Is your user interface-driven test automation worth the effort?

I don’t know what’s happening on your side of things, dear reader, but in the projects I’m currently working on, and those that I have been working on in the recent past, there’s been a big focus on implementing user interface-driven test automation, almost always using some implementation of Selenium WebDriver (be it Java, C# or JavaScript). While that isn’t a bad thing in itself (I think Selenium is a great and powerful tool), I’m sometimes wondering whether all the effort that is being put into creating, stabilizing and maintaining these scripts is worth the effort in the end.

Recently, I’ve been thinking and talking about this question especially often, either when teaching different forms of my test automation awareness workshop, giving a talk on trusting your automation or just evaluating and thinking about my own work and projects. Yes, I too am sometimes guilty of getting caught up in the buzz of creating those perfectly stable, repeatable and maintainable Selenium tests, spending hours or sometimes even days on getting it right, thereby losing sight of the far more important questions of ‘why am I creating this test in the first place?’ and ‘will this test pay me back for the effort that I’m putting into creating it?’.

Sure, there are plenty of valid applications for user interface-driven tests. Here’s a little checklist that might be of use to you. Feel free to critique or add to it in the comments or via email. In my opinion, it is likely you’re creating a valuable test if all of these conditions apply:

  • The test simulates how an end user or customer interacts with the application and receives feedback from it (example: user searches for an item in a web shop, adds it to the cart, goes through checkout and receives feedback on the successful purchase)
  • There’s significant risk associated with the end user not being able to complete the interaction (example: not being able to complete a purchase and checkout leads to loss of revenue)
  • There’s no viable alternative available through which to perform the interaction (example: the web shop might provide an API that’s being called by the UI throughout the process, but this does not allow you to check that the end user is able to perform said interaction via the user interface. Web shop customer typically do not use APIs for their shopping needs..)
  • The test is repeatable (without an engineer having to intervene with regards to test environments or test data)

Checking all of the above boxes is no guarantee for a valuable user interface-driven test, but I tend to think it is far more likely you’re creating one if it does.

At the other end of the spectrum, a lot of useless (or ‘less valuable’ if you want the PC version) user interface-driven tests are created. And there’s more than one type of ‘useless’ here:

  • Tests that use the UI to test business logic that’s exposed through an API (use an API-level test instead!) or implemented in code (how about those unit tests?). Not testing at the right level supports shallow feedback and increased execution time. Goodbye fast feedback.
  • Tests that are unreliable with regards to execution and result consistency. ‘Flaky’ is the word I see used a lot, but I prefer ‘unreliable’ or ‘untrustworthy’. ‘Flaky’ sounds like snow to me. And I like snow. I don’t like unreliable tests, though. And user interface-driven tests are the tests that are most likely to be unreliable, in my experience.

What it boils down to is that these user interface-driven tests are by far the hardest to implement correctly. There’s so much to be taken care of: waiting for page loads and element state, proper exception handling, test data and test environment management. Granted, those last two are not limited to just this type of tests, but I find that people that know how to work on the unit or API level are also far more likely to be able to work with mocks, stubs and other simulations to deal with issues related to test data or test environments.

Here’s a tweet by Alan Page that recently appeared in my timeline and that sums it all up pretty well:

So, having read this post, are you still sure that all these hours you’re putting into creating, stabilizing and maintaining your Selenium tests are worth it in the end? If so, I tip my hat to you. But for the majority of people working on user interface-driven tests (again, including myself), it wouldn’t hurt to take a step back every now and then, lose the ‘have to get it working’ tunnel vision and think for a while whether your tests are actually delivering enough value to justify the efforts put into creating them.

So, are your UI automation efforts worth it?

Remember what your tests are trying to verify

Lately, I’ve been working a lot on Selenium-based test automation solutions. And even though I’m still not overly enthusiastic about creating lots of user interface-driven automated tests, now that I’m getting more skilled at creating robust and stable Selenium tests, I am starting to appreciate the tool and what you can do with it more and more. As everybody working with Selenium can tell you, there are situations where things can get, well, interesting. And by interesting, I mean tricky. Dealing with dynamic front end frameworks and unpredictable modals and overlays can ask a lot of you in terms of debugging and exception handling skills.

Not yet being fluent in Selenese, I find myself on Google and (subsequently) StackOverflow a lot, trying to find suitable solutions for problems I encounter. While doing so, there’s one thing that I see a lot of people do, yet strikes me as really odd, given what I think is the purpose of these user interface-driven tests:

Forcing your tests to do something your users can’t.

From what I’ve understood, studying and working in the test automation field for a while now, user interface-driven tests, such as these Selenium tests, should be used to verify that the user of your user interface is able to complete a sequence of predefined actions. If you’re working in a hip and happening environment, these are often called ‘customer journeys’ or ‘user journeys’. But that’s not my point. What my point is, is that quite often I see workarounds suggested that go beyond what a regular user could do with his or her keyboard and / or mouse.

For example, take an element that is invisible until you hover over it with your mouse. If you just try to do a click(), your test will probably throw an exception stating that the element was not visible. Now, there are (at least) two ways to deal with this situation:

  1. Use the Actions class to simulate a mouseover, then click the element.
  2. Use a JavaScriptExecutor to perform the click.

While both approaches might result in a passing test, I am of the opinion that one is useful and the other is a horrible idea. Based on what I’ve written so far, can you guess which of the two options I’d suggest?

Indeed, option #1 is the way to go, for two reasons:

  • User interface-driven tests should mimic actual user interaction as closely as possible. I’ve never seen a user execute some JavaScript on a page to make an element visible. Either that, or I’m hanging around the wrong group of users..
  • What happens if a front-end developer makes a mistake (I know, they never do, but let’s assume so anyway) which causes the element not to become visible, even on a mouseover? With option #1, your test will fail, for the right reason. With #2, hello false negative!

There are some exceptions to the rule, though. The prime example I can think of is handling file uploads by directly sending the absolute path of the file to be uploaded to the input element responsible for the file upload using sendKeys(), instead of clicking on it and handling the file dialog. I’ve tried the latter before, and it’s a pain, first because you can’t do it with the standard Selenium API (because the file dialog is native to the operating system), second because different browsers use different file dialog layouts, resulting in a lot of pesky code that easily breaks down. In this case, I prefer to bypass the file dialog altogether (it’s probably not the subject of my test anyway).

In (almost) all other cases though, I’d advise you to stick to simulating your end user behavior as closely as possible. The job of your user interface-driven tests, and therefore of you as its creator, is not to force a pass, but to simulate end user interaction and see if that leads to a successfully executed scenario. Don’t fool yourself and your stakeholders by underwater tricks that obscure potential user interface issues.

Remember what your tests are trying to verify.

Continuous Delivery and user interface-driven test automation: does that compute?

In this post, I’d like to take a closer look at the combination of Continuous Delivery on the one hand and automated tests at the user interface level on the other hand. Is this a match made in heaven? In hell? Or is the truth somewhere out there in between? (Hint: as with so many things in life, it is..).

Continuous Delivery (CD) is an approach in which software development teams produce software and release it into the production environment in very short cycles. Automation of building, testing and deploying the software often is a vital part in achieving CD. Since this is a blog on testing and test automation, I’ll focus on that, leaving the topics of build and deployment automation to those more experienced in that field.

Automated tests on the user interface level (such as those built using Selenium) traverse your complete application from the user interface to the database and back and can therefore be considered as end-to-end tests. These tests are often:

  • relatively slow to execute, since they require firing up an actual browser instance, rendering pages, dealing with page loading times, etc.,
  • demanding in terms of maintenance, since the user interface is among the components of an application that are most prone to change during the software life cycle, and
  • brittle, because object on a web page or in an application are often dynamically generated and rendered and because wait times are not always predictable, making synchronization a tough issue to tackle correctly.

So, we have CD striving for fast and reliable test feedback (CD is hard to do properly when you’ve got flaky tests stalling your builds) one the one hand, and user interface tests and their issues on the other hand. So, is there a place for these tests in the CD pipeline? I’d like to argue there is, if they satisfy a number of criteria.

The tests are actually verifying the user interface
There’s a difference between verifying the user interface itself, and using the user interface to verify (business) logic implemented in lower layers of your application. If you’re merely using the user interface as a means to verify, for example, API or database logic, you should reconsider moving the test to that specific level. The test automation pyramid isn’t popular without reason.. On the other hand, if it IS the user interface you’re testing, then you’re on the right track. But maybe there is a better option…

The user interface cannot be tested as a unit
Instead of verifying your user interface by running end-to-end tests, it might be worthwhile to see whether you can isolate the user interface in some way and test its logic as a unit instead. If this is possible, it will likely result in significant gains in terms of time needed to execute tests. I’ve recently written a blog post about this, so you might want to check that one out too.

The tests are verifying vital application logic
This point is more about the ‘what’ of the tests than the ‘how’. If you want to include end-to-end user interface-driven tests in your CD pipeline, they should verify business critical application features or logic. In other words, ask yourself ‘if this test fails, do we need to stop deploying into production?’ If the answer is yes, then the test has earned its place in the pipeline. If not, then maybe you should consider taking it out and running it periodically outside of the pipeline (no-one says that all tests need to be in the pipeline or that no testing can take place outside of the pipeline!). or maybe removing the test from your test set altogether, if it doesn’t provide enough value.

The tests are optimized in terms of speed and reliability
Once it’s clear that your user interface-driven end-to-end tests are worthy of being part of the CD pipeline, you should make sure that they’re as fast and stable as possible to prevent unnecessarily long delivery times and false negatives (and therefore blocked pipelines) due to flaky tests. For speed, you can for example make sure that there are no superfluous waits in your tests (Thread.sleep(), anyone?), and in case you have a lot of tests to execute – and all these tests should be run in the pipeline – you can see if it’s possible to parallelize test execution and have them run on different machines. For reliability, you should make sure that your error handling is top notch. For example, you should avoid any StaleElementReferenceException occurrence in Selenium, something you can achieve by implementing proper wrapper methods.

In short, I’d say you should free up a place for user-interface driven end-to-end tests in your CD pipeline, but it should be a very well earned place indeed.