On preventing your test suite from becoming too user interface-heavy

In August of last year, I published a blog post talking about why I don’t like to think of automation in terms of frameworks, but rather in terms of solutions. I’ve softened a little since then (this is probably a sign of me getting old..), but my belief that building a framework might lead to automation engineers subsequently trying to fit every test left, right and center into that framework still stands. One example of this phenomenon in particular I still see too often: engineers building a feature-rich end-to-end automation framework (for example using Selenium) and then automating all of their tests using that framework.

This is what I meant in the older post by ‘framework think’: because the framework has made it so easy for them to add new tests, they skip the step where they decide what would be the most efficient approach for a specific test and blindly add it to the test suite run by that very framework. This might not lead to harmful side effects in the short term, but as the test suite grows, chances are high that it becomes unwieldy, that the time it takes to complete a full test run becomes unnecessarily long and that maintenance efforts are not being outweighed by the added value of having the automated tests any more.

In this post, I’d like to take the practical approach once more and demonstrate how you can take a closer look at your application and decide if there might be a more efficient way to implement certain checks. We’re going to do this by opening up the user interface and see what happens ‘under the hood’. I’m writing this post as an addendum to my ‘Building great end-to-end tests with Selenium and Cucumber / SpecFlow‘ course, by the way. Yes, that’s right, one of the first things I talk about during my course on writing tests with Selenium is when not to do so. I firmly believe that’s the on of the very first steps towards creating a solid test suite: deciding what should not be in it.

The application under test
The application we’re going to write tests for is an online mortgage orientation tool, provided by a major Dutch online bank. I’ve removed all references to the client name, just to be sure, but it’s not like we’re dealing with sensitive data here. The orientation tool is a sequence of three forms, in which people that are interested in a mortgage fill in details about their financial situation, after which the orientation tool gives an indication of whether or not the applicant is eligible for a mortgage, as well as an estimate of the maximum amount of the mortgage, the interest rate and the monthly installments payable.

Our application under test - the mortgage orientation tool

What are we going to automate?
Now that we know what our application under test does, let’s see what we should automate. We’ll assume that there is a justified need for automated checks in the first place (otherwise this would have been a very short blog post!). We’ll also assume that, maybe for tests on some other part of the bank’s website, there is already a solid automation framework written around Selenium in place. So, this being a website and all, it makes sense to write some additional checks and incorporate them into the existing framework.

First of all, let’s try and make sure that the orientation tool can be used and completed, and that it displays a result. I’d say, that would be a good candidate for an automated test written using Selenium, since it confirms that the application is working from an end user perspective (there is value in the test) and I can’t think of a lower level test that would give me the same feedback. Since there are a couple of different paths through the orientation tool (you can apply for a mortgage alone or with someone else, some people have a house to sell while others have not, there are different types of contracts, etc.), I’d even go as far as to say you’ll need more than one Selenium-based test to be able to properly claim that all paths can be traversed by an end user.

Next, I can imagine that you’d want to make sure that the numbers that are displayed are correct, so your customers aren’t misinformed when they complete the orientation tool. This would lead to some massive issues of distrust later on in the mortgage application process, I’d assume.. Since we’ve been able to add the previous tests so easily to our existing framework, it makes sense to add some more tests that walk through the forms, add the data required to trigger a specific expected outcome and verify that the result screen we saw in the screenshot above displays the expected numbers. Right?

No. Not right.

It’s highly likely that the business logic used to perform the calculation and serve the numbers displayed on screen isn’t actually implemented in the user interface. Rather, it’s probably served up by a backend service containing the business logic and rules required to perform the calculations (and with mortgages, there are quite a few of those business rules, I’ve been told..). The user interface takes the values entered by the end user, sends them to a backend service that performs calculations and returns the values indicating mortgage eligibility, interest rate, height of monthly installment, etc., which are then interpreted and displayed again by that same user interface.

So, since the business logic that we’re verifying isn’t implemented in the user interface, why use the UI to verify it in the first place? That would highly likely only lead to unnecessarily slow tests and shallow feedback. Instead, let’s look if there’s a different hook we can use to write tests.

I tend to use on of two different tactics to find out if there are better ways to write automated tests in cases like these:

  1. Talk to a developer. They’re building the stuff, so they’ll probably know more about the architecture of your application and will likely be happy to help you out.
  2. Use a network analyzing tool such as Fiddler or WireShark. Tools like these two let you see what happens ‘under water’ when you’re using the user interface of a web application.

Normally, I’ll use a combination of both: find out more about the architecture of an application by talking to developers, then using a network analyzer (I prefer Fiddler myself) to see what API calls are triggered when I perform a certain action.

Analyzing API calls using Fiddler
So, let’s put my assumption that there’s a better way to automate the tests that will verify the calculations performed by the mortgage orientation tool to the test. To do so, I’ll fire up Fiddler and have it monitor the traffic that’s being sent back and forth between my browser and the application server while I interact with the orientation tool. Here’s what that looks like:

Traffic exchanged between client and server in our mortgage orientation tool

As you can see, there’s a mortgage orientation API with a Calculate operation that returns exactly those numbers that appear on the screen. See the number I marked in yellow? It’s right there in the application screenshot I showed previously. This shows that pretty much all that the front end does is performing calls to a backend API and presenting the data returned by it in a manner attractive to the end user. This means that it would not make sense to use the UI to verify the calculations. Instead, I’d advise you to mimic the API call (or sequence of calls) instead, as this will give you both faster and more accurate feedback.

To take things even further, I’d recommend you to dive into the application even deeper and see if the calculations can be covered with a decent set of unit tests. The easiest way to do this is to start talking to a developer and see if this is a possibility, and if they haven’t already done so. No need to maintain two different sets of automated checks that cover the same logic, and no need to cover logic that can be tested through unit tests with API-level checks..

Often, though, I find that writing tests like this at the API level hits the sweet spot between coverage, effort it takes to write the tests and speed of execution (and as a result, length of the feedback loop). This might be because I’m not too well versed in writing unit tests myself, but it has worked pretty well for me so far.

Deciding what to automate where: a heuristic
The above has just been one example where it would be better (as well as easier) to move specific checks from the UI level to the API level. But can we make some more generic statements about when to use UI-level checks and when to dive deeper?

Yes, we can. And it turns out, someone already did! In a recent blog post called ‘UI Test Heuristic: Don’t Repeat Your Paths‘, Chris McMahon talked about this exact subject, and the heuristic he presents in his blog post applies here perfectly:

  • Check that the end user can complete the mortgage orientation tools and is shown an indication of mortgage eligibility and associated figures > different paths through the user interface > user interface-level tests
  • Check that the figures served up by the mortgage orientation tool are correct > repeating the same paths multiple times, but with different sets of input data and expected output values > time to dive deeper

So, if you want to prevent your automated test suite from becoming too bloated with UI tests, this is a rule of thumb you can (and frankly, should) apply. As always, I’d love to hear what you think.

Is your user interface-driven test automation worth the effort?

I don’t know what’s happening on your side of things, dear reader, but in the projects I’m currently working on, and those that I have been working on in the recent past, there’s been a big focus on implementing user interface-driven test automation, almost always using some implementation of Selenium WebDriver (be it Java, C# or JavaScript). While that isn’t a bad thing in itself (I think Selenium is a great and powerful tool), I’m sometimes wondering whether all the effort that is being put into creating, stabilizing and maintaining these scripts is worth the effort in the end.

Recently, I’ve been thinking and talking about this question especially often, either when teaching different forms of my test automation awareness workshop, giving a talk on trusting your automation or just evaluating and thinking about my own work and projects. Yes, I too am sometimes guilty of getting caught up in the buzz of creating those perfectly stable, repeatable and maintainable Selenium tests, spending hours or sometimes even days on getting it right, thereby losing sight of the far more important questions of ‘why am I creating this test in the first place?’ and ‘will this test pay me back for the effort that I’m putting into creating it?’.

Sure, there are plenty of valid applications for user interface-driven tests. Here’s a little checklist that might be of use to you. Feel free to critique or add to it in the comments or via email. In my opinion, it is likely you’re creating a valuable test if all of these conditions apply:

  • The test simulates how an end user or customer interacts with the application and receives feedback from it (example: user searches for an item in a web shop, adds it to the cart, goes through checkout and receives feedback on the successful purchase)
  • There’s significant risk associated with the end user not being able to complete the interaction (example: not being able to complete a purchase and checkout leads to loss of revenue)
  • There’s no viable alternative available through which to perform the interaction (example: the web shop might provide an API that’s being called by the UI throughout the process, but this does not allow you to check that the end user is able to perform said interaction via the user interface. Web shop customer typically do not use APIs for their shopping needs..)
  • The test is repeatable (without an engineer having to intervene with regards to test environments or test data)

Checking all of the above boxes is no guarantee for a valuable user interface-driven test, but I tend to think it is far more likely you’re creating one if it does.

At the other end of the spectrum, a lot of useless (or ‘less valuable’ if you want the PC version) user interface-driven tests are created. And there’s more than one type of ‘useless’ here:

  • Tests that use the UI to test business logic that’s exposed through an API (use an API-level test instead!) or implemented in code (how about those unit tests?). Not testing at the right level supports shallow feedback and increased execution time. Goodbye fast feedback.
  • Tests that are unreliable with regards to execution and result consistency. ‘Flaky’ is the word I see used a lot, but I prefer ‘unreliable’ or ‘untrustworthy’. ‘Flaky’ sounds like snow to me. And I like snow. I don’t like unreliable tests, though. And user interface-driven tests are the tests that are most likely to be unreliable, in my experience.

What it boils down to is that these user interface-driven tests are by far the hardest to implement correctly. There’s so much to be taken care of: waiting for page loads and element state, proper exception handling, test data and test environment management. Granted, those last two are not limited to just this type of tests, but I find that people that know how to work on the unit or API level are also far more likely to be able to work with mocks, stubs and other simulations to deal with issues related to test data or test environments.

Here’s a tweet by Alan Page that recently appeared in my timeline and that sums it all up pretty well:

So, having read this post, are you still sure that all these hours you’re putting into creating, stabilizing and maintaining your Selenium tests are worth it in the end? If so, I tip my hat to you. But for the majority of people working on user interface-driven tests (again, including myself), it wouldn’t hurt to take a step back every now and then, lose the ‘have to get it working’ tunnel vision and think for a while whether your tests are actually delivering enough value to justify the efforts put into creating them.

So, are your UI automation efforts worth it?

Remember what your tests are trying to verify

Lately, I’ve been working a lot on Selenium-based test automation solutions. And even though I’m still not overly enthusiastic about creating lots of user interface-driven automated tests, now that I’m getting more skilled at creating robust and stable Selenium tests, I am starting to appreciate the tool and what you can do with it more and more. As everybody working with Selenium can tell you, there are situations where things can get, well, interesting. And by interesting, I mean tricky. Dealing with dynamic front end frameworks and unpredictable modals and overlays can ask a lot of you in terms of debugging and exception handling skills.

Not yet being fluent in Selenese, I find myself on Google and (subsequently) StackOverflow a lot, trying to find suitable solutions for problems I encounter. While doing so, there’s one thing that I see a lot of people do, yet strikes me as really odd, given what I think is the purpose of these user interface-driven tests:

Forcing your tests to do something your users can’t.

From what I’ve understood, studying and working in the test automation field for a while now, user interface-driven tests, such as these Selenium tests, should be used to verify that the user of your user interface is able to complete a sequence of predefined actions. If you’re working in a hip and happening environment, these are often called ‘customer journeys’ or ‘user journeys’. But that’s not my point. What my point is, is that quite often I see workarounds suggested that go beyond what a regular user could do with his or her keyboard and / or mouse.

For example, take an element that is invisible until you hover over it with your mouse. If you just try to do a click(), your test will probably throw an exception stating that the element was not visible. Now, there are (at least) two ways to deal with this situation:

  1. Use the Actions class to simulate a mouseover, then click the element.
  2. Use a JavaScriptExecutor to perform the click.

While both approaches might result in a passing test, I am of the opinion that one is useful and the other is a horrible idea. Based on what I’ve written so far, can you guess which of the two options I’d suggest?

Indeed, option #1 is the way to go, for two reasons:

  • User interface-driven tests should mimic actual user interaction as closely as possible. I’ve never seen a user execute some JavaScript on a page to make an element visible. Either that, or I’m hanging around the wrong group of users..
  • What happens if a front-end developer makes a mistake (I know, they never do, but let’s assume so anyway) which causes the element not to become visible, even on a mouseover? With option #1, your test will fail, for the right reason. With #2, hello false negative!

There are some exceptions to the rule, though. The prime example I can think of is handling file uploads by directly sending the absolute path of the file to be uploaded to the input element responsible for the file upload using sendKeys(), instead of clicking on it and handling the file dialog. I’ve tried the latter before, and it’s a pain, first because you can’t do it with the standard Selenium API (because the file dialog is native to the operating system), second because different browsers use different file dialog layouts, resulting in a lot of pesky code that easily breaks down. In this case, I prefer to bypass the file dialog altogether (it’s probably not the subject of my test anyway).

In (almost) all other cases though, I’d advise you to stick to simulating your end user behavior as closely as possible. The job of your user interface-driven tests, and therefore of you as its creator, is not to force a pass, but to simulate end user interaction and see if that leads to a successfully executed scenario. Don’t fool yourself and your stakeholders by underwater tricks that obscure potential user interface issues.

Remember what your tests are trying to verify.