Do you check your automated checks?

Quis custodiet ipsos custodes? (Who will guard the guards?)

– Juvenal (Roman poet), from Satires (Satire VI, lines 347–8)

You’ve been coding away on your automated tests for weeks or even months on end, creating a test suite that you think is up there with the best of them. Worthy of a place in the test automation Champions League, should such a competition exist. But is your work really that good? While you may think it is, do you have the proof to back it up? In this post, I would like to explain why you should make a habit of testing your tests, or, in light of the checking vs. testing debate, of checking your checks.

Why would you want to check your checks?
To argument the need for checking your checks, let’s take a look at what might happen when you don’t do so regularly. Essentially, there are two things that can (and will) occur:

1. Your checks will go bad
Checks that are left unattended once created, might – either right from the get-go or over time as the application under test evolves – start to:

  • Fail consistently. This will probably be detected and fixed as soon as your build monitor will start turning red. When you do Continuous Delivery or Continuous Deployment, this is even more likely as consistently failing checks will stall the delivery process. And people will start to notice that!
  • Fail intermittently. Also known as ‘flakiness’. Incredibly annoying as it’s usually a lot harder to pinpoint the root cause compared to checks that fail all the time. Still, the value of flaky checks can be considered about the same as for consistently failing checks: zero.
  • Always return true. This one might turn out to be very hard to detect, because your runs will keep being green and therefore chances are that nobody will feel the urge to review the checks that are performed. They are passing, so everything must be fine, right? Right?
  • Check the wrong thing. This one may or may not be easy to spot, depending on what they ARE checking. If it fails and the root cause of the failure is being analyzed, one of two things might happen: either the check is fixed – for example by changing the expected value – but the wrong check keeps being executed, or deeper analysis reveals that the wrong check has been performed all along and the check itself is being fixed (or even removed).

No matter what the root cause is, the end result of any of these types of check defects is the same: your checks will go bad and will therefore be worthless.

High quality software or just bad checks?

And I bet that’s not what you had in mind when you created all of these automated scripts! Also, this problem isn’t solved by simply looking at the output report for your automated test runs, as doing so won’t tell you anything about the effectiveness of your checks. It will only tell you what percentage of your checks passed, which tells only half the story.

2. Your check coverage will go bad
Another problem with the effectiveness of your checks can be seen when we look at it from another perspective:

  • You may have implemented too many checks. Yes, this is actually possible! This increases maintenance efforts and test run (and therefore feedback) time. This is especially significant for UI-driven checks performed by tools such as Selenium WebDriver.
  • You may have implemented too few checks. This has a negative impact on coverage. Not that coverage is the end all and be all of it (far from that), but I’m sure we can agree that not doing enough checks can’t be a good thing when you want to release a quality product..

I think both of these reasons warrant a periodic check validation. In such a validation, all checks performed by your automated test suite should be evaluated on the characteristics mentioned above. Checks that are no longer useful should be rewritten or removed, and any shortcomings in your test coverage should be addressed.

Check your automated checks!

Automating the check validation
Of course, since we’re here to talk about test automation, it would be even better to perform such a validation in an automated manner. But is such a thing even possible?

Unit test level checks
Turns out that for checks at the unit test level (which should make up a significant part of your checks anyway) checking your checks can be done using mutation testing. Even better, one of the main purposes of mutation testing IS to check the validity and usefulness of your checks. A perfect match, so to say. I am still pretty new to the concept of mutation testing, but so far I have found it to be a highly fascinating and potentially very powerful technique (if applied correctly, of course).

API and UI level checks
Unfortunately, mutation testing is mainly geared towards unit and (to a lesser extent) unit integration tests. I am not aware of any tools that actively support check validation for API and UI tests. If anyone reading this knows of a tool, please do let me know!

One reason such a tool might not exist is that it would be much harder to apply the basic concept of mutation testing to API and UI-level checks, if only because you would need to:

  1. Mutate the compiled application or the source code (any subsequently compile it) to create a mutant
  2. Deploy the mutant
  3. Run tests to see whether the mutant can be killed

for every possible mutant, or at least a reasonable subset of all possible mutants. As you can imagine, this would require a lot of time and computation power, which would have a negative effect on the basic principle of fast feedback for automated checks.

One possible approach that covers at least part of the objectives of mutation testing would be to negate all of your checks and see if all of them fail. This might give you some interesting information about the defect finding capabilities of your checks. However, it doesn’t tell you as much about the coverage of your checks though, so this very crude approach would be only part of the solution.

The above does give yet another reason for implementing as much of the checks as you can in unit tests, though. Not only would they be easier to create and faster to run, but their usefulness can also be proven much more precisely in an automated manner (i.e., using mutation testing).

So, as a wrap-up, do you check your automated checks? If so, how? I’m very curious to read new insights on this matter!

P.S.: Richard Bradshaw wrote a great blog post on the same topic recently, I highly suggest you read it as well over here.

The test automation pyramid

In his book Succeeding with Agile, Mike Cohn describes the concept of a test automation pyramid, describing three levels of test automation, their relation and their relative importance. As an advocate of minimizing user interface-based test automation I wholeheartedly support this pyramid, which is why I decided to share it with you at

Graphically, the test automation pyramid as proposed by Mike Cohn looks like this:
The test automation pyramid
Base layer: unit tests
Unit tests form the base layer of every solid automated testing approach. They can be written relatively quickly and give the programmer very specific information about the origins of a bug, up to the exact line of code where a failure occurs. Compare this to a bug report from a tester, who would usually be more like ‘function X, Y and / or Z are not working when I enter A or B, now go fix it’. This often requires more analysis (reproduction, debugging) and therefore more time from the developer to fix things.

Another advantage of unit tests is that not only can they be written quickly, test execution is also very fast, giving the developer immediate feedback on code quality.

Possible drawbacks of unit tests are that they mostly focus on small pieces of code (methods, classes) and are therefore unable to detect integration or system level bugs. Also, as they are written in code, they are written mostly by developers and not by testers. Ideally, unit tests should be written by someone other than the developer of the code that is being tested.

Top layer: user interface-level tests
Let’s skip the middle layer for a moment and go right to the top of the pyramid, where the UI-level automated tests reside. Ideally, you would want to do as little of this as possible, as they are often the most brittle and take the longest time both in test case development and in test execution. In my opinion, this form of test automation should only be used when the UI is actually being tested rather than the underlying system functionality, or when there is no viable alternative. Such an alternative is available more often than you’d think, by the way..

Middle layer: service and API tests
For those tests that exceed the scope of unit tests it is strongly advised to use tests that communicate with the application under test at the service or API level. Most modern applications offer some sort of API (either through an actual programming interface or through a web service exposing functionality to the outside world) that can be used by the tester to test those applications. These tests are often far less brittle (as service interfaces and APIs tend to change far less often than user interfaces) and execute far quicker with less false negatives.

The inverted test automation pyramid
It is no coincidence that Mike Cohn calls this middle layer the forgotten layer of test automation. All too often, test cases that cannot be covered by developers in unit tests are directly automated on the user interface level, resulting in big sets of UI level automated tests that take eons to execute and maintain. This phenomenon is represented by an inverted test automation pyramid:
The inverted test automation pyramid
In extreme cases, the middle layer doesn’t even exist in the overall test automation approach. It doesn’t need explanation that in most test automation projects that resemble this inverted pyramid a lot of money is wasted unnecessarily on development and maintenance of the automated test cases..

My advice
I’d therefore advice any test automation specialist that would like to make his or her project a success (and who doesn’t?) to do a couple of things:

  • Get familiar with unit testing, its benefits and the way the developers in your project use it. Try to understand what they test and what coverage is achieved in unit testing. Try and work with your developers to see where this coverage can be increased further and what you can do to achieve that.
  • For those tests that are not covered by unit tests, try to find out whether the application you are testing offers an API that you can use to base your automated tests on. Initially, testing ‘under the hood’ without a UI to drive tests might seem challenging, but the benefits are definitely worth it.

Following these two suggestions will help you greatly in getting your test automation pyramid in the right shape. What does your pyramid look like?

Final note: Although some articles on the Internet even go as far as to suggest an ‘ideal’ mix of the pyramid layers (80% unit tests, 15% API tests and 5% UI tests, for example), I think there is no one ‘ideal’ mix. It depends a lot on the type of application under test and the skill sets of your developers and especially your testers. However, the shape of the test automation pyramid should be roughly the same in any case.