Do you check your automated checks?

Quis custodiet ipsos custodes? (Who will guard the guards?)

– Juvenal (Roman poet), from Satires (Satire VI, lines 347–8)

You’ve been coding away on your automated tests for weeks or even months on end, creating a test suite that you think is up there with the best of them. Worthy of a place in the test automation Champions League, should such a competition exist. But is your work really that good? While you may think it is, do you have the proof to back it up? In this post, I would like to explain why you should make a habit of testing your tests, or, in light of the checking vs. testing debate, of checking your checks.

Why would you want to check your checks?
To argument the need for checking your checks, let’s take a look at what might happen when you don’t do so regularly. Essentially, there are two things that can (and will) occur:

1. Your checks will go bad
Checks that are left unattended once created, might – either right from the get-go or over time as the application under test evolves – start to:

  • Fail consistently. This will probably be detected and fixed as soon as your build monitor will start turning red. When you do Continuous Delivery or Continuous Deployment, this is even more likely as consistently failing checks will stall the delivery process. And people will start to notice that!
  • Fail intermittently. Also known as ‘flakiness’. Incredibly annoying as it’s usually a lot harder to pinpoint the root cause compared to checks that fail all the time. Still, the value of flaky checks can be considered about the same as for consistently failing checks: zero.
  • Always return true. This one might turn out to be very hard to detect, because your runs will keep being green and therefore chances are that nobody will feel the urge to review the checks that are performed. They are passing, so everything must be fine, right? Right?
  • Check the wrong thing. This one may or may not be easy to spot, depending on what they ARE checking. If it fails and the root cause of the failure is being analyzed, one of two things might happen: either the check is fixed – for example by changing the expected value – but the wrong check keeps being executed, or deeper analysis reveals that the wrong check has been performed all along and the check itself is being fixed (or even removed).

No matter what the root cause is, the end result of any of these types of check defects is the same: your checks will go bad and will therefore be worthless.

High quality software or just bad checks?

And I bet that’s not what you had in mind when you created all of these automated scripts! Also, this problem isn’t solved by simply looking at the output report for your automated test runs, as doing so won’t tell you anything about the effectiveness of your checks. It will only tell you what percentage of your checks passed, which tells only half the story.

2. Your check coverage will go bad
Another problem with the effectiveness of your checks can be seen when we look at it from another perspective:

  • You may have implemented too many checks. Yes, this is actually possible! This increases maintenance efforts and test run (and therefore feedback) time. This is especially significant for UI-driven checks performed by tools such as Selenium WebDriver.
  • You may have implemented too few checks. This has a negative impact on coverage. Not that coverage is the end all and be all of it (far from that), but I’m sure we can agree that not doing enough checks can’t be a good thing when you want to release a quality product..

I think both of these reasons warrant a periodic check validation. In such a validation, all checks performed by your automated test suite should be evaluated on the characteristics mentioned above. Checks that are no longer useful should be rewritten or removed, and any shortcomings in your test coverage should be addressed.

Check your automated checks!

Automating the check validation
Of course, since we’re here to talk about test automation, it would be even better to perform such a validation in an automated manner. But is such a thing even possible?

Unit test level checks
Turns out that for checks at the unit test level (which should make up a significant part of your checks anyway) checking your checks can be done using mutation testing. Even better, one of the main purposes of mutation testing IS to check the validity and usefulness of your checks. A perfect match, so to say. I am still pretty new to the concept of mutation testing, but so far I have found it to be a highly fascinating and potentially very powerful technique (if applied correctly, of course).

API and UI level checks
Unfortunately, mutation testing is mainly geared towards unit and (to a lesser extent) unit integration tests. I am not aware of any tools that actively support check validation for API and UI tests. If anyone reading this knows of a tool, please do let me know!

One reason such a tool might not exist is that it would be much harder to apply the basic concept of mutation testing to API and UI-level checks, if only because you would need to:

  1. Mutate the compiled application or the source code (any subsequently compile it) to create a mutant
  2. Deploy the mutant
  3. Run tests to see whether the mutant can be killed

for every possible mutant, or at least a reasonable subset of all possible mutants. As you can imagine, this would require a lot of time and computation power, which would have a negative effect on the basic principle of fast feedback for automated checks.

One possible approach that covers at least part of the objectives of mutation testing would be to negate all of your checks and see if all of them fail. This might give you some interesting information about the defect finding capabilities of your checks. However, it doesn’t tell you as much about the coverage of your checks though, so this very crude approach would be only part of the solution.

The above does give yet another reason for implementing as much of the checks as you can in unit tests, though. Not only would they be easier to create and faster to run, but their usefulness can also be proven much more precisely in an automated manner (i.e., using mutation testing).

So, as a wrap-up, do you check your automated checks? If so, how? I’m very curious to read new insights on this matter!

P.S.: Richard Bradshaw wrote a great blog post on the same topic recently, I highly suggest you read it as well over here.

"