Do you check your automated checks?

Quis custodiet ipsos custodes? (Who will guard the guards?)

– Juvenal (Roman poet), from Satires (Satire VI, lines 347–8)

You’ve been coding away on your automated tests for weeks or even months on end, creating a test suite that you think is up there with the best of them. Worthy of a place in the test automation Champions League, should such a competition exist. But is your work really that good? While you may think it is, do you have the proof to back it up? In this post, I would like to explain why you should make a habit of testing your tests, or, in light of the checking vs. testing debate, of checking your checks.

Why would you want to check your checks?
To argument the need for checking your checks, let’s take a look at what might happen when you don’t do so regularly. Essentially, there are two things that can (and will) occur:

1. Your checks will go bad
Checks that are left unattended once created, might – either right from the get-go or over time as the application under test evolves – start to:

  • Fail consistently. This will probably be detected and fixed as soon as your build monitor will start turning red. When you do Continuous Delivery or Continuous Deployment, this is even more likely as consistently failing checks will stall the delivery process. And people will start to notice that!
  • Fail intermittently. Also known as ‘flakiness’. Incredibly annoying as it’s usually a lot harder to pinpoint the root cause compared to checks that fail all the time. Still, the value of flaky checks can be considered about the same as for consistently failing checks: zero.
  • Always return true. This one might turn out to be very hard to detect, because your runs will keep being green and therefore chances are that nobody will feel the urge to review the checks that are performed. They are passing, so everything must be fine, right? Right?
  • Check the wrong thing. This one may or may not be easy to spot, depending on what they ARE checking. If it fails and the root cause of the failure is being analyzed, one of two things might happen: either the check is fixed – for example by changing the expected value – but the wrong check keeps being executed, or deeper analysis reveals that the wrong check has been performed all along and the check itself is being fixed (or even removed).

No matter what the root cause is, the end result of any of these types of check defects is the same: your checks will go bad and will therefore be worthless.

High quality software or just bad checks?

And I bet that’s not what you had in mind when you created all of these automated scripts! Also, this problem isn’t solved by simply looking at the output report for your automated test runs, as doing so won’t tell you anything about the effectiveness of your checks. It will only tell you what percentage of your checks passed, which tells only half the story.

2. Your check coverage will go bad
Another problem with the effectiveness of your checks can be seen when we look at it from another perspective:

  • You may have implemented too many checks. Yes, this is actually possible! This increases maintenance efforts and test run (and therefore feedback) time. This is especially significant for UI-driven checks performed by tools such as Selenium WebDriver.
  • You may have implemented too few checks. This has a negative impact on coverage. Not that coverage is the end all and be all of it (far from that), but I’m sure we can agree that not doing enough checks can’t be a good thing when you want to release a quality product..

I think both of these reasons warrant a periodic check validation. In such a validation, all checks performed by your automated test suite should be evaluated on the characteristics mentioned above. Checks that are no longer useful should be rewritten or removed, and any shortcomings in your test coverage should be addressed.

Check your automated checks!

Automating the check validation
Of course, since we’re here to talk about test automation, it would be even better to perform such a validation in an automated manner. But is such a thing even possible?

Unit test level checks
Turns out that for checks at the unit test level (which should make up a significant part of your checks anyway) checking your checks can be done using mutation testing. Even better, one of the main purposes of mutation testing IS to check the validity and usefulness of your checks. A perfect match, so to say. I am still pretty new to the concept of mutation testing, but so far I have found it to be a highly fascinating and potentially very powerful technique (if applied correctly, of course).

API and UI level checks
Unfortunately, mutation testing is mainly geared towards unit and (to a lesser extent) unit integration tests. I am not aware of any tools that actively support check validation for API and UI tests. If anyone reading this knows of a tool, please do let me know!

One reason such a tool might not exist is that it would be much harder to apply the basic concept of mutation testing to API and UI-level checks, if only because you would need to:

  1. Mutate the compiled application or the source code (any subsequently compile it) to create a mutant
  2. Deploy the mutant
  3. Run tests to see whether the mutant can be killed

for every possible mutant, or at least a reasonable subset of all possible mutants. As you can imagine, this would require a lot of time and computation power, which would have a negative effect on the basic principle of fast feedback for automated checks.

One possible approach that covers at least part of the objectives of mutation testing would be to negate all of your checks and see if all of them fail. This might give you some interesting information about the defect finding capabilities of your checks. However, it doesn’t tell you as much about the coverage of your checks though, so this very crude approach would be only part of the solution.

The above does give yet another reason for implementing as much of the checks as you can in unit tests, though. Not only would they be easier to create and faster to run, but their usefulness can also be proven much more precisely in an automated manner (i.e., using mutation testing).

So, as a wrap-up, do you check your automated checks? If so, how? I’m very curious to read new insights on this matter!

P.S.: Richard Bradshaw wrote a great blog post on the same topic recently, I highly suggest you read it as well over here.

An introduction to mutation testing and PIT

This blog post will cover the basics of the concept of mutation testing. I have been made aware of mutation testing only recently, but I have discovered it’s a very powerful and interesting technique for:

  • analysis and improvement of unit tests
  • detection of dead code in your application

Two things that are always worth taking a look at if you ask me. I will illustrate the mutation testing concept using a tool called PIT and a simple piece of code and accompanying set of unit tests.

What is mutation testing?
From Wikipedia:

Mutation testing is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill.

In other words, mutation testing is a technique that allows you to evaluate not only the percentage of code that is executed when running your tests (i.e., code coverage), but also the ability of your tests to detect any defects in the executed code. This makes mutation testing a very powerful and very useful technique I think anyone involved in software development and testing should at least be aware of.

Introducing PIT
I will try and illustrate the power of mutation testing using PIT, a Java mutation test tool which can be downloaded here. I chose PIT over other available mutation test tools mainly because of its ease of installation and use.

Assuming you’re also using Maven, you can configure your Java project for mutation testing using PIT by adding the following to your pom.xml:

<build>
	<plugins>
		<plugin>
			<groupId>org.pitest</groupId>
			<artifactId>pitest-maven</artifactId>
			<version>PIT-VERSION</version>
			<configuration>
				<targetClasses>
					<param>package.root.containing.classes.to.mutate*</param>
				</targetClasses>
				<targetTests>
					<param>package.root.containing.test.classes*</param>
				</targetTests>
			</configuration>
		</plugin>
	</plugins>
</build>

Simply replace the package locators with those appropriate for your project and be sure not to forget the asterisk at the end. Also replace PIT-VERSION with the PIT version you want to use (the latest is 1.1.4 at the moment of writing this blog post) and you’re good to go.

The code class and test class to be subjected to mutation testing
I created a very simple Calculator class that, you guessed it, performs simple arithmetic on integers. My calculator only does addition, subtraction and power calculations:

public class Calculator {

	int valueDisplayed;

	public Calculator() {
		this.valueDisplayed = 0;
	}
	
	public Calculator(int initialValue) {
		this.valueDisplayed = initialValue;
	}

	public void add(int x) {
		this.valueDisplayed += x;
	}
	
	public void subtract(int x) {
		this.valueDisplayed -= x;
	}
	
	public void power(int x) {
		this.valueDisplayed = (int) Math.pow(this.valueDisplayed, x);
	}

	public int getResult() {
		return this.valueDisplayed;
	}
	
	public void set(int x) {
		this.valueDisplayed = x;
	}

	public boolean setConditional(int x, boolean yesOrNo) {
		if(yesOrNo) {
			set(x);
			return true;
		} else {
			return false;
		}
	}
}

To test the calculator, I have created a couple of TestNG unit tests that call the various methods my calculator supports. Note that PIT supports both JUnit and TestNG.

public class CalculatorTest {
	
	@Test
	public void testAddition() {
		
		Calculator calculator = new Calculator();
		calculator.add(2);
		Assert.assertEquals(calculator.getResult(), 2);
	}
	
	@Test
	public void testPower() {
		
		Calculator calculator = new Calculator(2);
		calculator.power(3);
		Assert.assertEquals(calculator.getResult(), 8);
	}
	
	@Test
	public void testConditionalSetTrue() {
		
		Calculator calculator = new Calculator();
		Assert.assertEquals(calculator.setConditional(2, true), true);
	}
	
	@Test
	public void testConditionalSetFalse() {
		
		Calculator calculator = new Calculator();
		Assert.assertEquals(calculator.setConditional(3, false), false);
	}
}

To illustrate the capabilities of PIT and mutation testing in general, I ‘forgot’ to include a test for the subtract() method. Also, I created what is known as a ‘weak test’: a test that passes but doesn’t check whether all code is actually called (in this case, no check is done to see whether set() is called when calling setConditional()). Now, when we run PIT on our code and test classes using:

mvn org.pitest:pitest-maven:mutationCoverage

an HTML report is generated displaying our mutation test results:

The report as generated by PIT

When we drill down to our Calculator class we can see the modifications that have been made by PIT and the effect it had on our tests:

Class-level details of the PIT mutation test results

This clearly shows that our unit test suite has room for improvement:

  • The fact that subtract() is never called in our test suite (i.e., our code coverage can be improved) is detected
  • The fact that the call to set() can be removed from the code without our test results being affected (i.e., our tests are lacking defect detection power) is detected

These holes in our test coverage and test effectiveness might go unnoticed for a long time, especially since all tests pass when run using TestNG. This goes especially for the second flaw as a regular code coverage tool would not pick this up: the call to set() is made after all, but it does not have any effect on the outcome of our tests!

Additional PIT features
The PIT documentation discusses a lot of features that make your mutation testing efforts even more powerful. You can configure the set of mutators used to tailor the result set to your needs, you can use mutation filters to filter out any unwanted results, and much more. However, even in the default configuration, using PIT (or potentially any other mutation testing tool as listed here will tell you a lot about the quality of your unit testing efforts.

Removing dead code from your codebase based on mutation test results
Apart from evaluating the quality of your unit tests, mutation test results can also give you insight into which parts of your application code are never executed (dead code). Consider the call to the set() method in the example above. The mutation test results indicated that this call could be removed without the results of the unit test being altered. Now, in our case it is pretty obvious that this indicates a lack of coverage in our unit tests (if you want to set the Calculator value, you’d better call the set() method), but it isn’t hard to imagine a situation where such a method call can be removed without any further consequences. In this case, the results of the mutation tests will point you to potentially dead code that might be a candidate for refactoring or removal. Thanks go to Markus Schirp for pointing out this huge advantage of mutation testing to me on Twitter.

Example project
The Maven project that was used to generate the results demonstrated in this post can be downloaded here. You can simply import this project and run

mvn org.pitest:pitest-maven:mutationCoverage

to recreate my test results and review the generated report. This will serve as a good starting point for your further exploration of the power and possibilities of mutation testing.