(Not so) useful metrics in test automation

In order to remain in control of test automation efforts, it’s a good idea to define metrics and track them, so that you can take action in case the metrics tell you your efforts aren’t yielding the right results. And even more important, it allows you to bask in glory if they do! But what exactly are good metrics when it comes to test automation? In this blog post, I’ll take a look at some metrics that I think are useful, and some that I think the automation world can easily do without. Note that I’m not even going to try and present a complete list of metrics that can be used to track your automation efforts, but hopefully the ones mentioned here can move you a little closer to the right path.

So, what do I think could be a useful metric when tracking the effectiveness and/or the results of your test automation efforts?

Feedback loop duration reduction
The first metric I’ll suggest here is not related to the quality of an application, but rather to the quality of the development process. At its heart, test automation is – or at least should be – meant to increase the effectiveness of testing efforts. One way to measure this is to track the time that elapses between the moment a developer commits a change and the moment (s)he is informed about the effects these changes have had on application quality. This time, also known as the feedback loop time, should ideally be as short as possible. If it takes two weeks before a developer hears about the negative (side) effects of a change, he or she will have long moved on to other things. Or projects. Or jobs. If, instead, feedback is given within minutest (or even seconds), it’s far easier to correct course directly. One way to shorten the feedback loop duration is by effective use of test automation, so use it wisely and track the effects that automation has on your feedback loop.

Customer journey completion rate
This might come as a surprise to some, but test automation, testing and software development in general is still an activity that serves a greater good: your customer base. In this light, it would make sense to have some metrics that relate directly to the extent to which a customer is able to use your application, right? A prime example of this would be the amount of predefined critical customer journeys that can (still) be completed by means of an automated test after a certain change to the software has been developed and deployed. By critical, I mean journeys that relate directly to revenue generation, customer acquisition and other such trivialities. The easier and more often you can verify (using automation) that these journeys can still be completed, the more trust you’ll have deploying that shiny new application version into your production environment.

False positives and negatives
Automated tests are only truly valuable if you can trust them fully to give you the feedback you need. That is: whenever an automated test case fails, it should be due to a defect (or an unnoticed change) in your application, not because your automation is failing (you don’t want false negatives). The other way around, whenever an automated test case passes, you should have complete trust that the component or application under test indeed works as intended (you want false positives even less). False negatives are annoying, but at least they don’t go by unnoticed. Fix their root cause and move on. If it can’t be fix, don’t be afraid to throw away the test, because if you can’t trust it, it’s worthless. False positives are the biggest pain in the backside, because they go by unnoticed. If all is green in the world of automation, it’s easy (and quite human) to trust the results, even when all you’re checking is that 1 equals 1 (see also below). One approach to detecting and fixing false positives, at least in the unit testing area, is the use of mutation testing. If this is not an option, be sure to regularly check your automated checks to see that they still have their desired defect detection dint (or 4D, coining my first ever useless automation acronym here!).

Where there are useful metrics, there are also ones that aren’t as valuable (or downright worthless)..

Code coverage
A metric that is often used to express the coverage of a suite of unit tests. The main problem I have with this metric is that in theory it all sounds perfectly sensible (‘every line of our code is executed at least once when we run our tests!’), but in practice, it doesn’t say anything about the quality and the effectiveness of the tests, nor does it say anything about actual application quality. For example, it’s perfectly possible to write unit tests that touch all lines of your code and then assert that 1 equals 1. Or that apples equal apples. These tests will run smoothly. They’ll pass every time, or at least until 1 does not equal 1 anymore (but I think we’re safe for the foreseeable future). Code coverage tools will show a nice and shiny ‘100%’. Yet it means nothing in terms of application quality.

Percentage of overall number of test cases that are automated
An exponent of the ‘automate all the things!!’ phenomenon. In theory, it looks cool: ‘We automated 83,56% of our tests!’. First of all, especially with exploratory testing (not my forte, so I might be wrong here), there is no such thing as a fixed and predetermined number of test cases anymore. As such, expressing the amount of automated tests as a percentage of a variable or even nonexistent number is worthless. Or downright lying (you pick one). There’s only one metric that counts in this respect, quoting Alan Page:

You should automate 100% of the tests that should be automated

Reduction in numbers of testers
Yes, it’s 2016. And yes, some organizations still think this way. I’m not even going to spend time explaining why ‘if we automate our tests, we can do with X fewer testers’ is horrible thinking anymore. However, in a blog post mentioning good, bad and downright ugly metrics related to test automation, it cannot be NOT mentioned.

Anyway, wrapping all this up, I think my view on automation metrics can be summarized like this: automation metrics should tell you something useful about the quality your application or your development process. Metrics related to automation itself or to something that automation cannot be used for (actual testing as a prime example) might not be worth tracking or even thinking about.

8 thoughts on “(Not so) useful metrics in test automation

  1. Thank for the article.
    It really made me to think about test automation metrics much broader.(Specifically about customer journeys).

    I think that another metrics that are also crucial for test automation: time for creating, maintaining tests and time for test result analysis.
    These metrics can show how test code is supportable and how test results are understandable not only by test automation, but also by testers or even project managers.

    Because if tests can be supported only by “one magic engineer” – it is a sign, that it will be abandoned, as soon as creator will leave a team / company.

    • Hey Alexander,

      Thanks for sharing your insights! That’s a great addition indeed. How easy does your selected tool or set of tools make it for you to create and update your tests?

      • On my current project we have a solution for cross component intergration testing. It provides us fast and reliable tests.

        Test scenarios are written in Gherkin language using Cucumber as BDD tool.

        So QA’s can easily construct new scenarios from previously implemented steps. If any step is missing – QA implements it by him/herself or creates a Jira issue to automation / developer (if changes in solution are crucial).

        We spend additional time only if new system / flow is oboarded.

  2. Pingback: Testing Bits – 12/11/16 – 12/17/16 | Testing Curator Blog

  3. Pingback: Java Testing Weekly 51 / 2016

  4. Pingback: Java Web Weekly, Issue 155 | Baeldung

  5. Thanks Bas, nice article!
    Only with the code coverage I don’t agree. You are right that a good coverage is not a reliable indicator for a good test quality. But this is just the inversion of the correct application of code coverage:
    Bad code coverage is (often) an indicator for bad test quality. Or, in other words, good test quality should lead to a good code coverage.
    So, if you use code coverage correct, it can very well help you in improving your code quality by hinting you e.g. to missing test cases, etc.

    Cheers,
    Lars

    • Hey Lars, thanks for the nice words and for sharing your insights!

      I agree, especially after having discussed this a couple of times with some developers at work. Good code coverage does not mean good test quality, but bad code coverage definitely implies there’s work to be done in test creation. It’s using coverage as the be all and end all that’s bothering me. It could very well be that this wasn’t made clear in my post, so thanks for that!

Leave a Reply

Your email address will not be published. Required fields are marked *