On asking ‘why?’ in test automation

This blog post is about something that’s been bothering me for a while, and that keeps coming up for some reason.. Whenever I talk to clients, see a discussion on LinkedIn or StackOverflow or read a blog post on something related to test automation, all too often it’s about ‘how can I solve problem X with tool Y’ (with tool Y suspiciously often being equal to Selenium). The word that buggers me most in this question is ‘how?‘. My knee-jerk reaction to a lot of these ‘how?’ questions I see is ‘why?’. Or more specifically: ‘why the &%$^* would you want to do that in the first place?’.

About half a year ago, I wrote a blog post related to this frustration of mine on LinkedIn. So far, it hasn’t changed the world, since I still see a lot of ‘how?’ where I think ‘why?’ would be a far better question. But since, as it is so eloquently said in Latin (although to me, all Latin sounds pretty eloquent):

“Repetitio mater studiorum est” (“Repetition is the mother of all learning”),

I think it’s worth repeating here as well: With all questions related to test automation, first ask yourself ‘why?’ before even thinking about the ‘how?’.

‘Why?’ prevents you from automation for automation’s sake
Before asking ‘how can I implement test automation most effectively?’, ask ‘why do I want to implement test automation in the first place?’. Implementation of test automation should be a conscious decision, motivated by tangible and significant benefits to the overall software development process (and in the end, to the business objectives of the organization), not an activity that is adopted just because it sounds cool, or because *shudder* everybody else is doing it.

‘Why?’ steers your efforts in the right direction
Before asking ‘how can I automate this test?’, ask ‘why do I want to automate this test in the first place?’. Don’t become the world’s best automator of useless tests. Instead, become the world’s best selector of useful tests to automate. Selecting those tests that give you the most valuable information about the quality and the risk associated with the application you’re developing and delivering can only be done by asking ‘why?’ first. Only after you’ve decided on the best possible set of tests, start exploring how you can automate those tests in the most effective way.

‘Why?’ makes sure you use your tools in the best possible way
Before asking ‘how can I use tool X to automate this test?’, ask ‘why should I use tool X to automate this test?’. In a recent blog post, I talked (or ranted) about abusing Selenium for API tests. Even though the blog post was meant to be – at least partly – satirical, I see similar things happening on a regular basis. Another big example is the use of Cucumber or SpecFlow as an automation tool.

In the surprisingly recent past, I’ve been guilty of the above mistakes myself too, by the way. No need to be elitist and pretend I know it all. I just hope that more people will start to think and ask the right questions before they automate.

On a final note, in test automation training too, a lot of attention is being paid to the ‘how?’, without the proper amount of focus on the ‘why?’, and subsequently on the ‘what?’. Since I’m a firm believer in ‘practice what you preach’, I’ve started to develop training material that I think will contribute to asking the right questions in test automation. I believe that doing so will always lead to better test automation in the end. I hope to be able to present this training material to you, and to the general test automation and testing community, somewhere early next year (creating course material takes time!).

May next year be the year of the ‘why?’ in test automation, not just of the ‘how?’.

What does good test automation training look like?

As I’m moving away from by-the-hour work and more towards a hybrid of consulting, training, writing and speaking, one of the things I’m working on is slowly building up a workshop and training portfolio around topics I think I’m qualified to talk about. I have created a couple of workshops already, but so far, they are centered around a specific tool that I am interested in and enthusiastic about. This, however, has the downside that they’re probably targeted towards a relatively small group of interested people (not in the least because these tools are only available for a specific programming language, i.e., Java).

To extend my options with regards to delivering training and workshops, I am currently looking at developing workshops and training material that contain higher level and more generic material, while still offering practical insights and hands-on exercises. There are a lot of different approaches and possible routes that can be taken to achieve this, especially since there is no specific certification trajectory around test automation (nor do I think there should be, but that’s a wholly different discussion that I’ll probably cover in another blog post in time). So far, I haven’t figured out the ideal contents and delivery format, but ideas have been taking shape in my head recently.

Here are some subjects I think a decent test automation training trajectory should cover:

Test automation 101: the basics
Always a good approach: start with the basics. What is test automation? What is it not (here’s a quote I love from Jim Hazen)? What role does automation play in current development and testing processes and teams? Why is it attracting the interest it does? To what levels and what areas can you apply test automation and what is that test automation pyramid thing you keep hearing about?

Test automation implementation
So, now that you know what test automation (sorta kinda) is, how to apply it to your software development process? How are you going to involve stakeholders? What information or knowledge do you want to derive from test automation? How does it fit into trends such as Agile software development, BDD, Continuous Integration and Continuous Delivery?

Test automation, the good the bad and the ugly
It’s time to talk about patterns. Not about best practices, though, I don’t like that term. But there are definitely lessons to be learned from the past on what works and what doesn’t. Think data driven. Think maintainability. Think code review. Think (or rather, forget) code-free test automation. Think reporting. Think some more.

Beyond functional test automation: what else could automation be used for?
Most of what we’ve seen so far covers functional test automation: automated checks that determine whether or not some part of the application under test functions as specified or desired (or both, if you’re lucky). However, there’s a lot more to testing than mere functional checks. Of course there’s performance testing, security testing, usability testing, accessibility testing, all kinds of testing where smart application of tools might help. But there’s more: how about automated parsing of logs generated during an exploratory testing session? Automated test data creation / generation / randomization? Automated report creation? All these are applications of test automation, or better put, automation in testing (thanks, Richard!), and all these are worth learning about.

Note that nowhere in the topics above I am focusing on specific tools. As far as I’m concerned, getting comfortable with one or more tools is one of the very last steps in becoming a good test automation engineer or consultant. I am of the opinion that it’s much more important to answer the ‘why?’ and the ‘what?’ of test automation before focusing on the ‘how?’. Unfortunately, most training offerings I’m seeing focus solely on a specific tool. I myself am quite guilty of doing the same, as I said in the first paragraph of this post.

One thing I’m still struggling with is how to make the attendants do the work. It’s quite easy to present the above subjects as a (series of) lecture(s), but there’s no better way to learn than by doing. Also, I think hosting workshops is much more fun than delivering talks, and there’s no ‘workshop’ without actual ‘work’. But it has to be meaningful, relevant to the subject covered, and if possible, fun..

So, now that I’ve shared my thoughts on what ingredients would make up a decent test automation education, I’d love to hear what you think. What am I missing (I’m pretty sure the list above isn’t complete). Do you think there’s an audience for training as mentioned above? If not, why not? What would you do (or better, what are you doing) differently? This is a topic that’s very dear to me, so I’d love to hear your thoughts on the subject. Your input is, as always, much appreciated.

In the meantime, I’ve started working on a first draft of training sessions and workshops that cover the topics above, and I’m actively looking for opportunities to deliver these, be it at a conference or somewhere in-house. I’ve got a couple of interesting opportunities lined up already, which is why I’m looking forward to 2017 with great anticipation!

Managing test data in end-to-end test automation

One of the biggest challenges I’m facing in projects I’m contributing to is the proper handling of test data in automated tests, and especially in end-to-end test automation. For unit and integration testing, it is often a good idea to resort to mocking or stubbing the data layer to remain in control over the test data that is used in and required for the tests to be executed. When doing end-to-end tests, however, keeping all required test data in check in an automated manner is no easy task. I say ‘in an automated manner’ here, because once you start to rely on manual intervention for the preparation or cleaning up of test data, then you’re moving away from the ability to test on demand, which is generally not a good thing. If you want your tests to truly run on demand, having to rely on someone (or a third party process) to manage the test data can be a serious bottleneck. Even more so with distributed applications, where teams often do not have enough control over the dependencies they require in order to be able to do end-to-end (or even integration) testing.

In this post, I’d like to consider a number of possible strategies for dealing with test data in end-to-end tests. I’ll take a look at their benefits and their drawbacks to see if there’s one strategy that trumps all others (spoiler alert: probably not…).

Creating test data during test execution
One approach is to start every test, suite or run with a set-up phase where the test data required for that specific test, suite or run is created. This can be done by any technical means available: be it through direct INSERT statements in a database, a series of API calls that create new users, orders or any other type of test data object, or (if there really is no alternative) through the user interface. The main benefit of this approach is that there is a strong coupling between the created test data and the actual test, meaning that the right test data is always available. There’s some rather big drawbacks as well, though:

  • Setting up test data takes additional time, especially when doing through the user interface.
  • Setting up test data requires additional code, which increases the maintenance burden of your automated tests.
  • If an error occurs during the test data setup fase of your test, your actual test result will be unpredictable and therefore cannot be trusted. That is, if your test isn’t simply aborted before the actual test steps are executed at all…
  • This approach potentially requires tests to depend on one another in terms of the sequence in which they’re executed, which is a definite anti-pattern of test automation.

I’ve used this approach several times in my projects, with mixed results. Sometimes it works just fine, sometimes a little less so. The latter is most often the case when the data model is really complex and there’s no other way than mimicking user interaction by means of tools such as Selenium to get the data prepared.

Query test data prior to test execution
The second approach to dealing with test data around automated tests is to query the data before the actual test runs. This can be done either directly on the database, or possibly through an API (or even a user interface) that allows you to retrieve customers, articles or whatever type of data object you need for your test. The main benefit of this approach is that you’re not losing time creating test data when all you really care about is the test results, and that this approach results in less test automation code to maintain, especially when you can query the database directly. Here too, there are a couple of drawbacks that render this approach less than ideal as well:

  • There’s no guarantee that the exact data you require for a test case (especially with edge cases) is actually present in the database. For example, how many customers that have lived in Nowhereville for 17,5 years, together with their wife and their blue parrot, are actually in your database?
  • Sometimes getting the query right so that you’re 100% sure that you get the right test data is a daunting task, requiring very specific knowledge of the system. This might make this approach less than ideal for some teams.
  • Also, even when you get the query exactly right, does that really guarantee that you get results that you’re 100% sure will be a perfect fit for your test case?

Reset the test data state before or after a test run
I think this is potentially the best approach when having to deal with test data in end-to-end tests: either setting the test data database to the exact state by restoring a database backup or cleaning up the test database afterwards. This guarantees that test data-wise, you’re always in the exact same state before / after a test run, which massively improves predictability and repeatability of your tests. The main drawback is that often, this is not an option either due to nobody knowing enough about the data model to allow this, or by access to the database being restricted for reasons good or less than good. Also, when you’re dealing with large databases, doing a database reset or rollback might be a matter of hours, which slows down your feedback loop significantly, rendering it next to useless when your tests are part of a CD pipeline.

Virtualizing the data layer
Nowadays, there are several solutions on the market that allow you to effectively virtualize your data layer for testing purposes. A prime example of such a solution is Delphix, but there are several other tools on the market as well. I haven’t experimented with any of these for long enough to actually have formed an educated opinion, but one thing I don’t really like about this approach is that virtualizing the data layer (however efficient it may be) voids the concept of executing a true end-to-end test, since there’s no actual data layer involved anymore. Then again, for other types of testing, it may actually be a very good concept, just like service virtualization is for simulating the behavior of critical yet hard-to-access dependencies in test environments.

So, what’s your take on this?
In short, I haven’t found the ideal solution yet. I’d love to read about the approaches other people and teams are taking when it comes to managing test data in end-to-end automated tests, so feel free to send me an email, or even better, leave a comment to this post. Looking forward to seeing your replies!