More troubles with test data

If managing test data in complex end-to-end automated test scenarios is an art, I’m certainly not an artist (yet). If it is a science, I’m still looking for the solution to the problem. At this moment, I’m not even sure which of the two it is, really..

The project
Some time ago, I wrote a blog post on different strategies to manage test data in end-to-end test automation. A couple of months down the road, and we’re still struggling. We are faced with the task of writing automated user interface-driven tests for a complex application. The user interface in itself isn’t that complex, and our tool of choice handles it decently. So far, so good.

As with all test automation projects I work on, I’d like to keep the end goal in mind. For now, running the automated end-to-end tests once every fortnight (at the end of a sprint) is good enough. I know, don’t ask, but the client is satisfied with that at the moment. Still, I’d like to create a test automation solution that can be run on demand. If that’s once every two weeks, all right. It should also be possible to run the test suite ten times per day, though. Shorten the feedback loops and all that.

The test data challenge
The real challenge with this project, as with a number of other projects I’ve worked on in the past, is in ensuring that the test data required to successfully run the tests is present and in the right state at all times. There are a number of complicating factors that we need to deal (or live) with:

The data model is fairly complex, with a large number of data entities and relations between them. What makes it really tough is that there is nobody available that completely understands it. I don’t want to mess around assuming that the data model looks a certain way.
As of now, there is no on demand back-up restoration procedure. Database back-ups are made daily in the test environment, but restoring them is a manual task at the moment, blocking us from recreating a known test data state whenever we want to.
There is no API that makes it easy for us to inject and remove specific data entities. All we have is the user interface, which results in long setup times during test execution, and direct database access, which isn’t of real use since we don’t know the data model details.

Our current solution
Since we haven’t figured out a proper way to manage test data for this project yet, we’re dealing with it the easiest way available: by simply creating the test data we need for a given test at the start of that test. I’ve mentioned the downsides of this approach in my previous post on managing test data (again, here it is), but it’s all we can do for now. We’re still in the early stages of automation, so it’s not something that’s holding us back to much, but all parties involved realize that this is not a sustainable solution for the longer term.

The way forward
What we’re looking at now is an approach that looks roughly like this:

A database backup that contains all test data required is created with every new release.
We are given permission to restore that database backup on demand, a process that takes a couple of minutes and currently is not yet automated.
We are given access to a job that installs the latest data model configuration (this changes often, sometimes multiple times per day) to ensure that everything is up to date.
We recreate the test data database manually before each regression test run.

This looks like the best possible solution at the moment, given the available knowledge and resources. There are still some things I’d like to improve in the long run, though:

I’d like database recreation and configuration to be a fully automated process, so it can more easily be integrated into the testing and deployment process.
There’s still the part where we need to make sure that the test data set is up to date. As the application evolves, so do our test cases, and somebody needs to make sure that the backup we use for testing contains all the required test data.

As you can see, we’re making progress, but it is slow. It makes me realize that managing test data for these complex automation projects is possibly the hardest problem I’ve encountered so far in my career. There’s no one-stop solution for it, either. So much depends on the availability of technical hooks, domain knowledge and resources at the client side.

On the up side, last week I met with a couple of fellow engineers from a testing services and solutions provider, just to pick their brain on this test data issue. They said they have encountered the same problem with their clients as well, and were working on what could be a solution to this problem. They too realize that it’ll never be a 100% solution to all test data issues for all organizations, but they’re confident that they can provide them (and consultants like myself) with a big step forwards. I haven’t heard too many details, but I know they know what they’re talking about, so there might be some light at the end of the tunnel! We’re going to look into a way to collaborate on this solution, which I am pretty excited about, since I’d love to have something in my tool belt that helps my clients tackle their test data issues. To be continued!