More troubles with test data

If managing test data in complex end-to-end automated test scenarios is an art, I’m certainly not an artist (yet). If it is a science, I’m still looking for the solution to the problem. At this moment, I’m not even sure which of the two it is, really..

The project
Some time ago, I wrote a blog post on different strategies to manage test data in end-to-end test automation. A couple of months down the road, and we’re still struggling. We are faced with the task of writing automated user interface-driven tests for a complex application. The user interface in itself isn’t that complex, and our tool of choice handles it decently. So far, so good.

As with all test automation projects I work on, I’d like to keep the end goal in mind. For now, running the automated end-to-end tests once every fortnight (at the end of a sprint) is good enough. I know, don’t ask, but the client is satisfied with that at the moment. Still, I’d like to create a test automation solution that can be run on demand. If that’s once every two weeks, all right. It should also be possible to run the test suite ten times per day, though. Shorten the feedback loops and all that.

The test data challenge
The real challenge with this project, as with a number of other projects I’ve worked on in the past, is in ensuring that the test data required to successfully run the tests is present and in the right state at all times. There are a number of complicating factors that we need to deal (or live) with:

  • The data model is fairly complex, with a large number of data entities and relations between them. What makes it really tough is that there is nobody available that completely understands it. I don’t want to mess around assuming that the data model looks a certain way.
  • As of now, there is no on demand back-up restoration procedure. Database back-ups are made daily in the test environment, but restoring them is a manual task at the moment, blocking us from recreating a known test data state whenever we want to.
  • There is no API that makes it easy for us to inject and remove specific data entities. All we have is the user interface, which results in long setup times during test execution, and direct database access, which isn’t of real use since we don’t know the data model details.

Our current solution
Since we haven’t figured out a proper way to manage test data for this project yet, we’re dealing with it the easiest way available: by simply creating the test data we need for a given test at the start of that test. I’ve mentioned the downsides of this approach in my previous post on managing test data (again, here it is), but it’s all we can do for now. We’re still in the early stages of automation, so it’s not something that’s holding us back to much, but all parties involved realize that this is not a sustainable solution for the longer term.

The way forward
What we’re looking at now is an approach that looks roughly like this:

  1. A database backup that contains all test data required is created with every new release.
  2. We are given permission to restore that database backup on demand, a process that takes a couple of minutes and currently is not yet automated.
  3. We are given access to a job that installs the latest data model configuration (this changes often, sometimes multiple times per day) to ensure that everything is up to date.
  4. We recreate the test data database manually before each regression test run.

This looks like the best possible solution at the moment, given the available knowledge and resources. There are still some things I’d like to improve in the long run, though:

  • I’d like database recreation and configuration to be a fully automated process, so it can more easily be integrated into the testing and deployment process.
  • There’s still the part where we need to make sure that the test data set is up to date. As the application evolves, so do our test cases, and somebody needs to make sure that the backup we use for testing contains all the required test data.

As you can see, we’re making progress, but it is slow. It makes me realize that managing test data for these complex automation projects is possibly the hardest problem I’ve encountered so far in my career. There’s no one-stop solution for it, either. So much depends on the availability of technical hooks, domain knowledge and resources at the client side.

On the up side, last week I met with a couple of fellow engineers from a testing services and solutions provider, just to pick their brain on this test data issue. They said they have encountered the same problem with their clients as well, and were working on what could be a solution to this problem. They too realize that it’ll never be a 100% solution to all test data issues for all organizations, but they’re confident that they can provide them (and consultants like myself) with a big step forwards. I haven’t heard too many details, but I know they know what they’re talking about, so there might be some light at the end of the tunnel! We’re going to look into a way to collaborate on this solution, which I am pretty excited about, since I’d love to have something in my tool belt that helps my clients tackle their test data issues. To be continued!

11 thoughts on “More troubles with test data

  1. Hello Bas and thank you for this sequel to your interesting post about test data from last year. This has given me ideas that I’m eager to test out in the project I’m working on. Keep us posted regarding that cliffhanger ending of this post.

    • Thank you Lisa! I’m not sure what the end result is going to look like yet but I’ll surely do another post if there’s something interesting coming up!

      I’d love to read your experiences too!

  2. Pingback: Java Web Weekly, Issue 168 | Baeldung

  3. Pingback: Testing Bits – 3/12/17 – 3/18/17 | Testing Curator Blog

  4. Pingback: Java Testing Weekly 12 / 2017

  5. Pingback: Java Web Weekly, Issue 169 | Baeldung

  6. Hey Bas, Thanks for sharing your experience in this post. I am also an automation test engineer, I am exploring different data sources for test data. I have been using excels but exploring if database could be a better option. I feel as the size of app increases and we build more automation scripts maintaining data in excels is a pain and ll also affect performance. how is your experience using database, is it better that excel?
    hope to hear back -shiv

    • Hey Shiv,

      I haven’t used an actual database as a source for test data that often (only on a really small basis). In terms of stability and ease of use I think it definitely beats Excel (pretty much everything does imo). At the moment I’m mostly using Cucumber/SpecFlow files for specification of data, but that’s because I’m working on automating end user journeys at the moment. For other types of tests, either a database, XML or JSON can work fine. I’d stay away from Excel as much as possible though. Too easy to get things wrong.

  7. Thanks Bas for the reply
    I have tried json as a data source but It’s not fitting the requirement, as client needs it to be made easier for manual testers to use. I have tried cucumber, I feel it fits well in the Agile kind of project where all teams collaborate. I am not sure if its the right tool for a waterfall or v kind of model. what do you think?

    • Hi Shiv,

      I’m using Cucumber (and SpecFlow) a lot lately and yes, it works fine for me as well. Doesn’t matter what the development cycle looks like. Sure, Cucumber is a BDD tool, and BDD is often combined with Agile, but you don’t have to do BDD to profit from using Cucumber.

  8. Pingback: Java Web Weekly, Issue 168 – Baeldung

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.