Using data tables in Cucumber-JVM for better readable specifications
If you have ever worked in a team that practiced BDD and used Cucumber or SpecFlow to create executable specifications, you’ll know that writing scenarios that are readable is hard. As in: really, really hard.
In this blog post, I want to take a closer look at a feature of the Java Cucumber bindings that can help you write specs that are easy on the eye, and that I feel isn’t talked about often enough: using data tables.
Data tables are tables that you can pass as an argument to an individual step. The data in this data table can then be processed in the underlying step definition. As such, data tables should not be confused with example tables, which are tables containing examples for entire scenarios, to be used with scenario outlines. Data tables, on the other hand, allow you to use more complex data structures as an argument to an individual step.
Let’s take a look at a couple of examples of the use of different forms of data tables, and how they compare to specifying the same data in more verbose textual format.
Example 1 - Key-value pairs
As a first example, let’s consider the situation where you’ll need to specify some sort of initial state that consists of multiple objects (‘things’) that can be modeled as a simple key-value pair. Some examples could be:
- currencies and their associated currency code (USD, EUR, etc.)
- airports and their IATA location identifier (LAX, JFK, etc.)
- employee names and their associated employee number
or, in the example below, football clubs and the stadium they play their home games at. Often, I see teams model these in their specifications like this:
and the step definition associated with these steps:
While this works, technically, I believe there are two inherent problems with this example. First, it is tedious to read because of the repeated text. Second, we’re using three steps (a Given
and two And
s) to specify a single initial state, which to me feels counterintuitive. As a whole, this kind of specification reminds me a little too much of this iconic movie scene..
There’s a much better way to include the same data in your specification, and that is by using a data table. The same specification can be rewritten as
Please note that the first row does contain data, not table headers, even though the syntax highlighting suggests otherwise!
and the associated step definition might look like this:
As you can see, Cucumber is able to automatically convert the data table that is passed as an argument to the @Given
step to a Map
, which is essentially a collection of key-value pairs. You can then iterate over the key-value pairs in the Map
using a forEach()
construct and process each entry as required for your acceptance test.
Running this example yields the following output:
Juventus play their home games at Allianz Stadium
AC Milan play their home games at San Siro
AS Roma play their home games at Stadio Olimpico
Example 2 - Multi-column tables
What about the situations where you want to model data entities that have more attributes, and therefore cannot be simply represented as a key-value pair? Think of entities such as:
- Addresses (a combination of a street name, a house number, a zip code, a city, …)
- Flight information (a combination of a flight number, an airline, an airport of origin, a destination airport, …)
- Bank transfers (a combination of a date, an amount, a source bank account, a destination bank account, …)
or, in this example, data related to specific players of a football club. We could choose to specify this data as:
and implements these steps using a step definition method such as
This example, too, suffers from the same problems as the previous one. There’s a lot of repetition, which makes the specification boring to read, and we need multiple steps to create an initial state. Let’s do better!
The same state can be modeled with a data table that might look like this:
Note that I decided to include table headers here, as this makes it much clearer which column corresponds to which attribute for a player. Using table headers is also really helpful in the associated step definition code, as we can see in this example implementing the step:
Cucumber automatically converts the table structure we’ve seen in this example to an argument of type List<Map<String, String>>
, or, in plain English, a List
of Map
s, where each Map
represents a data entity (a football player, here), and each player has specific attributes, represented by the keys and their values in each Map
.
To iterate over the List
, we can use a forEach()
loop again, as can be seen in the code snippet above. Each property value is retrieved using its corresponding key, which is equal to the header of the respective data table column in our specification.
Running this example yields the following output:
Cristiano Ronaldo of Portugal, born on 05-02-1985, plays for Juventus since the 2018/2019 season
Matthijs de Ligt of the Netherlands, born on 12-08-1999, plays for Juventus since the 2019/2020 season
Giorgio Chiellini of Italy, born on 14-08-1984, plays for Juventus since the 2005/2006 season
Example 3 - Tables with both row and column headers
As a final example, let’s look at an even more intricate data structure: a table with both column and row headers. Example usages of such a table can be:
- Specifying availability per weekday and time of day for a delivery service
- Specifying train fare rates for combinations of seating class and age group for a railway company
- Specifying the distribution of gold, silver and bronze medals for a number of countries competing in the Olympics
or, in the example below, final scores for football matches. One way I’ve seen teams specify data similar to this:
implemented by
Once again, a lot of repetition in the specs, which makes them boring to read, you know the drill by now. Fortunately, there’s a way out of this mess, too:
Here, too, the syntax highlighting may be a little confusing: the first column in this table contains row headers.
As you can see, the aforementioned state can be modeled using a table with both row and column headers. Here, too, Cucumber is able to directly transform this into a suitable data structure:
The data we specified is transformed into a Map
, where the keys are the row headers (the dates on which the matches were played), and the values were Map
s, too, with keys corresponding to the column headers (club names) and values containing the number of goals scored by each team.
In this example, too, we iterate over the outer Map
to process each individual match using forEach()
, and we use the column headers to get the values (the scores for each team) we want from the inner Map
using get()
.
Running this example yields the following output:
The final score of the Derby d'Italia played on 17-01-2021 was Internazionale 2, Juventus 0
The final score of the Derby d'Italia played on 08-03-2020 was Internazionale 0, Juventus 2
The final score of the Derby d'Italia played on 06-10-2019 was Internazionale 1, Juventus 2
In this article, we’ve seen some examples of how to use data tables to specify complex data more efficiently in Cucumber .feature files, how Cucumber automatically transforms these tables into specific data structures, and how we can iterate over and process that data effectively.
Effectively, what we’ve done is removing the iterative nature from our specifications to make them more readable, and move the iterating over the data to our step definitions. In a follow-up article, I’ll dive deeper into using data table transformers to handle complex data structures in an even more efficient way.
All the code shown in this blog post can be found on GitHub.
"