How I would approach creating automated user interface-driven tests

One of the best, yet also most time-consuming parts of running a blog is answering questions from readers. Questions are asked here on the site (through comments), but also frequently via email. I love interacting with you, my readers, and I’ll do whatever I can to help you, but there’s one thing even better than answering questions: preventing them. Specifically, I get a lot of questions asking me how I’d approach creating a ‘framework’ (I prefer ‘solution’, but more on that at the end of this post) for automated end-to-end user interface-driven tests, or checks, as they should be called.

Also, I get a lot of questions related to blog posts I wrote a year or more ago, of which I do no longer support the solution or approach I wrote about at the time. I’ve put up an ‘old post’ notice on some of them (example here), but that doesn’t prevent readers from asking questions on that subject. Which is fine, but sometimes a little frustrating.

As a means of providing you all with the answers to some of the most frequently asked questions, I’ve spent some time creating a blueprint for a solution for automated user interface-driven tests for you to look at. It illustrates how I usually approach creating these tests, and how I deal with things like test specification and reporting. The entire solution is available for your reference on my GitHub page. Feel free to copy, share, read and maybe learn something from it.

Since I’m a strong believer in answering the ‘why?’ in test automation as well as the ‘how?’, I’ll explain my design and tool choices in this post too. That does not mean that you should simply copy my choices, or that they’re the right choices in the first place, but it might give you a little insight into the thought process behind them.

Tools included
First of all, the language. Even though almost all the technical posts I’ve written on this blog so far have covered Java tools, this solution is written in C#. Why? Because I’ve come to like the .NET ecosystem over the last year and a half, and because doing so gives me a chance to broaden my skill set a little more. Don’t worry, Java and C# are pretty close to one another, especially on this level, so creating a similar setup in Java shouldn’t be a problem for you.

Now then, the tools:

  • For browser automation, I’ve decided to go with the obvious option: Selenium WebDriver. Why? Because it’s what I know. Also, Selenium has been the go-to standard for browser automation for a number of years now, and it doesn’t look like that’s going to change anytime soon.
  • For test and test data specification, I’ve chosen SpecFlow, which is basically Cucumber for .NET. Why? Because it gives you a means of describing what your tests do in a business readable format. Also, it comes with native support for data driven testing (through scenario outlines), which is among the first features I look for when evaluating any test automation tool. So, as you can see, even if you’re not doing BDD (which I don’t), using SpecFlow has its benefits. I prefer using SpecFlow (or Cucumber for that matter) over approaches such as specifying your tests in Excel. Dealing with workbooks, sheets, rows and cells is just too cumbersome.
  • For running tests and performing assertions, I decided to go with NUnit. Why? It’s basically the .NET sibling of JUnit for Java, and it integrates very nicely with SpecFlow. I have no real experience with alternatives (such as MSTest), so it’s not really a well-argumented decision, but it works perfectly, and that’s what counts.
  • Finally, for reporting, I’ve added ExtentReports. Why? If you’ve been reading this blog for a while, you know that I’m a big fan of this tool, because it creates great-looking HTML reports with just a couple of lines of code. It has built-in support for including screenshots as well, which makes it perfectly suited for the user interface-driven tests I want to perform. Much better and easier than trying to build your own reports.

Top level: SpecFlow feature and scenarios
Since we’re using SpecFlow here, a test is specified as a number of scenarios written in the Gherkin Given-When-Then format. These scenarios are grouped together in features, with every feature describing a specific piece of functionality or behavior of our application under test. I’m running my example tests against the Parasoft demo application ParaBank, and the feature I’m using tests the login function of this application:

Feature: Login
	In order to access restricted site options
	As a registered ParaBank user
	I want to be able to log in

Scenario Outline: Login using valid credentials
	Given I have a registered user <firstname> with username <username> and password <password>
	And he is on the ParaBank home page
	When he logs in using his credentials
	Then he should land on the Accounts Overview page
	| firstname | username | password |
	| John      | john     | demo     |
	| Bob       | parasoft | demo     |
	| Alex      | alex     | demo     |

Scenario:  Login using incorrect password
	Given I have a registered user John with username john and password demo
	And he is on the ParaBank home page
	When he logs in using an invalid password
	Then he should see an error message stating that the login request was denied

Level 2: Step definitions
Not only should features and scenarios be business readable, I think it’s a good idea that your automation code is as clean and as readable as possible too. A step definition, which is basically the implementation of a single step (line) in a scenario, should convey what it does through the use of fluent code and descriptive method names. As an example, here’s the implementation of the When he logs in using his credentials step:

[When(@"he logs in using his credentials")]
public void WhenHeLogsInUsingHisCredentials()
    User user = ScenarioContext.Current.Get<User>();

    new LoginPage().

Quite readable, right? I retrieve a previously stored object of type User (the SpecFlow ScenarioContext and FeatureContext are really useful for this purpose) and use its Username and Password properties to perform a login action on the LoginPage. Voila: fluent code, clear method names and therefore readable, understandable and maintainable code.

Level 3: Page Objects
I’m using the Page Object pattern for reusability purposes. Every action I can perform on the page (setting the username, setting the password, etc.) has its own method. Furthermore, I have each method return a Page Object, which allows me to write the fluent code we’ve seen in the previous step. This is not always possible, though. The ClickLoginButton method, for example, does not return a Page Object, since clicking the button might redirect the user to either of two different pages: the Accounts Overview page (in case of a successful login) or the Login Error page (in case something goes wrong). Here’s the implementation of the SetUsername and SetPassword methods:

public LoginPage SetUsername(string username)
    OTAElements.SendKeys(_driver, textfieldUsername, username);
    return this;

public LoginPage SetPassword(string password)
    OTAElements.SendKeys(_driver, textfieldPassword, password);
    return this;

In the next step, we’ll see how I’ve implemented the SendKeys method as well as the reasoning behind using this wrapper method instead of calling the Selenium API directly. First, let’s take a look at how I identified objects. A commonly used method for this is the PageFactory pattern, but I chose not to use that. It’s not that useful to me, plus I don’t particularly like having to use two lines and possibly an additional third, blank line (for readability) for each element on a page. Instead, I simply declare the appropriate By locator at the top of the page so that I can pass it to my wrapper method:

private By textfieldUsername = By.Name("username");
private By textfieldPassword = By.Name("password");
private By buttonLogin = By.XPath("//input[@value='Log In']");

In this example, I also used the LoadableComponent pattern for better page loading and error handling. Again, whether or not you use it is entirely up to you, but I think it can be a very useful pattern in some cases. I don’t use it in my current project though, since there’s no need for it (yet). Feel free to copy it, or leave it out. Whatever works best for you.

Level 4: Wrapper methods
As I said above, I prefer writing wrapper methods around the Selenium API, instead of calling Selenium methods directly in my Page Objects. Why? Because in that way, I can do all error handling and additional logging in a single place, instead of having to think about exceptions and reporting in each individual page object method. In the example above, you could see I created a class OTAElements, which conveniently contains wrapper methods for individual element manipulation. My SendKeys wrapper implementation looks like this, for example:

public static void SendKeys(IWebDriver driver, By by, string valueToType, bool inputValidation = false)
        new WebDriverWait(driver, TimeSpan.FromSeconds(Constants.DefaultTimeout)).Until(ExpectedConditions.ElementIsVisible(by));
    catch (Exception ex) when (ex is NoSuchElementException || ex is WebDriverTimeoutException)
        ExtentTest test = ScenarioContext.Current.Get<ExtentTest>();
        test.Error("Could not perform SendKeys on element identified by " + by.ToString() + " after " + Constants.DefaultTimeout.ToString() + " seconds", MediaEntityBuilder.CreateScreenCaptureFromPath(ReportingMethods.CreateScreenshot(driver)).Build());
    catch (Exception ex) when (ex is StaleElementReferenceException)
        // find element again and retry
        new WebDriverWait(driver, TimeSpan.FromSeconds(Constants.DefaultTimeout)).Until(ExpectedConditions.ElementIsVisible(by));

These wrapper methods are a little more complex, but this allows me to keep the rest of the code squeaky clean. And those other parts (the page objects, the step definitions) is where 95% of the maintenance happens once the wrapper method implementation is finished. All exception handling and additional logging (more on that soon) is done inside the wrapper, but once you’ve got that covered, creating additional page objects and step definitions is really, really straightforward.

I have also created wrapper methods for the various checks performed in my tests, for exactly the same reasons: keeping the code as clean as possible and perform additional logging when desired:

public static void AssertTrue(IWebDriver driver, ExtentTest extentTest, bool assertedValue, string reportingMessage)
    catch (AssertionException)
        extentTest.Fail("Failure occurred when executing check '" + reportingMessage + "'", MediaEntityBuilder.CreateScreenCaptureFromPath(ReportingMethods.CreateScreenshot(driver)).Build());

Even though NUnit produces its own test reports, I’ve fallen in love with ExtentReports a while ago, and since then I’ve been using it in a lot of different projects. It integrates seamlessly with the solution setup described above, it’s highly configurable and its reports just look good. Especially the ability to include screenshots is a blessing for the user interface-driven tests we’re performing here. I’ve also included (through the LogTraceListener class, which is called in the project’s App.config) the option of adding the current SpecFlow scenario step to the ExtentReports report. This provides a valuable link between test results and test (or rather, application behavior) specification. You can see the report generated by executing the Login feature here.

Extending the solution
In the weeks to come, I’ll probably add some tweaks and improvements to the solution described here. One major thing I’d like to add is including an example of testing a RESTful API. So far, I haven’t found a really elegant solution for C# yet. I’m looking for something similar to what REST Assured does in Java. I’ve tried RestSharp, but that’s a generic HTTP client, not a library written for testing, resulting in a little less readable code if you do want to use it for writing tests. I guess I’ll have to keep looking, and any hints and tips are highly welcome!

I’d also like to add some more complex test scenarios, to show you how this solution evolves when the number of features, scenarios, step definitions and page objects grows, as they tend to do in real life projects. I’ve used pretty much the same setup in actual projects before, so I know it holds up well, but the proof of the pudding.. You know.

Wrapping up
As I’ve said at the beginning of this post, I’ve created this project to show you how I would approach the task of creating a solution for clients that want to use automated user interface-driven tests as part of their testing process. Your opinions on how I implemented things might differ, but that’s OK. I’d love to hear suggestions on how things can be improved and extended. I’d like to stress that this code is not a framework. Again, I can’t stand that word. This approach has proven to be a solution to real world problems for real world clients in the past. And it continues to do so in projects I’m currently involved in.

I wrote this code to show you (and I admit, my potential clients too) how I would approach creating these tests. I don’t claim it’s the best solution. But it works for me, and it has worked for my clients.

Again, you can find the project on my GitHub page. I hope it’ll prove useful to you.

How (not) to test RESTful APIs with Selenium WebDriver

I’ve seen the question of how to do RESTful API testing with Selenium WebDriver come up a lot in various places. I’ve seen it mainly on LinkedIn, but a couple of weeks ago I was asked this question in person as well. So, as your ever helpful consultant, I thought I’d be a good idea to show you how it’s done.

First, let’s create a new browser driver object. For the sake of simplicity, I’ll use Firefox, because it doesn’t require setting up a separate driver. And yes, I know implicit waits aren’t the best waiting strategy, but I didn’t feel like writing an ExpectedCondition.

driver = new FirefoxDriver();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

Now, instead of browsing to a web page, we’ll simply browse to the RESTful API endpoint for which we want to check the response. I’ll use as an example here to look up some data for the US zip code 90210:


This gives us the following output:

Response from the API in JSON format

Let’s grab the text we see on screen and convert it to a JSON object (using the org.json package):

WebElement element = driver.findElement(By.xpath("//pre"));
JSONObject jsonObject = new JSONObject(element.getText());

Now that we have a JSON object containing the API response, we can extract specific elements and check their value:

String valueToCheck = jsonObject.get("country").toString();
Assert.assertEquals(valueToCheck, "United States");

As a final step, throw everything away, never do this again and please forget everything you’ve seen so far in this post. Selenium is not an API testing tool. It has never been, and it will never be. So please don’t try and force it to be.

Please stick to simulating UI interaction when using Selenium and use a dedicated, fit for purpose tool for API testing. As an example, REST Assured can perform the exact same check as the above. With a single line of code. Which is far better readable to boot:

public void doRestTestProperly() {
		body("country", equalTo("United States"));

So, here’s to hoping I did my part in exterminating questions that feature ‘Selenium’ and ‘API testing’ in the same sentence.

Selenium and what it is not

Please note: this post is in no way meant as a rant towards some of my fellow testers. I’ve only written this to try and clear up some of the misconceptions with regards to Selenium WebDriver that I see popping up on various message boards and social media channels.

Selenium is hot in the test automation world. This probably isn’t news to you. If it is, you’re welcome, please remember you read it here first! In all seriousness, Selenium has been widely used for the past couple of years, and I don’t see its popularity fading any time soon. Due to its popularity, Selenium has been the go-to solution for everybody looking to write automated checks for browser-based applications, even though sometimes it might not be the actual best answer to the question at hand. In the past couple of years I have seen a lot of funny, strange and downright stupid questions being asked with regards to Selenium.

In this post, I would like to look at Selenium from a slightly different perspective by answering the question ‘what is Selenium NOT?’. All of the examples in the remainder of this blog post are based on actual questions or anecdotes I have read. I’m not going to link to them, since this post is not about blaming and shaming, it’s about educating those in need of some education.

Selenium is not a test tool
You read that right, Selenium is not a test tool. A lot of people (including a staggeringly large percentage of recruiters) get this wrong, unfortunately. To quote the Selenium web site:

Selenium automates browsers. That’s it! What you do with that power is entirely up to you.

My personal number one reason why Selenium is not a test tool: it does not provide a mechanism to perform assertions. For that, you’ll have to combine Selenium with JUnit or TestNG (for Java), NUnit (for C#) or any other actual (unit) testing framework. Selenium only performs predefined actions on a web site or other application running in a browser. It does this fairly well, but that doesn’t make it a test tool.

No, Selenium is NOT a test tool

No, Selenium is NOT a test tool

Off-topic: the same applies to tools that support Behaviour Driven Development, such as Cucumber and SpecFlow. Those aren’t test tools either. They can be used as an assistant in writing automated tests (checks), but that isn’t the same thing.

Selenium is not a tool to be used in API testing
I’ve seen this one come by a number of times, even as recently as a few days ago:

How can I perform tests on APIs using Selenium?

Again, Selenium automates browsers, so it operates on the user interface level of an application. Actions performed on a user interface might invoke API calls. Selenium is completely oblivious to this API interaction, however. There’s no way to have Selenium interact with APIs directly. If you want to perform automated checks on the API level, please use a tool that is specifically created for these types of checks, such as REST Assured. It might be wise to repeatedly ask yourself ‘Am I actually testing the user interface, or am I merely testing my application THROUGH that user interface?‘.

Selenium is not a performance test tool
Seeing how easy it is to write tests using Selenium (I’m not saying it’s easy to write GOOD tests, though), a lot of people seem to think that these tests can easily be leveraged to execute load and performance tests as well, for example using Selenium Grid. This is a pretty bad idea, though. Selenium Grid is a means to execute your functional tests in parallel, thereby shortening the execution time. It is not meant to be (ab)used as a performance testing platform, if only for these two simple reasons:

  • It doesn’t scale very well, so you’ll probably have a hard time generating anything but trivial loads
  • Selenium does not offer a mechanism to measure response times (at least not without taking into account the time needed at the client side to render a page or specific elements), and Selenium Grid isn’t able to gather and aggregate these response times for each individual note and present them in a concise and useful manner.

Both of these are essential if you’re serious about your performance testing, so please use a tool that is specifically written for that purpose, such as JMeter.

Selenium can’t handle your desktop applications
I feel that I’m starting to repeat myself here: Selenium automates browsers. Anything outside the scope of that browser cannot be handled by Selenium. This means that Selenium also can’t handle alerts and dialogs that are native to the operating system, such as the Windows file upload/download dialogs. To handle these, either use a tool such as AutoIt (if you’re on Windows) or, and this approach is much preferred, bypass the user interface altogether and use a solution such as the one presented in this blog post.

I sincerely hope this clears up some of the misconceptions around Selenium. If you have any other examples, feel free to post a comment or send me an email.