On designing a self-diagnosing and self-healing automated test framework – part 1

This post was published on July 8, 2015

This is the first post in a short series on designing robust, self-diagnosing and, if possible, even self-healing tests. You can read all articles in this series by clicking here.

I like automating tests. So much that I have even made my profession out of it. What I don’t like though is wasting a lot of time trying to find out why an automated test failed and whether the failure was a genuine bug, a flaky test or some sort of test environment issue.

A quick Google search showed me that there are a couple of commercial test tools that claim they come with some sort of self diagnosing and even a self healing mechanism, for example this one. I’m very interested to see how such a mechanism works. Skeptic as I am, I don’t really believe it until I’ve seen it. Nonetheless, I think it would be interesting to see what can be done to make existing automated test frameworks more robust by improving their ability to self-diagnose and possible even self-heal in case an unexpected error occurs. So that’s what I’ll try to do in the next couple of posts.

The step by step approach I want to follow and the considerations to be made along the way are (very) loosely inspired on a scientific article titled ‘A Self-Healing Approach for Object-Oriented Applications’, which can be downloaded from here. This article presents an approach and architecture for fault diagnosis and self-healing of interpreted object-oriented applications. As the development and maintenance of automated tests should be treated as any other software development project, I see no reason why the principles and suggestions presented in the article could not apply to a test automation framework, at least to some extent…

When is a system self-diagnosing and self-healing?
Let’s start with the end in mind, as Stephen Covey so eloquently put it. What are we aiming for? In order for a system to be self-diagnosing, it should:

Be self-aware, or more concretely, always have a deterministic state
Recognize the fact that an error has occurred
Have enough knowledge to stabilize itself
Be able to analyze the problem situation
Make a plan to heal itself
Suggest healing solutions to the system administrator (in this case, the person responsible for test automation)

If we want our system not only to be self-diagnosing but also self-healing, the system should also:

Heal itself without human intervention

In this post – and probably some future posts as well – I will try and see whether it is possible to design a generic approach for creating robust, self-diagnosing and self-healing test automation frameworks. I’ll try and include meaningful examples based on a relatively simple Selenium test framework wherever possible.

Self-aware tests
The most straightforward way to make any piece of software self-aware is to introduce the concept of state. A robust program, and therefore an automated test as well, should always be in a specific state when it is executing. For robustness, we will assume that the state model for an automated test is deterministic, i.e., a test can never be in more than one state, and an event that triggers a state transition should always result in the test ending up in a single new state. Let’s say we identify the following states that an automated test can be in:

Not running
Initialization
Running
Error
Teardown

The state model or state transition diagram could then look like this:

A sample implementation of this state model (also known as a finite state machine or FSM) can be created using a Java enum:

public enum State {
	
	NOT_RUNNING {
        @Override
        State doTransition(String input) {
            System.out.println("Going from State.NOT_RUNNING to State.INITIALIZATION");
            return INITIALIZATION;
        }
    },
    INITIALIZATION {
        @Override
        State doTransition(String input) {
        	if (input.equals("error")) {
        		System.out.println("Going from State.INITIALIZATION to State.ERROR");
        		return ERROR;
        	} else {
        		System.out.println("Going from State.INITIALIZATION to State.RUNNING");
        		return RUNNING;
        	}
        }
    },
    // The RUNNING and TEARDOWN states are implemented in the same way as INITIALIZATION state
    ERROR {
        @Override
        State doTransition(String input) {
        	if (input.equals("ok")) {
        		System.out.println("Going from State.ERROR to State.NOT_RUNNING");
        		return NOT_RUNNING;
        	} else {
        		System.out.println("Remaining in State.ERROR");
        		return this;
        	}
        }
    };
 
    abstract State doTransition(String input);
}

In a next post, I will show how we can apply this state model to a simple Selenium WebDriver test to make it more robust. I will also demonstrate how this state model helps us in letting our tests fail gracefully and in determining what exactly constitutes an error (versus a failed check, for example).