On Test Automation

I’m (re-)starting a newsletter

2026-05-11T00:00:00+00:00

Just a quick update to let those of you who bookmarked this blog or who have subscribed to my RSS feed know that I have (re-)started a newsletter.

As you might know (or not), while I’ve been pretty active on LinkedIn over the years, I do have a love-hate (or rather an appreciate-hate) relationship with that platform.

Lately, I’ve been noticing that the pendulum is swinging in the ‘hate’ direction more often, mainly because the ever-changing algorithm used by LinkedIn makes it incredibly hard to predict if people are even going to see what I write.

I’d rather publish my thoughts, ideas and other ramblings via a platform that I do control, and that platform will be a newsletter.

I’ve had a newsletter in the past, but that only lived for about three months. This time, I intend to keep writing and publishing a new issue every week. The first edition goes out a few hours after I’m writing this, and a new issue will be sent to subscribers every Monday morning around 11 AM CET.

But what about the blog?

I’ll still publish to the blog, too, but that will be on a much less regular basis. Just like it has been for a while, really. The idea is to post the more ‘technical’ posts, that is, the ones including code, directly to my blog, whereas the ‘text-and-images-only’ posts go through my blog post first.

My priority is with the newsletter, though.

That’s easy, just go to the subscription page, leave your email address, click the button on the confirmation email and you’re in.

I promise I won’t use the newsletter or your email to spam or sell to you. Ever.

My thoughts on ‘self-healing’ in test automation

2026-04-09T00:00:00+00:00

From what I’ve seen over the years, tests that exercise a system through the graphical user interface are disproportionally likely to fail for reasons other than an actual, genuine product failure. Call them false positives, call them flaky tests, call them whatever you want.

There are good reasons for this: graphical user interfaces are interfaces optimized for consumption by humans, not by machines or code. There is a lot of (often asynchronous) processing going on in your browser, all to create an application that looks good, is pleasant to work with, and generally provides a good end user experience. And remember, that end user is a human being, not a computer.

The problem

The dynamic nature of a modern GUI often leads to test failures, even when the behaviour of the application remains unchanged. For example, if the text on a button that submits a form for a loan application used to be Submit, but changes to Apply, and you have Playwright tests that first locate and then click the button using

await page.getByRole('button', { name: 'Submit' }).click();

your test is going to fail to locate the button, even when you’re using getByRole(), a Playwright-recommended locator.

Another example: let’s say you check that the table of accounts in an online banking platform is visible by using its HTML id attribute, like this:

await expect(page.locator('#accounts')).toBeVisible();

When the value of this id attribute changes, for example because of a new version of the framework used to build the frontend is used, your test is going to fail to locate the table, and subsequently tell you the table is not visible, even when it is.

Self-healing algorithms - the solution?

To avoid unwanted rework, many tools these days offer a ‘self-healing’ feature: whenever a test fails to locate an element, it will try and identify the element that you intended to locate, often using a probabilistic algorithm that is sold to you as ‘AI’.

What this means is that it consults a large collection of training data, finds similar occurrences of changes in UI layout, either from your own history of changes or from other applications, checks those against what it sees on the screen, and from there it selects the candidate element that is most likely to be the one that you intended to locate. These tools will also often report on the results of the changes they made in a test run, so that a human being can review these changes afterwards and approve or reject them.

The bigger the size of the training database, the higher the quality of the training data, and the higher the sophistication of the algorithm, the higher the probability that the tool identifies the correct element, e.g., the button with the Apply text that replaced the Submit text. Still, trying to find the ‘right’ element is a probabilistic process, which means that mistakes will be made. That’s fine, as long as you’re aware of the risk, and you’re willing to accept it.

Why I think self-healing is a bad idea

So, would I recommend using self-healing frameworks as a solution to the challenge of often-changing graphical user interfaces and the test code maintenance that is the result thereof?

Well, no.

Sure, using these tools might reduce the time required to maintain your tests in the case of changing HTML and the need to update the corresponding element locators. And I don’t even have a problem with the fact that the algorithms they use are probabilistic, which means they can make mistakes. I mean, there’s a lot of fairly useless and downright bad human-written test code out there, too, so I don’t think that problem is unique to LLM-powered test tools or LLM-generated test code.

No, the biggest problem I have with ‘self-healing’ test tools and frameworks is that they are, to me, nothing more than band-aids, hiding the real problem that is underneath.

That real problem, to me, is the fact that the people writing the tests weren’t aware that the locators changed in the first place. Why did the text on the button change from Submit to Apply? Who committed that change in the product code? And why didn’t they either change the corresponding test code, too, or informed someone in their team that the test might break because of this? Or, in case of the second example, why didn’t we know that the framework update might lead to changing id values? Or, as is often the case, that the update happened in the first place?

That’s the problem that we need to address. We need to close the communication and collaboration gaps in our teams, instead of trying to patch them up with an algorithm. Test automation isn’t the deliberate chase of green checkmarks, it’s using tools to efficiently detect changes in the behaviour of our product. Self-healing, to me, feels like an attempt to sweep these changes under the RUG.

I don’t know about you, but I’d rather address the real problem than applying a band-aid and hoping the problem will remain out of sight.

The ‘valuable’ in valuable feedback, fast

2026-04-01T00:00:00+00:00

When I talk about the goals and the purpose of test automation, I often use the phrase ‘valuable feedback, fast’: we use tools to support our testing to help us get valuable information about the state of our product in the most efficient manner possible.

The ‘fast’ part of ‘valuable feedback, fast’ is pretty self-explanatory for most people: as build and release cycles are becoming shorter, teams want to be informed timely about any unexpected changes in behaviour of their product, often after every change they make to that product.

Tools can help them achieve that by running quick, focused tests automatically when a change is made or committed to version control. Of course, it takes plenty of hard work to write those tests to be fast, but that’s not what I wanted to talk about here.

The ‘valuable’ in ‘valuable feedback, fast’ is a much more ambiguous term, and one that deserves some more explanation. To me, there are multiple dimensions to what makes a test valuable, and in this post, I want to unpack and address them one by one.

Valuable = important to someone who matters

Borrowing from the classic definition of ‘quality’ as defined by Jerry Weinberg and further refined by James Bach and Michael Bolton, this is where it all starts. The information presented by a test should be important to someone who matters. That someone could be a member of the development team, a stakeholder such as a product owner or business analyst, the end user of the product, or a combination of those.

Without that importance, a test is meaningless, dead weight. It could be the most reliable, best-written test ever, but if the information that is provided by it is not important to someone who matters in the context of the product, why bother writing, running and maintaining the test?

Valuable = covering what matters

Test coverage is a tricky subject, and I want to steer clear of the discussion on what ‘coverage’ means exactly in this blog post. The only realistic answer is ‘it depends’, anyway, as there are so many ways to define coverage (line, branch, requirements, mutation, …).

Having said that, for the information provided by our tests to be valuable, teams should invest time in making sure that the tests cover the parts of the product behaviour that are deemed ‘important enough’ in a sufficient manner. What exactly constitutes ‘sufficient’ here depends on, you guessed it, the context.

Some products require deeper, more thorough coverage than others. The same applies to individual parts of the same product. It all depends on the acceptable amount of risk a team is willing to take before putting a product in the hands of their users. Teams would do well to have a continual discussion about these risks and the extent to which they are covered by the tests that accompany and scrutinize the product.

Valuable = trustworthy

The higher the degree of automation in the build and delivery process of a product, and that includes testing, the more teams will rely (and have to rely) on the results of the execution of that automation. Concerning tests, that means that teams need to be able to rely on the information presented by the tests, because they will make decisions based on that information.

The nature of that decision might vary from anywhere between ‘this build seems sufficiently stable to warrant deeper testing’ to ‘this change is ready to be put in the hands of our users’. No matter what the specific decision is, if teams make it based on the results of your test automation, even in part, they can only confidently do so if the information provided by the tests is trustworthy.

In practice, that means that when a test emits a signal indicating a problem with the product, the team can safely conclude that there is a problem with the product, not with the test, the data it uses or the environment it runs in (no false positives). It also means that when a test does not emit such a signal, the team can trust that the particular piece of behaviour exercised by the test is working according to expectations expressed in the test (no false negatives).

Valuable = actionable

Another dimension of the value of the feedback provided by a test is that it should be actionable. This applies specifically to those situations where a test ‘fails’, i.e., it indicates a problem with the product I have put ‘fails’ between quotes here, because the test didn’t fail, the product failed the test. There’s a difference.

Anyway, when a test result indicates a (potential) problem with the product, teams need to able to act on that information as soon as possible, spending as little time digging deeper into the product or into the test as possible to identify the root cause of the problem. Some practices that might help here are:

Making your test scope as small as possible - the fewer moving parts your test has, the easier it will be to identify which of those parts made a move that was unexpected
Have good test names - A descriptive test name that tells you what part of the behaviour your product verifies and what the expected behaviour is helps in finding out where exactly the problem might be found
Use custom assertion messages - Many test frameworks allow you to specify custom, descriptive error messages in case of assertion failures (something RestAssured.Net supports as of version 5.0.0, too)

So, is this a complete and final definition of what ‘valuable’ means to me when I talk about ‘valuable feedback, fast’ as the goal of test automation? I don’t think so. I don’t know if it is complete, but it definitely is a good reflection of my current thoughts on ‘value’ in test automation right now. Those thoughts are definitely not ‘final’, and I would appreciate your takes on what I wrote here.

On increasing focus in my career

2026-03-13T00:00:00+00:00

Note: this is not the most well-structured blog post I’ve written, but rather a recording of my not (yet) completely organized thoughts.

Last week, I was at the Test Automation Days conference, both as a member of the program committee, and as a last-minute substitute workshop facilitator. As is typically the case with conferences, I had a lot of conversations, catching up with people I’ve met before, as well as getting to know people I haven’t. One of the questions that invariably comes up in these conversations is

“So, what are you up to these days?”

I typically start my answer with a lame attempt at humour by saying “well, do you have half an hour?”. Poor jokes aside, it is true, though: I am working on a lot of different things these days. So much so that I’m sometimes feeling like I am spreading myself a little thin. For example, at the moment of writing this I am involved in

working part time with a consulting client here in the Netherlands
running workshops and training courses for several clients
in the final stages of a corporate mentoring engagement
wrapping up the creation of a video course for a new platform
talking about and preparing for several public speaking gigs
reviewing submissions for several conferences

And then there’s the writing, the reading, the watching videos, the conversations to help out fellow testers, and more. Now, I enjoy doing all of these things (well, except for editing videos of me talking, that is painful), but it is a lot. In fact, I think I would be better off doing fewer things, going deeper on those, spending more time on fewer things and ultimately, raising the quality of the work I deliver.

Which means that something, or rather several things, will have to go. First on the list: creating video courses. I’m absolutely going to finish the one I’m working on now, but that will be the last one I do. Creating video content takes a lot of time, and in all honesty, the return on investment isn’t great. Plus, while I enjoy seeing the result go live, I could do without the process of writing transcripts and recording and then editing video. Did I tell you I despise editing video footage of me talking already?

As of today, my main focus will be on further building my training business. Why training? Well, for several reasons. First of all, it’s the kind of work I like doing best. That in itself is reason enough, but there’s more… It also gives me the most flexibility, and that’s something I’ll really need in the near future, but more on that in a moment. Another reason for focusing on training is that it puts me in front of lots of different people, teams and companies, which is a great way to build my network. And finally, and as I’m running a business, not insignificantly, running training courses, especially in-company, also pays the best, by a long shot.

I will keep working on some of the other activities, but it looks like I’ll have to practice saying ‘no’ a little more often. For example, I enjoy reviewing conference submissions, but it takes time I could spend on other things, too.

I am going to continue my current consulting engagement, because it is interesting and there’s plenty of opportunity to run in-company workshops, but it’s not very likely I’ll take on additional consulting gigs in the near future.

If a corporate mentoring gig comes up, I’ll happily take it, but I am not going to go out of my way to find one.

The public speaking will stay as well, but here too, I will consider and take what comes my way, instead of actively pursuing speaking gigs at meetups and conferences, especially those abroad. I do have a few in-company speaking gigs coming up in the next couple of months, as well as a keynote at an online conference. I enjoy speaking, so I will continue doing talks, but again, pretty much only when I am invited to do so.

Finally, I will keep spending some time every week reading and writing, reading because I need to and want to stay on top of what is happening, writing because it is both a great way to organize my thoughts and to share those same thoughts with the world. Depending on opportunities, public speaking will be part of this, too.

In short, the only part of my business and service offerings that I will actively build and grow is my training business. What I’m looking for is spending most of my working time preparing for or running training sessions, with a few hours a week spent on outreach, coordination, and other logistical matters that come with running a training business. Ideally, running these sessions will also give me the opportunity to visit different places in the world, something I am able to do already, but I would love to go to more places, especially outside Europe (most training work I do is within the EU right now).

If all goes well, and that’s what I’m slowly working towards and have been working towards for a while now, that will leave a decent amount of time during the week for things outside of work. Most notably, for cycling.

I have always enjoyed cycling, but until recently, I only really brought out my racing bike every now and then. A few months ago, that changed, and I am now working my way towards riding ever longer distances (I’ll never be a fast rider, and that’s fine). I’ll ride my first century next month. In May, I hope to complete the cycling version of one of the most famous Dutch sporting events of all time. If that goes well, I have my eyes set on a much longer event in 2027.

The only problem with cycling, and training for long-distance cycling events in particular, is that it takes time, a LOT of time. I go out for a bike ride three times a week, with my shortest rides being around 2 hours. A typical weekend ride, right now, is 5-6 hours, and that will only grow as I get closer to the events I would like to complete.

I don’t want to spend all my hours outside of work on a bike, and neither does my family. So, I decided to slowly start changing the way I work and the way I spend my time to make room for spending hours on my bike. That, to me, is the ‘freedom’ I talk about when I tell others why I am an independent consultant.

Anyway, ramblings over. If you’re looking to bring in an experienced trainer to help your team grow their test automation skills, let’s talk. I’d be happy to discuss options.

Writing tests with Claude Code - part 1 - initial results

2026-03-09T00:00:00+00:00

In a recent post, I wrote about how I used Claude Code to analyze the code for RestAssured.Net and then perform a refactoring action, using handwritten tests as the safety net. In that post, I wrote that I didn’t want Claude to touch the tests themselves, and why. I was still curious, though, to find out for myself what Claude was capable of in terms of writing tests, and to what extent the trust that more and more people put into the tests written by LLMs is warranted.

In this blog post, I’ll share with you some first steps in doing exactly that, and you’ll read about my thoughts and my thought process along the way. You’ll see how I create an initial suite of tests for a small Spring Boot-based API that I wrote for use in several of my workshops, and how I assessed the results. In a follow-up blog post, I’ll show you how I improved the test suite based on my observations, again using Claude Code.

The starting point

As a starting point, I created a new repository containing the code for the API I wanted Claude to write tests for. Obviously, I removed the existing tests, and I also removed the README and the GitHub Actions build pipeline definition, as I want Claude to write tests based only on the product code itself, without being primed by other artifacts in the codebase. I left in are the dependencies used to write and run the tests, in this case REST Assured and JUnit.

After installing and initializing Claude, I asked it to generate tests using this prompt:

“Add acceptance tests for the endpoints exposed by the AccountController to this project. Cover all the logic in the AccountService class. Use REST Assured as the tool to interact with the API. Use JUnit 5 as the test runner. Both libraries are already part of the project, see the pom.xml. Assert status codes and relevant response body elements as part of the tests. Extract common request properties into a RequestSpecification.”

After some deliberation, Claude added a new test file to the project, containing 23 tests, all of them passing. You can see these tests here. The code in this file is the raw output from the above prompt, I haven’t changed anything.

It took Claude only a minute or two to write these tests, which definitely is a lot faster than what I could have done myself. But how good are they, really?

A first look at the tests

Let’s look at the coding style first. I’m seeing people argue that code quality is not really all that important anymore once AI will write most of our code, but I beg to differ, especially when it concerns our tests. Tests are documentation of the intended behaviour of our code, and I would say that being able to read that documentation as a human being, without too much effort, remains very important. So, is the generated code easy to read?

There’s a @BeforeEach hook creating the RequestSpecification (an object in REST Assured containing shared HTTP request properties). There’s a helper method to create a new account passing in the AccountType and a predefined balance. There’s the aforementioned 23 tests that, especially at first glance, seem to verify things that are valuable.

What Claude did not do, very likely because I didn’t explicitly ask for it, is adding an abstraction layer to make the code easier to read, similar to the approach described here. We’ll see how Claude does in this area in the next blog post, as I want to stick to assessing the quality of the initial output from Claude in this one.

And I have to say, all in all, for a first try, I’m not unhappy with what I’m seeing. Yes, there’s room for improvement, but I have seen humans do far worse than this. Of course, this is only a small and simple API, but that hasn’t stopped people (including myself) in the past from writing ugly test code for it.

When we look at what the tests seem to cover, at first glance it looks like they hit all the endpoints defined in the API controller, and most, if not all paths in the business logic defined in the service layer.

I should note here that I was able to fairly quickly and confidently come to these conclusions only because:

I wrote the code for the API, so I have knowledge of the inner workings and the intent of the API, and
I have plenty of experience writing tests for APIs and writing tests in REST Assured, so I’d like to think I know what ‘good’ looks like

If you don’t have that prior knowledge and experience, it will be harder to draw meaningful conclusions from just looking at what Claude coughs up. And that comes with a significant risk: the risk of saying ‘looks good to me’ without actually understanding what you’re approving and what you’re putting your trust into, and then ending up with a safety net of tests full of holes.

Testing the generated tests with mutation testing

To further increase our understanding of the value of the tests that were generated for me, let’s see if these tests can fail. If they can’t, the fact that we have generated 23 passing tests in two minutes flat is nothing more than an example of productivity theater.

My preferred method of finding out whether a test can actually fail is to use a mutation testing tool. In this case, because we’re working with Java code, I’ll use PITest as my mutation testing tool of choice. I configured the tool to mutate all the code in the project and run all the tests, to get a complete overview of the quality of the test suite generated. Note that in a real life-sized project, you probably want to start by mutating only part of the code base and run part of the tests to get mutation testing feedback within a reasonable amount of time.

After about a minute, PITest reports back that the initial test suite achieves 95% line coverage. This looks impressive, but it doesn’t really tell me anything. The much more valuable metric here is the number of mutants that were killed by the test suite. PITest reports that this is 91%, which, again is pretty good. In absolute numbers, out of 55 mutants generated by PITest, 50 were detected by the initial test suite.

From this number, two follow-up questions came up in my head immediately:

Which mutants were missed by the tests, and what is the impact of that? What are we not covering yet?
Could we have achieved the same amount of (line and mutation) coverage with fewer tests? In other words, do we have tests that are dead weight?

Looking at the surviving mutants

First, let’s have a look at the mutants that survived, i.e., changes in the API code that were not detected by any of the tests.

To start, in the CustomizedResponseEntityExceptionHandler, the HTTP 500 path isn’t covered in any of the tests, and that causes a surviving mutant. By design, the API returns an HTTP 500 when an Exception occurs that isn’t a ResourceNotFoundException (returning an HTTP 404) or a BadRequestException (returning an HTTP 400). This looks like a useful path to cover in a test.

Second, the API returns an HTTP 204 in response to a GET call to /accounts when there are no accounts in the database. That path isn’t covered in any of the tests. This, too, seems like a useful path to test, because it is part of the intentional behaviour of the API.

Finally, the tests that were written do not properly cover some of the boundary conditions, both in the logic that implements the business rule of ‘you cannot overdraw on a savings account’ and in the interest calculation logic. Once more, I would like to have these situations covered by tests.

These results, to me, indicate that mutation testing is a useful method of assessing what is tested and what isn’t. It doesn’t matter if you wrote the tests yourself, or you had them write by an LLM.

Note: it definitely helped here that I can confidently and quickly perform this analysis of the signals produced by PITest, and of the quality of my tests, because I know that mutation testing as a technique exists, and because I know how it works.

More importantly, these results strengthen my belief that it is my moral obligation to closely watch the output of an LLM. I deeply value writing tests that test meaningful things and that are actually able to detect changes in product behaviour, and I don’t think that should change when I have a tool writing the tests for me. If all I cared about was having some tests to cover the API and declared, for example, 90% line coverage as ‘good enough’, I would be done by now.

However, I am not OK with this result yet, for the reasons I gave just now. In the next blog post, I want to return this feedback to Claude and see how well it does in updating the existing test suite based on my observations. I also want to see if I can add mutation testing to the test generation loop, and have Claude achieve better mutation coverage without my interfering, but that’s something for another time.

For now, I’ll conclude that when I ask Claude to generate tests in the way I have done, it produces pretty good results in terms of both line and mutation coverage, but that it missed certain key paths in my application code.

Identifying dead weight in our test suite

Next, I want to find out if the test suite that was generated by Claude contains dead weight, that is, do we have any tests that do not uniquely contribute to either line or mutation coverage? To do so, I asked PITest to generate a report in XML format next to the HTML report, as (for some reason) only the XML report contains information about which specific test killed a specific mutant.

Performing this analysis required a bit of elbow grease, as I had to manually search the XML test report for occurrences of the test name for every test in the test suite. This, too, is probably a process that can be automated, but for now, I’m OK with doing this the manual way. I don’t want to run the risk of hallucinations here, and besides, there are only 23 tests in the suite anyway.

My search tells me that four tests that were generated by Claude were not mentioned as a test killing a mutant in the results file. In all four cases, the reason behind this is that the exact same code path was already exercised in another test. For example, one of the generated tests performs a withdrawal on a checking account and verifies that the balance is updated accordingly:

@Test
void withdraw_positiveAmount_fromCheckingAccount_updatesBalance() {
    long id = createAccount(AccountType.CHECKING, 500.0);

    given(requestSpec)
        .post("/{id}/withdraw/{amount}", id, 200.0)
    .then()
        .statusCode(200)
        .body("balance", equalTo(300.0f));
}

Another test in the suite, however, does the exact same thing, but for a savings account:

@Test
void withdraw_positiveAmount_fromSavingsAccount_withSufficientFunds_updatesBalance() {
    long id = createAccount(AccountType.SAVINGS, 500.0);

    given(requestSpec)
        .post("/{id}/withdraw/{amount}", id, 200.0)
    .then()
        .statusCode(200)
        .body("balance", equalTo(300.0f));
}

Since the logic for handling withdrawals in the case of a high enough balance is exactly the same, no matter what the account type is, one test is enough to properly cover this path in the API code.

After removing these four tests from the suite and running mutation testing again, as expected, I can see that the impact on both line and mutation coverage is 0, meaning that these four tests can indeed be classified as ‘dead weight’.

Why go through this exercise, you might think? You practically got those extra tests for free when Claude generated them, isn’t it?

Well, sort of, but tests take time to run and to maintain, and most importantly, it takes time to ingest and process the information produced by them, especially in case of potential problems. The less the time I have to spend on that, the better. If I can safely remove tests from my test suite, and by safely, I mean without apparent adversarial impact on coverage, I would say that doing so is the prudent thing to do.

Conclusions

So, after completing the analysis of the results of asking Claude Code to generate tests for a new code base, what do I think? Well, while I am impressed, I think a couple of words of warning are in order.

I am positively surprised by the quality and the coverage of the initial test suite. 95% line coverage and 91% mutation coverage are good numbers, and all that coverage was generated in a few minutes, definitely a lot less time than it would have taken me to write these tests myself.

There is some room for improvement in terms of readability of the tests, but that can probably be resolved by being more specific in my prompt and / or using dedicated Claude Code skills. I’ll explore that in more detail and write about that soon.

Now to the words of warning. While Claude achieved a pretty decent mutation coverage, it did oversee a few critical paths in the code. Maybe I was simply ‘unlucky’, and another attempt with the same prompt would have given better results. I don’t know, but these results do tell me not to simply accept what Claude gives me at face value.

The same applies to the tests that Claude did generate. 4 out of the 23 tests generated were dead weight, which equates to 17% of the test suite. Now, n = 1, and this is a small codebase and test suite, so the numbers might be skewed, but again, if you want your test suite to be as efficient and effective as possible, these are numbers that you probably don’t want to ignore.

Finally, there are of course many things that Claude did not do, mainly because I didn’t ask it to. An example of that would be telling me that since we’re working with a banking API, it probably would be a good idea to add some form of authentication to the endpoints. There’s a lot more to unpack about what Claude does and does not do, and I will probably write about that in more detail somewhere in the future, too.

First, in a follow-up blog post, I’ll document the process of improving the existing test suite that Claude generated, both in terms of coverage and of coding style. I will once again be using Claude and mutation testing to do that. You can find the code for the API that was used in this blog post, as well as the initial suite of tests generated by Claude, here.

Refactoring the RestAssured.Net code with Claude Code

2026-02-27T00:00:00+00:00

As some of you might know, the workshops and training courses I run, the talks I do and the blog posts I write tend to focus on fundamental sofware testing, software development and test automation skills, rather than focusing on the latest technology and trends. I just don’t ‘do’ trends very well. I do, however, read a lot of what others think about and write about with regards to these technologies, as I am an independent consultant and therefore I simply cannot afford not to stay on top of what is happening in tech.

Until now, I have pretty much avoided spending a lot of time using AI tools myself, other than using ChatGPT to help me build a custom training plan for my endurance cycling training. That is, until I heard more and more folks talking about Claude Code, and how good they thought it was, especially with the new Opus 4.6 model. That triggered me to find out for myself if it was truly useful, and this blog post is probably the first of a few where I document my thoughts and findings.

Of course, I could start with building (or ‘vibe coding’, as the cool kids say) something from scratch, but I don’t think that is a particularly good way to find out what AI can really do for me. After all, I am not in the business of building new things that didn’t exist before, I’m in the business of helping people perform existing and important tasks, in my case, testing and automation, in a better and more efficient way.

The goal

I’ve been working on RestAssured.Net for some 4 years now, and over time, the number of features has grown, as has the code base itself. As a result of tacking on more and more features, some of the classes in the project have grown to be very large, way too large, even, and those classes typically also have many different responsibilities.

I have been meaning to refactor them and extract different pieces of logic into their own classes to make the code easier to understand and to maintain, but in all honesty, I sometimes can’t see the forest for the trees anymore. So, it would be great if Claude Code could help me do that.

The guardrails

One thing I’ve learned from my limited use of AI and from a lot of reading about other people’s experience with AI is to have strict guardrails in place before you’re letting an AI agent loose on your codebase. Since I don’t have a lot of experience with AI under my belt yet, I reckon it is a good idea to take small steps and remain in control of the process. I mean, I am responsible for the RestAssured.Net codebase, so I want to know what happens to it, and make sure I understand everything that is happening to it.

So, as a first guardrail, I’m not letting Claude Code touch the tests. The acceptance tests for RestAssured.Net act as a safety net for me when I fix bugs and add new features, and I write them with intent and take good care of them. As the task I set out to do is a ‘pure’ refactoring task, that is, I only want to change the code structure, not its behaviour, the tests should remain intact to prove that the refactoring was successful.

Second, I am doing a thorough review of every change that is made by Claude before I bring it under version control. My goal with this experiment, and with using AI in general, is not to outsource my thinking (thank you, Fiona Charles), but rather to enhance my capabilities. As I said earlier, ultimately, I am the one responsible for the code and the changes made to it, not Claude. This is also why I don’t offload running the tests and bringing the code under version control to Claude.

Third and final, I’m going to take small steps. I have seen plenty of (horror) stories of people letting LLMs go for hours writing code, without intermediate scrutinizing of the changes and suggestions made, with results that ranged from the mildly amusing to the outright horrifying. I don’t claim that this code base is as important as, for example, the code in an online banking platform, but it has grown to a decent user base of the years. I don’t want to let those people down. And who knows, some of them might use the library to test those online banking platforms, so it’s my responsibility to only publish versions of a product that I feel is fit for that job. There’s no place there for code I don’t understand and that might give people a false sense of security.

The first prompt

After purchasing a Pro subscription to Claude Code, setting it up and having it initialize the CLAUDE.md file for the project, I first asked Claude to analyze the ExecutableRequest class (a class that’s been bothering me for a while because of its size and complexity) and suggest me improvements. I did that with this specific prompt:

The ExecutableRequest class is quite long, with a number of different responsibilities. Analyze it and suggest improvements to the code structure without changing the behaviour. List your top 5 recommendations, together with impact on code quality and reasons for your prioritization.

I don’t want Claude to make changes yet, I want it to suggest me changes, and also, and more importantly perhaps, tell me why it thinks these improvements make sense. This should give me the information I need to make an informed decision on whether to proceed with the suggestion.

Opus 4.6 is slower than many other models out there, but the output is supposed to be much better. And indeed, after some deliberation, Claude produced a list of suggestions for improvement that at first glance, made a lot of sense. It’s #1 recommendation was to extract the logic to create a request body into a separate class called RequestBodyFactory. Not necessarily perfect as a class name, but as I couldn’t think of anything better at that moment, I asked Claude to proceed with doing the actual refactoring.

And so it did, and I have to say, it did not disappoint. It created the new class without problems, moved the logic in there, changed the ExecutableRequest class to use the methods in the new RequestBodyFactory class, and it even followed all the styling and formatting requirements set by StyleCop. That last point is very important, because I set the styling rules to level ‘nuclear’, as in, even the slightest infraction will make the code fail to compile.

The ultimate test of the work done by Claude, though, was running the tests, and those all passed, too. Which makes sense, as I asked for a change of the code structure, without changing the behaviour. My tests test the library for behaviour, not implementation, so this is exactly what I expected.

The only thing left for me to do was to review the changes made by Claude before committing them to version control. As I said at the start of this post, ultimately it is me who bears responsibility for the code, so I want to be able to read and understand it, even when I didn’t write it myself.

Overall, the changes made by Claude looked pretty good, but there was one thing I didn’t like. The newly created Create() method in the RequestBodyFactory class had a lot of arguments (9, if I remember correctly). So, naturally, I asked Claude if it could further improve that. It came back with a suggestion to group several properties related to request body settings in a custom RequestBodySettings type and then pass that object in as an argument. As I thought this did improve the readability of the code, I asked Claude to proceed with the change. Again, code compiled, tests passed, all good.

After that, I saw no further reasons to change or improve the work done by Claude, so I thought the code was safe to commit and push. My build pipeline then took care of verifying that all tests run and pass on all .NET versions that RestAssured.Net supports. Again, no issues here.

So, what did I learn?

Did this experiment teach me a lot of new things? Well, not necessarily. What it did, though, was reinforce my initial thoughts on what constitutes prudent use of AI in software development and software testing. It also showed me that Claude is both a really powerful and a user-friendly tool, at least when used on a codebase and for a task of this, admittedly small, size.

In short:

Software like Claude is a great tool for refactoring and code improvement tasks like the one described in this blog post
Having solid guardrails in place (linters, tests, review before commit) is essential if you want to retain control of the software that is being written by AI
At least initially, I would lean towards having AI support you in writing product code, and leave writing tests and reviewing the results to human beings

With regards to that last point, I know that not everybody will agree with this. I have seen a lot of examples of AI writing tests, and of AI taking care of your code reviews. Personally, I’m not ready yet to hand over that kind of control to a piece of software that I do not fully understand or trust. At least, not without keeping a human being (myself, for example) in the loop. If you decide otherwise, that’s fine with me, as long as you can handle the responsibility, and are ready to deal with the potential fallout, that comes along with it…

In the meantime, I will continue improving the RestAssured.Net code using Claude and other tools, as there’s a lot of room for improvement left. And I think I’ll stick to keeping the current guardrails in place, too. It might be slightly slower, but it will be a lot safer, too.

My LinkedIn break - six weeks in

2026-02-01T00:00:00+00:00

About six weeks ago, I decided to take some time away from LinkedIn. I won’t go into the reasons behind this decision again, you can read all about that in the post I just linked to, but I do want to take some time and look back on the past six weeks and the things that not spending so much time on LinkedIn have brought me.

First of all, moving away from spending an hour or two on LinkedIn wasn’t as easy as I thought. Especially early on, thoughts of ‘am I missing something?’ were in my head pretty much all the time, and yes, that led to me logging in and checking to see whether there was something I needed to address - DMs to answer, invites to accept - a few times.

After a few weeks, though, and after seeing that there wasn’t much of importance, or really anything at all, that I missed, that feeling slowly faded. It’s still there, sometimes, but I don’t feel the ‘need’ (it’s more of a ‘want’, really) to log in as often as I did in the beginning.

When I started my break, I was hoping for several positive side effects. It’s still early, too early to draw conclusions, but so far, things are looking pretty good:

The contents for my brand new ‘Valuable Feedback, Fast’ course is coming together slowly but surely, and I’ve got 3-4 companies interested in booking me to deliver this course in 2026, with one of them confirmed. My target is to deliver it at least three times in 2026, so things are looking good.
My training business is off to a good start, too, with 5 full days of training already delivered in January. I was off to a much slower start in 2025, so this makes me happy. I’m also working on different ways to bring my training offerings to the attention of potential clients. One thing I’m thinking about is starting a YouTube channel with instructional videos, to give people an idea of my teaching style and the type of content they can expect when they book me to teach a course.
I’ve already published 4 blog posts, including this one, where I only wrote 13 in all of 2025. I really hope to keep up this pace.
I’ve definitely been reading more, too. Mostly fiction, including the fantastic Winter’s Bone by Daniel Woodrell, but I’m also slowly working my way through Taking Testing Seriously by James Bach and Michael Bolton.
Outside of work, I’m definitely taking my cycling much more seriously. Even though January wasn’t the best cycling month, because of snow, the flu and some other things that got in the way, I have a training plan in place, I have set some ambitious but not-too-crazy goals, and I hope those will lead to my first century as well as my first 200k and even a 300k before the end of the year.

Because of that last bullet point, and especially the time that training for long-distance cycling events takes, I have decided to pretty much entirely focus on my training business in 2026, as that is simply more flexible than consulting. This doesn’t mean I will not take on any consulting gigs at all, but the ones that I do will be of a very part-time nature. I don’t want to do all the cycling I want to do in the evenings and weekends only, if only because there’s no better feeling than going for a long bike ride at a time you know most other people are in the office ;)

Needless to say, I’ll remain absent from LinkedIn for the foreseeable future. Yes, I might log in once every other week to quickly check DMs and invites. You might see a very occasional post from me promoting my training services, but that will be once a month, tops, and probably even less than that.

I have zero interest at the moment in getting back to regular posting, commenting, sharing and liking. My brain relishes the quiet, the absence of that background noise that was present when I was spending a lot of time on LinkedIn. Plus, it has given me the time and attention it needs to do more valuable things, and I’m not ready to give that up again. Don’t expect me to write an update like this every month, either. As the months go by, I’ll probably think about LinkedIn even less, especially now that I’ve seen that I’m not really missing out on a lot.

As I said before, email is a much better way to get or stay in touch, so if you have a question, something to share, or you just want to catch up, email me at bas@ontestautomation.com, and I’ll happily talk to you. I’ve had some great conversations already, and I’m looking for many more of those.

On the first training sessions in 2026

2026-01-18T00:00:00+00:00

Earlier this week, I ran my first two training sessions of 2026. Running these sessions reminded me (once again) that training really is what I enjoy the most and what I do best. Working with a group of engineers, introducing them to new techniques, patterns, principles and tools, and exploring and discussing how these could help them improve their automation, testing and development efforts to get closer to the end goal of valuable feedback, fast is a very rewarding thing to do.

On Tuesday, I facilitated a full-day course covering the ‘textbook’ Behaviour-Driven Development process. In this course, we took a change to an existing feature for the world’s least safe online bank from idea to working to automated acceptance tests. Along the way, we discussed and practiced Example Mapping, reviewed and wrote Gherkin scenarios, and took a quick look at the tools that support BDD, in this case Cucumber.

As a course topic, BDD is a bit of an outlier for me. Most of the workshops I run cover topics that are related to test automation, yet in this workshop we cover the BDD process and methodology as a whole, and not just talk about Cucumber and Gherkin. Tuesday reminded me how fun it can be to talk about and practice BDD, though, so I do hope I get to run this workshop a couple more times in 2026. I’ve got at least one more planned, so that’s a start.

The next day, I ran a full-day workshop on contract testing with Pact in Java. I quite recently redesigned this workshop, because I felt that the way I taught it before didn’t really do justice to the complex topic that is contract testing. Before, I often tried to cram it into a half-day session, which is just too short, and therefore had to limit myself mostly to writing and running consumer-driven contract generation tests on the consumer side and verify contracts at the provider side. If there was time left, we talked a little bit about challenges of consumer-driven contract testing and how bidirectional contract testing tries to address those challenges, but that was about it.

In the new format, I’m taking more time, using more and smaller steps, to take people through the flow of both consumer-driven and bidirectional contract testing. There’s some coding involved, of course, but not an awful lot. The biggest change, I think, is that I now focus on the process flows rather than just the tools. For example, we now use a Pact Broker from the very start, and we use tools like can-i-deploy throughout the workshop, too. From the initial feedback I gathered, this helps people a lot in understanding what contract testing really is, how it works and how it tries to answer the question of

“Are all individual components and services able to communicate with one another?”

So, in short, 2026 is off to a good start when it comes to my training business. As my goal for the year is to grow that training business revenue with 20% compared to 2025, while staying away from LinkedIn and social media in general, I’m happy to see that next to the two I delivered this week, I have three more days of training scheduled for January, all three on Playwright. That’s a big improvement over last year, when my training sessions didn’t really start until early March. Needless to say, I hope that this trend will continue throughout the year. I’ll keep you updated.

I’ll be facilitating a tutorial at EuroSTAR 2026

2026-01-09T00:00:00+00:00

One of the decisions I made early last year was to pretty much stop contributing to conferences abroad. I’m not going to revisit my reasons for that decision here, if you’re interested in them, you can read the blog post I just linked to.

In that blog post, I did list a couple of exceptions, though, and one of those was when a conference is held in a country that I really want to go to, either for personal or for professional reasons. The list of countries that rule applies to is short, but Norway is on it, so when I heard that EuroSTAR 2026 -probably the largest software testing conference in Europe- was going to be held in Oslo, I knew I wanted to at least submit a proposal and see if I could get myself on the programme.

Now, when given the choice, I strongly prefer running a tutorial over doing a presentation, so the decision on what kind of session to submit was an easy one. The first day of EuroSTAR typically is tutorial day, so there’s a few slots for tutorials available.

I decided to submit a tutorial around Playwright, as last year has shown me that there is a lot of demand for Playwright knowledge and experience, which would increase my chances of making it to the programme. I didn’t just want to make it an ‘Introduction to Playwright’ tutorial, though, as I don’t think that would really stand out from the other submissions.

So, the choice was a fairly straightforward one, and I decided on submitting a slightly modified version of my ‘Improving your Playwright code’ workshop, and I’m happy to say that it got accepted.

In this tutorial, I’d like to go beyond just teaching Playwright, although, of course, the tool will play an important role. What I’ll be doing in the tutorial is I’ll present participants with a Playwright code base for some end-to-end tests on a fictional application.

We’ll then start by reviewing this code to identify areas for improvement. I’ll then ask participants to advocate for the improvements. That is, don’t just say ‘we should do this’, but also explain why we want to do this (an important skill that is too often forgotten). Of course, there will be plenty of time to get hands-on and actually apply the improvements.

What the tutorial will look like exactly is still work in progress, but what you’re reading just now is general gist of it. More details about the contents of the tutorial can be found on the EuroSTAR conference page for this session.

In any case, I’m really, really looking forward to my visit to Oslo and to EuroSTAR 2026. I hope to see you there!

On my new ‘Valuable feedback, fast’ course

2026-01-07T00:00:00+00:00

A couple of weeks ago, I published the training page for a brand-new course that I’m looking to run at least a couple of times in 2026. The course is called ‘Valuable feedback, fast’, and in this blog post, I’d like to share a little more about why I created this course, why it is designed in the way I have in mind and what it will look like.

Why did I create this course in the first place?

I’ve been running workshops and training courses for close to a decade now, and I really, really enjoy it. There’s just something very rewarding in sharing your knowledge and experience and helping others learn something new. Until now, the most important part of most of the courses I have been offering so far has been the ‘hands-on’ part: learning how to use a specific tool, or how to start with implementing a new technique or approach.

Next to the how, I always try and teach participants in my course what to do and what not to do with that tool or technique, and why they should or should not use a tool or technique in a specific context or situation. In other words: I try to go beyond ‘teaching people tricks with tools’, and I think I’m doing a pretty good job in that regard. You should really ask the participants themselves if they’d agree, though.

I plan to keep offering this type of workshop or training course in the future. It’s useful, fun, there’s demand for it, and it pays the bills, so I don’t see a reason to stop. However, I think there’s a need for a different kind of course in the area of test automation, too.

A course that does not focus on a specific tool, technique or practice, but one that focuses on ‘the bigger picture’ of test automation. One that teaches teams and organizations how to be successful with test automation. How test automation fits into their organization, their tech stack and their way of working. What common pitfalls one encounters on the road to test automation success, and how to navigate those pitfalls. In other words, a course that teaches them how to reach the test automation goal of providing valuable feedback, fast.

I’ve been doing some market research, and I could not find a lot of courses that cover all of this out there, even when there are a few that cover parts of it. I think this is a real gap in what there is on offer, because if we want to be successful with test automation, we need to be able to talk about it and work on it with that bigger picture clearly defined for our specific context.

I’m saying this because I’ve seen it happen so often in the nearly 20 years I’ve been working in test automation. When teams struggle with their test automation, they look at the tool or the technique they’re using and come to the conclusion that those are the root cause of their struggles. Often, though, the tool or the technique is only a (small) part of the problem: the real problem is lack of a solid, holistic test automation strategy that is created for their specific context and purpose.

This is exactly what the ‘Valuable feedback, fast’ course is all about.

What’s the difference between this course and other courses ‘out there’?

Apart from the fact that the focus of this course is not on a single tool or technique, but on creating a holistic test automation strategy, there is another difference that I think make this course stand out from other courses out there.

That difference is the context I present in the course, and within which the exercises are set. Most courses I run, and most courses I have attended, have squeaky clean contexts. The exercises are typically relatively small and simple (that doesn’t mean they’re easy, by the way), with the goal to achieve a single, very well-defined objective. Write a test that does this. Make this test code pass. Refactor that method.

Real life, though, is often anything but small and simple. There’s often a large, messy context in which you’re trying to do your work in the best possible way, and you have to navigate different kinds of impediments, work with existing applications and code bases that definitely do not look like what you’ve seen in the books, deal with conflicts of interest, work with different kinds of stakeholders, and so on.

In the ‘Valuable feedback, fast’ course, the context is messy, and that is by design. While it is impossible to cover everything that can happen in real life, the course attempts to recreate several challenges I have seen and have had to deal with in the past, and that I hear others talk about and struggle with, too.

The objective of doing this is to present a context that looks more like what you’ll have to deal with back at work after the course. Doing this addresses one of the most important pain points that I hear from participants in my workshops: they struggle to apply what they have learned in the course in their context, because their context is a lot messier and more complex than what they have seen during their time in the classroom.

What does an exercise in the course look like?

To illustrate what I mean when I talk about a messy context, let’s look at an example of what a typical exercise in the course looks like. One of the topics we will discuss in the course is that of ‘automating regression tests’, which still considered a ‘holy grail’ in many organizations. I wrote down my thoughts on regression test automation a while ago, but it is a topic that keeps coming back, and that’s understandable. To some extent, at least.

Before I talk about the specific exercise, it’s good to know that the entire ‘Valuable feedback, fast’ course revolves around a single context, representing an online bank struggling with delivering valuable software to their customers at the pace that these customers require. One of the reasons that their delivery process is lagging is the fact that the teams spend a lot of time on regression testing, i.e., on verifying that core software functionality still works, for every new feature, improvement and bug fix.

The exercise I’ll present in the course is to formulate an approach for speeding up this process, given a combination of the following conditions and impediments:

Team leads strongly advocate for ‘automation all the regression testing’, which is motivated by demands from higher up to speed up the delivery process and shorten feedback loops
Developers are under near-constant pressure to deliver features, which means they claim they can’t spend a lot of time writing test automation code
Not every tester has the skills to contribute to writing test code that is easy to read, understand and maintain
The backend system involved is relatively old and wasn’t exactly written with testability and using modern test automation tools in mind

As you can see, participants will have their work cut out for them to come up with a feasible and acceptable approach. In a follow-up to this exercise, they will simulate and experiment where they actually implement (part of) their proposal, too, and determine if they made the right choice, or whether they will need to revise and adjust their strategy.

There’s still quite a bit of work to do in designing this exercise, and the other exercises in the course, but this should give you a good idea of what I have in mind: figure out a strategy to address a specific problem, implement it at a small scale, observe the outcome, report on it and learn from it.

Who’s the intended audience for this course?

As mentioned on the training page:

This course is for software development and testing practitioners, as well as tech and team leads, who want to learn how to be successful with test automation and how to achieve its goal of ‘valuable feedback, fast’

The ideal client for this course is a development team, or a small group of development teams (up to 20 people in total) from the same company, who are looking to dive deeper and get serious about their test automation strategy and learn how to deal with the real-life problems that tend to come up in the process. I’m looking for teams who want to move beyond learning a specific tool or technique (even when this course does include plenty of hands-on, technical work) and dive deeper into what decides whether test automation efforts succeed or fail.

That sounds pretty good!

Excellent! If you would like more information about the course, or if you want to have a conversation to see if we can bring this course to your company, let’s have a chat.

This blog post was written while listening to Lemon8 – The Inner Sanctuary Sessions CD2