How not to screw up your automated testing

Like all software automated tests are easy to get wrong. Much easier in fact than getting them right. Having seen (and made) the same mistakes time and again I would like to talk about how to avoid them. Some of those mistakes can doom an Automated Testing project to failure while others will merely cause your maintainers mild problems such as depression, hallucinations and/or homicidal tendencies. In order to help you reap the benefits of ATs (Automated Tests) and retain your sanity (in case you need to maintain them) I will lay down a few things you should do when designing your tests and testing tools and helpers.

What ATs are we talking about?

Since people's definition of automated tests vary greatly as to their scope and application it is necessary to be explicit about which ATs we're talking about. The automated tests I have the most experience with range from automated component tests, where we fiercely attack a single component of an application ignoring or mocking the rest of it, to automated system tests where we drive the top level interface and only observe the final, user-facing output. As a result I am explicitly excluding unit-tests from the discussion. My observations can be helpful with those as well but YMMV.

The cost of failure

Software quality assurance/control, testing and development is at it's core all about managing costs and balancing against benefits. Nothing is undertaken (or should be at any rate) without the benefits outweighing the costs. We triage bugs and make sure the most important ones get fixed first. We order our backlogs such that the most important features get completed first. We build the products we believe will make us the most money. The problem is one of accurately determining the costs and the benefits. Since this is nearly impossible to do we fall back on truisms and practices to estimate and guide our hands, preferably ones which have already proven themselves to be useful. We let our experience and those of our peers guide us. When test automation goes wrong the cost can be high. The cost of not using automation can be high as well. In both cases we can see projects fail, careers and even companies end. Most of the time though we see a truckload of money disappear and an army of unhappy developers, testers and users. That cost is obviously dependent on the size and importance of the project.

What we need

Lets talk about what we need, our requirements for the tests and the framework. How to satisfy these requirements is not always clear and certainly not always easy. Sometimes it might not be possible, it all depends on the project. Keeping them in mind from the start is the easiest way to make sure your TA efforts are successful and rewarding.

For each test we must be able to:

  • Run it successfully, barring any bugs, before, after and in parallel with any test (including itself) as many times as desired without performing an external cleanup action.

    I'll concede that the parallel requirement doesn't always make sense. It depends on the system you're working with. Being able to run each test before and/or after any other test is extremely important though. Otherwise you have introduced dependencies between the tests and for that you will suffer. Hidden dependencies are a frequent source of bugs and strange behaviors and unstable tests are completely unusable. Ideally no external cleanup action should be needed as those tend to slow your tests down to a crawl.

    My first automation project was making simple selenium scripts exercising an account management page. The tests depended on each other in strange ways and would frequently fail because of small modifications in other parts of the suite and suffered from terrible-to-debug timing issues. We had to rerun the tests frequently each day to get "valid" results and very quickly lost faith in the tests, paying errors little heed. The project got canceled and my company lost it's most important client.

  • Differentiate between failures resulting from issues with the test and issues with the software.

    Being able to confidently assert that a failing test is revealing an issue with the system under test is fundamental to building faith in the tests. It means that decisive action can be taken as soon as a red flag is raised, whether that action is fixing the test or debugging the software.

    In one test automation project I worked on I had to take great care not to share the results of tests immediately. I needed to thoroughly analyze the results before making them available as the errors were cryptic and if I did not provide concrete evidence that the fault was in the software under test they would be dismissed as problems with the tests.

  • Infer from the output which component failed.

    When an integration or system level test fails it is vitally important to know where to start looking for the fault. It can mean the difference between days of combing through logs and stepping through debuggers versus hours.

    This is obviously not always possible. When it is possible it is a godsend. The sheer amount of time I've spent combing through logs of different components, monitoring them while running the tests and querying the different databases so as to identify the faulty component has on occasion made me question my sanity. Being able to limit your attention to a single part of the system is, for lack of a better word, wonderful.

For each test we should be able to:

  • Read which system-level actions the test performed and their result.

    This ties in with the previous point. Properly written asserts and log/output statements often allow you to completely skip the debugging the test phase and immediately focus your attention on a misbehaving component. Note that this only applies to failing tests, the results of a successful test should not be subject to analysis. "You do not need a transcript of successful tests unless you are vain." -Ferrix Hovi

    I once had to debug a test which only emitted a single cryptic error after performing a myriad of operations, helpfully written so as to not fail under any circumstances. Of course being unfamiliar with the system meant I had to consult the documentation (i.e. the developers) after analyzing the results of each step. Boy was that fun.

  • Clean up precisely after that test leaving all other data intact -- not necessarily immediately after the test-run.

    This limits the amount of system data you have to review when analyzing issues greatly. It also helps with making tests "idempotent" as per the first point above (obviously not actually idempotent in the strictest sense but close enough).

    Having tests which load up the databases or other data stores with faulty, similar looking data has often been the source of erratic test failures. Failures which have caused me to spend untold hours looking for faults which simply were not there to begin with. Wild goose chases don't make for a happy tester.

  • Infer from the output how the failing component of the SUT (System Under Test) failed.

    This might be wishful thinking. When it is possible to do so this limits your scope when searching for faults greatly. Like a more granular version of the third point.

    When I have been able to write tests with this property it has always been a godsend. Very rapidly the more dangerous bugs have been weeded out. Components tested with such tests very rarely manage to contain serious bugs and are usually of uncommonly high quality.

Other factors to consider are:

  • Readability issues specific to tests

    There are a few issues specific to writing automated tests that fly rather directly in the face of commonly accepted good practices in other parts of software development. An excellent example is code reuse. Factoring too much of the test code into helpers and helpers of helpers makes reading and understanding the tests an absolute nightmare. Attempting to analyze, understand and even debug such tests is often completely impossible, leading to maintenance hell.

    Using inheritance and deeply nested class hierarchies to expose system functionality via Test Case type base classes or adapters makes understanding the actual flow and logic of the tests a Herculean task.

    Asserts and error messages should contain both clear descriptions of the problem as well as the relevant data being asserted upon. This helps both users and maintainers of tests to investigate faults and understand the coverage of the test.

  • Performance issues specific to tests

    Component, integration and system level tests tend to increase in number as well as grow more complex with time. Being unable to run the test suite frequently usually means developers wont run them all that often. This means that the faults, sometimes glaringly obvious, find their way to the main branch and often cause an immense amount of work for testers and other developers trying to identify them. Since more time passes from the fault being introduced to it being found means the original developer has to spend much more time fixing it if it isn't simply tossed on the backlog. This means that performance is always a factor to consider. In fact here there is no such thing as premature optimization.


Should you be having problems with your test automation perhaps some answers and ideas for improvements can be gleaned from these requirements. They are the current snapshot of what I believe after (a mere) 8 years of developing and testing software. They are the sum of my often times painful experience with testing and automating tests for applications and systems. I have surely omitted some requirements and perhaps some are superfluous. Corrections and challenges are most welcome and encouraged, especially well reasoned ones.