tl;dr This post is an attempt to codify my thoughts about how to succeed with end to end integration testing. A toned down version of this post is part of the Storyteller 3 documentation.
About six months ago the development teams at my shop came together in kind of a town hall to talk about the current state of our automated integration testing approach. We have a pretty deep investment in test automation and I think we can claim some significant success, but we also have had some problems with test instability, brittleness, performance, and the time it takes to author new tests or debug existing tests that have failed.
Some of the problems have since been ameliorated by tightening up on our practices — but that still left quite a bit of technical friction and that’s where this post comes in. Since that meeting, I’ve been essentially rewriting our old Storyteller testing tool in an attempt to address many of the technical issues in our automated testing. As part of the rollout of the new Storyteller 3 to our ecosystem, I thought it was worth a post on how I think teams can be more successful at automated end to end testing.
I’ve worked in far too many environments and codebases where the automated tests were “flakey” or unreliable:
- Teams that do all of their development against a single shared, development database such that the data setup is hard to control
- Web applications with a lot of asynchronous behavior are notoriously hard to test and the tests can be flakey with timing issues — even with all the “wait for this condition on the page to be true” discipline in the world.
- Distributed architectures can be difficult to test because you may need to control, coordinate, or observe multiple processes at one time.
- Deployment issues or technologies that tend to hang on to file locks, tie up ports, or generally lock up resources that your automated tests need to use
To be effective, automated tests have to be reliable and repeatable. Otherwise, you’re either going to spend all your time trying to discern if a test failure is “real” or not, or you’re most likely going to completely ignore your automated tests altogether as you lose faith in them.
I think you have several strategies to try to make your automated, end to end tests more reliable:
- Favor white box testing over black box testing (more on this below)
- Closely related to #1, replace hard to control infrastructure dependencies with stub services, even in functional testing. I know some folks absolutely hate this idea, but my shop is having a lot of success in using an IoC tool to swap out dependencies on external databases or web services in functional testing that are completely out of our control.
- Isolate infrastructure to the test harness. For example, if your system accesses a relational database, use an isolated schema for the testing that is only used by the test harness. Shared databases can be one of the worst impediments to successful test automation. It’s both important to be able to set up known state in your tests and to not get “false” failures because some other process happened to alter the state of your system while the test is running. Did I mention that I think shared databases are a bad idea yet?*
- Completely control system state setup in your tests or whatever build automation you have to deploy the system in testing.
- Collapse a distributed application down to a single process for automated functional testing rather than try to run the test harness in a different process than the application. In our functional tests, we will run the test harness, an embedded web server, and even an embedded database in the same process. For distributed applications, we have been using additional .Net AppDomain’s to load related services and using some infrastructure in our OSS projects to coordinate the setup, teardown, and even activity in these services during testing time.
- As a last resort for a test that is vulnerable to timing issues and race conditions, allow the test runner to retry the test
Failing all of those things, I definitely think that if a test that is so unstable and unreliable that it renders your automated build useless that you just delete that test. I think a reliable test suite with less coverage is more useful to a team than a more expansive test suite that is not reliable.
You Gotta Have Continuous Integration
This section isn’t the kind of pound on the table, Uncle Bob-style of “you must do this or you’re incompetent” kind of rant that causes the Rob Conery’s of the world have conniptions. Large scale automation testing simply does not work if the automated tests are not running regularly as the system continues to evolve.
Automated tests that are never or seldom executed can even be a burden on a development team that still try to keep that test code up to date with architectural changes. Even worse, automated tests that are not constantly executed are not trustworthy because you no longer know if test failures are real or just because the application structure changed.
Assuming that your automated tests are legitimately detecting regression problems, you need to determine what recent change introduced the problem — and it’s far easier to do that if you have a smaller list of possible changes and those changes are still fresh in the developer’s mind. If you are only occasionally running those automated tests, diagnosing failing tests can be a lot like finding the proverbial needle in the haystack.
I strongly prefer to have all of the automated tests running as part of a team’s continuous integration (CI) strategy — even the heavier, slower end to end kind of tests. If the test suite gets too slow (we have a suite that’s currently taking 40+ minutes), I like the “fast tests, slow tests” strategy of keeping one main build that executes the quicker tests (usually just unit tests) to give the team reasonable confidence that things are okay. The slower tests would be executed in a cascading build triggered whenever the main build completes successfully. Ideally, you’d like to have all the automated tests running against every push to source control, but even running the slower tests suites in a nightly or weekly scheduled build is better than nothing.
Make the Tests Easy to Run Locally
I think the section title is self-explanatory, but I’ve gotten this very wrong in the past in my own work. Ideally, you would have a task in your build script (I still prefer Rake, but substitute MSBuild, Fake, Make, Gulp, NAnt, whatever you like) that completely sets up the system under test on your machine and runs whatever the test harness. In a less perfect world a developer has to jump through hoops to find hidden dependencies and take several poorly described steps in order to run the automated tests. I think this issue is much less problematic than it was earlier in my career as we’ve adopted much more project build automation and moved to technologies that are easier to automate in deployment. I haven’t gotten to use container technologies like Docker myself yet, but I sure hope that those tools will make doing the environment setup for automating tests easier in the future.
Whitebox vs. Blackbox Testing
I strongly believe that teams should generally invest much more time and effort into whitebox tests than blackbox tests. Throughout my career, I have found that whitebox tests are frequently more effective in finding problems in your system – especially for functional testing – because they tend to be much more focused in scope and are usually much faster to execute than the corresponding black box test. White box tests can also be much easier to write because there’s simply far less technical stuff (databases, external web services, service buses, you name it) to configure or set up.
I do believe that there is value in having some blackbox tests, but I think that these blackbox tests should be focused on finding problems in technical integrations and infrastructure whereas the whitebox tests should be used to verify the desired functionality.
Especially at the beginning of my career, I frequently worked with software testers and developers who just did not believe that any test was truly useful unless the testing deployment was exactly the same as production. I think that attitude is inefficient. My philosophy is that you write automated tests to find and remove problems from your system, but not to prove that the system is perfect. Adopting that philosophy, favoring white box over black box testing makes much more sense.
Choose the Quickest, Useful Feedback Mechanism
Automating tests against a user interface has to be one of the most difficult and complex undertakings in all of software development. While teams have been successful with test automation using tools like WebDriver, I very strongly recommend that you do not test business logic and rules through your UI if you don’t have to. For that matter, try hard to avoid testing business logic without using the database. What does this mean? For example:
- Test complex logic by calling into a service layer instead of the UI. That’s a big issue for one of the teams I work with who really needs to replace a subsystem behind http json services without necessarily changing the user interface that consumes those services. Today the only integration testing involving that subsystem is done completely end to end against the full stack. We have plenty of unit test coverage on the internals of that subsystem, but I’m pretty certain that those unit tests are too coupled to the implementation to be useful as regression or characterization tests when that team tries to improve or replace that subsystem. I’m strongly recommending that that team write a new suite of tests against the gateway facade service to that subsystem for faster feedback than the end to end tests could ever possibly be.
- Use Subcutaneous Tests even to test some UI behavior if your application architecture supports that
- Make HTTP calls directly against the endpoints in a web application instead of trying to automate the browser if that can be useful to test out the backend.
- Consider testing user interface behavior with tightly controlled stub services instead of the real backend
The general rule we encourage in test automation is to use the “quickest feedback cycle that tells you something useful about your code” — and user interface testing can easily be much slower and more brittle than other types of automated testing. Remember too that we’re trying to find problems in our system with our tests instead of trying to prove that the system is perfect.
Setting up State in Automated Tests
I wrote a lot about this topic a couple years ago in My Opinions on Data Setup for Functional Tests, and I don’t have anything new to say since then;) To sum it up:
- Use self-contained tests that set up all the state that a test needs.
- Be very cautious using shared test data
- Use the application services to set up state rather than some kind of “shadow data access” layer
- Don’t couple test data setup to implementation details. I.e., I’d really rather not see gobs of SQL statements in my automated test code
- Try to make the test data setup declarative and as terse as possible
Test Automation has to be a factor in Architecture
I once had an interview for a company that makes development tools. I knew going in that their product had some serious deficiencies in their automated testing strategy. When I told my interviewer that I was confident that I could help that company make their automated testing support much better, I was told that testing was just a “process issue.” Last I knew, it is still weak for its support for automating tests against systems that use that tool.
Automated testing is not merely a “process issue,” but should be a first class citizen in selecting technologies and shaping your system architecture. I feel like my shop is far above average for our test automation and that is in no small part because we have purposely architected our applications in such a way to make functional, automated testing easier. The work I described in sections above to collapse a distributed system into one process for easier testing, using a compositional architecture effectively composed by an IoC tool, and isolatating business rules from the database in our systems has been vital to what success we have had with automated testing. In other places we have purposely added logging infrastructure or hooks in our application code for no other reason than to make it easier for test automation infrastructure to observe or control the application.
Other Stuff for later…
I don’t think that in 10 years of blogging I’ve ever finished a blog series, but I might get around to blogging about how we coordinate multiple services in distributed messaging architectures during automated tests or how we’re integrating much more diagnostics in our automated functional tests to spot and prevent performance problems from creeping into application.
* There are some strategies to use in testing if you absolutely have no other choice in using a shared database, but I’m not a fan. The one approach that I want to pursue in the future is utilizing multi-tenancy data access designs to create a fake tenant on each test run to keep the data isolated for the test even if the damn database is shared. I’d still rather smack the DBA types around until they get their project automation act together so we could all get isolated databases.