A brain dump on automated integration testing

I’m strictly talking about automated testing in this post. I’m more or less leading an effort at work to improve our test automation and Test Driven Development practices at work, so I’ll be trying to blog quite a bit about related topics in the next couple months. After reviewing quite a bit of in flight code, I think I’ll try to revisit some of my old blog posts on testability design from the CodeBetter days and update those old lessons from the early days of TDD to what we’re building now.

My company builds and maintains several long running software systems with a healthy back log of feature requests, performance improvements, and stories to retire technical debt. All that is to say that we’re constantly adding to or improving existing code — which implies we’re always running some non-zero risk of creating regression defects. To keep everybody’s stress levels down, we’re taking incremental steps toward a true continuous delivery model where we can smoothly and consistently build and deploy fully tested features while being confident that we aren’t introducing regression defects.

As you’d likely guess, we’re very interested in improving our automated testing practices as a safety net to enable continuous delivery while also improving our quality in general. That leads to the next question, what kind of automated testing should we be doing? Followed by, is there automated testing we’re doing today that isn’t delivering enough bang for the buck?

To that point, let’s take a look at the classic idea of the testing pyramid at some point, as shown below:

From Unit test: sociable or solitary

The thinking behind the testing pyramid is that there’s a certain, healthy mix of different sorts of automated tests that efficiently lead to better results. I say a “mix” here because unit tests, though relatively cheap compared to other tests, cannot detect many defects that only come out during integration between components, code modules, or systems. From a quick search, I found worlds of memes along the lines of this one:

No integration tests, but all the unit tests pass!

To address exactly what kind of automated tests we should be writing, here’s a stream of consciousness brain dump that I later organized with a patina of organization:

On End to End User Interface Tests

Any kind of test that uses a tool like Selenium to do end to end, black-box testing is going to run slowly. These tests are also frequently be unstable because of asynchronous timing issues in modern browser applications. There’s an unhealthy tendency in many shops who adopt Selenium as a test automation solution to use it as their golden hammer to the exclusion of other testing techniques that can be much more efficient in certain circumstances. To put it bluntly, it’s very difficult to successfully author and maintain test automation suites based on Selenium against complicated applications. In my experience in shops that have attempted large scale Selenium usage, I do not believe that the benefits of those tests have ever outweighed the costs.

I would still recommend using some small number of end to end tests with a tool like Selenium, but those tests should be focused on proving out integration mechanisms between a user interface and backing server side code. For example, I’ve been working on a new integration of Open Id Connect (OIDC) authentication into our web services and web applications. I’ve used Playwright to automate browser testing to prove out the interactions and redirects between the OIDC service and our applications or services.

I also find Cypress.io interesting, but more for doing integration testing of our Angular applications by themselves with a dummy backend. For true end to end testing of .Net-backed web applications, I think I’m interested in replacing Selenium from here on out with Playwright as I think and hope it just does much more for you to make performant and reliable automated tests compared to Selenium.

Driving a browser should not be used to automate functional testing of business logic or data services or any kind of data analysis that could possibly be tested without using the full browser.

If you insist on trying to do a lot of browser automation testing, you better invest in collecting diagnostic information in test runs that can be used by developers to debug test failures. Ideally, I like to have the application’s log output correlated to the test run somehow. I’ve worked with teams that were able to pipe the console.log() tracing from the JavaScript code running in the browser to the test results and that was extremely helpful. Taking screenshots as part of the test can certainly help. As another ideal, I very strongly recommend that any kind of browser automation tests be executable by developers on their local machine on demand for easier debugging. More on this later.

Again, if you absolutely have to write Selenium/Playwright/Cypress tests despite all of my warnings, I strongly recommend you write those tests in the same programming language as the real application. That statement is going to be controversial if any real test automation engineers stumble into this post, but I think it’s important to make it as easy as possible for developers to collaborate with test automation engineers. Moreover, I despise the kind of shadow data access layers you can get from test automation code doing their own thing to write to and read from the underlying data store of the system under test. I think it’s less likely to get that kind of insidious, hidden code duplication if the test code is written in the same programming language and even uses the system’s own data access code to set up or verify database state as part of the automated tests.

Choosing Solitary/Unit Tests or Sociable/Integration Tests

I missed out on this when it was first published, but I think I like the nomenclature of solitary vs sociable tests better than thinking about unit vs integration tests. I also encourage folks to think of that as a continuum rather than a hard categorization. Moreover, I recommend that you switch between solitary and sociable tests even within the same test library where one or the other is more effective.

I would recommend organizing tests by functional area first, and only consider separating out integration tests into a separate testing project when it’s advantageous to use a “fast test, slow test” division for more efficient development.

We have formal requirements for test coverage metrics in our continuous integration builds, so I’d definitely make sure that any integration tests count toward that coverage number.

I think an emphasis on always writing classical unit tests can easily create a strong coupling between the production code and your testing code. That can and will reduce your ability to evolve your code, add new functionality, or do performance optimizations without rewriting your tests.

In many cases, integration tests that start at a natural sub-system facade or a logical controller/conductor entry point will do much better for you as a regression safety net to allow you to refactor your code to allow for new behavior or do important performance optimizations.

Case in point, I relied strictly on fine grained unit tests in my early work in StructureMap, and I definitely felt the negative consequences of that approach (I gave a talk about it in 2008 that’s still relevant). With later releases of StructureMap and now with Lamar, I lean much more heavily on integrated acceptance tests (let’s go ahead and call it Behavior Driven Development) that test from the entry point of the library down and focus on user-centric scenarios. I feel like that testing approach has led to much better results — both in the ease of adding new features, detecting regression defects in automated builds, and allowing me to evolve the functionality of the library.

On the other hand, integration tests can be harder to troubleshoot when they fail because you have more ground to cover. They also run slower of course. If you find that your feedback cycles feel too slow to efficiently run the tests continuously or especially if you find yourself doing long, marathon sessions in your debugger, stop and consider introducing more fine-grained tests first.

Running the Tests Locally vs Remotely

To the previous point, I think it’s critical that developers should be able to easily spin up and run automated integration tests on demand on their local development boxes. Tests will fail, and being able to easily troubleshoot a failing test is a prerequisite for successful test automation. If you can run an integration test locally, you’re much more likely to be able to iterate and try potential fixes quickly. There’s also the very real possibility of attaching a debugger to the testing process.

I think that our current technology set makes it much easier to do integration testing than it was when I was first getting started and the strict Michael Feathers definition of a unit test was in vogue. Just speaking from my own experience, the current .Net 5 generation is very easy to spin up and down in process for automated testing. Docker has been a great way to stand up development environments using Sql Server, Postgresql, Rabbit MQ, and other infrastructural tools.

As a follow up to the previous section, it’s also advantageous for automated tests to be able to run in process with the test harness code. For instance, I’d much rather do Alba testing of HTTP endpoints in .Net 5 where I’m able to quickly spin up an actual web service in memory and shut it down from my testing project. As opposed to the old, full .Net framework where you’d have to run the web service project in IIS or IISExpress first, then use HttpClient to address the service from your unit tests. The first, .Net 5/Alba approach is a much faster iteration and feedback cycle to support a Test Driven Development workflow than the 2nd approach.

Likewise, when given a choice between tests that can be run locally versus tests that can only be executed in a remote server location, give me the local tests every time. When and if you hit a scenario where you really need to run tests remotely *cough* serverless *cough*, that’s the one and only exception I can think of to my “never deploy from your local development box” rule. If depending on remote execution of tests, I’d at least want the ability to send my local development branch to the remote server at will. Just having a CI server build out pull request branches might get you there of course, but then you might be dependent upon being about to run that test suite in parallel with other CI builds. That’s not a show stopper, but it might make you have to invest more in your build automation to spin up isolated environments on demand.

Apollo Testing!

There’s plenty of debate over what the actual ration of UI tests to integration tests to unit tests should be, and plenty of folks have different metaphors than a “pyramid” to describe what they thing the ratio should be. I happen to like the integration test heavy ratio described in The Testing Trophy and Testing Classifications with the graphic below:

From The Testing Trophy and Testing Classifications

I think his image of the “testing trophy” looks a lot like the command module from the Apollo missions to the moon in the 70’s:

The Apollo Command Module

So from now on, I’m calling our intended testing approach the Apollo Testing Method!

What is the purpose of testing?

I’m just barely old enough that I started my official software development career in old fashioned waterfall models. In those days we did some unit testing with ad hoc testing tools to troubleshoot new code, but it wasn’t anywhere close to what developers do today with Test Driven Development and xUnit tools. As developers, we mostly ran the complete application on our development boxes and stepped through things manually to check out new code locally before throwing things over the wall to QA at the end of the project.

Regardless of whatever ad hoc, local testing developers did of their code locally, the only official testing that actually counted was the purely manual testing done by our testers in the testing environment that was supposed to exactly mimic the production environment (it never quite did, but that’s a story for another day). The QA team strictly used black-box testing with some direct access to the underlying database.

That old black-box testing approach at the end of the project was a much slower feedback cycle than we’re accustomed to today after the advent of Agile Software Development. The killer problem was that the testing feedback cycles were too slow to consider evolutionary design approaches because of the real fear of regression failures. It was also harder as a developer to address defects found in the testing cycle because you were frequently needing to work with code you hadn’t touched in many months that certainly wasn’t fresh in your mind. That frequently led to marathon debugging sessions while a helpful project manager came by your cubicle to cheerfully ask for any updates several times a day and crank up the pressure. As a developer you were also completely at the mercy of QA for information about what was really happening in the system when they found bugs.

In my mind, the most important element of Agile software development overall was the emphasis on improving feedback cycles. Faster feedback allowed

Testing is not about proving that our code works perfectly so much as a way to find and remove enough problems from the code that it can be deployed to production. I think this is an important approach because it allows us to use faster, finer grained testing approaches like isolated unit testing or intermediate level white-box integration tests that are generally faster running and cheaper to build than classic black-box, end to end tests.

What’s Next?

My organization has started an effort to introduce much more integration testing into our development processes as a way of improving quality and throughput. To help out on that, I’m going to attempt to write a series of blog posts going into specific areas about tools and techniques, but for right now I’m just jotting down this stream of consciousness brain dump to get started.

Based on what I think we need to establish at work, I’m thinking to cover:

  • .Net IHost bootstrapping and lifecycle within xUnit.Net or NUnit. I’m much more familiar with xUnit.Net from recent development, but we mostly use NUnit at work so I’ll be trying to cover that base as well.
  • HTTP API testing, which will inevitably feature Alba
  • Dealing with databases in tests, and that’s gonna have to cover both RDBMS databases and probably Mongo Db for now
  • Message handler testing with MassTransit and NServiceBus (we use both in different products)
  • Another brain dump on doing end to end testing after a meeting today about the CI of one of our big systems.
  • How should automated testing be integrated into the development cycle, and who should be responsible for these tests, and why is the obvious answer a very close collaboration between testers, developers, and even business experts
  • This will be a bigger stretch for me, but maybe get into how to do some semi-integration testing of an Angular front end with NgRX.
  • After talking through some of our issues with test automation at work, I think I’d like to blog about some of the positive things we did with Storyteller. I’ve been increasingly frustrated with xUnit.Net (and don’t think NUnit would be much better) for integration testing, so I’ve got quite a bit of notes about what an alternative tool optimized for integration tests could look like I wouldn’t mind publishing.

One thought on “A brain dump on automated integration testing

  1. Great article! Thank you for taking the time to write it.
    I also find the Solitary & Sociable nomenclature for tests more useful. I agree with the tradeoffs that one has to make when making the choice between Solitary and Sociable tests. The one thing that I still have issues with is the “Testing Trophy” or “Apollo” approach. I’m a big fan of the fast feedback cycle that Solitary tests give me. So whenever I can get away with it, I’ll go for the quick and fast micro test. Also the fact as you mentioned that Sociable tests are more difficult to diagnose when they fail drives me towards small Solitary tests. The tradeoff is indeed the increased coupling with the production code, although in my book I’ve made an attempt to somewhat mitigate this effect.

    It’s also perfectly valid to drive the design with Solitary tests and fallback on a couple Sociable tests, deleting the Solitary tests whenever the coupling starts to hurt. I use a similar approach to test doubles, where I sometimes start out by using them and completely refactor them out once the design comes to fruition.

    Anyway, the Test Pyramid served me well over the years and it still does. But I do recognise that there might be different concerns at stake when testing/driving the design of a library like StructureMap or Lamar, as they might be more infrastructural in nature compared to modern day LOB application.

Leave a comment