Imagining a Better Integration Testing Tool

I might be building out a new integration testing library named “Bobcat”, but mostly I just think that bobcats are cool. Scary looking though when you see them in the wild. They came back to the area where I grew up when the wild turkey population recovered in the 80’s/90’s.

Let’s say that you’re good at writing testable code, and you’re able to write isolated unit tests for the mass majority of your domain logic and much of your coordination workflow type of logic as well. Great, but that still leaves you with the need to write some type of integration test suite and maybe even some modicum of end to end tests through *shudder* Playwright, Selenium, or Cypress.io — not that I’m saying that those tools are bad per se, but that the technical challenge and overhead for succeeding with those tools can be through the roof.

Right now I’m in a spot where the mass majority of the testing for Marten and Wolverine are integration tests of one sort or another that involve databases and message brokers. Most of these tests behave just fine, as Marten especially is well suited for integration testing and I’ve invested in some helpful integration test support helpers built directly into Wolverine itself. Cool, but, um, there’s embarrassingly enough a significant number of “blinking” tests (unreliable automated tests) in Marten and far more in Wolverine. Moreover, there’s a number of other tests that behave perfectly on my high powered development box, but blink in continuous integration test runs.

And I know what you’re thinking, in this case it’s not at all because of shared, static state which is the typical root cause of many blinking test issues. There might very well be some race conditions we haven’t yet perfectly ironed out yet, but basically all of these ill behaved tests are testing asynchronous processing and will run into timeout failures inside a larger test suite run, but generally work reliably when running one at a time.

At this point, I don’t feel like the tooling (mostly xUnit.Net) we’re currently using for automating integration testing is perfectly appropriate for what we’re trying to do, and I’m ready to consider doing something a little bit custom — especially because much of the development coming my way soon is going to involve the exact kind of asynchronous behavior that’s already giving us trouble.

I am also currently helping a JasperFx Software client formulate an integration testing approach across multiple, collaborating micro services, so integration testing is top of mind for me right now.

Integration Testing Challenges

Understanding what’s happening inside your system in the case of failures
Testing timeouts, especially if you’re needing to test asynchronous processing
“Knowing” when asynchronous work is complete and delaying the *assertions* until those asynchronous actions are really complete
Having a sense for whether a long running integration test is proceeding, or hung
Data setup, especially in problem domains that require quite a bit of data in tests
Making the expression of the test as declarative as possible to make the test clear in its intentions
Preventing the test from being too tightly coupled to the internals of the system so the test isn’t too brittle when the system internals change
Being able to make the test suite *fail fast* when the system is detected to be in an invalid state — don’t blow this off, this can be a huge problem if you’re not careful
Selectively and intelligently retrying “blinking” tests — and yeah, you should try really hard to not need this capability, but you might no matter how hard you try

Scattered Thoughts about Possible Approaches

In the case of Wolverine and Marten’s “async daemon” testing, I think a large part of our problems are due to thread pool exhaustion as we rapidly spin up and down different IHost applications. My thought is to not just to do test retries when we detect time out failures, but also to do some level of exponential backoff to pause between starting the next test to let things simmer down in the runtime and let the thread pool snap back. In the worst case, I’d also like to consider a test runner implementation where a separate test manager process could trash and restart the actual test running process in certain cases (Storyteller could actually do this somewhat).

As far as “Knowing” when asynchronous work is complete across multiple running processes, I want to kick the tires on a distributed version of Wolverine’s in-process message tracking integration testing support that can tell you when all outstanding work is completely across threads. I definitely want this built into Wolverine itself some day, but I need to help a client do this in an architecture that doesn’t include Wolverine. With a tip from Martin Thwaites, maybe some kind of tracking using ActivitySource?

As far as tooling is concerned, I’ve contemplated forking xUnit.Net for a hot second, or writing a more “robust” test runner that can work with xUnit.Net types, resurrecting Storyteller (very unlikely), or building something new (“bobcat”?) and dogfooding it the whole way on mostly Wolverine tests. I’m not ready to talk about that much yet — and I’m well aware of the challenges in trying to tackle something like that after the time I invested in Storyteller in the mid to late 2010’s.

For distributed testing, I am intrigued right now by the new Project Aspire from Microsoft as a way to bootstrap and monitor a local or testing environment for integration testing.

I am certainly considering using SpecFlow for a completely different client where they could really use business person readable BDD specifications, but I don’t see SpecFlow by itself doing much to help us out with Wolverine development.

So, stay tuned if you’re interested in the journey here, or have some solid suggestions for us!

3 thoughts on “Imagining a Better Integration Testing Tool”

annemartijn says:

November 18, 2023 at 10:39 am

thread pool exhaustion, so that is why running a few hosts in parallel goes well, but for a lot gives me hangs

1. jeremydmiller says:
  
  November 18, 2023 at 12:45 pm
  
  It’s my theory. Even running tests single file, the runtime seems “tired”. There might be something to do smarter with the underlying TPL Dataflow usage, but I haven’t tried anything yet.
  
  1. annemartijn says:
    
    November 18, 2023 at 1:20 pm
    
    Well, I can confirm for my application that spinning up a new host for every test did make everything hang. Using a collection shared between tests fixed that. But running a few collections in parallel works fine as well. So your theory could hold