Integration Testing: Lessons from Storyteller and Other Thoughts

Continuing my series about automated testing. I’m wrestling quite a bit right now with the tooling for integration testing both at work and in some of my more complicated OSS projects like Marten or Jasper. I happily use xUnit.Net for unit testing on side projects, and we generally use NUnit at work. Both tools are well suited for unit testing, but lack something to be desired for integration testing in my opinion. At the same time, one of my long running OSS projects is Storyteller, which was originally meant as a Behavior Driven Development (BDD) tool (think Gherkin tools or the much older FitNesse tooling), but really turned out to mostly be advantageous for big, data intensive integration testing.

To paraphrase Miracle Max, Storyteller isn’t dead, it’s just mostly dead. Storyteller never caught on, the React UI is a bazillion versions out of date, buggy as hell, and probably isn’t worth updating. I’m busy with other things. Gherkin/Cucumber tools suck all the oxygen out of the room for alternative approaches to BDD or really even just integration testing. Plus our industry’s zombie like fixation on trying to automate all testing through Selenium is absolutely killing off all the enthusiasm for any other automated testing techniques — again, just my personal opinion.

However, I still want something better than what we have today, and there were some useful things that Storyteller provided for automated testing that I still want in automated testing tools. And of course there are also some things that I’d like to have where Storyteller fell short.

Things that I thought were valuable in Storyteller:

  • Rendering the test results as HTML with failures inline helps relate test failures to test inputs, expectations, or actions
  • BDD tests are generally expressed through a mix of “flow based” expressions (think about the English-like Given/When/Then nomenclature of Gherkin specifications) and “table based” testing that was very common in the older FIT/FitNesse style of BDD. A lot of data intensive problem domains can be best tested through declarative “table-driven” tests. For example, consider insurance quoting logic, or data transformations, or banking transactions where you’re pushing around a lot of data and also verifying matrices of expected results. I still very strongly believe that Storyteller’s focus on example tables and set verifications is the right approach for these kinds of problem domains. The example tables make the expression of the tests be both very declarative and readily reviewable by domain experts. The set verification approach in Storyteller is valuable because it will show you which values are missing or unexpected for easy test failure diagnosis. Table-driven testing is somewhat supported in the Gherkin specification, but I don’t think that’s well utilized in that space.
  • Being able to embed system diagnostics and logging directly into the test results such that the application logs are perfectly correlated to the integration test running has been very advantageous. As an example, in Jasper testing I pipe in all the system events around messages being sent, received, and executed so that you can use that data to understand how any given test behaved. I’ve found this feature to be incredibly helpful in solving the inevitable test failures.
  • Like many integration testing tools, Storyteller generally allows you to evaluate all the assertions of a long running test even after some condition or action has failed. This is the opposite of what we’re taught for fine-grained unit testing, but very advantageous in integration testing where a scenarios run very slowly and you want to maximize your feedback cycle. That approach can also easily lead to runaway test runs. Picking on Selenium testing yet again, imagine you have an integration test that uses Selenium to pull open a certain page in a web application and then verify the expected values of several screen elements. Following Selenium best practices, you have some built in “waiters” that try to slow down the test until the expected elements are visible on the screen. If the web application itself chokes completely on the web page, your Selenium heavy test might very easily loop through and timeout on the visibility of every single expected element. That’s slow. Even worse — and I’ve seen this happen in multiple shops — what if the whole web application is down, but each and every Selenium heavy test still runs and repeatedly times out all the various “waiters” in the test? That’s a hung build that can run all night (again, been there, done that). To that end, Storyteller has the concepts of “critical” or “catastrophic” errors in the test engine. For example, if you were using Storyteller to drive Selenium, you could teach Storyteller to treat the failure to load a web page as a “critical” exception that will stop the current test from continuing so you can “fail fast”. Likewise, if the very home page of the web application can’t be loaded (status code 500 y’all), you can have Storyteller throw a “catastrophic” exception that will pull the cord on the whole test suite run and stop all test executions regardless of any kind of test retry policy. Fail fast FTW!
  • Storyteller did a lot to make test data set up be as declarative as possible. I think this is hugely valuable for data-intensive problem domains like basically any enterprise application.
  • Targeted test retries in CI. You have to opt into retries on a test by test basic in Storyteller, and I think that was the right decision as opposed to letting every test failure get retried in CI.
  • Something I stole years ago from a former colleague at ThoughtWorks, Storyteller differentiates between tests that are “Acceptance” tests in the development lifecycle and tests that are labeled as “Regression” — meaning that those tests absolutely have to pass in CI while “Acceptance” tests are known to be a work in progress. I think that workflow is valuable for doing BDD style testing where the tests are specifications about how the system *should* behave.
  • Storyteller had a “step through” mode where you could manually step through the specification at execution time much like you would regular code in a debugger. That was hugely valuable.
  • Storyteller kept performance data about individual test steps as part of the test results, and even allowed you to specify performance thresholds that could fail a test if the performance did not satisfy that threshold. That’s not to say Storyteller specification were anything more than a good complement to real load or performance testing, but it was still very helpful to diagnose long running tests and very frequently identified real application performance bottlenecks.

Things that weren’t good about Storyteller, and you should do better:

  • The IDE integration was nonexistent. At a bare minimum you need easy ways to run individual or groups of specifications at will from your IDE for a quick feedback cycle. The best I ever did with Storyteller was an xUnit.Net bridge that was so so. Maybe the new “dotnet test” specification could help here
  • It’s a Gherkin world and the rest of us just live in it. If there was a serious attempt to reboot Storyteller, it absolutely has to be Gherkin compatible. Think there would be Storyteller customizations as well though, and I despise the silly Given/When/Then handcuffs. Any new Storyteller absolutely has to allow devs to efficiently build and run new specifications without having to jump into an external tool of some sort
  • It’s got to be more asynchronous code friendly because that’s the world we live in

I’ve got a lot more notes about specific use cases I’ll have to dig out later.

What am I going to do?

For Jasper & Marten where there’s no quick win to replace Storyteller, I think I might cut a quickie Storyteller v6 that updates everything to .Net 5/6 and makes the testing framework much more asynchronous code friendly as a short term thing.

Longer term, I don’t know. At work I think the best approach would be to convince myself to like SpecFlow and possibly add some Storyteller functionality as addons to SpecFlow for some of its missing functionality. I’ve got worlds of notes about a rebooted Storyteller or maybe a completely different approach I’ve nicknamed “Bobcat” that would be an xUnit-like tool that’s just very optimized for integration testing (shared setup data, teardown, smarter heuristics for parallelizing tests). I’m also somewhat curious to see if the rebooted Fixie could be molded into a better tool for integration testing.

Honestly, if a newly rebooted Storyteller happens, it’ll be an opportunity for me to refresh my UI/UX skills.

Leave a comment