Storyteller 3: Executable Specifications and Living Documentation for .Net

tl;dr: The open source Storyteller 3 is an all new version of an old tool that my shop (and others) use for customer facing acceptance tests, large scale test automation, and “living documentation” generation for code-centric systems.

A week from today I’m giving a talk at .Net Unboxed on Storyteller 3, an open source tool largely built by myself and my colleagues for creating, running, and managing Executable Specifications against .Net projects based on what we feel are the best practices for automated testing based on over a decade of working with automated integration testing. As I’ll try to argue in my talk and subsequent blog posts, the most complete approach in the .Net ecosystem for reliably and economically writing large scale automated integration tests.

My company and a couple other early adopters have been using Storyteller for daily work since June and the feedback has been pleasantly positive so far. Now is as good of time as any to make a public beta release for the express purpose of getting more feedback on the tool so we can continue to improve the tool prior to an official 3.0 release in January.

If you’re interested in kicking the tires on Storyteller, the latest beta as of now is available on For help getting started, see our tutorial and getting started pages.

Some highlights:

It’s improved a lot since then, but I gave a talk at work in March previewing Storyteller 3 that at least discusses the goals and philosophy behind the tool and Storyteller’s approach to acceptance tests and integration tests.

A Brief History

I had a great time at Codemash this year catching up with old friends. I was pleasantly surprised when I was there to be asked several times about the state of Storyteller, an OSS project others had originally built in 2008 as a replacement for FitNesse as our primary means of expressing and executing automated customer facing acceptance tests. Frankly, I always thought that Storyteller 1 and the incrementally better Storyteller 2 were failures in terms of usability and I was so burnt out on working with it that I had largely given up on it and ignored it for years.

Unfortunately, my shop has a large investment in Storyteller tests and our largest and most active project was suffering with heinously slow and unreliable Storyteller regression test suites that probably caused more harm than good with their support costs. After a big town hall meeting to decide whether to scrap and replace Storyteller with something else, we instead decided to try to improve Storyteller to avoid having to rewrite all of our tests. The result has been an effective rewrite of Storyteller with an all new client. While trying very hard to mostly preserve backward compatibility with the previous version in its public API’s, the .Net engine is also a near rewrite in order to squeeze out as much performance and responsiveness as we could.


The official 3.0 release is going to happen in early January to give us a chance to possibly get more early user feedback and maybe to get some more improvements in place. You can see the currently open issue list on GitHub. The biggest things outstanding on our roadmap are:

  • Modernize the client technology to React.js v14 and introduce Redux and possibly RxJS as a precursor to doing any big improvements to the user interface and trying to improve the performance of the user interface with big specification suites
  • A “step through” mode in the interactive specification running so users can step through a specification like you would in a debugger
  • The big one, allow users to author the actual specification language in the user interface editor with some mechanics to attach that language to actual test support code later

Succeeding with Automated Integration Tests

tl;dr This post is an attempt to codify my thoughts about how to succeed with end to end integration testing. A toned down version of this post is part of the Storyteller 3 documentation

About six months ago the development teams at my shop came together in kind of a town hall to talk about the current state of our automated integration testing approach. We have a pretty deep investment in test automation and I think we can claim some significant success, but we also have had some problems with test instability, brittleness, performance, and the time it takes to author new tests or debug existing tests that have failed.

Some of the problems have since been ameliorated by tightening up on our practices — but that still left quite a bit of technical friction and that’s where this post comes in. Since that meeting, I’ve been essentially rewriting our old Storyteller testing tool in an attempt to address many of the technical issues in our automated testing. As part of the rollout of the new Storyteller 3 to our ecosystem, I thought it was worth a post on how I think teams can be more successful at automated end to end testing.

Test Stability

I’ve worked in far too many environments and codebases where the automated tests were “flakey” or unreliable:

  • Teams that do all of their development against a single shared, development database such that the data setup is hard to control
  • Web applications with a lot of asynchronous behavior are notoriously hard to test and the tests can be flakey with timing issues — even with all the “wait for this condition on the page to be true” discipline in the world.
  • Distributed architectures can be difficult to test because you may need to control, coordinate, or observe multiple processes at one time.
  • Deployment issues or technologies that tend to hang on to file locks, tie up ports, or generally lock up resources that your automated tests need to use

To be effective, automated tests have to be reliable and repeatable. Otherwise, you’re either going to spend all your time trying to discern if a test failure is “real” or not, or you’re most likely going to completely ignore your automated tests altogether as you lose faith in them.

I think you have several strategies to try to make your automated, end to end tests more reliable:

  1. Favor white box testing over black box testing (more on this below)
  2. Closely related to #1, replace hard to control infrastructure dependencies with stub services, even in functional testing. I know some folks absolutely hate this idea, but my shop is having a lot of success in using an IoC tool to swap out dependencies on external databases or web services in functional testing that are completely out of our control.
  3. Isolate infrastructure to the test harness. For example, if your system accesses a relational database, use an isolated schema for the testing that is only used by the test harness. Shared databases can be one of the worst impediments to successful test automation. It’s both important to be able to set up known state in your tests and to not get “false” failures because some other process happened to alter the state of your system while the test is running. Did I mention that I think shared databases are a bad idea yet?*
  4. Completely control system state setup in your tests or whatever build automation you have to deploy the system in testing.
  5. Collapse a distributed application down to a single process for automated functional testing rather than try to run the test harness in a different process than the application. In our functional tests, we will run the test harness, an embedded web server, and even an embedded database in the same process. For distributed applications, we have been using additional .Net AppDomain’s to load related services and using some infrastructure in our OSS projects to coordinate the setup, teardown, and even activity in these services during testing time.
  6. As a last resort for a test that is vulnerable to timing issues and race conditions, allow the test runner to retry the test

Failing all of those things, I definitely think that if a test that is so unstable and unreliable that it renders your automated build useless that you just delete that test. I think a reliable test suite with less coverage is more useful to a team than a more expansive test suite that is not reliable.

You Gotta Have Continuous Integration

This section isn’t the kind of pound on the table, Uncle Bob-style of “you must do this or you’re incompetent” kind of rant that causes the Rob Conery’s of the world have conniptions. Large scale automation testing simply does not work if the automated tests are not running regularly as the system continues to evolve.

Automated tests that are never or seldom executed can even be a burden on a development team that still try to keep that test code up to date with architectural changes. Even worse, automated tests that are not constantly executed are not trustworthy because you no longer know if test failures are real or just because the application structure changed.

Assuming that your automated tests are legitimately detecting regression problems, you need to determine what recent change introduced the problem — and it’s far easier to do that if you have a smaller list of possible changes and those changes are still fresh in the developer’s mind. If you are only occasionally running those automated tests, diagnosing failing tests can be a lot like finding the proverbial needle in the haystack.

I strongly prefer to have all of the automated tests running as part of a team’s continuous integration (CI) strategy — even the heavier, slower end to end kind of tests. If the test suite gets too slow (we have a suite that’s currently taking 40+ minutes), I like the “fast tests, slow tests” strategy of keeping one main build that executes the quicker tests (usually just unit tests) to give the team reasonable confidence that things are okay. The slower tests would be executed in a cascading build triggered whenever the main build completes successfully. Ideally, you’d like to have all the automated tests running against every push to source control, but even running the slower tests suites in a nightly or weekly scheduled build is better than nothing.

Make the Tests Easy to Run Locally

I think the section title is self-explanatory, but I’ve gotten this very wrong in the past in my own work. Ideally, you would have a task in your build script (I still prefer Rake, but substitute MSBuild, Fake, Make, Gulp, NAnt, whatever you like) that completely sets up the system under test on your machine and runs whatever the test harness. In a less perfect world a developer has to jump through hoops to find hidden dependencies and take several poorly described steps in order to run the automated tests. I think this issue is much less problematic than it was earlier in my career as we’ve adopted much more project build automation and moved to technologies that are easier to automate in deployment. I haven’t gotten to use container technologies like Docker myself yet, but I sure hope that those tools will make doing the environment setup for automating tests easier in the future.

Whitebox vs. Blackbox Testing

I strongly believe that teams should generally invest much more time and effort into whitebox tests than blackbox tests. Throughout my career, I have found that whitebox tests are frequently more effective in finding problems in your system – especially for functional testing – because they tend to be much more focused in scope and are usually much faster to execute than the corresponding black box test. White box tests can also be much easier to write because there’s simply far less technical stuff (databases, external web services, service buses, you name it) to configure or set up.

I do believe that there is value in having some blackbox tests, but I think that these blackbox tests should be focused on finding problems in technical integrations and infrastructure whereas the whitebox tests should be used to verify the desired functionality.

Especially at the beginning of my career, I frequently worked with software testers and developers who just did not believe that any test was truly useful unless the testing deployment was exactly the same as production. I think that attitude is inefficient. My philosophy is that you write automated tests to find and remove problems from your system, but not to prove that the system is perfect. Adopting that philosophy, favoring white box over black box testing makes much more sense.

Choose the Quickest, Useful Feedback Mechanism

Automating tests against a user interface has to be one of the most difficult and complex undertakings in all of software development. While teams have been successful with test automation using tools like WebDriver, I very strongly recommend that you do not test business logic and rules through your UI if you don’t have to. For that matter, try hard to avoid testing business logic without using the database. What does this mean? For example:

  • Test complex logic by calling into a service layer instead of the UI. That’s a big issue for one of the teams I work with who really needs to replace a subsystem behind http json services without necessarily changing the user interface that consumes those services. Today the only integration testing involving that subsystem is done completely end to end against the full stack. We have plenty of unit test coverage on the internals of that subsystem, but I’m pretty certain that those unit tests are too coupled to the implementation to be useful as regression or characterization tests when that team tries to improve or replace that subsystem. I’m strongly recommending that that team write a new suite of tests against the gateway facade service to that subsystem for faster feedback than the end to end tests could ever possibly be.
  • Use Subcutaneous Tests even to test some UI behavior if your application architecture supports that
  • Make HTTP calls directly against the endpoints in a web application instead of trying to automate the browser if that can be useful to test out the backend.
  • Consider testing user interface behavior with tightly controlled stub services instead of the real backend

The general rule we encourage in test automation is to use the “quickest feedback cycle that tells you something useful about your code” — and user interface testing can easily be much slower and more brittle than other types of automated testing. Remember too that we’re trying to find problems in our system with our tests instead of trying to prove that the system is perfect.

Setting up State in Automated Tests

I wrote a lot about this topic a couple years ago in My Opinions on Data Setup for Functional Tests, and I don’t have anything new to say since then;) To sum it up:

  • Use self-contained tests that set up all the state that a test needs.
  • Be very cautious using shared test data
  • Use the application services to set up state rather than some kind of “shadow data access” layer
  • Don’t couple test data setup to implementation details. I.e., I’d really rather not see gobs of SQL statements in my automated test code
  • Try to make the test data setup declarative and as terse as possible

Test Automation has to be a factor in Architecture

I once had an interview for a company that makes development tools. I knew going in that their product had some serious deficiencies in their automated testing strategy. When I told my interviewer that I was confident that I could help that company make their automated testing support much better, I was told that testing was just a “process issue.” Last I knew, it is still weak for its support for automating tests against systems that use that tool.

Automated testing is not merely a “process issue,” but should be a first class citizen in selecting technologies and shaping your system architecture. I feel like my shop is far above average for our test automation and that is in no small part because we have purposely architected our applications in such a way to make functional, automated testing easier. The work I described in sections above to collapse a distributed system into one process for easier testing, using a compositional architecture effectively composed by an IoC tool, and isolatating business rules from the database in our systems has been vital to what success we have had with automated testing. In other places we have purposely added logging infrastructure or hooks in our application code for no other reason than to make it easier for test automation infrastructure to observe or control the application.

Other Stuff for later…

I don’t think that in 10 years of blogging I’ve ever finished a blog series, but I might get around to blogging about how we coordinate multiple services in distributed messaging architectures during automated tests or how we’re integrating much more diagnostics in our automated functional tests to spot and prevent performance problems from creeping into application.

* There are some strategies to use in testing if you absolutely have no other choice in using a shared database, but I’m not a fan. The one approach that I want to pursue in the future is utilizing multi-tenancy data access designs to create a fake tenant on each test run to keep the data isolated for the test even if the damn database is shared. I’d still rather smack the DBA types around until they get their project automation act together so we could all get isolated databases.

Integration Testing with FubuMVC and OWIN

tl;dr: Having an OWIN host to run FubuMVC applications in process made it much easier to write integration tests and I think OWIN will end up being a very positive thing for the .Net development community.

FubuMVC is still dead-ish as an active OSS project, but the things we did might still be useful to other folks so I’m continuing my series of FubuMVC lessons learned retrospectives — and besides, I’ve needed to walk a couple other developers through writing integration tests against FubuMVC applications in the past week so writing this post is clearly worth my time right now.

One of the primary design goals of FubuMVC was testability of application code, and while I think we succeeded in terms of simpler, cleaner unit tests from the very beginning (especially compared to other web frameworks in .Net), there are so many things that can only be usefully tested and verified by integration tests that execute the entire HTTP stack.

Quite admittedly, FubuMVC was initially very weak in this area — partially because the framework itself only ran on top of ASP.Net hosting. While we had good unit test coverage from day one, our integration testing story had to evolve as we went in roughly these steps:

  1. Haphazardly build sample usages of features in an ASP.Net application that had to run hosted in either IIS or IISExpress that had to be executed manually. Obviously not a great solution for regression testing or for setting up test scenarios either.
  2. A not completely successful attempt to run FubuMVC on the early “Delegate of Doom” version of the OWIN specification and the old Kayak HTTP server. It worked just well enough to get my hopes up but didn’t really fly (there were some scenarios where it would just hang). If you follow me on Twitter and seen me blast OWIN as a “mystery meat” API, it’s largely because of lingering negativity from our first attempt.
  3. Running FubuMVC on top of Web API’s Self Host libraries. This was a nice win as it enabled us to embed FubuMVC applications in process for the first time, and our integration test coverage improved quickly. The Self Host option was noticeably slow and never worked on Mono* for us.
  4. Just in time for the 1.0 release, we finally made a successful attempt at OWIN hosting with the less insane OWIN 1.0 specification, and we’ve ran most of our integration tests and acceptance tests on Katana ever since. Moving to Katana and OWIN was much faster than the Web API Self Host infrastructure and at least in early versions, worked on Mono (the Katana team has periodically broken Mono support in later versions).
  5. For the forthcoming 2.0 release I built a new “InMemoryHost” model that can execute a single HTTP request at a time using our OWIN support without needing any kind of HTTP server.

I cannot overstate how valuable it has been to have an embedded hosting model has been for automated testing. Debugging test failures, using the application services exactly as the real application is configured, and setting up test data in databases or injecting in fake services is much easier with the embedded hosting models.

Here’s a real problem that I hit early this year. FubuMVC’s content negotiation (conneg) logic for selecting readers and writers was originally based only on the HTTP accepts and content-type headers, which is great and all except for how many ill behaved clients there are out there that don’t play nice with the HTTP specification. I finally went into our conneg support earlier this year and added support for overriding the accepts header, with an in the box default that looked for a query string on the request like “?format=json” or “format=xml.” While there are some unit tests for the internals of this feature, this is exactly the kind of feature that really has to be tested through an entire HTTP request and response to verify correctness.

If you’re having any issues with the formatting of the code samples, you can find the real code on GitHub at the bottom of this page.


I started by building a simple GET action that returned a simple payload:

    public class OverriddenConnegEndpoint
        public OverriddenResponse get_conneg_override_Name(OverriddenResponse response)
            return response;

    public class OverriddenResponse
        public string Name { get; set; }

Out of the box, the new “conneg/override/{Name}” route up above would respond with either a json or xml serialization of the output model based on the value of the accepts header, with “application/json” being the default representation in the case of wild cards for the accepts header. In the functionality, content negotiation needs to also look out for the new query string rules.

The next step is to define our FubuMVC application with a simple IApplicationSource class in the same assembly:

    public class SampleApplication : IApplicationSource
        public FubuApplication BuildApplication()
            return FubuApplication.DefaultPolicies().StructureMap();

The role of the IApplicationSource class in a FubuMVC application is to bootstrap the IoC container for the application and define any custom policies or configuration of a FubuMVC application. By using an IApplicationSource class, you’re establishing a reusable configuration that can be quickly applied to automated tests to make your tests as close to the production deployment of your application as possible for more realistic testing. This is crucial for FubuMVC subsystems like validation or authorization that mix in behavior to routes and endpoints off of conventions determined by type scanning or configured policies.


Using Katana and EndpointDriver with FubuMVC 1.0+

First up, let’s write a simple NUnit test for overriding the conneg behavior with the new “?format=json” trick, but with Katana and the FubuMVC 1.0 era EndpointDriver object:

        public void with_Katana_and_EndpointDriver()
            using (var server = EmbeddedFubuMvcServer
                server.Endpoints.Get("conneg/override/Foo?format=json", acceptType: "text/html")

Behind the scenes, the EmbeddedFubuMvcServer class spins up the application and a new instance of a Katana server to host it. The “Endpoints” object exposes a fluent interface to define and execute an HTTP request that will be executed with .Net’s built in WebClient object. The EndpointDriver fluent interface was originally built specifically to test the conneg support in the FubuMVC codebase itself, but is usable in a more general way for testing your own application code written on top of FubuMVC.


Using the InMemoryHost and Scenarios in FubuMVC 2.0

EndpointDriver was somewhat limited in its coverage of common HTTP usage, so I hoped to completely replace it in FubuMVC 2.0 with a new “scenario” model somewhat based on the Play Framework’s Play Specification tooling in Scala. I knew that I also wanted a purely in memory hosting model for integration tests to avoid the extra time it takes to spin up an instance of Katana and sidestep potential port contention issues.

The result is the same test as above, but written in the new “Scenario” style concept:

        public void with_in_memory_host()
            // The 'Scenario' testing API was not completed,
            // so I never got around to creating more convenience
            // methods for common things like deserializing JSON
            // into .Net objects from the response body
            using (var host = InMemoryHost.For<SampleApplication>())
                host.Scenario(_ => {



The newer Scenario concept was an attempt to make HTTP centric testing be more declarative. C# is not as expressive as Scala is, but I was still trying to make the test expression as clean and readable as possible and the syntax above probably would have evolved after more usage. The newer Scenario concept also has complete access to FubuMVC 2.0’s raw HTTP abstractions so that you’re not limited at all in what kinds of things you can express in the integration tests.

If you want to see more examples of both EmbeddedFubuMvcServer/EndpointServer and InMemoryHost/Scenario in action, please see the FubuMVC.IntegrationTesting project on GitHub.


Last Thoughts

If you choose to use either of these tools in your own FubuMVC application testing, I’d highly recommend doing something at the testing assembly level to cache the InMemoryHost or EmbeddedFubuMvcServer so that they can be used across test fixtures to avoid the nontrivial cost of repeatedly initializing your application.

While I’m focusing on HTTP centric testing here, using either tool above also has the advantage of building your application’s IoC container out exactly the way it should be in production for more accurate integration testing of underlying application services.


If I had to do it all over again…

We would have had an embedded hosting model from the very beginning, even if it had been on a fake, “only one HTTP request at a time” model. Moreover, if we were to ever reboot a new FubuMVC like web framework in the future KVM, I would vote for wrapping any new framework completely around the OWIN signature as our one and only model of an HTTP request. In any theoretical future effort, I’d invest time from the very beginning in something like FubuMVC 2.0’s InMemoryHost model early to make integration and acceptance testing easier and faster.

With the recent release of StructureMap 3, I’d also opt for a new child container model such that you can fake out application services on a test by test basis and rollback to the previous container state without having to re-initialize the entire application each time for faster testing.


* Mono support was a massive time sink for the FubuMVC project and never really paid off. Maybe having Xamarin involved much earlier in the new KVM .Net runtime and the emphasis on PCL supportwill make that situation much better in the future.

Adventures in Custom Testing Infrastructure

tl;dr: Sometimes the overhead of writing custom testing infrastructure can lead to easier development


Quick Feedback Cycles are Key

It’d be nice if someday I could write all my code perfectly in both structure and function the first time through, but for now I have to rely on feedback mechanisms to tell me when the code isn’t working correctly. That being said, I feel the most productive when I have the tightest feedback cycle between making a change in code and knowing how it’s actually working — and by “quick” I mean both the time it takes for me to setup the feedback cycle and how long the feedback cycle itself takes.

While I definitely like using quick twitch feedback tools like REPL’s or auto-reloading/refreshing web tools like our own fubu run or Mimosa.js’s “watch” command, my primary feedback mechanism for code centric tasks is usually automated tests. That being said, it helps when the tests are mechanically easy to write and run quickly enough that you can get into a nice “red/green/refactor” cycle. For whatever reasons, I’ve hit several problem domains in the last couple years where it was laborious in my time to set up the preconditions and testing inputs and also to measure and assert on the expected outcomes.


Maybe Invest in Some Custom Testing Infrastructure?

In some cases I knew right away that testing a feature was going to be a problem, so I started by asking myself “how do I wish I could express the test setup and assertions.” If it seems feasible, I’ll write custom ObjectMother if that’s possible or Test Data Builder‘s for the data setup in more complex cases. I’ve occasionally resorted to building little interpreters that read text and create data structures or files (I do this more often for hierarchical data than anything else I think) or perform assertions on the final state.

You can see an example of this in my old Storyteller2 codebase. Storyteller is a tool for automated acceptance tests and includes a tree view pane in the UI with the inevitable hierarchy of tests organized by suites in an n-deep hierarchy like:

Top Level Suite
  - Suite 1
    -Suite 2
    -Suite 3
      - Test 1
      - Test 2

In the course of building the Storyteller client, I needed to write a series of tests on the tree view state that had to start with a known hierarchy of suites and test files as inputs. After performing actions like filtering or receiving state updates within the UI, I needed to assert on the expected display in this test explorer pane (which tests and suites were visible and were they marked as running, failed, successful, or unknown).

First, to deal with the setup of the hierarchical data I created a little custom class that read flat text data and turned that into the desired hierarchy:

            hierarchy =

Then in the “assertion” part of the test I created a custom specification class that could again read its expectations expressed as flat text and assert that the resulting tree view exactly matched the specified state:

        public void the_child_nodes_are_constructed_with_the_empty_suite()
            var spec =
                new TreeNodeSpecification(


As I recall, writing the simple text parsing classes just to make the expression of the automated tests made it pretty easy to add new behavior quickly. In this case, the time investment upfront for the custom testing infrastructure paid off.


FubuMVC’s View Engine Support

A couple months ago I finally got to carve off some time to finally go overhaul the view engine support code in FubuMVC. My main goals were to cut the unnecessarily complex internal code down to something more manageable as a precursor to optimizing both runtime performance and FubuMVC’s time to initialize an application. Since I was about to start monkeying around quite a bit with the internals of code that many of our users depend on, it’s a good thing that we had an existing suite of integration tests that acted as acceptance tests (think layouts, partials, HTML helpers, and our conventional attachment of views to routes) so that in theory I could safely make the restructuring changes without breaking existing behavior.

Going in though, I knew that there was some significant drawbacks to using our existing mechanism for testing the view engine support and I wasn’t looking forward to the inevitable test failures or formulating new integration tests.


Problems with the Existing Test Suite

In order to write end to end tests against the view engine support we had been effectively writing little mini FubuMVC applications inside our integration test libraries. Quite naturally, that often meant adding several view files and folders to simulate all the different permutations for layout rendering, using partials, sharing views from external Bottles (a superset of Area’s for you ASP.Net MVC folks), and view profiles (mobile vs. desktop for example). In the test fixtures we would spin up a FubuMVC application with Katana, run HTTP requests, and make assertions against the content that should or should not be present in the HTTP response body.

It wasn’t terrible, but it came with a serious drawbacks:

  1. It wasn’t complete and I’d need to add additional tests
  2. It was expensive in mechanical effort to create those little mini FubuMVC applications that had to be spread over so many different files and even folders
  3. Understanding the tests when something went wrong could be difficult because the expression of the test was effectively split over so many files


The New Approach

Before going too far into the code changes against the view engine support, I built a new test harness that would allow me to express in one testing class file:

  1. What all the views and layouts were in the entire system including the content of the views
  2. What the views were in external Bottles loaded into the application
  3. If necessary, configure a complete FubuMVC application if the defaults weren’t sufficient for the test
  4. Declare what content should and should not be rendered when certain routes were executed

The end result was a base class I called ViewIntegrationContext. Mechanically, I made TestFixture classes deriving from this abstract class. In the constructor function of the test fixture classes I would specify the location, content, and view model of any number of Spark or Razor views. When the test fixture class was first executed, it would:

  1. Create a brand new folder using a guid as the name to host the new “application” to avoid collisions with existing test runs (while the new test harness does try to clean up after itself, I’ve learned not to be very trusting of the file system during automated tests)
  2. Write out the Spark and Razor files based on the data specified in the constructor function to the new application folder
  3. Optionally load content Bottles and FubuMVC configurations inside the test harness (ignore that for now if you would, but it was a huge win for me)
  4. Load a new FubuMVC application in memory with the root directory pointing to our new folder for just this test

For each test, the ViewIntegrationContext object uses FubuMVC 2.0’s brand new in memory test harness (somewhat inspired by PlaySpecification from Scala) to execute a “Scenario” where I could declaratively specify what url to render and assert what content should or should not be present in the HTML output.

To make this concrete, the very simplest test to check that FubuMVC really can render a Spark view looks like this:

    public class Simple_rendering : ViewIntegrationContext
        public Simple_rendering()
<p>This is real output</p>

        public void can_render()
            Scenario.Get.Input(new AirInputModel{TakeABreath = true});
            Scenario.ContentShouldContain("<h2>Breathe in!</h2>");

    public class AirEndpoint
        public AirViewModel TakeABreath(AirRequest request)
            return new AirViewModel { Text = "Take a {0} breath?".ToFormat(request.Type) };

        public BreatheViewModel get_breathe_TakeABreath(AirInputModel model)
            var result = model.TakeABreath
                ? new BreatheViewModel { Text = "Breathe in!" }
                : new BreatheViewModel { Text = "Exhale!" };

            return result;

    public class AirRequest
        public AirRequest()
            Type = "deep";

        public string Type { get; set; }

    public class AirInputModel
        public bool TakeABreath { get; set; }

    public class AirViewModel
        public string Text { get; set; }

    public class BreatheViewModel : AirViewModel



So did this payoff? Heck yeah it did, especially for scenarios where I needed to build out multiple views and layouts. The biggest win for me was that the tests were completely self-contained instead of spread out over so many files and folders. Even better yet, the new in memory Scenario support in FubuMVC made the actual tests very declarative with decently descriptive failure messages.


It’s Not All Rainbows and Unicorns

I cherry picked some examples that I felt went well, but there have been some other times when I’ve gone down a rabbit hole of building custom testing infrastructure only to see it be a giant boondoggle. There’s a definite bit of overhead to writing this kind of tooling and you always have to consider whether you’ll save time in the whole compared to writing more crude or repetitive testing code. While I tend to be aggressive about building custom test harnesses, you might accurately call it a speculative exercise and hold off until you feel some pain in your testing.

Moreover, any kind of custom test harness where you decouple the expression of the test (inputs, actions, and assertions) from the actual code that’s being exercised obfuscates your traceability back to the actual code. I’ve seen plenty of cases where the “goodness” of making the expression of the test prettier and more declarative was more than offset by how hard it was to debug test failures because of the extra mental overhead of connecting the meaning of the test to the code that should be implementing it. It’s for that reason that I’ve never been a big fan of most Behavior Driven Development tools for testing that isn’t customer facing.




A Simple Example of a Table Driven Executable Specification

My shop is starting to go down the path of executable specifications (using Storyteller2 as the tooling, but that’s not what this post is about).  As an engineering practice, executable specifications* involves specifying the expected behavior of a user story with concrete examples of exactly how the system should behave before coding.  Those examples will hopefully become automated tests that live on as regression tests.

What are we hoping to achieve?

  • Remove ambiguity from the requirements with concrete examples.  Ambiguity and misunderstandings from prose based requirements and analysis has consistently been a huge time waste and source of errors throughout my career.
  • Faster feedback in development.  It’s awfully nice to just run the executable specs in a local branch before pushing anything to the testers
  • Find flaws in domain logic or screen behavior faster, and this has been the biggest gain for us so far
  • Creating living documentation about the expected behavior of the system by making the specifications human readable
  • Building up a suite of regression tests to make later development in the system more efficient and safer

Quick Example

While executable specifications are certainly a very challenging practice from the technical side of things, in the past week or so I’m aware of 3-4 scenarios where the act of writing the specification tests has flushed out problems with our domain logic or screen behavior a lot faster than we could have done otherwise.

Part of our application logic involves fuzzy matching against people in our system against some, ahem, not quite trustworthy data from external partners. Our domain expert explained the matching logic that he wanted was to match a person’s social security number, birth date, first name, and last name — but the name matching should be case insensitive and it’s valid to match on the initial of the first name.  Since this logic can be expressed as a set number of inputs and the one output with a great number of permutations, I chose to express this specification as a table with Storyteller (conceptually identical to the old ColumnFixture in FitNesse).  The final version of the spec is shown below  (click the image to get a more readable version):


The image above is our final, approved version of this functionality that now lives as both documentation and a regression test.  Before that though, I wrote the spec and got our domain expert to look at it, and wouldn’t you know it, I had misunderstood a couple assumptions and he gave me very concrete feedback about exactly what the spec should have been.

To make this just a little bit more concrete, our Storyteller test harness connects the table inputs to the system under test with this little bit of adapter code:

The code behind the executable spec
  1.     public class PersonFixture : Fixture
  2.     {
  3.         public PersonFixture()
  4.         {
  5.             Title = “Person Matching Logic”;
  6.         }
  7.         [ExposeAsTable(“Person Matching Examples”)]
  8.         [return:AliasAs(“Matches”)]
  9.         public bool PersonMatches(
  10.             string Description,
  11.             [Default(“555-55-5555”)]SocialSecurityNumber SSN1,
  12.             [Default(“Hank”)]string FirstName1,
  13.             [Default(“Aaron”)]string LastName1,
  14.             [Default(“01/01/1974”)]DateCandidate BirthDate1,
  15.                                   [Default(“555-55-5555”)]SocialSecurityNumber SSN2,
  16.             [Default(“Hank”)]string FirstName2,
  17.             [Default(“Aaron”)]string LastName2,
  18.             [Default(“01/01/1974”)]DateCandidate BirthDate2)
  19.         {
  20.             var person1 = new Person
  21.             {
  22.                 SSN = SSN1,
  23.                 FirstName = FirstName1,
  24.                 LastName = LastName1,
  25.                 BirthDate = BirthDate1
  26.             };
  27.             var person2 = new Person
  28.             {
  29.                 SSN = SSN2,
  30.                 FirstName = FirstName2,
  31.                 LastName = LastName2,
  32.                 BirthDate = BirthDate2
  33.             };
  34.             return person1.Equals(person2);
  35.         }
  36.     }

* Jeremy, is this really just Behavior Driven Development (BDD)?  Or the older idea of Acceptance Test Driven Development (ATDD)?  This is some folks’ definition of BDD, but BDD is so overloaded and means so many different things to different people that I hate using the term.  ATDD never took off, and “executable specifications” just sounds cooler to me, so that’s what I’m going to call it.

My Opinions on Data Setup for Functional Tests

I read Jim Holmes’s post Data Driven Testing: What’s a Good Dataset Size? with some interest because it’s very relevant to my work.  I’ve been heavily involved in test automation efforts over the past decade, and I’ve developed a few opinions about how best to handle test data input for functional tests (as opposed to load/scalability/performance tests).  First though, here’s a couple concerns I have:

  • Automated tests need to be reliable.  Tests that require external, environmental tests can be brittle in unexpected ways.  I hate that.
  • Your tests will fail from time to time with regression bugs.  It’s important that your tests are expressed in a way that makes it easy to understand the cause and effect relationship between the “known inputs” and the “expected outcomes.”  I can’t tell you how many times I’ve struggled to fix a failing test because I couldn’t even understand what exactly it was supposed to be testing.
  • My experience says loudly that smaller, more focused automated tests are far easier to diagnose and fix when they fail than very large, multi-step automated tests.  Moreover, large tests that drive user interfaces are much more likely to be unstable and unreliable.  Regardless of platform, problem domain, and team, I know that I’m far more productive when I’m working with quicker feedback cycles.  If any of my colleagues are reading this, now you know why I’m so adamant about having smaller, focused tests rather than large scripted scenarios.
  • Automated tests should enable your team to change or evolve the internals of your system with more confidence.

Be Very Cautious with Shared Test Data

If I have my druthers, I would not share any test setup data between automated tests except for very fundamental things like the inevitable lookup data and maybe some default user credentials or client information that can safely be considered to be static.  Unlike “real” production coding where “Don’t Repeat Yourself” is crucial for maintainability, in testing code I’m much more concerned with making the test as self-explanatory as possible and completely isolated from one another.  If I share test setup data between tests, there’s frequently going to be a reason why you’ll want to add a little bit more data for a new test which ends up breaking the assertions in the existing tests.  Besides that, using a shared test data set means that you probably have more data than any single test really needs — making the diagnosis of test failures harder.  For all of those reasons and more, I strongly prefer that my teams copy and paste bits of test data sets to keep them isolated by test rather than shared.

Self-Contained Tests are Best

I’ve been interested in the idea of executable specifications for a long time.  In order to make the tests have some value as living documentation about the desired behavior of the system, I think it needs to be as clear as possible what the relationship is between the germane data inputs and the observed behavior of the system.  Plus, automated tests are completely useless if you cannot reliably run them on demand or inside a Continuous Integration build.  In the past I’ve also found that understanding or fixing a test is much harder if I have to constantly ALT-TAB between windows or even just swivel my head between a SQL script or some other external file and the body of the rest of a script.

I’ve found that both the comprehensibility and reliability of an automated test are improved by making each automated test self-contained.  What I mean by that is that every part of the test is expressed in one readable document including the data setup, exercising the system, and verifying the expected outcome.  That way the test can be executed at almost any time because it takes care of its own test data setup rather than being dependent on some sort of external action.  To pull that off you need to be able to very concisely describe the initial state of the system for the test, and shared data sets and/or raw SQL scripts, Xml, Json, or raw calls to your system’s API can easily be noisy.  Which leads me to say that I think you should…

Decouple Test Input from the Implementation Details

I’m a very large believer in the importance of reversibility to the long term success of a system.  With that in mind, we write automated tests to pin down the desired behavior of the system and spend a lot of energy towards designing the structure of our code to more readily accept changes later.  All too frequently, I’ve seen systems become harder to change over time specifically because of tight coupling between the automated tests and the implementation details of a system.  In this case, the automated test suite will actually retard or flat out prevent changes to the system instead of enabling you to more confidently change the system.  Maybe even worse, that tight coupling means that the team will have to eliminate or rewrite the automated tests in order to make a desired change to the system.

With that in mind, I somewhat strongly recommend against expressing your test data input in some form of interpreted format rather than as SQL statements or direct API calls.  My team uses Storyteller2 where all test input is expressed in logical tables or “sentences” that are not tightly coupled to the structure of our persisted documents.  I think that simple textual formats or interpreted Domain Specific Language’s are also viable alternatives.  Despite the extra work to write and maintain a testing DSL, I think there are some big advantages to doing it this way:

  • You’re much more able to make additions to the underlying data storage without having to change the tests.  With an interpreted data approach, you can simply add fake data defaults for new columns or fields
  • You can express only the data that is germane to the functionality that your test is targeting.  More on this in below when I talk about my current project.
  • You can frequently make the test data setup be much more mechanically cheaper per test by simply reducing the amount of data the test author will have to write per test with sensible default values behind the scenes.  I think this topic is probably worth a blog post on its own someday.

This goes far beyond just the test data setup.  I think it’s very advantageous in general to express your functional tests in a way that is independent of implementation details of your application — especially if you’re going to drive a user interface in your testing.

Go in through the Front Door

Very closely related to my concerns about decoupling tests from the implementation details is to avoid using “backdoor” ways to set up test scenarios.  My opinion is that you should set up test scenarios by using the real services your application uses itself to persist data.  While this does risk making the tests run slower by going through extra runtime hoops, I think it has a couple advantages:

  • It’s easier to keep your test automation code synchronized with your production code as you refactor or evolve the production code and data storage
  • It should result in writing less code period
  • It reduces logical duplication between the testing code and the production code — think database schema changes
  • When you write raw data to the underlying storage mechanisms you can very easily get the application into an invalid state that doesn’t happen in production

Case in point, I met with another shop a couple years ago that was struggling with their test automation efforts.  They were writing a Silverlight client with a .Net backend, but using Ruby scripts with ActiveRecord to setup the initial data sets for automated tests.  I know from all of my ex-.Net/now Ruby friends that everything in Ruby is perfect, but in this case, it caused the team a lot of headaches because the tests were very brittle anytime the database schema changed with all the duplication between their C# production code and the Ruby test automation code.

Topics for later…

It’s Saturday and my son and I need to go to the park while the weather’s nice, so I’m cutting this short.  In a later post I’ll try to get more concrete with examples and maybe an additional post that applies all this theory to the project I’m doing at work.

Clean Database per Automated Test Run? Yes, please.

TL;DR We’re able to utilize RavenDb‘s support for embedded databases, some IoC trickery, and our FubuMVC.RavenDb library to make automated testing far simpler by quickly spinning up a brand new database for each individual automated test to have complete control over the state of our system.  Oh, and removing ASP.Net and relational databases out of the equation makes automated functional testing far easier too.

Known inputs and expected outcomes is the mantra of successful automated testing.  This is generally pretty simple with unit tests and more granular integration tests, but sooner or later you’re going to want to exercise your application stack with a persistent database.  You cannot sustain your sanity, much less be successful, while doing automated testing if you cannot easily put your system in a known state before you try to exercise the system.  Stateful elements of your application architecture includes things like queues, the file system, and in memory caches, but for this post I’m only concerned with controlling the state of the application database.

On my last several projects we’ve used some sort of common test setup action to roll back our database to a near empty state before a test adds the exact data to the database that it needs as part of the test execution (the “arrange” part of arrange, act, and assert). You can read more about the ugly stuff I’ve tried in the past at the bottom of this post, but I think we’ve finally arrived at a solution for this problem that I think is succeeding.

Our Solution

First, we’re using RavenDb as a schema-less document database.  We also use StructureMap to compose the services in our system, and RavenDb’s IDocumentStore is built and scoped as a singleton.  In functional testing scenarios, we run our entire application (FubuMVC website hosted with an embedded web server, RavenDb, our backend service) in the same AppDomain as our testing harness, so it’s very simple for us to directly alter the state of the application.  Before each test, we:

  1. Eject and dispose any preexisting instance of IDocumentStore from our main StructureMap container
  2. Replace the default registration of IDocumentStore with a new, completely empty instance of RavenDb’s EmbeddedDocumentStore
  3. Write a little bit of initial state into the new database (a couple pre-canned logins and tenants).
  4. Continue to the rest of the test that will generally start by adding test specific data using our normal repository classes helpfully composed by StructureMap to use the new embedded database

I’m very happy with this solution for a couple different reasons.  First, it’s lightning fast compared with other mechanics I’ve used and describe at the bottom of this post.  Secondly, using a schema-less database means that we don’t have much maintenance work to do to keep this database cleansing mechanism up to date with new additions to our persistent domain model and event store — and I think this is a significant source of friction when testing against relational databases.

Show me some code!

I won’t get into too much detail, but we use StoryTeller2 as our test harness for functional testing.  The “arrange” part of any of our functional tests gets expressed like this taken from one of our tests for our multi-tenancy support:

|If the system state is |

|The users are                                |
|Username |Password |Clients                  |
|User1    |Password1|ClientA, ClientB, ClientC|
|User2    |Password2|ClientA, ClientB         |

In the test expressed above, the only state in the system is exactly what I put into the “arrange” section of the test itself.  The “If the system state is” DSL is implemented by a Fixture class that runs this little bit of code in its setup:

Code Snippet
  1.         public override void SetUp(ITestContext context)
  2.         {
  3.             // There’s a bit more than this going on here, but the service below
  4.             // is part of our FubuPersistence library as a testing hook to
  5.             // wipe the slate clean in a running application
  6.             _reset = Retrieve<ICompleteReset>();
  7.             _reset.ResetState();
  8.         }

As long as my team is using our “If the system state is” fixture to setup the testing state, the application database will be set back to a known state before every single test run — making the automated tests far more reliable than other mechanisms I’ve used in the past.

The ICompleteReset interface originates from the FubuPersistence project that was designed in no small part to make it simpler to completely wipe out the state of your running application.  The ResetState() method looks like this:

Code Snippet
  1.         public void ResetState()
  2.         {
  3.             // Shutdown any type of background process in the application
  4.             // that is stateful or polling before resetting the database
  5.             _serviceResets.Each(x => {
  6.                 trace(“Stopping services with {0}”, x.GetType().Name);
  7.                 x.Stop();
  8.             });
  9.             // The call to replace the database
  10.             trace(“Clearing persisted state”);
  11.             _persistence.ClearPersistedState();
  12.             // Load any basic state that has to exist for all tests.  
  13.             // I’m thinking that this is nothing but a couple default
  14.             // login credentials and maybe some static lookup list
  15.             // data
  16.             trace(“Loading initial data”);
  17.             _initialState.Load();
  18.             // Restart any and all background processes to run against the newly
  19.             // created database
  20.             _serviceResets.Each(x => {
  21.                 trace(“Starting services with {0}”, x.GetType().Name);
  22.                 x.Start();
  23.             });
  24.         }

The method _persistence.ClearPersistedState() called above to rollback all persistence is implemented by our RavenDbPersistedState class.  That method does this:

Code Snippet
  1.         public void ClearPersistedState()
  2.         {
  3.             // _container is the main StructureMap IoC container for the
  4.             // running application.  The line below will
  5.             // eject any existing IDocumentStore from the container
  6.             // and dispose it
  7.             _container.Model.For<IDocumentStore>().Default.EjectObject();
  8.             // RavenDbSettings is another class from FubuPersistence
  9.             // that just controls the very intial creation of a
  10.             // RavenDb IDocumentStore object.  In this case, we’re
  11.             // overriding the normal project configuration from
  12.             // the App.config with instructions to use an
  13.             // EmbeddedDocumentStore running completely
  14.             // in memory.
  15.             _container.Inject(new RavenDbSettings
  16.             {
  17.                 RunInMemory = true
  18.             });
  19.         }

The code above doesn’t necessarily create a new database, but we’ve set ourselves up to use a brand new embedded, in memory database whenever something does request a running database from the StructureMap container.  I’m not going to show this code for the sake of brevity, but I think it’s important to note that the RavenDb database construction will use your normal mechanisms for bootstrapping and configuring an IDocumentStore including all the hundred RavenDb switches and pre-canned indices.

All the code shown here is from the FubuPersistence repository on GitHub.


I’m generally happy with this solution.  So far, it’s quick in execution and we haven’t required much maintenance as we’ve progressed other than more default data.  Hopefully, this solution will be applicable and reusable in future projects out of the box.  I would happily recommend a similar approach to other teams.

But, but, but…

If you did read this carefully, I think you’ll find some things to take exception with:

  1. I’m assuming that you really are able to test functionality with bare minimum data sets to keep the setup work to a minimum and the performance at an acceptable level.  This technique isn’t going to be useful for anything involving performance or load testing — but are you really all that concerned about functionality testing when you do that type of testing?
  2. We’re not running our application in its deployed configuration when we collapse everything down to the same AppDomain.  Why I think this is a good idea, the benefits, and how we do it are a topic for another blog post.  Promise.
  3. RavenDb is schema-less and that turns out to make a huge difference in how long it takes to spin up a new database from scratch compared to relational databases.  Yes, there may be some pre-canned indices that need to get built up when you spin up the new embedded database, but with an empty database I don’t see that as a show stopper.

Other, less successful ways of controlling state I’ve used in the past

Over the years I’ve done automated testing against persisted databases with varying degrees of frustration.  The worst possible thing you can do is to have everybody testing against a shared relational database in the development and testing environments.   You either expect the database to be in a certain state at the start of the test, or you ran a stored procedure to set up the tables you wanted to test against.  I can’t even begin to tell you how unreliable this turns out to be when more than one person is running tests at the same time and fouling up the test runs.  Unfortunately, many shops still try to do this and it’s a significant hurdle to clear when doing automated testing.  Yes, you can try to play tricks with transactions to isolate the test data or try to use randomized data, but I’m not a believer in either approach.

Having an isolated relational database per developer, preferably on their own development box, was a marked improvement, but it adds a great deal of overhead to your project automation.  Realistically, you need a developer to be able to build out the latest database on the fly from the latest source on their own box.  That’s not a deal breaker with modern database migration tools, but it’s still a significant about of work for your team.  The bigger problem to me is how you tear down the existing state in a relational database to put it into a known state before running an automated test.  You’ve got a couple choices:

  1. Destroy the schema completely and rebuild it from scratch.  Don’t laugh, I’ve seen people do this and the tests were as painfully slow as you can probably imagine.  I suppose you could also script the database to rollback to a checkpoint or reattach a backed up copy of the database, but again, I’m never going to recommend that if you have other options.
  2. Execute a set of commands that wipes most if not all of the data in a database before each test.  I’ve done this before, and while it definitely helped create a known state in the system, this strategy performed very poorly and it took quite a bit of work to maintain the “clean database” script as the project progressed.  As a project grows, the runtime of your automated test runs becomes very important to keep the feedback loop useful.  Slow tests hamper the usefulness of automated testing.
  3. Selectively clean out and write data to only the tables affected by a test.  This is probably much faster performance wise, but I think it will require more coding inside of the testing code to do the one off, set up the state code.

* As an aside, I really suggest keeping the project database data definition language scripts and/or migrations in the same source control system as the code so that it’s very easy to trace the version of the code running against which version of the database schema.  The harsh reality in my experience is that the same software engineering rigor we generally use for our application code (source control, unit testing, TDD, continuous integration) is very often missing in regards to the relational database DDL and environment. If you’re a database guy talking to me at a conference, you better have your stuff together on this front before you dare tell me that “my developers can’t be trusted to access my database.”