Subcutaneous Testing against React + .Net Applications

Everything in this post is from a proof of concept project we did for the technique described here. We have not used this tooling on a real project yet, but we have a team starting a project where this might be useful, so I promised a write up for them.

In my previous post I laid out how I see the testing pyramid and test tool and technique choices against my company’s typical web application technology stack. As a reminder, our recommended stack for new development on web applications or API’s looks like this (plus a backing database):

Last week I talked through how we might test the React components and Redux store setup, including the interaction between Redux and React. I also talked about how we could go about testing the .Net backend both at a unit level and through integration tests through to the backing database. Lastly, I said we’d use a modicum of end to end, Selenium-based tests, but said that we should avoid depending on too many of those kinds of tests. That leaves us with a pretty big hole in coverage against the interaction between the Javascript code running in the browser and the .Net code and database interactions running server side.

As a possible solution for this gap, my team at work did a proof of concept for using Storyteller to do subcutaneous testing against the full application stack, but minus the actual React component “view layer.” The general idea is to use Storyteller with its Storyteller.Redux extension to host the ASP.Net Core application so that it can easily drive both test data input through the real data layer of the .Net code and then turn around and use the real system services to verify the state of the application and the backing database as the “assert” stage of the tests. The basic value proposition here is that this mechanism could be far more efficient in terms of developer time against its benefits compared to end to end, Selenium based testing. We’re also theorizing that the feedback cycles would be much tighter through faster tests and definitely more reliable tests than the equivalent tests against the browser every could be.

A couple things to note or argue:

This technique would be most useful if your React components are generally dumb and only communicate with the external world by dispatching well defined actions to the Redux store (I’m assuming that you’re utilizing Redux middleware like redux-thunk or redux-saga here).
Why Storyteller as the driver for this instead of another test runner? I’m obviously biased, but I think Storyteller has the very best story in test automation tooling for declarative set up and verification of system state. Plus, unlike any of the xUnit tools I’m aware of, Storyteller is built specifically with integration testing in mind (think configurable retries, bailing out on runaway tests, better control over the lifecycle of the test harness)
Storyteller has support for declarative assertions against a JSON document that should be handy for making assertions against the Redux store state
We’re theorizing that it’ll be vastly easier to make assertions against the Redux store state than it would to hunt down DOM elements with Selenium
The Storyteller.Redux extension subscribes to any changes to the store state and exposes that to the Storyteller test engine. The big win here is that it gives you a single mechanism to handle the dreaded “everything is asynchronous so how does the test harness know when it’s time to check the expected outcomes” problem that makes Selenium testing so dad gum hard in the real world.
The Storyteller.Redux extension can capture any logged messages to console.log or console.error in the running browser. Add that to any server side logging that you can also pipe into the Storyteller results

The general topology in these tests would look like this:

The test harness would consist of:

A Storyteller project that bootstraps the ASP.Net Core application and runs it within the Storyteller test engine. You can use the Storyteller.AspNetCore extension to make that easier (or you could after I update it for ASP.Net Core 2 and its breaking changes).
The Storyteller.Redux extension for Storyteller provides the Websockets glue to communicate between the launched browser with your Redux store and the running Storyteller engine
The Storyteller ISystem in this project has to have some way to launch a web browser to the page that hosts the Javascript bundle. In the proof of concept project, I just built out a static HTML page that included the bundle Javascript and directly launched the browser to the file location, but you could always use Selenium just to open the brower and navigate to the right Url.
Storyteller Fixtures for setting up system state for tests, sending Redux actions directly to the running Redux store to simulate user interactions, asserting on the expected system state on the backend, and checking the expected Redux store state
An alternative Javascript bundle that includes all the reducer and middleware code in your application, along with some “special sauce” code shown in a section down below that enables Storyteller to send messages and retrieve the current state of the running Redux store via Websockets.

The Special Sauce in the Javascript Bundle

Your custom bundle for the subcutaneous testing would need to have this code in its Webpack entry point file (the full file is on GitHub here):

// "store" is your configured Redux store object. 
// "transformState" is just a hook to convert your Redux
// store state to something that Storyteller could consume
function ReduxHarness(store, transformState){
    if (!transformState){
        transformState = s => s;
    }

    function getQueryVariable(variable)
    {
       var query = window.location.search.substring(1);
       var vars = query.split("&");
       for (var i=0;i<vars.length;i++) {                var pair = vars[i].split("=");                if(pair[0] == variable){return pair[1];}        }        return(false);     }     var revision = 1;     var port = getQueryVariable('StorytellerPort');     var wsAddress = "ws://127.0.0.1:5250";     var socket = new WebSocket(wsAddress); 	socket.onclose = function(){ 		console.log('The socket closed'); 	}; 	socket.onerror = function(evt){ 		console.error(JSON.stringify(evt)); 	}     socket.onmessage = function(evt){         if (evt.data == 'REFRESH'){             window.location.reload();             return;         }         if (evt.data == 'CLOSE'){             window.close();             return;         } 		var message = JSON.parse(evt.data); 		console.log('Got: ' + JSON.stringify(message) + ' with topic ' + message.type); 	 		store.dispatch(message); 	};     store.subscribe(() => {
        var state = store.getState();

        revision = revision + 1;
        var message = {
            type: 'redux-state',
            revision: revision,
            state: transformState(state)
        }

		if (socket.readyState == 1){
            var json = JSON.stringify(message);
            console.log('Sending to engine: ' + json);
			socket.send(json);
		}
    });

    // Capturing any kind of client side logging
    // and piping that into the Storyteller test results
    var originalLog = console.log;
    console.log = function(msg){
        originalLog(msg);

        var message = {
            type: 'console.log',
            text: msg
        }

        var json = JSON.stringify(message);
        socket.send(json);
    }

    // Capture any logged errors in the JS code
    // and pipe that into the Storyteller results
    var originalError = console.error;
    console.error = function(e){
        originalError(e);

        var message = {
            type: 'console.error',
            error: e
        }

        var json = JSON.stringify(message);
        socket.send(json);
    }
}


ReduxHarness(store, s => s.toJS())

The Storyteller System

In my proof of concept, I connected Storyteller to the Redux testing bundle like this (the real code is here):

    public class Program
    {
        public static void Main(string[] args)
        {
            StorytellerAgent.Run(args, new ReduxSampleSystem());
        }
    }

    public class ReduxSampleSystem : SimpleSystem
    {
        protected override void configureCellHandling(CellHandling handling)
        {
            // The code below is just to generate the static file I'm 
            // using to host the reducer + websockets code
            var directory = AppContext.BaseDirectory;
            while (Path.GetFileName(directory) != "ReduxSamples")
            {
                directory = directory.ParentDirectory();
            }

            var jsFile = directory.AppendPath("reduxharness.js");
            Console.WriteLine("Copying the reduxharness.js file to " + directory);
            var source = directory.AppendPath("..", "StorytellerRunner", "reduxharness.js");


            File.Copy(source, jsFile, true);

            var harnessPath = directory.AppendPath("harness.htm");
            if (!File.Exists(harnessPath))
            {
                var doc = new HtmlDocument();

                var href = "file://" + jsFile;

                doc.Head.Add("script").Attr("src", href);

                Console.WriteLine("Writing the harness file to " + harnessPath);
                doc.WriteToFile(harnessPath);
            }

            var url = "file://" + harnessPath;

            // Add the ReduxSagaExtension and point it at your view
            handling.Extensions.Add(new ReduxSagaExtension(url));
        }
    }

The static HTML file generation above isn’t mandatory. You *could* do that by running the real page from the instance of the application hosted within Storyteller as long as the ReduxHarness function shown above is applied to your Redux store at some point.

Storyteller Fixtures that Drive or Check the Redux Store

For driving and checking the Redux store, we created a helper class called ReduxFixture that enables you to do simple actions and value checks in a declarative way as shown below:

    public class CalculatorFixture : ReduxFixture
    {
        // There's a little bit of magic here. This would send a JSON action
        // to the Redux store like {"type": "multiply", "operand": "5"}
        [SendJson("multiply")]
        public void Multiply(int operand)
        {

        }

        // Does an assertion against a single value within the current state
        // of the redux store using a JSONPath expression
        public IGrammar CheckValue()
        {
            return CheckJsonValue("$.number", "The current number should be {number}");
        }

    }

You can of course skip the built in helpers and send JSON actions directly to the running browser or write your own assertions against the current state of the Redux store. There’s also some built in functionality in the ReduxFixture class to track Redux store revisions and to wait for any change to the Redux store before performing assertions.

Thoughts on Agile Database Development

I’m flying out to our main office next week and one of the big things on my agenda is talking over our practices around databases in our software projects. This blog post is just me getting my thoughts and talking points together beforehand. There are two general themes here, how I’d do things in a perfect world and how to make things better within the constraints of the organization and software architecture that have now.

I’ve been a big proponent of Agile development processes and practices going back to the early days of Extreme Programming (before Scrum came along and ruined everything about the way that Scrappy ruined Scooby Doo cartoons for me as a child). If I’m working in an Agile way, I want:

Strong project and testing automation as feedback cycles that run against all changes to the system
Some kind of easy traceability from a built or deployed system to exactly the version of the code and its dependencies , preferably automated through your source control processes
Technologies, tools, and frameworks that provide high reversibility to ease the cost of doing evolutionary software design.

From the get go, relational databases have been one of the biggest challenges in the usage of Agile software practices. They’re laborious to use in automated testing, often expensive in time or money to install or deploy, the change management is a bit harder because you can’t just replace the existing database objects the way we can with other code, and I absolutely think it’s reduces reversibility in your system architecture compared to other options. That being said, there are some practices and processes I think you should adopt so that your Agile development process doesn’t crash and burn when a relational database is involved.

Keep Business Logic out of the Database, Period.

I’m strongly against having any business logic tightly coupled to the underlying database, but not everyone feels the same way. For one reason, stored procedure languages (tSQL, PL/SQL, etc.) are very limited in their constructs and tooling compared to the languages we use in our application code (basically anything else). Mostly though, I avoid coupling business logic to the database because having to test through the database is almost inevitably more expensive both in developer effort and test run times than it would be otherwise.

Some folks will suggest that you might want to change out your database later, but to be honest, the only time I’ve ever done that in real life is when we moved from RavenDb to Marten where it had little impact on the existing structure of the code.

In practice this means that I try to:

Eschew usage of stored procedures. Yes, I think there are still some valid reasons to use sprocs, but I think that they are a “guilty until proven innocent” choice in almost any scenario
Pull business logic away from the database persistence altogether whenever possible. I think I’ll be going back over some of my old designing for testability blog posts from the Codebetter/ALT.Net days to try to explain to our teams that “wrap the database in an interface and mock it” isn’t always the best solution in every case for testability
Favor persistence tools that invert the control between the business logic and the database over tooling like Active Record that creates a tight coupling to the database. What this means is that instead of having business logic code directly reading and writing to the database, something else (Dapper if we can, EF if we absolutely have to) is responsible for loading and persisting application state back and forth between the domain in code and the underlying database. The point is to be able to completely test your business logic in complete isolation from the database.

I would make exceptions for use cases where using the database engine to do set based logic in a stored procedure is a more efficient way to solve the problem, but I haven’t been involved in systems like that for a long time.

Database per Developer/Tester/Environment

My very strong preference and recommendation is to have each developer, tester, and automated testing environment using a completely separate database. The key reason is to isolate each thread of team activity to avoid simultaneous operations or database changes from interfering with each other. Sharing the database makes automated testing much less effective because you often get false negatives or false positives from database activity going on somewhere else at the same time — and yes, this really does happen and I’ve got the scars to prove it.

Additionally, it’s really important for automated testing to be able to tightly control the inputs to a test. While there are some techniques you can use to do this in a shared database (multi-tenancy usage, randomized data), it’s far easier mechanically to just have an isolated database that you can easily control.

Lastly, I really like being able to look through the state of the database after a failed test. That’s certainly possible with a shared database, but it’s much easier in my opinion to look through an isolated database where it’s much more obvious how your code and tests changed the database state.

I should say that I’m concerned here with logical separation between different threads of activity. If you do that with truly separate databases or separate schemas in the same database, it serves the same goal.

“The” Database vs. Application Persistence

There are two basic development paradigms to how we think about databases as part of a software system:

The database is the system and any other code is just a conduit to get data back and forth from the database and its consumers
The database is merely the state persistence subsystem of the application

I strongly prefer and recommend the 2nd way of looking at that, and act accordingly. That’s a admittedly a major shift in thinking from traditional software development or database centric teams.

In practice, this generally means that I very strongly favor the concept of an application database that is only accessed by one application and can be considered to be just part of the application. In this case, I would opt to have all of the database DDL scripts and migrations in the source control repository for the application. This has a lot of benefits for development teams:

It makes it dirt simple to correlate the database schema changes to the rest of the application code because they’re all versioned together
Automated testing is easier within continuous integration builds becomes easier because you know exactly what scripts to apply to the database before running the tests
No need for elaborate cascading builds in your continuous integration setup because it’s just all together

In contrast, a shared database that’s accessed by multiple applications is a lot more potential friction. The version tracking between the two moving parts is harder to understand and it harms your ability to do effective automated testing. Moreover, it’s wretchedly nasty to allow lots of different applications to float on top of the same database in what I call the “pond scum anti-pattern” because it inevitably causes nasty coupling issues that will almost result in regression bugs due to it being so much harder to understand how changes in the database will ripple out to the applications sharing the database. A much, much younger version of myself walked into a meeting and asked our “operational data store” folks to add a column to a single view and got screamed at for 30 minutes straight on why that was going to be impossible and do you know how much work it’s going to be to test everything that uses that view young man?

Assuming that you absolutely have to continue to use a shared database like my shop does, I’d at least try to ameliorate that by:

Make damn sure that all changes to that shared database schema are captured in source control somewhere so that you have a chance at effective change tracking
Having a continuous integration build for the shared database that runs some level of regression tests and then subsequently cascades to all of the applications that touch that database being automatically updated and tested against the latest version of the shared database. I’m expecting some screaming when I recommend that in the office next week;-)
At the least, have some mechanism for standing up a local copy of the up to date database schema with any necessary baseline data on demand for isolated testing
Some way to know when I’m running or testing the dependent applications exactly what version of the database schema repository I’m currently using. Git submodules? Distribute the DB via Nuget? Finally do something useful with Docker, distribute the DB as a versioned Docker image, and brag about that to any developer we meet?

The key here is that I want automated builds constantly running as feedback mechanisms to know when and what database changes potentially break (or fix too!) one of our applications. Because of some bad experiences in the past, I’m hesitant to use cascading builds between separate repositories, but it’s definitely warranted in this case until we can get the big central database split up.

At the end of the day, I still think that the shared database architecture is a huge anti-pattern that most shops should try to avoid and I’d certainly like to see us start moving away from that model more and more.

Document Databases over Relational Databases

I’ve definitely put my money where my mouth is on this (RavenDb early on, and now Marten). In my mind, evolutionary or incremental software design is much easier with document databases for a couple reasons:

Far fewer changes in the application code result in database schema changes
It’s much less work to keep the application and database in sync because the storage just reflects the application model
Less work in the application code to transform the database storage to structures that are more appropriate for the business logic. I.e., relational databases really aren’t great when your domain model is logically hierarchical rather than flat
It’s a lot less work to tear down and set up known test input states in document databases. With a relational database you frequently end up having to deal with extraneous data you don’t really care about just to satisfy relational integrity concerns. Likewise, tearing down relational database state takes more care and thought than it does with a document database.

I would still opt to use a relational database for reporting or if there’s a lot of set based logic in your application. For simpler CRUD applications, I think you’re fine with just about any model and I don’t object to relational databases in those cases either.

It sounds trivial, but it does help tremendously if your relational database tables are configured to use cascading deletes when you’re trying to set a database into a known state for tests.

Team Organization

My strong preference is to have a completely self-contained team that has the ability and authority to make any and all changes to their application database, and that’s most definitely been valid in my experience. Have the database managed and owned separately from the development team is a frequent source of friction and definitely a major hit to your reversibility that forces you to do more potentially wrong, upfront design work. It’s much worse when that separate team does not share your priorities or simply works on a very different release schedule. I think it’s far better for a team to own their database — or at the very worst, have someone who is allowed to touch the database in the team room and team standup’s.

If I had full control over an organization, I would not have a separate database team. Keeping developers and database folks on separate team makes your team have to spend more time on inter-team coordination, takes away from the team’s flexibility in deciding what they can deliver, and almost inevitably causes a bottleneck constraint for projects. Even worse in my mind is when neither the developers nor the database team really understand how their work impacts the other team.

Even if we say that we have a matrix organization, I want the project teams to have primacy over functional teams. To go farther, I’d opt to make functional teams (developers, testers, DBA’s) be virtual teams solely for the purpose of skill acquisition, knowledge sharing, and career growth. My early work experience was being an engineer within large petrochemical project teams, and the project team dominant matrix organization worked a helluva lot better than it did at my next job in enterprise IT that focused more on functional teams.

As an architect now rather than a front line programmer, I constantly worry about not being able to feel the “pain” that my decisions and shared libraries cause developers because that pain is an important feedback mechanism to improve the usability of our shared infrastructure or application architecture. Likewise, I worry that having a separate database team creates a situation where they’re not very aware of the impact of their decisions on developers or vice versa. One of the very important lessons I was taught as an engineer was that it was very important to understand how other engineering disciplines work and what they needed so that we could work better with them.

Now though, I do work in a shop that has historically centralized the control of the database in a centralized database team. To mitigate the problems that naturally arise from this organizational model, we’re trying to have much more bilateral conversations with that team. If we can get away with this, I’d really like to see members of that team spend more time in the project team rooms. I’d also love it if we could steal a page from my original engineering job (Bechtel) and suggest some temporary rotations between the database and developer teams to better appreciate how the other half of that relationship works and what their needs are.

Testing HTTP Handlers with No Web Server in Sight

FubuMVC 2.0 and 3.0 introduced some tooling I called “Scenarios” that allow users to write mostly declarative integration tests against the entire HTTP pipeline in memory without having to host the application in a web server. I promised a coworker that I would write a blog post about using Scenarios for an internal team that wants to start using it much more in their work. A week of procrastination later and here you go:

NOTE: All samples are using FubuMVC 3.0

Why Integration Tests?

From the very beginning, we tried very hard to make unit testing FubuMVC action methods in isolation as easy as possible. I think we largely succeeded in that goal. However, within the context of a handling an HTTP request, FubuMVC like most web frameworks will potentially wrap those action methods with various middleware strategies for cross cutting technical things like authentication, authorization, logging, transaction management, and content negotiation. At some point, to truly exercise an HTTP endpoint you really do need to write an integration test that exercises the entire chain of HTTP handlers for an HTTP request exactly the way it will be configured inside the running application.

Toward that end, I built a class called EndpointDriver in early versions of FubuMVC that you could use to write integration tests against a FubuMVC application hosted with an embedded Katana server. This early tooling just wrapped WebClient with a FubuMVC specific fluent interface for resolving url’s, setting common options like the content-type and accepts headers, and verifying parts of the HTTP response. Below is a sample from our content negotiation support integration tests in FubuMVC 1.3 (“endpoints” is a reference to the EndpointDriver object for the running application):

[Test]
public void force_to_json_with_querystring()
{
    endpoints.Get("conneg/override/Foo?Format=Json", acceptType: "text/html")
        .ContentTypeShouldBe(MimeType.Json)
        .ReadAsJson<OverriddenResponse>()
        .Name.ShouldEqual("Foo");
}

EndpointDriver was fine at first, but our test library started getting slower as we added more and more tests and the fluent interface just never kept up with everything we needed for HTTP testing (plus I think that WebClient is awkward to use).

Using OWIN for HTTP “Scenarios”

As part of my FubuMVC 2.0 effort last year, I knew that I wanted a much better mechanism than the older EndpointDriver for doing integration testing of HTTP endpoints. Specifically, I wanted:

To be able to run HTTP requests and verify the response without having to take the performance hit of a web server
To run a FubuMVC application as it would be configured in production
To completely configure any part of an HTTP request
To be able to declaratively express multiple assertions against the expected response
To utilize FubuMVC’s support for “reverse URL resolution” for more traceable tests
Access to the raw HTTP request and response for anything unusual you would need to do that didn’t have a specific helper

The end result was a mechanism I called “Scenario’s” that exploited FubuMVC’s OWIN support to run HTTP requests in memory using this signature off of the new FubuRuntime object I explained in an earlier blog post:

OwinHttpResponse Scenario(Action<Scenario> configuration)

The Scenario object models both the HTTP request provides a way to specify expectations about the HTTP response for commonly used things like HTTP status codes, header values, and checking for the presence of string values in the HTTP response body. If need be, you also have access to FubuMVC’s abstractions for the entire HTTP request and response (more on this later).

To make this concrete, let’s say that you’re working through a “Hello, World” exercise with FubuMVC with this class and action method that just returns the text “Hello, World” when you issue a GET to the root “/” url of an application:

public class HomeEndpoint
{
    public string Index()
    {
        return "Hello, World";
    }
}

A scenario test for the action above would look like this code below:

using (var runtime = FubuRuntime.Basic())
{
    // Execute the home route and verify
    // the response
    runtime.Scenario(_ =>
    {
        _.Get.Url("/");

        _.StatusCodeShouldBeOk();
        _.ContentShouldBe("Hello, World");
        _.ContentTypeShouldBe("text/plain");
    });
}

In the scenario above, I’m issuing a GET request to the “/” url of the application and specifying that the resulting status code should be HTTP 200, “content-type” response header should be “text/plain”, and the exact contents of the response body should be “Hello, World.” When a Scenario is executed, it will run every single assertion instead of quitting on the first failure and report on every failed expectation in the specification output. This behavior is valuable when you have to author specifications with slower running scenario setup.

Specifying Url’s

FubuMVC has a model for reverse URL lookup from any endpoint method or the input model that we exploited in Scenario’s for traceable tests:

host.Scenario(_ =>
{
    // Specify a GET request to the Url that runs an endpoint method:
    _.Get.Action<InMemoryEndpoint>(e => e.get_memory_hello());

    // Or specify a POST to the Url that would handle an input message:
    _.Post

        // This call serializes the input object to Json using the 
        // application's configured JSON serializer and setting
        // the contents on the Request body
        .Json(new HeaderInput {Key = "Foo", Value1 = "Bar"});

    // Or specify a GET by an input object to get the route parameters
    _.Get.Input(new InMemoryInput { Color = "Red" });
});

I like the reverse url lookup instead of specifying Url’s directly in the scenarios because:

It makes your scenario tests traceable to the actual handling code
It insulates your scenarios from changes to the Url structures later

Checking the Response Body

For the 3.0 work I did a couple months ago, I fleshed out the Scenario support with more mechanisms to analyze the HTTP response body:

host.Scenario(_ =>
{
    // set up a request here

    // Read the response body as text
    var bodyText = _.Response.Body.ReadAsText();

    // Read the response body by deserializing Json
    // into a .net type with the application's
    // configured Json serializer
    var output = _.Response.Body.ReadAsJson<MyResponse>();

    // If you absolutely have to work with Xml...
    var xml = _.Response.Body.ReadAsXml();
});

Some Other Things…

I’ll happily explain the details of this list on request, but here are some other attributes of Scenario’s that FubuMVC supports right now:

You can specify expected values for HTTP response headers
You can assert on status codes and descriptions
There are helpers to send Json or Xml serialized data based on an input object message
There is a mechanism that allows you to disable all security middleware in the application for a single Scenario that has been frequently helpful in testing
You have access to the underlying IoC container for the running application from the Scenario if you need to resolve and use application services
FubuMVC is now StructureMap 4.0-only for its IoC usage, so we’re able to rely on StructureMap’s child container feature to resolve services during a Scenario execution from a unique child container per run. This allows you to replace services in your application with fakes, mocks, and stubs in a way that prevents your fake services from impacting more than one test.

Scenarios in Jasper

If you didn’t see my blog post earlier this year, FubuMVC is getting a complete reboot into a new project called Jasper late this year/early next year. I absolutely plan on bringing the Scenario support forward into Jasper very early, but this time around we’re completely dropping all of FubuMVC’s HTTP abstractions in favor of directly using the OWIN environment dictionary as the single model of HTTP requests and responses. My thought right now is that we’ll invest heavily in extension methods hanging off of IDictionary<string, object> for commonly used operations against that OWIN dictionary.

To some extent, we’re hoping as well that there will be a good ecosystem of OWIN helpers from other people and projects that will be usable from within Jasper.

Other Reading

You can see many more samples of Scenario usage in the integration tests in the FubuMVC codebase.
At some point during the development of FubuMVC 2.0 one of my coworkers pointed me at the way that the Scala Play framework supports integration testing that was part of the inspiration for Play.
Damian Hickey has a similar approach and project for testing OWIN-based application code in memory.

Storyteller 3: Executable Specifications and Living Documentation for .Net

tl;dr: The open source Storyteller 3 is an all new version of an old tool that my shop (and others) use for customer facing acceptance tests, large scale test automation, and “living documentation” generation for code-centric systems.

A week from today I’m giving a talk at .Net Unboxed on Storyteller 3, an open source tool largely built by myself and my colleagues for creating, running, and managing Executable Specifications against .Net projects based on what we feel are the best practices for automated testing based on over a decade of working with automated integration testing. As I’ll try to argue in my talk and subsequent blog posts, the most complete approach in the .Net ecosystem for reliably and economically writing large scale automated integration tests.

My company and a couple other early adopters have been using Storyteller for daily work since June and the feedback has been pleasantly positive so far. Now is as good of time as any to make a public beta release for the express purpose of getting more feedback on the tool so we can continue to improve the tool prior to an official 3.0 release in January.

If you’re interested in kicking the tires on Storyteller, the latest beta as of now is 3.0.0.279-alpha available on Nuget.org. For help getting started, see our tutorial and getting started pages.

Some highlights:

Shockingly for something I work on, the documentation site for Storyteller is fairly comprehensive – but please tell us about anything you think is missing or confusing
Storyteller allows you to express automated tests in language that can be easily read and reviewed by your business partners, testers, and analysts
Storyteller 3 includes a completely new user interface for interactively authoring and executing specifications written as a self-hosted web application with vastly better usability than the previous incarnations (admittedly a very low bar)
Built in and extensible instrumentation and performance diagnostics that are giving our users quite a bit of insight into how their systems are behaving and why complicated tests may be failing.
It’s not sexy at all, but Storyteller’s secondary feature is tooling inspired by readthedocs to efficiently author and maintain “living documentation” for code-centric systems. The Storyteller documentation site and the new StructureMap 3/4 website were both authored with Storyteller (you don’t have to use the same theme that I did for the documentation, I’m just lazy and copied it).

It’s improved a lot since then, but I gave a talk at work in March previewing Storyteller 3 that at least discusses the goals and philosophy behind the tool and Storyteller’s approach to acceptance tests and integration tests.

A Brief History

I had a great time at Codemash this year catching up with old friends. I was pleasantly surprised when I was there to be asked several times about the state of Storyteller, an OSS project others had originally built in 2008 as a replacement for FitNesse as our primary means of expressing and executing automated customer facing acceptance tests. Frankly, I always thought that Storyteller 1 and the incrementally better Storyteller 2 were failures in terms of usability and I was so burnt out on working with it that I had largely given up on it and ignored it for years.

Unfortunately, my shop has a large investment in Storyteller tests and our largest and most active project was suffering with heinously slow and unreliable Storyteller regression test suites that probably caused more harm than good with their support costs. After a big town hall meeting to decide whether to scrap and replace Storyteller with something else, we instead decided to try to improve Storyteller to avoid having to rewrite all of our tests. The result has been an effective rewrite of Storyteller with an all new client. While trying very hard to mostly preserve backward compatibility with the previous version in its public API’s, the .Net engine is also a near rewrite in order to squeeze out as much performance and responsiveness as we could.

Roadmap

The official 3.0 release is going to happen in early January to give us a chance to possibly get more early user feedback and maybe to get some more improvements in place. You can see the currently open issue list on GitHub. The biggest things outstanding on our roadmap are:

Modernize the client technology to React.js v14 and introduce Redux and possibly RxJS as a precursor to doing any big improvements to the user interface and trying to improve the performance of the user interface with big specification suites
A “step through” mode in the interactive specification running so users can step through a specification like you would in a debugger
The big one, allow users to author the actual specification language in the user interface editor with some mechanics to attach that language to actual test support code later

Succeeding with Automated Integration Tests

tl;dr This post is an attempt to codify my thoughts about how to succeed with end to end integration testing. A toned down version of this post is part of the Storyteller 3 documentation.

About six months ago the development teams at my shop came together in kind of a town hall to talk about the current state of our automated integration testing approach. We have a pretty deep investment in test automation and I think we can claim some significant success, but we also have had some problems with test instability, brittleness, performance, and the time it takes to author new tests or debug existing tests that have failed.

Some of the problems have since been ameliorated by tightening up on our practices — but that still left quite a bit of technical friction and that’s where this post comes in. Since that meeting, I’ve been essentially rewriting our old Storyteller testing tool in an attempt to address many of the technical issues in our automated testing. As part of the rollout of the new Storyteller 3 to our ecosystem, I thought it was worth a post on how I think teams can be more successful at automated end to end testing.

Test Stability

I’ve worked in far too many environments and codebases where the automated tests were “flakey” or unreliable:

Teams that do all of their development against a single shared, development database such that the data setup is hard to control
Web applications with a lot of asynchronous behavior are notoriously hard to test and the tests can be flakey with timing issues — even with all the “wait for this condition on the page to be true” discipline in the world.
Distributed architectures can be difficult to test because you may need to control, coordinate, or observe multiple processes at one time.
Deployment issues or technologies that tend to hang on to file locks, tie up ports, or generally lock up resources that your automated tests need to use

To be effective, automated tests have to be reliable and repeatable. Otherwise, you’re either going to spend all your time trying to discern if a test failure is “real” or not, or you’re most likely going to completely ignore your automated tests altogether as you lose faith in them.

I think you have several strategies to try to make your automated, end to end tests more reliable:

Favor white box testing over black box testing (more on this below)
Closely related to #1, replace hard to control infrastructure dependencies with stub services, even in functional testing. I know some folks absolutely hate this idea, but my shop is having a lot of success in using an IoC tool to swap out dependencies on external databases or web services in functional testing that are completely out of our control.
Isolate infrastructure to the test harness. For example, if your system accesses a relational database, use an isolated schema for the testing that is only used by the test harness. Shared databases can be one of the worst impediments to successful test automation. It’s both important to be able to set up known state in your tests and to not get “false” failures because some other process happened to alter the state of your system while the test is running. Did I mention that I think shared databases are a bad idea yet?*
Completely control system state setup in your tests or whatever build automation you have to deploy the system in testing.
Collapse a distributed application down to a single process for automated functional testing rather than try to run the test harness in a different process than the application. In our functional tests, we will run the test harness, an embedded web server, and even an embedded database in the same process. For distributed applications, we have been using additional .Net AppDomain’s to load related services and using some infrastructure in our OSS projects to coordinate the setup, teardown, and even activity in these services during testing time.
As a last resort for a test that is vulnerable to timing issues and race conditions, allow the test runner to retry the test

Failing all of those things, I definitely think that if a test that is so unstable and unreliable that it renders your automated build useless that you just delete that test. I think a reliable test suite with less coverage is more useful to a team than a more expansive test suite that is not reliable.

You Gotta Have Continuous Integration

This section isn’t the kind of pound on the table, Uncle Bob-style of “you must do this or you’re incompetent” kind of rant that causes the Rob Conery’s of the world have conniptions. Large scale automation testing simply does not work if the automated tests are not running regularly as the system continues to evolve.

Automated tests that are never or seldom executed can even be a burden on a development team that still try to keep that test code up to date with architectural changes. Even worse, automated tests that are not constantly executed are not trustworthy because you no longer know if test failures are real or just because the application structure changed.

Assuming that your automated tests are legitimately detecting regression problems, you need to determine what recent change introduced the problem — and it’s far easier to do that if you have a smaller list of possible changes and those changes are still fresh in the developer’s mind. If you are only occasionally running those automated tests, diagnosing failing tests can be a lot like finding the proverbial needle in the haystack.

I strongly prefer to have all of the automated tests running as part of a team’s continuous integration (CI) strategy — even the heavier, slower end to end kind of tests. If the test suite gets too slow (we have a suite that’s currently taking 40+ minutes), I like the “fast tests, slow tests” strategy of keeping one main build that executes the quicker tests (usually just unit tests) to give the team reasonable confidence that things are okay. The slower tests would be executed in a cascading build triggered whenever the main build completes successfully. Ideally, you’d like to have all the automated tests running against every push to source control, but even running the slower tests suites in a nightly or weekly scheduled build is better than nothing.

Make the Tests Easy to Run Locally

I think the section title is self-explanatory, but I’ve gotten this very wrong in the past in my own work. Ideally, you would have a task in your build script (I still prefer Rake, but substitute MSBuild, Fake, Make, Gulp, NAnt, whatever you like) that completely sets up the system under test on your machine and runs whatever the test harness. In a less perfect world a developer has to jump through hoops to find hidden dependencies and take several poorly described steps in order to run the automated tests. I think this issue is much less problematic than it was earlier in my career as we’ve adopted much more project build automation and moved to technologies that are easier to automate in deployment. I haven’t gotten to use container technologies like Docker myself yet, but I sure hope that those tools will make doing the environment setup for automating tests easier in the future.

Whitebox vs. Blackbox Testing

I strongly believe that teams should generally invest much more time and effort into whitebox tests than blackbox tests. Throughout my career, I have found that whitebox tests are frequently more effective in finding problems in your system – especially for functional testing – because they tend to be much more focused in scope and are usually much faster to execute than the corresponding black box test. White box tests can also be much easier to write because there’s simply far less technical stuff (databases, external web services, service buses, you name it) to configure or set up.

I do believe that there is value in having some blackbox tests, but I think that these blackbox tests should be focused on finding problems in technical integrations and infrastructure whereas the whitebox tests should be used to verify the desired functionality.

Especially at the beginning of my career, I frequently worked with software testers and developers who just did not believe that any test was truly useful unless the testing deployment was exactly the same as production. I think that attitude is inefficient. My philosophy is that you write automated tests to find and remove problems from your system, but not to prove that the system is perfect. Adopting that philosophy, favoring white box over black box testing makes much more sense.

Choose the Quickest, Useful Feedback Mechanism

Automating tests against a user interface has to be one of the most difficult and complex undertakings in all of software development. While teams have been successful with test automation using tools like WebDriver, I very strongly recommend that you do not test business logic and rules through your UI if you don’t have to. For that matter, try hard to avoid testing business logic without using the database. What does this mean? For example:

Test complex logic by calling into a service layer instead of the UI. That’s a big issue for one of the teams I work with who really needs to replace a subsystem behind http json services without necessarily changing the user interface that consumes those services. Today the only integration testing involving that subsystem is done completely end to end against the full stack. We have plenty of unit test coverage on the internals of that subsystem, but I’m pretty certain that those unit tests are too coupled to the implementation to be useful as regression or characterization tests when that team tries to improve or replace that subsystem. I’m strongly recommending that that team write a new suite of tests against the gateway facade service to that subsystem for faster feedback than the end to end tests could ever possibly be.
Use Subcutaneous Tests even to test some UI behavior if your application architecture supports that
Make HTTP calls directly against the endpoints in a web application instead of trying to automate the browser if that can be useful to test out the backend.
Consider testing user interface behavior with tightly controlled stub services instead of the real backend

The general rule we encourage in test automation is to use the “quickest feedback cycle that tells you something useful about your code” — and user interface testing can easily be much slower and more brittle than other types of automated testing. Remember too that we’re trying to find problems in our system with our tests instead of trying to prove that the system is perfect.

Setting up State in Automated Tests

I wrote a lot about this topic a couple years ago in My Opinions on Data Setup for Functional Tests, and I don’t have anything new to say since then;) To sum it up:

Use self-contained tests that set up all the state that a test needs.
Be very cautious using shared test data
Use the application services to set up state rather than some kind of “shadow data access” layer
Don’t couple test data setup to implementation details. I.e., I’d really rather not see gobs of SQL statements in my automated test code
Try to make the test data setup declarative and as terse as possible

Test Automation has to be a factor in Architecture

I once had an interview for a company that makes development tools. I knew going in that their product had some serious deficiencies in their automated testing strategy. When I told my interviewer that I was confident that I could help that company make their automated testing support much better, I was told that testing was just a “process issue.” Last I knew, it is still weak for its support for automating tests against systems that use that tool.

Automated testing is not merely a “process issue,” but should be a first class citizen in selecting technologies and shaping your system architecture. I feel like my shop is far above average for our test automation and that is in no small part because we have purposely architected our applications in such a way to make functional, automated testing easier. The work I described in sections above to collapse a distributed system into one process for easier testing, using a compositional architecture effectively composed by an IoC tool, and isolatating business rules from the database in our systems has been vital to what success we have had with automated testing. In other places we have purposely added logging infrastructure or hooks in our application code for no other reason than to make it easier for test automation infrastructure to observe or control the application.

Other Stuff for later…

I don’t think that in 10 years of blogging I’ve ever finished a blog series, but I might get around to blogging about how we coordinate multiple services in distributed messaging architectures during automated tests or how we’re integrating much more diagnostics in our automated functional tests to spot and prevent performance problems from creeping into application.

* There are some strategies to use in testing if you absolutely have no other choice in using a shared database, but I’m not a fan. The one approach that I want to pursue in the future is utilizing multi-tenancy data access designs to create a fake tenant on each test run to keep the data isolated for the test even if the damn database is shared. I’d still rather smack the DBA types around until they get their project automation act together so we could all get isolated databases.

Integration Testing with FubuMVC and OWIN

tl;dr: Having an OWIN host to run FubuMVC applications in process made it much easier to write integration tests and I think OWIN will end up being a very positive thing for the .Net development community.

FubuMVC is still dead-ish as an active OSS project, but the things we did might still be useful to other folks so I’m continuing my series of FubuMVC lessons learned retrospectives — and besides, I’ve needed to walk a couple other developers through writing integration tests against FubuMVC applications in the past week so writing this post is clearly worth my time right now.

One of the primary design goals of FubuMVC was testability of application code, and while I think we succeeded in terms of simpler, cleaner unit tests from the very beginning (especially compared to other web frameworks in .Net), there are so many things that can only be usefully tested and verified by integration tests that execute the entire HTTP stack.

Quite admittedly, FubuMVC was initially very weak in this area — partially because the framework itself only ran on top of ASP.Net hosting. While we had good unit test coverage from day one, our integration testing story had to evolve as we went in roughly these steps:

Haphazardly build sample usages of features in an ASP.Net application that had to run hosted in either IIS or IISExpress that had to be executed manually. Obviously not a great solution for regression testing or for setting up test scenarios either.
A not completely successful attempt to run FubuMVC on the early “Delegate of Doom” version of the OWIN specification and the old Kayak HTTP server. It worked just well enough to get my hopes up but didn’t really fly (there were some scenarios where it would just hang). If you follow me on Twitter and seen me blast OWIN as a “mystery meat” API, it’s largely because of lingering negativity from our first attempt.
Running FubuMVC on top of Web API’s Self Host libraries. This was a nice win as it enabled us to embed FubuMVC applications in process for the first time, and our integration test coverage improved quickly. The Self Host option was noticeably slow and never worked on Mono* for us.
Just in time for the 1.0 release, we finally made a successful attempt at OWIN hosting with the less insane OWIN 1.0 specification, and we’ve ran most of our integration tests and acceptance tests on Katana ever since. Moving to Katana and OWIN was much faster than the Web API Self Host infrastructure and at least in early versions, worked on Mono (the Katana team has periodically broken Mono support in later versions).
For the forthcoming 2.0 release I built a new “InMemoryHost” model that can execute a single HTTP request at a time using our OWIN support without needing any kind of HTTP server.

I cannot overstate how valuable it has been to have an embedded hosting model has been for automated testing. Debugging test failures, using the application services exactly as the real application is configured, and setting up test data in databases or injecting in fake services is much easier with the embedded hosting models.

Here’s a real problem that I hit early this year. FubuMVC’s content negotiation (conneg) logic for selecting readers and writers was originally based only on the HTTP accepts and content-type headers, which is great and all except for how many ill behaved clients there are out there that don’t play nice with the HTTP specification. I finally went into our conneg support earlier this year and added support for overriding the accepts header, with an in the box default that looked for a query string on the request like “?format=json” or “format=xml.” While there are some unit tests for the internals of this feature, this is exactly the kind of feature that really has to be tested through an entire HTTP request and response to verify correctness.

If you’re having any issues with the formatting of the code samples, you can find the real code on GitHub at the bottom of this page.

I started by building a simple GET action that returned a simple payload:

    public class OverriddenConnegEndpoint
    {
        public OverriddenResponse get_conneg_override_Name(OverriddenResponse response)
        {
            return response;
        }
    }

    public class OverriddenResponse
    {
        public string Name { get; set; }
    }

Out of the box, the new “conneg/override/{Name}” route up above would respond with either a json or xml serialization of the output model based on the value of the accepts header, with “application/json” being the default representation in the case of wild cards for the accepts header. In the functionality, content negotiation needs to also look out for the new query string rules.

The next step is to define our FubuMVC application with a simple IApplicationSource class in the same assembly:

    public class SampleApplication : IApplicationSource
    {
        public FubuApplication BuildApplication()
        {
            return FubuApplication.DefaultPolicies().StructureMap();
        }
    }

The role of the IApplicationSource class in a FubuMVC application is to bootstrap the IoC container for the application and define any custom policies or configuration of a FubuMVC application. By using an IApplicationSource class, you’re establishing a reusable configuration that can be quickly applied to automated tests to make your tests as close to the production deployment of your application as possible for more realistic testing. This is crucial for FubuMVC subsystems like validation or authorization that mix in behavior to routes and endpoints off of conventions determined by type scanning or configured policies.

Using Katana and EndpointDriver with FubuMVC 1.0+

First up, let’s write a simple NUnit test for overriding the conneg behavior with the new “?format=json” trick, but with Katana and the FubuMVC 1.0 era EndpointDriver object:

        [Test]
        public void with_Katana_and_EndpointDriver()
        {
            using (var server = EmbeddedFubuMvcServer
                .For<SampleApplication>())
            {
                server.Endpoints.Get("conneg/override/Foo?format=json", acceptType: "text/html")
                    .ContentTypeShouldBe(MimeType.Json)
                    .ReadAsJson<OverriddenResponse>()
                    .Name.ShouldEqual("Foo");
            }
        }

Behind the scenes, the EmbeddedFubuMvcServer class spins up the application and a new instance of a Katana server to host it. The “Endpoints” object exposes a fluent interface to define and execute an HTTP request that will be executed with .Net’s built in WebClient object. The EndpointDriver fluent interface was originally built specifically to test the conneg support in the FubuMVC codebase itself, but is usable in a more general way for testing your own application code written on top of FubuMVC.

Using the InMemoryHost and Scenarios in FubuMVC 2.0

EndpointDriver was somewhat limited in its coverage of common HTTP usage, so I hoped to completely replace it in FubuMVC 2.0 with a new “scenario” model somewhat based on the Play Framework’s Play Specification tooling in Scala. I knew that I also wanted a purely in memory hosting model for integration tests to avoid the extra time it takes to spin up an instance of Katana and sidestep potential port contention issues.

The result is the same test as above, but written in the new “Scenario” style concept:

        [Test]
        public void with_in_memory_host()
        {
            // The 'Scenario' testing API was not completed,
            // so I never got around to creating more convenience
            // methods for common things like deserializing JSON
            // into .Net objects from the response body
            using (var host = InMemoryHost.For<SampleApplication>())
            {
                host.Scenario(_ => {
                    _.Get.Url("conneg/override/Foo?format=json");
                    _.Request.Accepts("text/html");

                    _.ContentTypeShouldBe(MimeType.Json);
                    _.Response.Body.ReadAsText()
                        .ShouldEqual("{\"Name\":\"Foo\"}");
                });
            }
        }

The newer Scenario concept was an attempt to make HTTP centric testing be more declarative. C# is not as expressive as Scala is, but I was still trying to make the test expression as clean and readable as possible and the syntax above probably would have evolved after more usage. The newer Scenario concept also has complete access to FubuMVC 2.0’s raw HTTP abstractions so that you’re not limited at all in what kinds of things you can express in the integration tests.

If you want to see more examples of both EmbeddedFubuMvcServer/EndpointServer and InMemoryHost/Scenario in action, please see the FubuMVC.IntegrationTesting project on GitHub.

Last Thoughts

If you choose to use either of these tools in your own FubuMVC application testing, I’d highly recommend doing something at the testing assembly level to cache the InMemoryHost or EmbeddedFubuMvcServer so that they can be used across test fixtures to avoid the nontrivial cost of repeatedly initializing your application.

While I’m focusing on HTTP centric testing here, using either tool above also has the advantage of building your application’s IoC container out exactly the way it should be in production for more accurate integration testing of underlying application services.

If I had to do it all over again…

We would have had an embedded hosting model from the very beginning, even if it had been on a fake, “only one HTTP request at a time” model. Moreover, if we were to ever reboot a new FubuMVC like web framework in the future KVM, I would vote for wrapping any new framework completely around the OWIN signature as our one and only model of an HTTP request. In any theoretical future effort, I’d invest time from the very beginning in something like FubuMVC 2.0’s InMemoryHost model early to make integration and acceptance testing easier and faster.

With the recent release of StructureMap 3, I’d also opt for a new child container model such that you can fake out application services on a test by test basis and rollback to the previous container state without having to re-initialize the entire application each time for faster testing.

* Mono support was a massive time sink for the FubuMVC project and never really paid off. Maybe having Xamarin involved much earlier in the new KVM .Net runtime and the emphasis on PCL supportwill make that situation much better in the future.

Adventures in Custom Testing Infrastructure

tl;dr: Sometimes the overhead of writing custom testing infrastructure can lead to easier development

Quick Feedback Cycles are Key

It’d be nice if someday I could write all my code perfectly in both structure and function the first time through, but for now I have to rely on feedback mechanisms to tell me when the code isn’t working correctly. That being said, I feel the most productive when I have the tightest feedback cycle between making a change in code and knowing how it’s actually working — and by “quick” I mean both the time it takes for me to setup the feedback cycle and how long the feedback cycle itself takes.

While I definitely like using quick twitch feedback tools like REPL’s or auto-reloading/refreshing web tools like our own fubu run or Mimosa.js’s “watch” command, my primary feedback mechanism for code centric tasks is usually automated tests. That being said, it helps when the tests are mechanically easy to write and run quickly enough that you can get into a nice “red/green/refactor” cycle. For whatever reasons, I’ve hit several problem domains in the last couple years where it was laborious in my time to set up the preconditions and testing inputs and also to measure and assert on the expected outcomes.

Maybe Invest in Some Custom Testing Infrastructure?

In some cases I knew right away that testing a feature was going to be a problem, so I started by asking myself “how do I wish I could express the test setup and assertions.” If it seems feasible, I’ll write custom ObjectMother if that’s possible or Test Data Builder‘s for the data setup in more complex cases. I’ve occasionally resorted to building little interpreters that read text and create data structures or files (I do this more often for hierarchical data than anything else I think) or perform assertions on the final state.

You can see an example of this in my old Storyteller2 codebase. Storyteller is a tool for automated acceptance tests and includes a tree view pane in the UI with the inevitable hierarchy of tests organized by suites in an n-deep hierarchy like:

Top Level Suite
  - Suite 1
    -Suite 2
    -Suite 3
      - Test 1
      - Test 2

In the course of building the Storyteller client, I needed to write a series of tests on the tree view state that had to start with a known hierarchy of suites and test files as inputs. After performing actions like filtering or receiving state updates within the UI, I needed to assert on the expected display in this test explorer pane (which tests and suites were visible and were they marked as running, failed, successful, or unknown).

First, to deal with the setup of the hierarchical data I created a little custom class that read flat text data and turned that into the desired hierarchy:

            hierarchy =
                StoryTeller.Testing.DataMother.BuildHierarchy(
                    @"
t1,Success
t2,Failure
t3,Success
s1/t4,Success
s1/t5,Success
s1/t6,Failure
s1/s2/t7,Success
s1/s2/t7,Success
");

Then in the “assertion” part of the test I created a custom specification class that could again read its expectations expressed as flat text and assert that the resulting tree view exactly matched the specified state:

        [Test]
        public void the_child_nodes_are_constructed_with_the_empty_suite()
        {
            var spec =
                new TreeNodeSpecification(
                    @"
suite:Empty
suite:s1
test:s1/t4
test:s1/t5
test:s1/t6
test:t1
test:t2
test:t3
");

            spec.AssertMatch(view.TestNode);
        }

As I recall, writing the simple text parsing classes just to make the expression of the automated tests made it pretty easy to add new behavior quickly. In this case, the time investment upfront for the custom testing infrastructure paid off.

FubuMVC’s View Engine Support

A couple months ago I finally got to carve off some time to finally go overhaul the view engine support code in FubuMVC. My main goals were to cut the unnecessarily complex internal code down to something more manageable as a precursor to optimizing both runtime performance and FubuMVC’s time to initialize an application. Since I was about to start monkeying around quite a bit with the internals of code that many of our users depend on, it’s a good thing that we had an existing suite of integration tests that acted as acceptance tests (think layouts, partials, HTML helpers, and our conventional attachment of views to routes) so that in theory I could safely make the restructuring changes without breaking existing behavior.

Going in though, I knew that there was some significant drawbacks to using our existing mechanism for testing the view engine support and I wasn’t looking forward to the inevitable test failures or formulating new integration tests.

Problems with the Existing Test Suite

In order to write end to end tests against the view engine support we had been effectively writing little mini FubuMVC applications inside our integration test libraries. Quite naturally, that often meant adding several view files and folders to simulate all the different permutations for layout rendering, using partials, sharing views from external Bottles (a superset of Area’s for you ASP.Net MVC folks), and view profiles (mobile vs. desktop for example). In the test fixtures we would spin up a FubuMVC application with Katana, run HTTP requests, and make assertions against the content that should or should not be present in the HTTP response body.

It wasn’t terrible, but it came with a serious drawbacks:

It wasn’t complete and I’d need to add additional tests
It was expensive in mechanical effort to create those little mini FubuMVC applications that had to be spread over so many different files and even folders
Understanding the tests when something went wrong could be difficult because the expression of the test was effectively split over so many files

The New Approach

Before going too far into the code changes against the view engine support, I built a new test harness that would allow me to express in one testing class file:

What all the views and layouts were in the entire system including the content of the views
What the views were in external Bottles loaded into the application
If necessary, configure a complete FubuMVC application if the defaults weren’t sufficient for the test
Declare what content should and should not be rendered when certain routes were executed

The end result was a base class I called ViewIntegrationContext. Mechanically, I made TestFixture classes deriving from this abstract class. In the constructor function of the test fixture classes I would specify the location, content, and view model of any number of Spark or Razor views. When the test fixture class was first executed, it would:

Create a brand new folder using a guid as the name to host the new “application” to avoid collisions with existing test runs (while the new test harness does try to clean up after itself, I’ve learned not to be very trusting of the file system during automated tests)
Write out the Spark and Razor files based on the data specified in the constructor function to the new application folder
Optionally load content Bottles and FubuMVC configurations inside the test harness (ignore that for now if you would, but it was a huge win for me)
Load a new FubuMVC application in memory with the root directory pointing to our new folder for just this test

For each test, the ViewIntegrationContext object uses FubuMVC 2.0’s brand new in memory test harness (somewhat inspired by PlaySpecification from Scala) to execute a “Scenario” where I could declaratively specify what url to render and assert what content should or should not be present in the HTML output.

To make this concrete, the very simplest test to check that FubuMVC really can render a Spark view looks like this:

    [TestFixture]
    public class Simple_rendering : ViewIntegrationContext
    {
        public Simple_rendering()
        {
            SparkView<BreatheViewModel>("Breathe")
                .Write(@"
<p>This is real output</p>
<h2>${Model.Text}</h2>");
        }

        [Test]
        public void can_render()
        {
            Scenario.Get.Input(new AirInputModel{TakeABreath = true});
            Scenario.ContentShouldContain("<h2>Breathe in!</h2>");
        }
    }

    public class AirEndpoint
    {
        public AirViewModel TakeABreath(AirRequest request)
        {
            return new AirViewModel { Text = "Take a {0} breath?".ToFormat(request.Type) };
        }

        public BreatheViewModel get_breathe_TakeABreath(AirInputModel model)
        {
            var result = model.TakeABreath
                ? new BreatheViewModel { Text = "Breathe in!" }
                : new BreatheViewModel { Text = "Exhale!" };

            return result;
        }
    }

    public class AirRequest
    {
        public AirRequest()
        {
            Type = "deep";
        }

        public string Type { get; set; }
    }

    public class AirInputModel
    {
        public bool TakeABreath { get; set; }
    }

    public class AirViewModel
    {
        public string Text { get; set; }
    }

    public class BreatheViewModel : AirViewModel
    {

    }

So did this payoff? Heck yeah it did, especially for scenarios where I needed to build out multiple views and layouts. The biggest win for me was that the tests were completely self-contained instead of spread out over so many files and folders. Even better yet, the new in memory Scenario support in FubuMVC made the actual tests very declarative with decently descriptive failure messages.

It’s Not All Rainbows and Unicorns

I cherry picked some examples that I felt went well, but there have been some other times when I’ve gone down a rabbit hole of building custom testing infrastructure only to see it be a giant boondoggle. There’s a definite bit of overhead to writing this kind of tooling and you always have to consider whether you’ll save time in the whole compared to writing more crude or repetitive testing code. While I tend to be aggressive about building custom test harnesses, you might accurately call it a speculative exercise and hold off until you feel some pain in your testing.

Moreover, any kind of custom test harness where you decouple the expression of the test (inputs, actions, and assertions) from the actual code that’s being exercised obfuscates your traceability back to the actual code. I’ve seen plenty of cases where the “goodness” of making the expression of the test prettier and more declarative was more than offset by how hard it was to debug test failures because of the extra mental overhead of connecting the meaning of the test to the code that should be implementing it. It’s for that reason that I’ve never been a big fan of most Behavior Driven Development tools for testing that isn’t customer facing.

A Simple Example of a Table Driven Executable Specification

My shop is starting to go down the path of executable specifications (using Storyteller2 as the tooling, but that’s not what this post is about). As an engineering practice, executable specifications* involves specifying the expected behavior of a user story with concrete examples of exactly how the system should behave before coding. Those examples will hopefully become automated tests that live on as regression tests.

What are we hoping to achieve?

Remove ambiguity from the requirements with concrete examples. Ambiguity and misunderstandings from prose based requirements and analysis has consistently been a huge time waste and source of errors throughout my career.
Faster feedback in development. It’s awfully nice to just run the executable specs in a local branch before pushing anything to the testers
Find flaws in domain logic or screen behavior faster, and this has been the biggest gain for us so far
Creating living documentation about the expected behavior of the system by making the specifications human readable
Building up a suite of regression tests to make later development in the system more efficient and safer

Quick Example

While executable specifications are certainly a very challenging practice from the technical side of things, in the past week or so I’m aware of 3-4 scenarios where the act of writing the specification tests has flushed out problems with our domain logic or screen behavior a lot faster than we could have done otherwise.

Part of our application logic involves fuzzy matching against people in our system against some, ahem, not quite trustworthy data from external partners. Our domain expert explained the matching logic that he wanted was to match a person’s social security number, birth date, first name, and last name — but the name matching should be case insensitive and it’s valid to match on the initial of the first name. Since this logic can be expressed as a set number of inputs and the one output with a great number of permutations, I chose to express this specification as a table with Storyteller (conceptually identical to the old ColumnFixture in FitNesse). The final version of the spec is shown below (click the image to get a more readable version):

The image above is our final, approved version of this functionality that now lives as both documentation and a regression test. Before that though, I wrote the spec and got our domain expert to look at it, and wouldn’t you know it, I had misunderstood a couple assumptions and he gave me very concrete feedback about exactly what the spec should have been.

To make this just a little bit more concrete, our Storyteller test harness connects the table inputs to the system under test with this little bit of adapter code:

The code behind the executable spec

    public class PersonFixture : Fixture
    {
        public PersonFixture()
        {
            Title = “Person Matching Logic”;
        }
        [ExposeAsTable(“Person Matching Examples”)]
        [return:AliasAs(“Matches”)]
        public bool PersonMatches(
            string Description,
            [Default(“555-55-5555”)]SocialSecurityNumber SSN1,
            [Default(“Hank”)]string FirstName1,
            [Default(“Aaron”)]string LastName1,
            [Default(“01/01/1974”)]DateCandidate BirthDate1,
                                  [Default(“555-55-5555”)]SocialSecurityNumber SSN2,
            [Default(“Hank”)]string FirstName2,
            [Default(“Aaron”)]string LastName2,
            [Default(“01/01/1974”)]DateCandidate BirthDate2)
        {
            var person1 = new Person
            {
                SSN = SSN1,
                FirstName = FirstName1,
                LastName = LastName1,
                BirthDate = BirthDate1
            };
            var person2 = new Person
            {
                SSN = SSN2,
                FirstName = FirstName2,
                LastName = LastName2,
                BirthDate = BirthDate2
            };
            return person1.Equals(person2);
        }
    }

* Jeremy, is this really just Behavior Driven Development (BDD)? Or the older idea of Acceptance Test Driven Development (ATDD)? This is some folks’ definition of BDD, but BDD is so overloaded and means so many different things to different people that I hate using the term. ATDD never took off, and “executable specifications” just sounds cooler to me, so that’s what I’m going to call it.

My Opinions on Data Setup for Functional Tests

I read Jim Holmes’s post Data Driven Testing: What’s a Good Dataset Size? with some interest because it’s very relevant to my work. I’ve been heavily involved in test automation efforts over the past decade, and I’ve developed a few opinions about how best to handle test data input for functional tests (as opposed to load/scalability/performance tests). First though, here’s a couple concerns I have:

Automated tests need to be reliable. Tests that require external, environmental tests can be brittle in unexpected ways. I hate that.
Your tests will fail from time to time with regression bugs. It’s important that your tests are expressed in a way that makes it easy to understand the cause and effect relationship between the “known inputs” and the “expected outcomes.” I can’t tell you how many times I’ve struggled to fix a failing test because I couldn’t even understand what exactly it was supposed to be testing.
My experience says loudly that smaller, more focused automated tests are far easier to diagnose and fix when they fail than very large, multi-step automated tests. Moreover, large tests that drive user interfaces are much more likely to be unstable and unreliable. Regardless of platform, problem domain, and team, I know that I’m far more productive when I’m working with quicker feedback cycles. If any of my colleagues are reading this, now you know why I’m so adamant about having smaller, focused tests rather than large scripted scenarios.
Automated tests should enable your team to change or evolve the internals of your system with more confidence.

Be Very Cautious with Shared Test Data

If I have my druthers, I would not share any test setup data between automated tests except for very fundamental things like the inevitable lookup data and maybe some default user credentials or client information that can safely be considered to be static. Unlike “real” production coding where “Don’t Repeat Yourself” is crucial for maintainability, in testing code I’m much more concerned with making the test as self-explanatory as possible and completely isolated from one another. If I share test setup data between tests, there’s frequently going to be a reason why you’ll want to add a little bit more data for a new test which ends up breaking the assertions in the existing tests. Besides that, using a shared test data set means that you probably have more data than any single test really needs — making the diagnosis of test failures harder. For all of those reasons and more, I strongly prefer that my teams copy and paste bits of test data sets to keep them isolated by test rather than shared.

Self-Contained Tests are Best

I’ve been interested in the idea of executable specifications for a long time. In order to make the tests have some value as living documentation about the desired behavior of the system, I think it needs to be as clear as possible what the relationship is between the germane data inputs and the observed behavior of the system. Plus, automated tests are completely useless if you cannot reliably run them on demand or inside a Continuous Integration build. In the past I’ve also found that understanding or fixing a test is much harder if I have to constantly ALT-TAB between windows or even just swivel my head between a SQL script or some other external file and the body of the rest of a script.

I’ve found that both the comprehensibility and reliability of an automated test are improved by making each automated test self-contained. What I mean by that is that every part of the test is expressed in one readable document including the data setup, exercising the system, and verifying the expected outcome. That way the test can be executed at almost any time because it takes care of its own test data setup rather than being dependent on some sort of external action. To pull that off you need to be able to very concisely describe the initial state of the system for the test, and shared data sets and/or raw SQL scripts, Xml, Json, or raw calls to your system’s API can easily be noisy. Which leads me to say that I think you should…

Decouple Test Input from the Implementation Details

I’m a very large believer in the importance of reversibility to the long term success of a system. With that in mind, we write automated tests to pin down the desired behavior of the system and spend a lot of energy towards designing the structure of our code to more readily accept changes later. All too frequently, I’ve seen systems become harder to change over time specifically because of tight coupling between the automated tests and the implementation details of a system. In this case, the automated test suite will actually retard or flat out prevent changes to the system instead of enabling you to more confidently change the system. Maybe even worse, that tight coupling means that the team will have to eliminate or rewrite the automated tests in order to make a desired change to the system.

With that in mind, I somewhat strongly recommend against expressing your test data input in some form of interpreted format rather than as SQL statements or direct API calls. My team uses Storyteller2 where all test input is expressed in logical tables or “sentences” that are not tightly coupled to the structure of our persisted documents. I think that simple textual formats or interpreted Domain Specific Language’s are also viable alternatives. Despite the extra work to write and maintain a testing DSL, I think there are some big advantages to doing it this way:

You’re much more able to make additions to the underlying data storage without having to change the tests. With an interpreted data approach, you can simply add fake data defaults for new columns or fields
You can express only the data that is germane to the functionality that your test is targeting. More on this in below when I talk about my current project.
You can frequently make the test data setup be much more mechanically cheaper per test by simply reducing the amount of data the test author will have to write per test with sensible default values behind the scenes. I think this topic is probably worth a blog post on its own someday.

This goes far beyond just the test data setup. I think it’s very advantageous in general to express your functional tests in a way that is independent of implementation details of your application — especially if you’re going to drive a user interface in your testing.

Go in through the Front Door

Very closely related to my concerns about decoupling tests from the implementation details is to avoid using “backdoor” ways to set up test scenarios. My opinion is that you should set up test scenarios by using the real services your application uses itself to persist data. While this does risk making the tests run slower by going through extra runtime hoops, I think it has a couple advantages:

It’s easier to keep your test automation code synchronized with your production code as you refactor or evolve the production code and data storage
It should result in writing less code period
It reduces logical duplication between the testing code and the production code — think database schema changes
When you write raw data to the underlying storage mechanisms you can very easily get the application into an invalid state that doesn’t happen in production

Case in point, I met with another shop a couple years ago that was struggling with their test automation efforts. They were writing a Silverlight client with a .Net backend, but using Ruby scripts with ActiveRecord to setup the initial data sets for automated tests. I know from all of my ex-.Net/now Ruby friends that everything in Ruby is perfect, but in this case, it caused the team a lot of headaches because the tests were very brittle anytime the database schema changed with all the duplication between their C# production code and the Ruby test automation code.

Topics for later…

It’s Saturday and my son and I need to go to the park while the weather’s nice, so I’m cutting this short. In a later post I’ll try to get more concrete with examples and maybe an additional post that applies all this theory to the project I’m doing at work.

Clean Database per Automated Test Run? Yes, please.

TL;DR We’re able to utilize RavenDb‘s support for embedded databases, some IoC trickery, and our FubuMVC.RavenDb library to make automated testing far simpler by quickly spinning up a brand new database for each individual automated test to have complete control over the state of our system. Oh, and removing ASP.Net and relational databases out of the equation makes automated functional testing far easier too.

Known inputs and expected outcomes is the mantra of successful automated testing. This is generally pretty simple with unit tests and more granular integration tests, but sooner or later you’re going to want to exercise your application stack with a persistent database. You cannot sustain your sanity, much less be successful, while doing automated testing if you cannot easily put your system in a known state before you try to exercise the system. Stateful elements of your application architecture includes things like queues, the file system, and in memory caches, but for this post I’m only concerned with controlling the state of the application database.

On my last several projects we’ve used some sort of common test setup action to roll back our database to a near empty state before a test adds the exact data to the database that it needs as part of the test execution (the “arrange” part of arrange, act, and assert). You can read more about the ugly stuff I’ve tried in the past at the bottom of this post, but I think we’ve finally arrived at a solution for this problem that I think is succeeding.

Our Solution

First, we’re using RavenDb as a schema-less document database. We also use StructureMap to compose the services in our system, and RavenDb’s IDocumentStore is built and scoped as a singleton. In functional testing scenarios, we run our entire application (FubuMVC website hosted with an embedded web server, RavenDb, our backend service) in the same AppDomain as our testing harness, so it’s very simple for us to directly alter the state of the application. Before each test, we:

Eject and dispose any preexisting instance of IDocumentStore from our main StructureMap container
Replace the default registration of IDocumentStore with a new, completely empty instance of RavenDb’s EmbeddedDocumentStore
Write a little bit of initial state into the new database (a couple pre-canned logins and tenants).
Continue to the rest of the test that will generally start by adding test specific data using our normal repository classes helpfully composed by StructureMap to use the new embedded database

I’m very happy with this solution for a couple different reasons. First, it’s lightning fast compared with other mechanics I’ve used and describe at the bottom of this post. Secondly, using a schema-less database means that we don’t have much maintenance work to do to keep this database cleansing mechanism up to date with new additions to our persistent domain model and event store — and I think this is a significant source of friction when testing against relational databases.

Show me some code!

I won’t get into too much detail, but we use StoryTeller2 as our test harness for functional testing. The “arrange” part of any of our functional tests gets expressed like this taken from one of our tests for our multi-tenancy support:

----------------------------------------------
|If the system state is |

|The users are                                |
|Username |Password |Clients                  |
|User1    |Password1|ClientA, ClientB, ClientC|
|User2    |Password2|ClientA, ClientB         |
----------------------------------------------

In the test expressed above, the only state in the system is exactly what I put into the “arrange” section of the test itself. The “If the system state is” DSL is implemented by a Fixture class that runs this little bit of code in its setup:

Code Snippet

        public override void SetUp(ITestContext context)
        {
            // There’s a bit more than this going on here, but the service below
            // is part of our FubuPersistence library as a testing hook to
            // wipe the slate clean in a running application
            _reset = Retrieve<ICompleteReset>();
            _reset.ResetState();
        }

As long as my team is using our “If the system state is” fixture to setup the testing state, the application database will be set back to a known state before every single test run — making the automated tests far more reliable than other mechanisms I’ve used in the past.

The ICompleteReset interface originates from the FubuPersistence project that was designed in no small part to make it simpler to completely wipe out the state of your running application. The ResetState() method looks like this:

Code Snippet

        public void ResetState()
        {
            // Shutdown any type of background process in the application 
            // that is stateful or polling before resetting the database
            _serviceResets.Each(x => {
                trace(“Stopping services with {0}”, x.GetType().Name);
                x.Stop();
            });
            // The call to replace the database
            trace(“Clearing persisted state”);
            _persistence.ClearPersistedState();
            // Load any basic state that has to exist for all tests.  
            // I’m thinking that this is nothing but a couple default 
            // login credentials and maybe some static lookup list
            // data
            trace(“Loading initial data”);
            _initialState.Load();
            // Restart any and all background processes to run against the newly
            // created database
            _serviceResets.Each(x => {
                trace(“Starting services with {0}”, x.GetType().Name);
                x.Start();
            });
        }

The method _persistence.ClearPersistedState() called above to rollback all persistence is implemented by our RavenDbPersistedState class. That method does this:

Code Snippet

        public void ClearPersistedState()
        {
            // _container is the main StructureMap IoC container for the
            // running application.  The line below will
            // eject any existing IDocumentStore from the container
            // and dispose it
            _container.Model.For<IDocumentStore>().Default.EjectObject();
            // RavenDbSettings is another class from FubuPersistence
            // that just controls the very intial creation of a
            // RavenDb IDocumentStore object.  In this case, we’re
            // overriding the normal project configuration from
            // the App.config with instructions to use an
            // EmbeddedDocumentStore running completely
            // in memory. 
            _container.Inject(new RavenDbSettings
            {
                RunInMemory = true
            });
        }

The code above doesn’t necessarily create a new database, but we’ve set ourselves up to use a brand new embedded, in memory database whenever something does request a running database from the StructureMap container. I’m not going to show this code for the sake of brevity, but I think it’s important to note that the RavenDb database construction will use your normal mechanisms for bootstrapping and configuring an IDocumentStore including all the hundred RavenDb switches and pre-canned indices.

All the code shown here is from the FubuPersistence repository on GitHub.

Conclusion

I’m generally happy with this solution. So far, it’s quick in execution and we haven’t required much maintenance as we’ve progressed other than more default data. Hopefully, this solution will be applicable and reusable in future projects out of the box. I would happily recommend a similar approach to other teams.

But, but, but…

If you did read this carefully, I think you’ll find some things to take exception with:

I’m assuming that you really are able to test functionality with bare minimum data sets to keep the setup work to a minimum and the performance at an acceptable level. This technique isn’t going to be useful for anything involving performance or load testing — but are you really all that concerned about functionality testing when you do that type of testing?
We’re not running our application in its deployed configuration when we collapse everything down to the same AppDomain. Why I think this is a good idea, the benefits, and how we do it are a topic for another blog post. Promise.
RavenDb is schema-less and that turns out to make a huge difference in how long it takes to spin up a new database from scratch compared to relational databases. Yes, there may be some pre-canned indices that need to get built up when you spin up the new embedded database, but with an empty database I don’t see that as a show stopper.

Other, less successful ways of controlling state I’ve used in the past

Over the years I’ve done automated testing against persisted databases with varying degrees of frustration. The worst possible thing you can do is to have everybody testing against a shared relational database in the development and testing environments. You either expect the database to be in a certain state at the start of the test, or you ran a stored procedure to set up the tables you wanted to test against. I can’t even begin to tell you how unreliable this turns out to be when more than one person is running tests at the same time and fouling up the test runs. Unfortunately, many shops still try to do this and it’s a significant hurdle to clear when doing automated testing. Yes, you can try to play tricks with transactions to isolate the test data or try to use randomized data, but I’m not a believer in either approach.

Having an isolated relational database per developer, preferably on their own development box, was a marked improvement, but it adds a great deal of overhead to your project automation. Realistically, you need a developer to be able to build out the latest database on the fly from the latest source on their own box. That’s not a deal breaker with modern database migration tools, but it’s still a significant about of work for your team. The bigger problem to me is how you tear down the existing state in a relational database to put it into a known state before running an automated test. You’ve got a couple choices:

Destroy the schema completely and rebuild it from scratch. Don’t laugh, I’ve seen people do this and the tests were as painfully slow as you can probably imagine. I suppose you could also script the database to rollback to a checkpoint or reattach a backed up copy of the database, but again, I’m never going to recommend that if you have other options.
Execute a set of commands that wipes most if not all of the data in a database before each test. I’ve done this before, and while it definitely helped create a known state in the system, this strategy performed very poorly and it took quite a bit of work to maintain the “clean database” script as the project progressed. As a project grows, the runtime of your automated test runs becomes very important to keep the feedback loop useful. Slow tests hamper the usefulness of automated testing.
Selectively clean out and write data to only the tables affected by a test. This is probably much faster performance wise, but I think it will require more coding inside of the testing code to do the one off, set up the state code.

* As an aside, I really suggest keeping the project database data definition language scripts and/or migrations in the same source control system as the code so that it’s very easy to trace the version of the code running against which version of the database schema. The harsh reality in my experience is that the same software engineering rigor we generally use for our application code (source control, unit testing, TDD, continuous integration) is very often missing in regards to the relational database DDL and environment. If you’re a database guy talking to me at a conference, you better have your stuff together on this front before you dare tell me that “my developers can’t be trusted to access my database.”