Imagining a Better Integration Testing Tool

I might be building out a new integration testing library named “Bobcat”, but mostly I just think that bobcats are cool. Scary looking though when you see them in the wild. They came back to the area where I grew up when the wild turkey population recovered in the 80’s/90’s.

Let’s say that you’re good at writing testable code, and you’re able to write isolated unit tests for the mass majority of your domain logic and much of your coordination workflow type of logic as well. Great, but that still leaves you with the need to write some type of integration test suite and maybe even some modicum of end to end tests through *shudder* Playwright, Selenium, or Cypress.io — not that I’m saying that those tools are bad per se, but that the technical challenge and overhead for succeeding with those tools can be through the roof.

Right now I’m in a spot where the mass majority of the testing for Marten and Wolverine are integration tests of one sort or another that involve databases and message brokers. Most of these tests behave just fine, as Marten especially is well suited for integration testing and I’ve invested in some helpful integration test support helpers built directly into Wolverine itself. Cool, but, um, there’s embarrassingly enough a significant number of “blinking” tests (unreliable automated tests) in Marten and far more in Wolverine. Moreover, there’s a number of other tests that behave perfectly on my high powered development box, but blink in continuous integration test runs.

And I know what you’re thinking, in this case it’s not at all because of shared, static state which is the typical root cause of many blinking test issues. There might very well be some race conditions we haven’t yet perfectly ironed out yet, but basically all of these ill behaved tests are testing asynchronous processing and will run into timeout failures inside a larger test suite run, but generally work reliably when running one at a time.

At this point, I don’t feel like the tooling (mostly xUnit.Net) we’re currently using for automating integration testing is perfectly appropriate for what we’re trying to do, and I’m ready to consider doing something a little bit custom — especially because much of the development coming my way soon is going to involve the exact kind of asynchronous behavior that’s already giving us trouble.

I am also currently helping a JasperFx Software client formulate an integration testing approach across multiple, collaborating micro services, so integration testing is top of mind for me right now.

Integration Testing Challenges

Understanding what’s happening inside your system in the case of failures
Testing timeouts, especially if you’re needing to test asynchronous processing
“Knowing” when asynchronous work is complete and delaying the *assertions* until those asynchronous actions are really complete
Having a sense for whether a long running integration test is proceeding, or hung
Data setup, especially in problem domains that require quite a bit of data in tests
Making the expression of the test as declarative as possible to make the test clear in its intentions
Preventing the test from being too tightly coupled to the internals of the system so the test isn’t too brittle when the system internals change
Being able to make the test suite *fail fast* when the system is detected to be in an invalid state — don’t blow this off, this can be a huge problem if you’re not careful
Selectively and intelligently retrying “blinking” tests — and yeah, you should try really hard to not need this capability, but you might no matter how hard you try

Scattered Thoughts about Possible Approaches

In the case of Wolverine and Marten’s “async daemon” testing, I think a large part of our problems are due to thread pool exhaustion as we rapidly spin up and down different IHost applications. My thought is to not just to do test retries when we detect time out failures, but also to do some level of exponential backoff to pause between starting the next test to let things simmer down in the runtime and let the thread pool snap back. In the worst case, I’d also like to consider a test runner implementation where a separate test manager process could trash and restart the actual test running process in certain cases (Storyteller could actually do this somewhat).

As far as “Knowing” when asynchronous work is complete across multiple running processes, I want to kick the tires on a distributed version of Wolverine’s in-process message tracking integration testing support that can tell you when all outstanding work is completely across threads. I definitely want this built into Wolverine itself some day, but I need to help a client do this in an architecture that doesn’t include Wolverine. With a tip from Martin Thwaites, maybe some kind of tracking using ActivitySource?

As far as tooling is concerned, I’ve contemplated forking xUnit.Net for a hot second, or writing a more “robust” test runner that can work with xUnit.Net types, resurrecting Storyteller (very unlikely), or building something new (“bobcat”?) and dogfooding it the whole way on mostly Wolverine tests. I’m not ready to talk about that much yet — and I’m well aware of the challenges in trying to tackle something like that after the time I invested in Storyteller in the mid to late 2010’s.

For distributed testing, I am intrigued right now by the new Project Aspire from Microsoft as a way to bootstrap and monitor a local or testing environment for integration testing.

I am certainly considering using SpecFlow for a completely different client where they could really use business person readable BDD specifications, but I don’t see SpecFlow by itself doing much to help us out with Wolverine development.

So, stay tuned if you’re interested in the journey here, or have some solid suggestions for us!

Using Context/Specification to better express complicated tests

I’m trying to help one of our teams at work that constantly modifies a very large, very complex, 12-15 year old managed workflow system. Like many shops, we’re working to improve our testing practices, and our developers are pretty diligent about adding tests for new code.

Great, but the next step in my opinion is to adopt some different approaches for structuring the code to make unit testing easier and lead toward smaller, more focused unit tests when possible (see my post on a Real Life TDD Example for some of my thinking on that).

All that being said, it’s a very complicated system with data elements out the wazoo that coordinates work across a bevy of internal and external services. Sometimes there’s a single operation that necessarily does a lot of things in one unit of work (almost inevitably an NServiceBus message handler in this case) like:

Changing state in business entities based on incoming commands — and the system will frequently change more than one entity at a time
Sending out additional command or event messages based on the inputs and existing state of the system

To deal with the complexity of testing these kinds of message handlers, I’m suggesting that we dust off the old BDD-ish “Context/Specification” style of tests. If you think of automated tests generally following some sort of arrange/act/assertion structure, the Context/Specification style in an OO language is going to follow this structure:

A class with a name that describes the scenario being tested
A single scenario set up that performs both the “arrange” and “act” parts of the logical test group
Multiple, granular tests with descriptive names that make a single, logical assertion against the expectations of the desired behavior

Jumping into a simple example, here’s a test class from the built in Open Telemetry instrumentation in Wolverine:

public class when_creating_an_execution_activity
{
    private readonly Activity theActivity;
    private readonly Envelope theEnvelope;

    public when_creating_an_execution_activity()
    {
        // In BDD terms....
        // Given a message envelope
        // When creating a new Otel activity for processing a message
        // Then the activity uses the envelope conversation id as the otel messaging conversation id
        // And [a bunch of other things]
        theEnvelope = ObjectMother.Envelope();
        theEnvelope.ConversationId = Guid.NewGuid();

        theEnvelope.MessageType = "FooMessage";
        theEnvelope.CorrelationId = Guid.NewGuid().ToString();
        theEnvelope.Destination = new Uri("tcp://localhost:6666");

        theActivity = new Activity("process");
        theEnvelope.WriteTags(theActivity);
    }

    [Fact]
    public void should_set_the_otel_conversation_id_to_correlation_id()
    {
        theActivity.GetTagItem(WolverineTracing.MessagingConversationId)
            .ShouldBe(theEnvelope.ConversationId);
    }

    [Fact]
    public void tags_the_message_id()
    {
        theActivity.GetTagItem(WolverineTracing.MessagingMessageId)
            .ShouldBe(theEnvelope.Id);
    }

    [Fact]
    public void sets_the_message_system_to_destination_uri_scheme()
    {
        theActivity.GetTagItem(WolverineTracing.MessagingSystem)
            .ShouldBe("tcp");
    }

    [Fact]
    public void sets_the_message_type_name()
    {
        theActivity.GetTagItem(WolverineTracing.MessageType)
            .ShouldBe(theEnvelope.MessageType);
    }

    [Fact]
    public void the_destination_should_be_the_envelope_destination()
    {
        theActivity.GetTagItem(WolverineTracing.MessagingDestination)
            .ShouldBe(theEnvelope.Destination);
    }

    [Fact]
    public void should_set_the_payload_size_bytes_when_it_exists()
    {
        theActivity.GetTagItem(WolverineTracing.PayloadSizeBytes)
            .ShouldBe(theEnvelope.Data!.Length);
    }

    [Fact]
    public void trace_the_conversation_id()
    {
        theActivity.GetTagItem(WolverineTracing.MessagingConversationId)
            .ShouldBe(theEnvelope.ConversationId);
    }
}

In the case above, the constructor is doing the “arrange” and “act” part of the group of tests, but each individual [Fact] is a logical assertion on the expected outcomes.

Here’s some takeaways from this style and when and where it might be useful:

It’s long been a truism that unit tests should have a single logical assertion. That’s just a rule of thumb, but I still find it to be useful in making tests readable and “digestable”
With that testing style, I find it easier to work on one assertion at a time in a red/green/refactor cycle than it can be to specify all the related assertions in one bigger test
Arguably, that style can at least sometimes do a much better job of making the tests act as useful documentation about how the system should behave than more monolithic tests
This style doesn’t require the usage of specialized Gherkin style tools, but at some point when you’re dealing with data intensive tests a Gherkin-based tool becomes much more attractive
This style is verbose, and it’s not my default test structure for everything by any means

For structure or grouping, you might structure these tests like:

// Some people like to use the other class to group the tests
// in IDE test runners. It's not necessary, but it might be
// advantageous
public class SomeHandlerSpecs
{
    // A single scenario
    public class when_some_description_of_the_specific_scenario1
    {
        public when_some_description_of_the_specific_scenario1()
        {
            // shared context setup
            // the logical "arrange" and "act"
        }

        [Fact]
        public void then_some_kind_of_descriptive_name_for_a_single_logical_assertion()
        {
            // do an assertion
        }
        
        [Fact]
        public void then_some_kind_of_descriptive_name_for_a_single_logical_assertion_2()
        {
            // do an assertion
        }
    }
    
    // A second scenario
    public class when_some_description_of_the_second_scenario1
    {
        public when_some_description_of_the_second_scenario1()
        {
            // shared context setup
            // the logical "arrange" and "act"
        }

        [Fact]
        public void then_some_kind_of_descriptive_name_for_a_single_logical_assertion()
        {
            // do an assertion
        }
        
        [Fact]
        public void then_some_kind_of_descriptive_name_for_a_single_logical_assertion_2()
        {
            // do an assertion
        }
    }
}

Admittedly, I frequently end up doing quite a bit of copy/paste between different scenarios when I use this style. I’m going to say that’s mostly okay because test code should be optimized for readability rather than for eliminating duplication as we would in production code (see the discussion about DAMP vs DRY in this post for more context).

To be honest, I couldn’t remember what this style of test was even called until I spent some time googling for better examples today. I remember this being a major topic of discussion in the late 00’s, but not really since. I think it’s maybe a shame that Behavior Driven Development (BDD) became too synonymous with Cucumber tooling, because there was definitely some very useful thinking going on with BDD approaches. Way too many “how many Angels can dance on the head of a pin” arguments too of course too though.

Here’s an old talk from Philip Japikse that’s the best resource I could find this morning on this idea.

Integration Testing: Lessons from Storyteller and Other Thoughts

Continuing my series about automated testing. I’m wrestling quite a bit right now with the tooling for integration testing both at work and in some of my more complicated OSS projects like Marten or Jasper. I happily use xUnit.Net for unit testing on side projects, and we generally use NUnit at work. Both tools are well suited for unit testing, but lack something to be desired for integration testing in my opinion. At the same time, one of my long running OSS projects is Storyteller, which was originally meant as a Behavior Driven Development (BDD) tool (think Gherkin tools or the much older FitNesse tooling), but really turned out to mostly be advantageous for big, data intensive integration testing.

To paraphrase Miracle Max, Storyteller isn’t dead, it’s just mostly dead. Storyteller never caught on, the React UI is a bazillion versions out of date, buggy as hell, and probably isn’t worth updating. I’m busy with other things. Gherkin/Cucumber tools suck all the oxygen out of the room for alternative approaches to BDD or really even just integration testing. Plus our industry’s zombie like fixation on trying to automate all testing through Selenium is absolutely killing off all the enthusiasm for any other automated testing techniques — again, just my personal opinion.

However, I still want something better than what we have today, and there were some useful things that Storyteller provided for automated testing that I still want in automated testing tools. And of course there are also some things that I’d like to have where Storyteller fell short.

Things that I thought were valuable in Storyteller:

Rendering the test results as HTML with failures inline helps relate test failures to test inputs, expectations, or actions
BDD tests are generally expressed through a mix of “flow based” expressions (think about the English-like Given/When/Then nomenclature of Gherkin specifications) and “table based” testing that was very common in the older FIT/FitNesse style of BDD. A lot of data intensive problem domains can be best tested through declarative “table-driven” tests. For example, consider insurance quoting logic, or data transformations, or banking transactions where you’re pushing around a lot of data and also verifying matrices of expected results. I still very strongly believe that Storyteller’s focus on example tables and set verifications is the right approach for these kinds of problem domains. The example tables make the expression of the tests be both very declarative and readily reviewable by domain experts. The set verification approach in Storyteller is valuable because it will show you which values are missing or unexpected for easy test failure diagnosis. Table-driven testing is somewhat supported in the Gherkin specification, but I don’t think that’s well utilized in that space.
Being able to embed system diagnostics and logging directly into the test results such that the application logs are perfectly correlated to the integration test running has been very advantageous. As an example, in Jasper testing I pipe in all the system events around messages being sent, received, and executed so that you can use that data to understand how any given test behaved. I’ve found this feature to be incredibly helpful in solving the inevitable test failures.
Like many integration testing tools, Storyteller generally allows you to evaluate all the assertions of a long running test even after some condition or action has failed. This is the opposite of what we’re taught for fine-grained unit testing, but very advantageous in integration testing where a scenarios run very slowly and you want to maximize your feedback cycle. That approach can also easily lead to runaway test runs. Picking on Selenium testing yet again, imagine you have an integration test that uses Selenium to pull open a certain page in a web application and then verify the expected values of several screen elements. Following Selenium best practices, you have some built in “waiters” that try to slow down the test until the expected elements are visible on the screen. If the web application itself chokes completely on the web page, your Selenium heavy test might very easily loop through and timeout on the visibility of every single expected element. That’s slow. Even worse — and I’ve seen this happen in multiple shops — what if the whole web application is down, but each and every Selenium heavy test still runs and repeatedly times out all the various “waiters” in the test? That’s a hung build that can run all night (again, been there, done that). To that end, Storyteller has the concepts of “critical” or “catastrophic” errors in the test engine. For example, if you were using Storyteller to drive Selenium, you could teach Storyteller to treat the failure to load a web page as a “critical” exception that will stop the current test from continuing so you can “fail fast”. Likewise, if the very home page of the web application can’t be loaded (status code 500 y’all), you can have Storyteller throw a “catastrophic” exception that will pull the cord on the whole test suite run and stop all test executions regardless of any kind of test retry policy. Fail fast FTW!
Storyteller did a lot to make test data set up be as declarative as possible. I think this is hugely valuable for data-intensive problem domains like basically any enterprise application.
Targeted test retries in CI. You have to opt into retries on a test by test basic in Storyteller, and I think that was the right decision as opposed to letting every test failure get retried in CI.
Something I stole years ago from a former colleague at ThoughtWorks, Storyteller differentiates between tests that are “Acceptance” tests in the development lifecycle and tests that are labeled as “Regression” — meaning that those tests absolutely have to pass in CI while “Acceptance” tests are known to be a work in progress. I think that workflow is valuable for doing BDD style testing where the tests are specifications about how the system *should* behave.
Storyteller had a “step through” mode where you could manually step through the specification at execution time much like you would regular code in a debugger. That was hugely valuable.
Storyteller kept performance data about individual test steps as part of the test results, and even allowed you to specify performance thresholds that could fail a test if the performance did not satisfy that threshold. That’s not to say Storyteller specification were anything more than a good complement to real load or performance testing, but it was still very helpful to diagnose long running tests and very frequently identified real application performance bottlenecks.

Things that weren’t good about Storyteller, and you should do better:

The IDE integration was nonexistent. At a bare minimum you need easy ways to run individual or groups of specifications at will from your IDE for a quick feedback cycle. The best I ever did with Storyteller was an xUnit.Net bridge that was so so. Maybe the new “dotnet test” specification could help here
It’s a Gherkin world and the rest of us just live in it. If there was a serious attempt to reboot Storyteller, it absolutely has to be Gherkin compatible. Think there would be Storyteller customizations as well though, and I despise the silly Given/When/Then handcuffs. Any new Storyteller absolutely has to allow devs to efficiently build and run new specifications without having to jump into an external tool of some sort
It’s got to be more asynchronous code friendly because that’s the world we live in

I’ve got a lot more notes about specific use cases I’ll have to dig out later.

What am I going to do?

For Jasper & Marten where there’s no quick win to replace Storyteller, I think I might cut a quickie Storyteller v6 that updates everything to .Net 5/6 and makes the testing framework much more asynchronous code friendly as a short term thing.

Longer term, I don’t know. At work I think the best approach would be to convince myself to like SpecFlow and possibly add some Storyteller functionality as addons to SpecFlow for some of its missing functionality. I’ve got worlds of notes about a rebooted Storyteller or maybe a completely different approach I’ve nicknamed “Bobcat” that would be an xUnit-like tool that’s just very optimized for integration testing (shared setup data, teardown, smarter heuristics for parallelizing tests). I’m also somewhat curious to see if the rebooted Fixie could be molded into a better tool for integration testing.

Honestly, if a newly rebooted Storyteller happens, it’ll be an opportunity for me to refresh my UI/UX skills.

Integration Testing: IHost Lifecycle with NUnit

Starting yesterday, all of my content about automated testing is curated under the new Automated Testing page on this site.

I kicked off a new blog series yesterday with Integration Testing: IHost Lifecycle with xUnit.Net. I started by just discussing how to manage the lifecycle of a .Net IHost inside of an xUnit.Net testing library. I used xUnit.Net because I’m much more familiar with that library, but we mostly use NUnit for our testing at MedeAnalytics, so I’m going to see how the IHost lifecycle I discussed and demonstrated last time in xUnit.NEt could work in NUnit.

To catch you up from the previous post, I have two projects:

An ASP.Net Core web service creatively named WebApplication. This web service has a small endpoint that allows you to post an array of numbers that returns a response telling you the sum and product of those numbers. The code for that controller action is shown in my previous post.
A second testing project using NUnit that references WebApplication. The testing project is going to use Alba for integration testing at the HTTP layer.

With NUnit, I chose to use the SetupFixture construct to manage and share the IHost for the test suite like this:

    [SetUpFixture]
    public class Application
    {
        // Make this lazy so you don't build it out
        // when you don't need it.
        private static readonly Lazy<IAlbaHost> _host;

        static Application()
        {
            _host = new Lazy<IAlbaHost>(() => Program
                .CreateHostBuilder(Array.Empty<string>())
                .StartAlba());
        }

        public static IAlbaHost AlbaHost => _host.Value;

        // I want to expose the underlying Lamar container for some later
        // usage
        public static IContainer Container => (IContainer)_host.Value.Services;

        // Make sure that NUnit will shut down the AlbaHost when
        // all the projects are finished
        [OneTimeTearDown]
        public void Teardown()
        {
            if (_host.IsValueCreated)
            {
                _host.Value.Dispose();
            }
        }
    }

With the IHost instance managed by the Application static class above, I can consume the Alba host in an NUnit test like this:

    public class sample_integration_fixture
    {
        [Test]
        public async Task happy_path_arithmetic()
        {
            // Building the input body
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await Application.AlbaHost.Scenario(x =>
            {
                // Alba deals with Json serialization for us
                x.Post.Json(input).ToUrl("/math");
                
                // Enforce that the HTTP Status Code is 200 Ok
                x.StatusCodeShouldBeOk();
            });

            var output = response.ReadAsJson<Result>();
            output.Sum.ShouldBe(9);
            output.Product.ShouldBe(24);
        }
    }

And now a couple notes about what I did in Application:

I think it’s important to create the IHost lazily, so that you don’t incur the cost of spinning up the IHost when you might be running other tests in your suite that don’t need the IHost. Rapid developer feedback is important, and that’s an awfully easy optimization that could pay off.
The static Teardown() method is decorated with the `[OneTimeTearDown]` attribute to direct NUnit to call that method after all the tests are executed. I cannot stress enough how important it is to clean up resources in your test harness to ensure your ability to quickly iterate through subsequent test runs.
NUnit has a very different model for parallelization than xUnit.Net, and it’s completely “opt in”, so I think there’s less to worry about on that front with NUnit.

At this point I don’t think I have a hard opinion about xUnit.Net vs. NUnit, and I certainly wouldn’t bother switching an existing project from one to the other (even though I’ve certainly done that plenty of times in the past). I haven’t thought this one through enough, but I still think that xUnit.Net is a little bit cleaner for unit testing, but NUnit might be better for integration testing because it gives you finer grained control over fixture lifecycle and has some built in support for test timeouts and retries. At one point I had high hopes for Fixie as another alternative, and that project has become active again, but it would have a long road to challenge either of the two now mainstream tools.

What’s Next?

This series is meant to support my colleagues at MedeAnalytics, so it’s driven by what we just happen to be talking about at any given point. Tomorrow I plan to put out a little post on some Lamar-specific tricks that are helpful in integration testing. Beyond that, I think dealing with database state is the most important thing we’re missing at work, so that needs to be a priority.

Integration Testing: IHost Lifecycle with xUnit.Net

I’m part of an initiative at work to analyze and ultimately improve our test automation practices. As part of that work, I’ll be blogging quite a bit about test automation starting with my brain dump on test automation last week and my most recent post on mocks and stubs last month. From here on out, I’m curating all of my posts and selected writings from other folks on my new Automated Testing page.

I’m already on record as saying that the generic host (IHost) in recent versions of .Net is one of the best things that’s ever happened to the .Net ecosystem. In my previous post I stated that I strongly prefer having the system under test running in process with the test harness for faster feedback cycles and easier debugging. The generic host builder introduced in .Net Core turns out to be a very effective way to bootstrap your system within automated test harnesses.

Before I dive into how to use the IHost in automated testing, here’s a couple issues I think you have to address in your integration testing strategy before we go willy nilly spinning up an IHost:

You ideally want to test against your code running in a realistic way, so the way code is bootstrapped and configured should be relatively close to how that code is started up in the real application.
There will inevitably need to be at least some configuration that needs to be different in testing or some services — usually accessing resources external to your system — that need to be replaced with stubs or some other kind of fake implementation.
It’s important to cleanly dispose or shutdown any IHost object you create in memory to avoid potential locks of resources like database connections, files, or ports. Failing to clean up resources in tests can easily make it harder to iterate through test fixes if you find yourself needing to manually kill processes or restart your IDE to release locked resources (been there, done that).
The IHost can be expensive to build up, and sometimes there’s going to be some serious benefit in reusing the IHost between tests to make the test suite run faster.
But the IHost is stateful, and there could easily be resources (singleton scoped services, databases, and whatnot) that could impact later test runs in the suite.

Before I jump into solutions, let’s assume that I have two projects:

WebApplication is an ASP.Net Core web service project. WebApplication uses Lamar as its underlying DI container.
A test project that references WebApplication

xUnit.Net Mechanics

I’m more comfortable with xUnit.Net, so I’m going to use that first. My typical usage is to share the IHost through xUnit.Net’s CollectionFixture mechanism (and if you think the usage of this thing is confusing, welcome to the club). First up, I’ll build out a new class I usually call AppFixture to manage the lifecycle of the IHost. The example project I’ve built here is an ASP.Net Core web service project, so I’m going to use Alba to wrap the host inside of AppFixture as shown below:

    public class AppFixture : IDisposable, IAsyncLifetime
    {
        public IAlbaHost Host { get; private set; }
        public async Task InitializeAsync()
        {
            // Program.CreateHostBuilder() is the code from the WebApplication
            // that configures the HostBuilder for the system
            Host = await Program
                .CreateHostBuilder(Array.Empty<string>())
                
                // This extension method starts up the underlying IHost,
                // but Alba replaces Kestrel with a TestServer and
                // wraps the IHost
                .StartAlbaAsync();
        }

        public Task DisposeAsync()
        {
            return Host.StopAsync();
        }

        public void Dispose()
        {
            Host?.Dispose();
        }
    }

A couple things to note in that code above:

As we’ll set up next, that class above will be constructed once in memory by xUnit and shared between test fixture classes
The Dispose() and DisposeAsync() methods both dispose the IHost. By normal .Net mechanics, that will also dispose the underlying Lamar IoC container, which will in turn dispose any services created by Lamar at runtime that implement IDisposable. Disposing the IHost also stops any registered IHostedService services that your application may be using for long running tasks (for my colleagues who may be reading this, both NServiceBus and MassTransit start and stop their message listeners in an IHostedService, so that might be in use even if you don’t explicitly use that technique).

Next, we’ll set up AppFixture to be shared between our integration test classes by using the [CollectionDefinition] attribute on a marker class:

    [CollectionDefinition("Integration")]
    public class AppFixtureCollection : ICollectionFixture<AppFixture>
    {
        
    }

Lastly, I like to build out a base class for integration tests like this one:

    [Collection("Integration")]
    public abstract class IntegrationContext
    {
        protected IntegrationContext(AppFixture fixture)
        {
            theHost = fixture.Host;
            
            // I am using Lamar as the underlying DI container
            // and want some Lamar specific things later on
            // in the tests
            Container = (IContainer)fixture.Host.Services;
        }

        public IAlbaHost theHost { get; }
        
        public IContainer Container { get; }
    }

The [Collection] attribute is meaningful here because that makes xUnit.Net run all the tests that are contained in test fixture classes that inherit from IntegrationContext in a single thread so we don’t have to worry about concurrent test runs.*

And finally to bring this all together, let’s say that WebApplication has this simplistic web service code to do some arithmetic:

    public class Result
    {
        public int Sum { get; set; }
        public int Product { get; set; }
    }

    public class Numbers
    {
        public int[] Values { get; set; }
    }
    
    public class ArithmeticController : ControllerBase
    {
        [HttpPost("/math")]
        public Result DoMath([FromBody] Numbers input)
        {
            var product = 1;
            foreach (var value in input.Values)
            {
                product *= value;
            }

            return new Result
            {
                Sum = input.Values.Sum(),
                Product = product
            };
        }
    }

In the next code block, let’s finally see a test fixture class that uses the new IntegrationContext as a base class and tests the HTTP endpoint shown in the block above.

    public class ArithmeticApiTests : IntegrationContext
    {
        public ArithmeticApiTests(AppFixture fixture) : base(fixture)
        {
        }

        [Fact]
        public async Task post_to_a_secured_endpoint_with_jwt_from_extension()
        {
            // Building the input body
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await theHost.Scenario(x =>
            {
                // Alba deals with Json serialization for us
                x.Post.Json(input).ToUrl("/math");
                
                // Enforce that the HTTP Status Code is 200 Ok
                x.StatusCodeShouldBeOk();
            });

            var output = response.ReadAsJson<Result>();
            output.Sum.ShouldBe(9);
            output.Product.ShouldBe(24);
        }
    }

Alright, at this point we’ve got a way to shared the system’s IHost in tests for better efficiency, and we’re making sure that all the resources in the IHost are cleaned up when the test suite is done. We’re using the WebApplication’s exact configuration for the IHost, but we still might need to alter that in testing. And there’s also the issue of needing to roll back state in our system between tests. I’ll pick up those subjects in my next couple posts, as well as using NUnit instead of xUnit.Net because that’s what the majority of code at my work uses for testing.

* It would be nice to be able to run parallel tests using our shared IHost, but that can often be problematic because of shared state, so I generally bypass test parallelization in integrated tests. The subject of parallelizing integration tests is worthy of a later blog post of some thoughts I haven’t quite elucidated yet.

A brain dump on automated integration testing

I’m strictly talking about automated testing in this post. I’m more or less leading an effort at work to improve our test automation and Test Driven Development practices at work, so I’ll be trying to blog quite a bit about related topics in the next couple months. After reviewing quite a bit of in flight code, I think I’ll try to revisit some of my old blog posts on testability design from the CodeBetter days and update those old lessons from the early days of TDD to what we’re building now.

My company builds and maintains several long running software systems with a healthy back log of feature requests, performance improvements, and stories to retire technical debt. All that is to say that we’re constantly adding to or improving existing code — which implies we’re always running some non-zero risk of creating regression defects. To keep everybody’s stress levels down, we’re taking incremental steps toward a true continuous delivery model where we can smoothly and consistently build and deploy fully tested features while being confident that we aren’t introducing regression defects.

As you’d likely guess, we’re very interested in improving our automated testing practices as a safety net to enable continuous delivery while also improving our quality in general. That leads to the next question, what kind of automated testing should we be doing? Followed by, is there automated testing we’re doing today that isn’t delivering enough bang for the buck?

To that point, let’s take a look at the classic idea of the testing pyramid at some point, as shown below:

From **Unit test: sociable or solitary**

The thinking behind the testing pyramid is that there’s a certain, healthy mix of different sorts of automated tests that efficiently lead to better results. I say a “mix” here because unit tests, though relatively cheap compared to other tests, cannot detect many defects that only come out during integration between components, code modules, or systems. From a quick search, I found worlds of memes along the lines of this one:

No integration tests, but all the unit tests pass!

To address exactly what kind of automated tests we should be writing, here’s a stream of consciousness brain dump that I later organized with a patina of organization:

On End to End User Interface Tests

Any kind of test that uses a tool like Selenium to do end to end, black-box testing is going to run slowly. These tests are also frequently be unstable because of asynchronous timing issues in modern browser applications. There’s an unhealthy tendency in many shops who adopt Selenium as a test automation solution to use it as their golden hammer to the exclusion of other testing techniques that can be much more efficient in certain circumstances. To put it bluntly, it’s very difficult to successfully author and maintain test automation suites based on Selenium against complicated applications. In my experience in shops that have attempted large scale Selenium usage, I do not believe that the benefits of those tests have ever outweighed the costs.

I would still recommend using some small number of end to end tests with a tool like Selenium, but those tests should be focused on proving out integration mechanisms between a user interface and backing server side code. For example, I’ve been working on a new integration of Open Id Connect (OIDC) authentication into our web services and web applications. I’ve used Playwright to automate browser testing to prove out the interactions and redirects between the OIDC service and our applications or services.

I also find Cypress.io interesting, but more for doing integration testing of our Angular applications by themselves with a dummy backend. For true end to end testing of .Net-backed web applications, I think I’m interested in replacing Selenium from here on out with Playwright as I think and hope it just does much more for you to make performant and reliable automated tests compared to Selenium.

Driving a browser should not be used to automate functional testing of business logic or data services or any kind of data analysis that could possibly be tested without using the full browser.

If you insist on trying to do a lot of browser automation testing, you better invest in collecting diagnostic information in test runs that can be used by developers to debug test failures. Ideally, I like to have the application’s log output correlated to the test run somehow. I’ve worked with teams that were able to pipe the console.log() tracing from the JavaScript code running in the browser to the test results and that was extremely helpful. Taking screenshots as part of the test can certainly help. As another ideal, I very strongly recommend that any kind of browser automation tests be executable by developers on their local machine on demand for easier debugging. More on this later.

Again, if you absolutely have to write Selenium/Playwright/Cypress tests despite all of my warnings, I strongly recommend you write those tests in the same programming language as the real application. That statement is going to be controversial if any real test automation engineers stumble into this post, but I think it’s important to make it as easy as possible for developers to collaborate with test automation engineers. Moreover, I despise the kind of shadow data access layers you can get from test automation code doing their own thing to write to and read from the underlying data store of the system under test. I think it’s less likely to get that kind of insidious, hidden code duplication if the test code is written in the same programming language and even uses the system’s own data access code to set up or verify database state as part of the automated tests.

Choosing Solitary/Unit Tests or Sociable/Integration Tests

I missed out on this when it was first published, but I think I like the nomenclature of solitary vs sociable tests better than thinking about unit vs integration tests. I also encourage folks to think of that as a continuum rather than a hard categorization. Moreover, I recommend that you switch between solitary and sociable tests even within the same test library where one or the other is more effective.

I would recommend organizing tests by functional area first, and only consider separating out integration tests into a separate testing project when it’s advantageous to use a “fast test, slow test” division for more efficient development.

We have formal requirements for test coverage metrics in our continuous integration builds, so I’d definitely make sure that any integration tests count toward that coverage number.

I think an emphasis on always writing classical unit tests can easily create a strong coupling between the production code and your testing code. That can and will reduce your ability to evolve your code, add new functionality, or do performance optimizations without rewriting your tests.

In many cases, integration tests that start at a natural sub-system facade or a logical controller/conductor entry point will do much better for you as a regression safety net to allow you to refactor your code to allow for new behavior or do important performance optimizations.

Case in point, I relied strictly on fine grained unit tests in my early work in StructureMap, and I definitely felt the negative consequences of that approach (I gave a talk about it in 2008 that’s still relevant). With later releases of StructureMap and now with Lamar, I lean much more heavily on integrated acceptance tests (let’s go ahead and call it Behavior Driven Development) that test from the entry point of the library down and focus on user-centric scenarios. I feel like that testing approach has led to much better results — both in the ease of adding new features, detecting regression defects in automated builds, and allowing me to evolve the functionality of the library.

On the other hand, integration tests can be harder to troubleshoot when they fail because you have more ground to cover. They also run slower of course. If you find that your feedback cycles feel too slow to efficiently run the tests continuously or especially if you find yourself doing long, marathon sessions in your debugger, stop and consider introducing more fine-grained tests first.

Running the Tests Locally vs Remotely

To the previous point, I think it’s critical that developers should be able to easily spin up and run automated integration tests on demand on their local development boxes. Tests will fail, and being able to easily troubleshoot a failing test is a prerequisite for successful test automation. If you can run an integration test locally, you’re much more likely to be able to iterate and try potential fixes quickly. There’s also the very real possibility of attaching a debugger to the testing process.

I think that our current technology set makes it much easier to do integration testing than it was when I was first getting started and the strict Michael Feathers definition of a unit test was in vogue. Just speaking from my own experience, the current .Net 5 generation is very easy to spin up and down in process for automated testing. Docker has been a great way to stand up development environments using Sql Server, Postgresql, Rabbit MQ, and other infrastructural tools.

As a follow up to the previous section, it’s also advantageous for automated tests to be able to run in process with the test harness code. For instance, I’d much rather do Alba testing of HTTP endpoints in .Net 5 where I’m able to quickly spin up an actual web service in memory and shut it down from my testing project. As opposed to the old, full .Net framework where you’d have to run the web service project in IIS or IISExpress first, then use HttpClient to address the service from your unit tests. The first, .Net 5/Alba approach is a much faster iteration and feedback cycle to support a Test Driven Development workflow than the 2nd approach.

Likewise, when given a choice between tests that can be run locally versus tests that can only be executed in a remote server location, give me the local tests every time. When and if you hit a scenario where you really need to run tests remotely *cough* serverless *cough*, that’s the one and only exception I can think of to my “never deploy from your local development box” rule. If depending on remote execution of tests, I’d at least want the ability to send my local development branch to the remote server at will. Just having a CI server build out pull request branches might get you there of course, but then you might be dependent upon being about to run that test suite in parallel with other CI builds. That’s not a show stopper, but it might make you have to invest more in your build automation to spin up isolated environments on demand.

Apollo Testing!

There’s plenty of debate over what the actual ration of UI tests to integration tests to unit tests should be, and plenty of folks have different metaphors than a “pyramid” to describe what they thing the ratio should be. I happen to like the integration test heavy ratio described in The Testing Trophy and Testing Classifications with the graphic below:

From The Testing Trophy and Testing Classifications

I think his image of the “testing trophy” looks a lot like the command module from the Apollo missions to the moon in the 70’s:

So from now on, I’m calling our intended testing approach the Apollo Testing Method!

What is the purpose of testing?

I’m just barely old enough that I started my official software development career in old fashioned waterfall models. In those days we did some unit testing with ad hoc testing tools to troubleshoot new code, but it wasn’t anywhere close to what developers do today with Test Driven Development and xUnit tools. As developers, we mostly ran the complete application on our development boxes and stepped through things manually to check out new code locally before throwing things over the wall to QA at the end of the project.

Regardless of whatever ad hoc, local testing developers did of their code locally, the only official testing that actually counted was the purely manual testing done by our testers in the testing environment that was supposed to exactly mimic the production environment (it never quite did, but that’s a story for another day). The QA team strictly used black-box testing with some direct access to the underlying database.

That old black-box testing approach at the end of the project was a much slower feedback cycle than we’re accustomed to today after the advent of Agile Software Development. The killer problem was that the testing feedback cycles were too slow to consider evolutionary design approaches because of the real fear of regression failures. It was also harder as a developer to address defects found in the testing cycle because you were frequently needing to work with code you hadn’t touched in many months that certainly wasn’t fresh in your mind. That frequently led to marathon debugging sessions while a helpful project manager came by your cubicle to cheerfully ask for any updates several times a day and crank up the pressure. As a developer you were also completely at the mercy of QA for information about what was really happening in the system when they found bugs.

In my mind, the most important element of Agile software development overall was the emphasis on improving feedback cycles. Faster feedback allowed

Testing is not about proving that our code works perfectly so much as a way to find and remove enough problems from the code that it can be deployed to production. I think this is an important approach because it allows us to use faster, finer grained testing approaches like isolated unit testing or intermediate level white-box integration tests that are generally faster running and cheaper to build than classic black-box, end to end tests.

What’s Next?

My organization has started an effort to introduce much more integration testing into our development processes as a way of improving quality and throughput. To help out on that, I’m going to attempt to write a series of blog posts going into specific areas about tools and techniques, but for right now I’m just jotting down this stream of consciousness brain dump to get started.

Based on what I think we need to establish at work, I’m thinking to cover:

.Net IHost bootstrapping and lifecycle within xUnit.Net or NUnit. I’m much more familiar with xUnit.Net from recent development, but we mostly use NUnit at work so I’ll be trying to cover that base as well.
HTTP API testing, which will inevitably feature Alba
Dealing with databases in tests, and that’s gonna have to cover both RDBMS databases and probably Mongo Db for now
Message handler testing with MassTransit and NServiceBus (we use both in different products)
Another brain dump on doing end to end testing after a meeting today about the CI of one of our big systems.
How should automated testing be integrated into the development cycle, and who should be responsible for these tests, and why is the obvious answer a very close collaboration between testers, developers, and even business experts
This will be a bigger stretch for me, but maybe get into how to do some semi-integration testing of an Angular front end with NgRX.
After talking through some of our issues with test automation at work, I think I’d like to blog about some of the positive things we did with Storyteller. I’ve been increasingly frustrated with xUnit.Net (and don’t think NUnit would be much better) for integration testing, so I’ve got quite a bit of notes about what an alternative tool optimized for integration tests could look like I wouldn’t mind publishing.

Testing effectively — with or without mocks or stubs

My team at MedeAnalytics are working with our development teams on a long term initiative to improve the effectiveness of our developer testing practices and encouraging more Test Driven Development in daily development. As part of that effort, it’s time for my organization to talk about how — and when — we use test doubles like mock objects or stubs in our daily work.

I think most developers have probably heard the terms mock or stub. Mocks and stubs are examples of “test doubles” that refer to any kind of testing object or function that is substituted a production dependency while testing or in early development before the real dependency is available. At the moment, I’m concerned that we may be overusing mock objects and especially dynamic mocking tools when other types of testing doubles could be easier or we should be switching to integration testing and be testing against the real dependencies.

Let’s first talk about the different kinds of tests that developers write, both as part of a Test Driven Development workflow and tests that you may add later to act as regression tests. When I was learning about developer testing, we talked about a pretty strict taxonomy of test types using the old Michael Feathers definition of what constituted a unit test. Unit tests typically meant that we tested one class at a time with all of its dependencies replaced with some kind of test double so we could isolate the functionality of just that one test. We’d also write some integration tests that ran part or all of the application stack, but that wasn’t emphasized very much at the time.

Truth be told, many of the mock-heavy unit tests I wrote back then didn’t provide a lot of value compared to the effort I put into authoring them, and I learned the hard way in longer lived codebases like StructureMap that I was better off in many cases breaking the “one class” rule of unit tests and writing far more coarser grained tests that were focused on usage scenarios because the fine-grained unit tests actually made it much harder to evolve the internal structure of the code. In my later efforts, I switched to mostly testing through the public APIs all the way down the stack and got much better results.

Flash forward to today, and we talk a lot about the testing pyramid concept where code really needs to be covered by different types of tests for maximum effective test coverage – lots of small unit tests, a medium amount of a nebulously defined middle ground of integration tests, and a handful of full blown, end to end black box tests. The way that I personally over-simplify this concept is to say:

Test with the finest grained mechanism that tells you something important
Jeremy Miller (me!)

For any given scenario, I want developers to consciously choose the proper mix of testing techniques that really tells you that the code fulfills its requirements correctly. At the end of the day, the code passing all of its integration tests should go along way toward making us confident that the code is ready to ship to production.

I also want developers to be aware that unit tests that become tightly coupled to the implementation details can make it quite difficult to refactor that code later. In one ongoing project, one of my team members is doing a lot of work to optimize an expensive process. We’ve already talked about the need to do the majority of his testing from outside-in with integration tests so he will have more freedom to iterate on completely different internal mechanisms while he pursues performance optimization.

I think I would recommend that we all maybe stop thinking so much about unit vs integration tests and think more about tests being on a continuous spectrum between “sociable” vs “solitary” tests that Marten Fowler discusses in On the Diverse And Fantastical Shapes of Testing.

To sum up this section, I think there are two basic questions I want our developers to constantly ask themselves in regards to using any kind of test double like a mock object or a stub:

Should we be using an integration or “sociable” test instead of a “solitary” unit test that uses test doubles?
When writing a “solitary” unit test with a test double, which type of test double is easiest in this particular test?

In the past, we strongly preferred writing “solitary” tests with or without mock objects or other fakes because those tests were reliable and ran fast. That’s still a valid consideration, but I think these days it’s much easier to author more “socialable” tests that might even be using infrastructure like databases or the file system than it was when I originally learned TDD and developer testing. Especially if a team is able to use something like Docker containers to quickly spin up local development environments, I would very strongly recommend writing tests that work through the data layer or call HTTP endpoints in place of pure, Feathers-compliant unit tests.

Black box or UI tests through tools like Selenium are just a different ball game altogether and I’m going to just say that’s out of the scope of this post and get on with things.

As for when mock objects or stubs or any other kind of test double are or are not appropriate, I’m going to stand pat on what I’ve written in the past:

Using Mocks or Stubs, Revisited – The last time I tried to tackle this subject in 2016 and this still reflects my current thinking and advice to other developers
Best and Worst Practices for Mock Objects – From 2006 for older thinking, but that was a little different world when TDD was still very new so take everything with a grain of salt
Why and When to Use Mock Objects – From 2005, just to see my own evolution in thinking

After all of that, I’d finally like to talk about all the different kinds of test doubles, but first I’d like to make a short digression into…

The Mechanics of Automated Tests

I think that most developers writing any kind of tests today are familiar with the Arrange-Act-Assert pattern and nomenclature. To review, most tests will follow a structure of:

“Arrange” any initial state and inputs to the test. One of the hallmarks of a good test is being repeatable and creating a clear relationship between known inputs and expected outcomes.
“Act” by executing a function or a method on a class or taking some kind of action in code
“Assert” that the expected outcome has happened

In the event sourcing support in Marten V4, we have a subsystem called the “projection daemon” that constantly updates projected views based on incoming events. In one of its simpler modes, there is a class called SoloCoordinator that is simply responsible for starting up every single configured projected view agent when the projection daemon is initialized. To test the start up code of that class, we have this small test:

    public class SoloCoordinatorTests
    {
        [Fact]
        public async Task start_starts_them_all()
        {
            // "Arrange" in this case is creating a test double object
            // as an input to the method we're going to call below
            var daemon = Substitute.For<IProjectionDaemon>();
            var coordinator = new SoloCoordinator();

            // This is the "Act" part of the test
            await coordinator.Start(daemon, CancellationToken.None);

            // This is the "Assert" part of the test
            await daemon.Received().StartAllShards();
        }
    }

and the little bit of code it’s testing:

        public Task Start(IProjectionDaemon daemon, CancellationToken token)
        {
            _daemon = daemon;
            return daemon.StartAllShards();
        }

In the code above, the real production code for the IProjectionDaemon interface is very complicated and setting up a real one would require a lot more code. To short circuit that set up in the “arrange” part of the test, I create a “test double” for that interface using the NSubstitute library, my dynamic mock/stub/spy library of choice.

In the “assert” phase of the test I needed to verify that all of the known projected views were started up, and I did that by asserting through the mock object that the IProjectionDaemon.StartAllShards() method was called. I don’t necessarily care here that that specific method was called so much as that the SoloCoordinator sent a logical message to start all the projections.

In the “assert” part of a test you can either verify some expected change of state in the system or return value (state-based testing), or use what’s known as “interaction testing” to verify that the code being tested sent the expected messages or invoked the proper actions to its dependencies.

See an old post of mine from 2005 called TDD Design Starter Kit – State vs. Interaction Testing for more discussion on this subject. The old code samples and formatting are laughable, but I think the discussion about the concepts are still valid.

As an aside, you might ask why I bothered writing a test for such a simple piece of code? I honestly won’t write bother writing unit tests in every case like this, but a piece of advice I read from (I think) Kent Beck was to write a test for any piece of code that could possibly break. Another rule of thumb is to write a test for any vitally important code regardless of how simple it is in order to remove project risk by locking it down through tests that could fail in CI if the code is changed. And lastly, I’d argue that it was worthwhile to write that test as documentation about what the code should be doing for later developers.

Mocks or Spies

Now that we’ve established the basic elements of automated testing and reviewed the difference between state-based and interaction-based testing, let’s go review the different types of test doubles. I’m going to be using the nomenclature describing different types of test doubles from the xUnit Patterns book by Gerard Meszaro (I was one of the original technical reviewers of that book and used the stipend to buy a 30gb iPod that I had for years). You can find his table explaining the different kinds of test doubles here.

As I explain the differences between these concepts, I recommend that you focus more on the role of a test double within a test than get too worked up about how things happen to be implemented and if we are or are not using a dynamic mocking tool like NSubstitute.

The most commonly used test double term is a “mock.” Some people use this term to mean “dynamic stand in object created dynamically by a mocking tool.” The original definition was an object that can be used to verify the calls or interactions that the code under test makes to its dependencies.

To illustrate the usage of mock objects, let’s start with a small feature in Marten to seed baseline data in the database whenever a new Marten DocumentStore is created. That feature allows users to register objects implementing this interface to the configuration of a new DocumentStore:

    public interface IInitialData
    {
        Task Populate(IDocumentStore store);
    }

Here’s a simple unit test from the Marten codebase that uses NSubstitute to do interaction testing with what I’d still call a “mock object” to stand in for IInitialData objects in the test:

        [Fact]
        public void runs_all_the_initial_data_sets_on_startup()
        {

            // These three objects are mocks or spies
            var data1 = Substitute.For<IInitialData>();
            var data2 = Substitute.For<IInitialData>();
            var data3 = Substitute.For<IInitialData>();

            // This is part of a custom integration test harness
            // we use in the Marten codebase. It's configuring and
            // and spinning up a new DocumentStore. As part of the
            // DocumentStore getting initialized, we expect it to
            // execute all the registered IInitialData objects
            StoreOptions(_ =>
            {
                _.InitialData.Add(data1);
                _.InitialData.Add(data2);
                _.InitialData.Add(data3);
            });

            theStore.ShouldNotBeNull();

            // Verifying that the expected interactions
            // with the three mocks happened as expected
            data1.Received().Populate(theStore);
            data2.Received().Populate(theStore);
            data3.Received().Populate(theStore);
        }

In the test above, the “assertions” are just that at some point when a new Marten DocumentStore is initialized it will call the three registered IInitialData objects to seed data. This test would fail if the IInitialData.Populate(IDocumentStore) method was not called on any of the mock objects. Mock objects are used specifically to do assertions about the interactions between the code under test and its dependencies.

In the original xUnitPatterns book, the author also identified another slightly different test double called a “spy” that recorded the inputs to itself that could be interrogated in the “Assert” part of a test. That differentiation did make sense years ago when early mock tools like RhinoMocks or NMock worked very differently than today’s tools like NSubstitute or FakeItEasy.

I used NSubstitute in the sample above to build a mock dynamically, but at other times I’ll roll a mock object by hand when it’s more convenient. Consider the common case of needing to verify that an important message or exception was logged (I honestly won’t always write tests for this myself, but it makes for a good example here).

Using a dynamic mocking tool (Moq in this case) to mock the ILogger<T> interface from the core .Net abstractions just to verify that an exception was logged could result in code like this:

_loggerMock.Received().Log(
            _loggerMock.Verify(
                x => x.Log(
                    LogLevel.Error,
                    It.IsAny<EventId>(),
                    It.IsAny<It.IsAnyType>(),
                    It.IsAny<Exception>(),
                    It.IsAny<Func<It.IsAnyType, Exception, string>>()),
                Times.AtLeastOnce);

In this case you have to use an advanced feature of mocking libraries called argument matchers as a kind of wild card to match the expected call against data generated at runtime you don’t really care about. As you can see, this code is a mess to write and to read. I wouldn’t say to never use argument matchers, but it’s a “guilty until proven” kind of technique to me.

Instead, let’s write our own mock object for logging that will more easily handle the kind of assertions we need to do later (this wasn’t a contrived example, I really have used this):

    public class RecordingLogger<T> : ILogger<T>
    {
        public IDisposable BeginScope<TState>(TState state)
        {
            throw new NotImplementedException();
        }

        public bool IsEnabled(LogLevel logLevel)
        {
            return true;
        }

        public void Log<TState>(
            LogLevel logLevel, 
            EventId eventId, 
            TState state, 
            Exception exception, 
            Func<TState, Exception, string> formatter)
        {
            // Just add this object to the list of messages
            // received
            var message = new LoggedMessage
            {
                LogLevel = logLevel,
                EventId = eventId,
                State = state,
                Exception = exception,
                Message = formatter?.Invoke(state, exception)
            };

            Messages.Add(message);
        }

        public IList<LoggedMessage> Messages { get; } = new List<RecordingLogger<T>.LoggedMessage>();

        public class LoggedMessage
        {
            public LogLevel LogLevel { get; set; }

            public EventId EventId { get; set; }

            public object State { get; set; }

            public Exception Exception { get; set; }

            public string Message { get; set; }
        }

        public void AssertExceptionWasLogged()
        {
            // This uses Fluent Assertions to blow up if there
            // are no recorded errors logged
            Messages.Any(x => x.LogLevel == LogLevel.Error)
                .Should().BeTrue("No exceptions were logged");
        }
    }

With this hand-rolled mock object, the ugly code above that uses argument matchers just becomes this assertion in the tests:

// _logger is a RecordingLogger<T> that 
// was used as an input to the code under
// test
_logger.AssertExceptionWasLogged();

I’d argue that that’s much simpler to read and most certainly to write. I didn’t show it here, but with you could have just interrogated the calls made to a dynamically generated mock object without using argument matchers, but the syntax for that can be very ugly as well and I don’t recommend that in most cases.

To sum up, “mock objects” are used in the “Assert” portion of your test to verify that expected interactions were made with the dependencies of the code under tests. You also don’t have to use a mocking tool like NSubstitute, and sometimes a hand-rolled mock class might be easier to consume and lead to easier to read tests.

Pre-Canned Data with Stubs

“Stubs” are just a way to replace real services with some kind of stand in that supplies pre-canned data as test inputs. Whereas “mocks” refer to interaction testing, “stubs” are to provide inputs in state-based testing. I won’t go into too much detail because I think this concept is pretty well understood, but here’s an example of using NSubstitute to whip up a stub in place of a full blown Marten query session in a test:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(session);
            
            // carry out the Arrange and Assert parts of the test
        }

Again, “stub” refers to a role within the test and not how it was built. In memory database stand ins in tools like Entity Framework Core are another common example of using stubs.

Dummy Objects

A “dummy” is a test double who’s only real purpose is to act as a stand in service that does nothing but yet allows your test to run without constant NullReferenceException problems. Going back to the ServiceThatLooksUpUsers in the previous section, let’s say that the service also depends on the .Net ILogger<T> abstraction for tracing within the service. We may not care about the log messages happening in some of our tests, but ServiceThatLooksUpUsers will blow up if it doesn’t have a logger, so we’ll use the built in NullLogger<T> that’s part of the .Net logging as a “dummy” like so:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(
                session,
                
                // Using a dummy logger
                new NullLogger<ServiceThatLooksUpUsers>());

            // carry out the Arrange and Assert parts of the test
        }

Summing it all up

I tried to cover a lot of ground, and to be honest, this was meant to be the first cut at a new “developer testing best practices” guide at work so it meanders a bit.

There’s a few things I would hope folks would get out of this post:

Mocks or stubs can sometimes be very helpful in writing tests, but can also cause plenty of heartburn in other cases.
Don’t hesitate to skip mock-heavy unit tests with some sort of integration test might be easier to write or do more to ascertain that the code actually works
Use the quickest feedback cycle you can get away with when trying to decide what kind of testing to do for any given scenario — and sometimes a judicious usage of a stub or mock object helps write tests that run faster and are easier to set up than integration tests with the real dependencies
Don’t get tunnel vision on using mocking libraries and forget that hand-rolled mocks or stubs can sometimes be easier to use within some tests

A Small Case Study in Test Automation (and other things)

I’m trying to walk a line here in this post between avoiding specifics about a client project for obvious reasons, but providing enough detail to make this post worthwhile for that client. One of our client’s development managers is interested in speeding up their testing, and I’m hoping to use this post to lay out some ideas and approaches to improve the testing procedures in this system.

I’ve been part of an integration project for the past couple years that validates, routes, and processes financial transactions coming from an external partner of our client’s all the way to a very large 3rd party hosted in our client’s environment. We’re in the middle of some significant changes in the integration to that 3rd party application that is going to trigger a round of regression testing of the entire system — and that’s where this post comes in. Testing this application has been very challenging and extremely time consuming. Any opportunity to make regression testing be quicker and more effective is going to make everyone’s jobs easier.

It’s not just that the testing itself is slower than desired. Because the testing is slow and not easily repeatable, the development team can’t really do much technical improvement through refactoring as they learn more about the system behavior and how the code structure is working out over time. That’s been a definite negative for code and architectural quality.

Before I get into the details of the existing system, know that what I’m showing and discussing here is a bit of an idealized version of how I wish we had architected the system and what we’ve recommended to the client for the longer term. The real system is a bit messier and significantly harder to test than what I’m presenting here — but there’s a lesson for you, testability should be a first class architectural goal in many cases (and Conway’s Law is legitimately something to work around).

From a 10,000 foot level, here’s the entire system:

TestAutomationScenario-High Level

The workflow is:

A couple times a day, a new flat file containing new transactions will be dropped into a file share
The File Reader console application is executed to find this file, parse it into little transaction messages, and publish those messages to Rabbit MQ. There’s a little bit of database tracking going on for reporting and just general activity tracking.
Rabbit MQ publishes the transaction messages to the subscribing Transaction Processor application (an ASP.Net Core application with an active subscriber for these incoming messages).
The Transaction Processor handles each transaction message by:
1. Pulling in a helluva lot of information from the 3rd Party Application and other information from a Configuration DB related to the account number in the transaction message
2. Using the information from the previous step to validate whether the transaction can be processed normally, or has to go into a queue for manual resolution
3. For the valid transactions, use the information from step #1 to decide how the money in the incoming transaction will be applied (routing to sub-transactions)
4. Send the routed sub-transactions from the previous step to the 3rd Party Application through its externally facing API.

While there are some unit tests and intermediate level integration tests today on some of the subsystems, the overall official testing effort to date has relied strictly on end to end, manual testing of the entire system. Some of the emphasis on black box, end to end testing is due to our client’s mandatory regulatory auditing requirements and that can’t completely go away. However, there’s worlds of opportunity and new willingness to explore other alternatives like white box testing techniques or new processes for testing as a complement to the formal audit-style testing, so let’s jump into some ideas for making things work more efficiently.

Some Necessary Shifts in Testing Philosophy

First off, there’s an important shift from trying to prove that the system is working perfectly with strictly black box testing to thinking about testing as a feedback mechanism to identify and remove problems in the code so that the code can be deployed to production. If you look at testing as more of a feedback cycle, you can utilize the testing pyramid idea to maximize feedback about how your system functions with more efficient testing techniques.

Secondly, I think you have to have collaboration between testers, developers, and architects to make white box testing more effective. Part of that is increasing the testability of the system architecture, and another part of that is trying to avoid duplication in effort between tests written by developers and other tests performed by the testers. Moreover, if developers are actively engaged in writing tests — and they should be in my world view — it’s very helpful to have the testers involved in the content of those developer-written tests. In other words, I think that having strict separation between testers and development can be very inefficient. I know there are folks who strongly believe that strict independence for the testers from the developers is necessary, but I think that does more harm than good.

For more information on whitebox vs. blackbox testing and improving test feedback, also see:

Jeremy’s Only Rule of Testing (about choosing testing techniques to maximize feedback)
Succeeding with Automated Integration Testing (for a discussion on white box vs black box testing)
Martin Fowler on the Test Pyramid

If you’ll buy any of the two previous paragraphs, or you’re at least open-minded to continue, let’s see how some of this Test Pyramid thinking would play out in our big integration system.

At a high level, I would want the testing strategy to focus on:

Some kind of Behavioral Driven Development approach for all the business rules for validating and routing the transactions.
Mid-level integration tests on all the code that acts as a gateway or service proxy to the 3rd Party Application. This would include both the code that sends commands to the 3rd Party Application and the code that queries or reads the 3rd Party Application.
Mid-level integration tests on the File Reader that probably stubs the outgoing Rabbit MQ and just measures how the File Reader parses the incoming files, writes tracking information, and what messages it publishes.
A handful of fully end to end tests through the entire system to prove out all the integration points — but by and large you use finer grained tests to test out business rules and the integration with the 3rd Party Application.

The Transaction Processor

Most of the meat of the bigger transaction processing project is within what I’m calling the Transaction Processor shown below in a little more detail:

TestAutomationScenario-Transaction Processor

There’s a couple big responsibilities here:

Querying data from the 3rd Party Application with its heinously unusable, custom Xml query language to use inside the business rules
Look up some configuration parameters about accounts from a second Configuration DB
Carry out validation rules against incoming transactions
Route the incoming transactions into sub-transactions based on business rules
Post the sub-transactions to the 3rd Party Application with its, shall we say, interesting XML API.

Channeling some Domain-Driven Design thinking here, let’s go straight into the business rules for validation and routing. The business rules required a lot of input parameters, there were a lot of permutations to build and test, and the developers new to the problem domain had plenty of misunderstandings early on about the desired behavior.

From an architectural standpoint, I think it is extremely important to completely isolate these business rules from the 3rd Party Application, the configuration databases, and even the incoming flat file format because:

It was very difficult to set up test scenario inputs in the 3rd Party Application
There’s a tremendous number of test cases because of the permutations on account state and transaction parameters involved, so there would be a large benefit to tests being quick to author and execute
This logic is key to the business and has already evolved significantly since this project started. It’s imperative that this logic be safe to change over time, and that happens most effectively when it’s cheap to write new tests and quick to execute the existing test coverage.
I probably shouldn’t say this too loudly, but I think this client should reconsider coupling their ecosystem to the 3rd Party Application

To that end, the business rules should only depend on a domain model that’s internal to the Transaction Processor. We’ll use the A-Frame Architecture idea from Jim Shore’s Testing Without Mocks paper to isolate the business rule behavior from the infrastructure. The domain model objects that implement all the business logic will have no dependency whatsoever on the external dependencies. Instead, we’ll effectively write our own mapping layer to take the data returned from the 3rd Party Application and the Configuration DB and build all the state the domain model needs, then hand that to the business logic code in the domain model.

From the perspective of testing, there’s a lot of opportunity to get the business rules wrong. Rather than depend solely on design or requirements documents, I strongly recommend using Behavior Driven Development (BDD) techniques here to author executable specifications that are readable and reviewed (if not written) by the business domain experts and testers. What I largely recommend here is that the developers mostly write the test harness code, but business domain experts and more likely testers will own the content and meaning of the BDD specifications. Working in this manner, we should be able to treat the BDD specifications as the official tests for the business rule behavior even though this doesn’t run the entire process.

So that handles the business rules, now on to the rest of the Transaction Processor. The “controller” code in the diagram is playing a coordination role to mediate between the business rules and the code that interacts directly with the 3rd Party Application’s external API endpoints. I’d mostly use unit tests and maybe even *gasp* interaction testing with mock objects to test out the workflow and error handling of this code.

The service gateway code that interacts with the 3rd Party Application was extremely problematic in both development and testing. In retrospect, I wish we’d hammered at this code in isolation much more before even bothering trying to run end to end tests. The big issue we never pushed through (yet) was how to establish known system state in the 3rd Party Application so that we could write reliable automated tests around just the service gateway code in the Transaction Processor. I think it would be worthwhile for domain experts and/or testers to be involved in this step as well to verify the expected results are really happening in the 3rd Party Application.

Lastly, I’d opt to do some bigger tests for just the Transaction Processor where you directly enqueue the transaction messages in Rabbit MQ and test the entire Transaction Processor stack all the way down to the external dependencies. The point of these tests are to prove out the integrations and configuration. You don’t try to recreate all the business rules functionality tests covered by the smaller, faster unit tests.

“Some” End to End Tests

There are absolutely some issues that can only be tested through true, end to end tests. Integrations, configuration, environments, and security are examples. We’ll still write and perform some end to end tests, but we won’t try to recreate the business functionality tests covered.

No matter what though, the tests need to be as easily repeatable as possible so there’s still going to be a level of automation to speed things along. Here’s my thoughts on what that might look like:

The flat file format was originally used by mainframe applications, so as you can imagine, it’s not remotely user friendly to edit or read. I’d suggest using some custom code that can transform a much simpler format to the mainframe-friendly format so the testers can write new test cases more efficiently and everyone else can actually read and understand the test inputs
The undeniable, cardinal rule of automated testing is that you have to have known inputs and expected outcomes. In this system, that means being able to set up the 3rd Party Application in a known state for each end to end testing scenario. The failure to do that (not a technical impossibility, but it’s a long story) is my single biggest regret from this project. See My Opinions on Data Setup for Functional Tests for more on what I recommend for test input data.
Automating the testing of asynchronous workflows like this system can be very challenging. The biggest issue is making an automated test harness understand when the work is really done across multiple systems so it can proceed to the “assert” part of the standard “arrange, act, assert” test workflow. I’ve had some success with this in the past by making the test harness listen to the various application logging or some kind of visible side effect like data being written to a database to “know” when the work is complete.
Tests do fail from time to time, so I’d actually try to have the end to end test harness able to gather up the relevant logs for all the systems active in the test. That’s even more valuable if you can somehow manage to correlate the logging activity with only the active test run.
Finally, the big expensive end to end tests my client has to follow for official certification and auditing? Yeah, you have to do that, but my very strong recommendation and where I think they’re starting to head is to use finer-grained and more efficient testing techniques to remove problems first. Then come back and do the laboriously slow audit tests when you can justifiably expect success with few iterations.

Summary

There’s a couple big points I wanted to drive home in this post:

Embrace the test pyramid idea, and try to get over any aversion to white box testing because of its advantages for efficiency
Treat testing as a feedback mechanism more than a certification process
Tests of all type need to be repeatable to be effective feedback. Manual testing, and especially manual testing where it’s time consuming to set up the necessary system state first, is not very repeatable
I think you need to embrace the Agile idea of blurring the lines between roles. Developers and architects need to be involved in the automated testing for a better chance of success. Testers may need to get their hands dirty directly in the code or at least exploit their knowledge of the coding internals in order to make the testing more efficient
Developers, testers, and architects need to collaborate to be truly successful in testing. Waterfall style testing where all testing happens at the end is just not the way to be successful
Try to avoid duplicating effort between developer written tests and the tester activity, which might be just yet another way of saying the testers and developers need to be collaborating as the project goes on
Feedback cycles of all kinds are valuable for quality software

My integration testing challenges this week

EDIT 3/12: All these tests are fixed, and it wasn’t as bad as I’d thought it was going to be, but the extra logging and faster change code/run test under debugger cycle definitely helped.

This was meant to be a short post, or at least an easy to write post on my part, but it spilled out from integration testing and into some Storyteller mechanics, semi-advanced xUnit.Net usage, ASP.Net Core logging integration, and even a tepid defense of compositional architectures wrapped around an IoC container.

I’ve been working feverishly the past couple of months to push Jasper to a 1.0 release. As (knock on wood) the last big epic, I’ve been working on a large overhaul and redesign of the message persistence behind Jasper’s durable messaging. The existing code was fairly well covered by integration tests, so I felt confident that I could make the large scale changes and use the big bang integration tests to ensure the intended functionality still worked as designed.

I assume that y’all can guess how this has turned out. After a week of code changes and fixing any and all unit test and intermediate integration test failures, I got to the point where I was ready to run the big bang integration tests and, get this, they didn’t pass on the first attempt! I know, shocking right? Moreover, the tests involve all kinds of background processing and even multiple logical applications (think ASP.Net Core IWebHost objects) being started up and shut down during the tests, so it wasn’t exactly easy to spot the source of my test failures.

I thought it’d be worth talking about how I first stopped and invested in improving my test harness to make it faster to launch the test harness and to capture a lot more information about what’s going on in all the multithreading/asynchronous madness going on but…

First, an aside on Debugger Hell

You probably want to have some kind of debugger in your IDE of choice. You also want to be reasonably good using that tool, know how to use its features, and conversant in its supported keyboard shortcuts. You also want to try really hard to avoid needing to use your debugger too often because that’s just not an efficient way to get things done. Many folks in the early days of Agile development, including me, described debugger usage as a code or testing “smell.” And while “smell” is mostly used today as a pejorative to put down something that you just don’t like, it was originally just meant as a warning you should pay more attention to in your code or approach.

In the case of debugger usage, it might be telling you that your testing approach needs to be more fine grained or that you are missing crucial test coverage at lower levels. In my case, I’ll be looking for places where I’m missing smaller tests on elements of the bigger system and fill in those gaps before getting too worked up trying to solve the big integration test failures.

Storyteller Test Harness

For these big integration tests, I’m using Storyteller as my testing tool (think Cucumber, but much more optimized for integration testing as opposed to being cute). With Storyteller, I’ve created a specification language that lets me script out message failover scenarios like the one shown below (which is currently failing as I write this):

JasperFailoverSpec

In the specification above, I’m starting and stopping Jasper applications to prove out Jasper’s ability to fail over and recover pending work from one running node to another using its Marten message persistence option. At runtime, Jasper has a background “durability agent” constantly running that is polling the database to determine if there is any unclaimed work to do or known application nodes are down (using advisory locks through Postgresql if anybody would ever be interested in a blog post about just that). Hopefully that’s enough description just to know that this isn’t particularly an easy scenario to test or code.

In my initial attempts to diagnose the failing Storyteller tests I bumped into a couple problems and sources of friction:

It was slow and awkward mechanically to get the tests running under the debugger (that’s been a long standing problem with Storyteller and I’m finally happy with the approach shown later in this post)
I could tell quickly that exceptions were being thrown and logged in the background processing, but I wasn’t capturing that log output in any kind of usable way

Since I’m pretty sure that the tests weren’t going to get resolved quickly and that I’d probably want to write even more of these damn tests, I believed that I first needed to invest in better visibility into what was happening inside the code and a much quicker way to cycle into the debugger. To that end, I took a little detour and worked on some Storyteller improvements that I’ve been meaning to do for quite a while.

Incorporating Logging into Storyteller Results

Jasper uses the ASP.Net Core logging abstractions for its own internal logging, but I didn’t have anything configured except for Debug and Console tracing to capture the logs being generated at runtime. Even with the console output, what I really wanted was all the log information correlated with both the individual test execution and which receiver or sender application the logging was from.

Fortunately, Storyteller has an extensibility model to capture custom logging and instrumentation directly into its test results. It turned out to be very simple to whip together an adapter for ASP.Net Core logging that captured the logging information in a way that can be exposed by Storyteller.

You can see the results in the image below. The table below is just showing all the logging messages received by ILogger within the “Receiver1” application during one test execution. The yellow row is an exception that was logged during the execution that I might not have been able to sense otherwise.

AspNetLoggingInStoryteller

For the implementation, ASP.Net Core exposes the ILoggerProvider service such that you can happily plug in as many logging strategies as you want to an application in a combinatorial way. On the Storyteller side of things, you have the Report interface that let’s you plug in custom logging that can expose HTML output into Storyteller’s results.

Implementing that crudely came out as a single class that implements both adapter interface (here’s a gist of the whole thing):

public class StorytellerAspNetCoreLogger : Report, ILoggerProvider

The actual logging just tracks all the calls to ILogger.Log() as little model objects in memory:

public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception exception, Func<TState, Exception, string> formatter)
{
    var logRecord = new LogRecord
    {
        Category = _categoryName,
        Level = logLevel.ToString(),
        Message = formatter(state, exception),
        ExceptionText = exception?.ToString()
    };

    // Just keep all the log records in an in memory list
    _parent.Records.Add(logRecord);
}

Fortunately enough, in the Storyteller Fixture code for the test harness I bootstrap the receiver and sender applications per test execution, so it’s really easy to just add the new StorytellerAspNetCoreLogger to both the Jasper applications and the Storyteller test engine:

var registry = new ReceiverApp();
registry.Services.AddSingleton<IMessageLogger>(_messageLogger);

var logger = new StorytellerAspNetCoreLogger(key);

// Tell Storyteller about the new logger so that it'll be
// rendered as part of Storyteller's results
Context.Reporting.Log(logger);

// This is bootstrapping a Jasper application through the 
// normal ASP.Net Core IWebHostBuilder
return JasperHost
    .CreateDefaultBuilder()
    .ConfigureLogging(x =>
    {
        x.SetMinimumLevel(LogLevel.Debug);
        x.AddDebug();
        x.AddConsole();
        
        // Add the logger to the new Jasper app
        // being built up
        x.AddProvider(logger);
    })

    .UseJasper(registry)
    .StartJasper();

And voila, the logging information is now part of the test results in a useful way so I can see a lot more information about what’s happening during the test execution.

It sucks that my code is throwing exceptions instead of just working, but at least I can see what the hell is going wrong now.

Get the debugger going quickly

To be honest, the friction of getting Storyteller tests running under a debugger has always been a drawback to Storyteller — especially compared to how fast that workflow is with tools like xUnit.Net that integrate seamlessly into your IDE. You’ve always been able to just attach your debugger to the running Storyteller process, but I’ve always found that to be clumsy and slow — especially when you’re trying to quickly cycle between attempted fixes and re-running the tests.

I made some attempts in Storyteller 5 to improve the situation (after we gave up on building a dotnet test adapter because that model is bonkers), but that still takes some set up time to make it work and even I have to always run to the documentation to remember how to do it. Sometime this weekend the solution for a quick xUnit.Net execution wrapper around Storyteller popped into my head and it honestly took about 15 minutes flat to get things working so that I could kick off individual Storyteller specifications from xUnit.Net as shown below:

StorytellerWithinXUnit

Maybe that’s not super exciting, but the end result is that I can rerun a specification after making changes with or without debugging with nothing but a simple keyboard shortcut in the IDE. That’s a dramatically faster feedback cycle than what I had to begin with.

Implementation wise, I just took advantage of xUnit.Net’s [MemberData] feature for parameterized tests and Storyteller’s StorytellerRunner class that was built to allow users to run specifications from their own code. After adding a new xUnit.Net test project and referencing the original Storyteller specification project named “StorytellerSpecs, ” I added the code file shown below in its entirety::

// This only exists as a hook to dispose the static
// StorytellerRunner that is hosting the underlying
// system under test at the end of all the spec
// executions
public class StorytellerFixture : IDisposable
{
    public void Dispose()
    {
        Runner.SpecRunner.Dispose();
    }
}

public class Runner : IClassFixture<StorytellerFixture>
{
    internal static readonly StoryTeller.StorytellerRunner SpecRunner;

    static Runner()
    {
        // I'll admit this is ugly, but this establishes where the specification
        // files live in the real StorytellerSpecs project
        var directory = AppContext.BaseDirectory
            .ParentDirectory()
            .ParentDirectory()
            .ParentDirectory()
            .ParentDirectory()
            .AppendPath("StorytellerSpecs")
            .AppendPath("Specs");

        SpecRunner = new StoryTeller.StorytellerRunner(new SpecSystem(), directory);
    }

    // Discover all the known Storyteller specifications
    public static IEnumerable<object[]> GetFiles()
    {
        var specifications = SpecRunner.Hierarchy.Specifications.GetAll();
        return specifications.Select(x => new object[] {x.path}).ToArray();
    }

    // Use a touch of xUnit.Net magic to be able to kick off and
    // run any Storyteller specification through xUnit
    [Theory]
    [MemberData(nameof(GetFiles))]
    public void run_specification(string path)
    {
        var results = SpecRunner.Run(path);
        if (!results.Counts.WasSuccessful())
        {
            SpecRunner.OpenResultsInBrowser();
            throw new Exception(results.Counts.ToString());
        }
    }
}

And that’s that. Something I’ve wanted to have for ages and failed to build, done in 15 minutes because I happened to remember something similar we’d done at work and realized how absurdly easy xUnit.Net made this effort.

Summary

Sometimes it’s worthwhile to take a step back from trying to solve a problem through debugging and invest in better instrumentation or write some automation scripts to make the debugging cycles faster rather than just trying to force your way through the solution
Inspiration happens at random times
Listen to that niggling voice in your head sometimes that’s telling you that you should be doing things differently in your code or tests
Using a modular architecture that’s composed by an IoC container the way that ASP.net Core does can sometimes be advantageous in integration testing scenarios. Case in point is how easy it was for me to toss in an all new logging provider that captured the log information directly into the test results for easier test failure resolution

And when I get around to it, these little Storyteller improvements will end up in Storyteller itself.

Subcutaneous Testing against React + .Net Applications

Everything in this post is from a proof of concept project we did for the technique described here. We have not used this tooling on a real project yet, but we have a team starting a project where this might be useful, so I promised a write up for them.

In my previous post I laid out how I see the testing pyramid and test tool and technique choices against my company’s typical web application technology stack. As a reminder, our recommended stack for new development on web applications or API’s looks like this (plus a backing database):

Last week I talked through how we might test the React components and Redux store setup, including the interaction between Redux and React. I also talked about how we could go about testing the .Net backend both at a unit level and through integration tests through to the backing database. Lastly, I said we’d use a modicum of end to end, Selenium-based tests, but said that we should avoid depending on too many of those kinds of tests. That leaves us with a pretty big hole in coverage against the interaction between the Javascript code running in the browser and the .Net code and database interactions running server side.

As a possible solution for this gap, my team at work did a proof of concept for using Storyteller to do subcutaneous testing against the full application stack, but minus the actual React component “view layer.” The general idea is to use Storyteller with its Storyteller.Redux extension to host the ASP.Net Core application so that it can easily drive both test data input through the real data layer of the .Net code and then turn around and use the real system services to verify the state of the application and the backing database as the “assert” stage of the tests. The basic value proposition here is that this mechanism could be far more efficient in terms of developer time against its benefits compared to end to end, Selenium based testing. We’re also theorizing that the feedback cycles would be much tighter through faster tests and definitely more reliable tests than the equivalent tests against the browser every could be.

A couple things to note or argue:

This technique would be most useful if your React components are generally dumb and only communicate with the external world by dispatching well defined actions to the Redux store (I’m assuming that you’re utilizing Redux middleware like redux-thunk or redux-saga here).
Why Storyteller as the driver for this instead of another test runner? I’m obviously biased, but I think Storyteller has the very best story in test automation tooling for declarative set up and verification of system state. Plus, unlike any of the xUnit tools I’m aware of, Storyteller is built specifically with integration testing in mind (think configurable retries, bailing out on runaway tests, better control over the lifecycle of the test harness)
Storyteller has support for declarative assertions against a JSON document that should be handy for making assertions against the Redux store state
We’re theorizing that it’ll be vastly easier to make assertions against the Redux store state than it would to hunt down DOM elements with Selenium
The Storyteller.Redux extension subscribes to any changes to the store state and exposes that to the Storyteller test engine. The big win here is that it gives you a single mechanism to handle the dreaded “everything is asynchronous so how does the test harness know when it’s time to check the expected outcomes” problem that makes Selenium testing so dad gum hard in the real world.
The Storyteller.Redux extension can capture any logged messages to console.log or console.error in the running browser. Add that to any server side logging that you can also pipe into the Storyteller results

The general topology in these tests would look like this:

The test harness would consist of:

A Storyteller project that bootstraps the ASP.Net Core application and runs it within the Storyteller test engine. You can use the Storyteller.AspNetCore extension to make that easier (or you could after I update it for ASP.Net Core 2 and its breaking changes).
The Storyteller.Redux extension for Storyteller provides the Websockets glue to communicate between the launched browser with your Redux store and the running Storyteller engine
The Storyteller ISystem in this project has to have some way to launch a web browser to the page that hosts the Javascript bundle. In the proof of concept project, I just built out a static HTML page that included the bundle Javascript and directly launched the browser to the file location, but you could always use Selenium just to open the brower and navigate to the right Url.
Storyteller Fixtures for setting up system state for tests, sending Redux actions directly to the running Redux store to simulate user interactions, asserting on the expected system state on the backend, and checking the expected Redux store state
An alternative Javascript bundle that includes all the reducer and middleware code in your application, along with some “special sauce” code shown in a section down below that enables Storyteller to send messages and retrieve the current state of the running Redux store via Websockets.

The Special Sauce in the Javascript Bundle

Your custom bundle for the subcutaneous testing would need to have this code in its Webpack entry point file (the full file is on GitHub here):

// "store" is your configured Redux store object. 
// "transformState" is just a hook to convert your Redux
// store state to something that Storyteller could consume
function ReduxHarness(store, transformState){
    if (!transformState){
        transformState = s => s;
    }

    function getQueryVariable(variable)
    {
       var query = window.location.search.substring(1);
       var vars = query.split("&");
       for (var i=0;i<vars.length;i++) {                var pair = vars[i].split("=");                if(pair[0] == variable){return pair[1];}        }        return(false);     }     var revision = 1;     var port = getQueryVariable('StorytellerPort');     var wsAddress = "ws://127.0.0.1:5250";     var socket = new WebSocket(wsAddress); 	socket.onclose = function(){ 		console.log('The socket closed'); 	}; 	socket.onerror = function(evt){ 		console.error(JSON.stringify(evt)); 	}     socket.onmessage = function(evt){         if (evt.data == 'REFRESH'){             window.location.reload();             return;         }         if (evt.data == 'CLOSE'){             window.close();             return;         } 		var message = JSON.parse(evt.data); 		console.log('Got: ' + JSON.stringify(message) + ' with topic ' + message.type); 	 		store.dispatch(message); 	};     store.subscribe(() => {
        var state = store.getState();

        revision = revision + 1;
        var message = {
            type: 'redux-state',
            revision: revision,
            state: transformState(state)
        }

		if (socket.readyState == 1){
            var json = JSON.stringify(message);
            console.log('Sending to engine: ' + json);
			socket.send(json);
		}
    });

    // Capturing any kind of client side logging
    // and piping that into the Storyteller test results
    var originalLog = console.log;
    console.log = function(msg){
        originalLog(msg);

        var message = {
            type: 'console.log',
            text: msg
        }

        var json = JSON.stringify(message);
        socket.send(json);
    }

    // Capture any logged errors in the JS code
    // and pipe that into the Storyteller results
    var originalError = console.error;
    console.error = function(e){
        originalError(e);

        var message = {
            type: 'console.error',
            error: e
        }

        var json = JSON.stringify(message);
        socket.send(json);
    }
}


ReduxHarness(store, s => s.toJS())

The Storyteller System

In my proof of concept, I connected Storyteller to the Redux testing bundle like this (the real code is here):

    public class Program
    {
        public static void Main(string[] args)
        {
            StorytellerAgent.Run(args, new ReduxSampleSystem());
        }
    }

    public class ReduxSampleSystem : SimpleSystem
    {
        protected override void configureCellHandling(CellHandling handling)
        {
            // The code below is just to generate the static file I'm 
            // using to host the reducer + websockets code
            var directory = AppContext.BaseDirectory;
            while (Path.GetFileName(directory) != "ReduxSamples")
            {
                directory = directory.ParentDirectory();
            }

            var jsFile = directory.AppendPath("reduxharness.js");
            Console.WriteLine("Copying the reduxharness.js file to " + directory);
            var source = directory.AppendPath("..", "StorytellerRunner", "reduxharness.js");


            File.Copy(source, jsFile, true);

            var harnessPath = directory.AppendPath("harness.htm");
            if (!File.Exists(harnessPath))
            {
                var doc = new HtmlDocument();

                var href = "file://" + jsFile;

                doc.Head.Add("script").Attr("src", href);

                Console.WriteLine("Writing the harness file to " + harnessPath);
                doc.WriteToFile(harnessPath);
            }

            var url = "file://" + harnessPath;

            // Add the ReduxSagaExtension and point it at your view
            handling.Extensions.Add(new ReduxSagaExtension(url));
        }
    }

The static HTML file generation above isn’t mandatory. You *could* do that by running the real page from the instance of the application hosted within Storyteller as long as the ReduxHarness function shown above is applied to your Redux store at some point.

Storyteller Fixtures that Drive or Check the Redux Store

For driving and checking the Redux store, we created a helper class called ReduxFixture that enables you to do simple actions and value checks in a declarative way as shown below:

    public class CalculatorFixture : ReduxFixture
    {
        // There's a little bit of magic here. This would send a JSON action
        // to the Redux store like {"type": "multiply", "operand": "5"}
        [SendJson("multiply")]
        public void Multiply(int operand)
        {

        }

        // Does an assertion against a single value within the current state
        // of the redux store using a JSONPath expression
        public IGrammar CheckValue()
        {
            return CheckJsonValue("$.number", "The current number should be {number}");
        }

    }

You can of course skip the built in helpers and send JSON actions directly to the running browser or write your own assertions against the current state of the Redux store. There’s also some built in functionality in the ReduxFixture class to track Redux store revisions and to wait for any change to the Redux store before performing assertions.