Automating Integration Tests using the “Critter Stack”

This builds on the previous blog posts in this list:

Integration Testing, but How?

Some time over the holidays Jim Shore released an updated version of his excellent paper Testing Without Mocks: A Pattern Language. He also posted this truly massive thread with some provocative opinions about test automation strategies:

I think it’s a great thread over all, and the paper is chock full of provocative thoughts about designing for testability. Moreover, some of the older content in that paper is influencing the direction of my own work with Wolverine. I’ve also made it recommended reading for the developers in my own company.

All that being said, I strongly disagree with approach the approach he describes for integration testing with “nullable infrastructure” and eschewing DI/IoC for composition in favor of just willy nilly hard coding things because “DI us scary” or whatever. My strong preference and also where I’ve had the most success is to purposely choose to rely on development technologies that lend themselves to low friction, reliable, and productive integration testing.

And as it just so happens, the “critter stack” tools (Marten and Wolverine) that I work on are purposely designed for testability and include several features specifically to make integration testing more effective for applications using these tools.

Integration Testing with the Critter Stack

From my previous blog posts linked up above, I’ve been showing a very simplistic banking system to demonstrate the usage of Wolverine with Marten. For a testing scenario, let’s go back to part of this message handler for a WithdrawFromAccount message that will effect changes on an Account document entity and potentially send out other messages to perform other actions:

    [Transactional] 
    public static async Task Handle(
        WithdrawFromAccount command, 
        Account account, 
        IDocumentSession session, 
        IMessageContext messaging)
    {
        account.Balance -= command.Amount;
     
        // This just marks the account as changed, but
        // doesn't actually commit changes to the database
        // yet. That actually matters as I hopefully explain
        session.Store(account);
 
        // Conditionally trigger other, cascading messages
        if (account.Balance > 0 && account.Balance < account.MinimumThreshold)
        {
            await messaging.SendAsync(new LowBalanceDetected(account.Id));
        }
        else if (account.Balance < 0)
        {
            await messaging.SendAsync(new AccountOverdrawn(account.Id), new DeliveryOptions{DeliverWithin = 1.Hours()});
         
            // Give the customer 10 days to deal with the overdrawn account
            await messaging.ScheduleAsync(new EnforceAccountOverdrawnDeadline(account.Id), 10.Days());
        }
        
        // "messaging" is a Wolverine IMessageContext or IMessageBus service 
        // Do the deliver within rule on individual messages
        await messaging.SendAsync(new AccountUpdated(account.Id, account.Balance),
            new DeliveryOptions { DeliverWithin = 5.Seconds() });
    }

For a little more context, I’ve set up a Minimal API endpoint to delegate to this command like so:

// One Minimal API endpoint that just delegates directly to Wolverine
app.MapPost("/accounts/withdraw", (WithdrawFromAccount command, IMessageBus bus) => bus.InvokeAsync(command));

In the end here, I want a set of integration tests that works through the /accounts/withdraw endpoint, through all ASP.NET Core middleware, all configured Wolverine middleware or policies that wrap around that handler above, and verifies the expected state changes in the underlying Marten Postgresql database as well as any messages that I would expect to go out. And oh, yeah, I’d like those tests to be completely deterministic.

First, a Shared Test Harness

I’m starting to be interested in moving back to NUnit for the first time in years strictly for integration testing because I’m starting to suspect it would give you more control over the test fixture lifecycle in ways that are frequently valuable in integration testing.

Now, before writing the actual tests, I’m going to build an integration test harness for this system. I prefer to use xUnit.Net these days as my test runner, so we’re going to start with building what will be a shared fixture to run our application within integration tests. To be able to test through HTTP endpoints, I’m also going to add another JasperFx project named Alba to the testing project (See Alba for Effective ASP.Net Core Integration Testing for more information):

public class AppFixture : IAsyncLifetime
{
    public async Task InitializeAsync()
    {
        // Workaround for Oakton with WebApplicationBuilder
        // lifecycle issues. Doesn't matter to you w/o Oakton
        OaktonEnvironment.AutoStartHost = true;
        
        // This is bootstrapping the actual application using
        // its implied Program.Main() set up
        Host = await AlbaHost.For<Program>(x =>
        {
            // I'm overriding 
            x.ConfigureServices(services =>
            {
                // Let's just take any pesky message brokers out of
                // our integration tests for now so we can work in
                // isolation
                services.DisableAllExternalWolverineTransports();
                
                // Just putting in some baseline data for our database
                // There's usually *some* sort of reference data in 
                // enterprise-y systems
                services.InitializeMartenWith<InitialAccountData>();
            });
        });
    }

    public IAlbaHost Host { get; private set; }

    public Task DisposeAsync()
    {
        return Host.DisposeAsync().AsTask();
    }
}

There’s a bit to unpack in that class above, so let’s start:

  • A .NET IHost can be expensive to set up in memory, so in any kind of sizable system I will try to share one single instance of that between integration tests.
  • The AlbaHost mechanism is using WebApplicationFactory to bootstrap our application. This mechanism allows us to make some modifications to the application’s normal bootstrapping for test specific setup, and I’m exploiting that here.
  • The `DisableAllExternalWolverineTransports()` method is a built in extension method in Wolverine that will disable all external sending or listening to external transport options like Rabbit MQ. That’s not to say that Rabbit MQ itself is necessarily impossible to use within automated tests — and Wolverine even comes with some help for that in testing as well — but it’s certainly easier to create our tests without having to worry about messages coming and going from outside. Don’t worry though, because we’ll still be able to verify the messages that should be sent out later.
  • I’m using Marten’s “initial data” functionality that’s a way of establishing baseline data (reference data usually, but for testing you may include a baseline set of test user data maybe). For more context, `InitialAccountData` is shown below:
public class InitialAccountData : IInitialData
{
    public static Guid Account1 = Guid.NewGuid();
    public static Guid Account2 = Guid.NewGuid();
    public static Guid Account3 = Guid.NewGuid();
    
    public Task Populate(IDocumentStore store, CancellationToken cancellation)
    {
        return store.BulkInsertAsync(accounts().ToArray());
    }

    private IEnumerable<Account> accounts()
    {
        yield return new Account
        {
            Id = Account1,
            Balance = 1000,
            MinimumThreshold = 500
        };
        
        yield return new Account
        {
            Id = Account2,
            Balance = 1200
        };

        yield return new Account
        {
            Id = Account3,
            Balance = 2500,
            MinimumThreshold = 100
        };
    }
}

Next, just a little more xUnit.Net overhead. To make a shared fixture across multiple test classes with xUnit.Net, I add this little marker class:

[CollectionDefinition("integration")]
public class ScenarioCollection : ICollectionFixture<AppFixture>
{
    
}

I have to look this up every single time I use this functionality.

For integration testing, I like to a have a slim base class that I tend to quite originally call “IntegrationContext” like this one:

public abstract class IntegrationContext : IAsyncLifetime
{
    public IntegrationContext(AppFixture fixture)
    {
        Host = fixture.Host;
        Store = Host.Services.GetRequiredService<IDocumentStore>();
    }
    
    public IAlbaHost Host { get; }
    public IDocumentStore Store { get; }
    
    public async Task InitializeAsync()
    {
        // Using Marten, wipe out all data and reset the state
        // back to exactly what we described in InitialAccountData
        await Store.Advanced.ResetAllData();
    }

    // This is required because of the IAsyncLifetime 
    // interface. Note that I do *not* tear down database
    // state after the test. That's purposeful
    public Task DisposeAsync()
    {
        return Task.CompletedTask;
    }
}

Other than simply connecting real test fixtures to the ASP.Net Core system under test (the IAlbaHost), this IntegrationContext utilizes another bit of Marten functionality to completely reset the database state back to only the data defined by the InitialAccountData so that we always have known data in the database before tests execute.

By and large, I find NoSQL databases to be more easily usable in automated testing than purely relational databases because it’s generally easier to tear down and rebuild databases with NoSQL. When I’m having to use a relational database in tests, I opt for Jimmy Bogard’s Respawn library to do the same kind of reset, but it’s substantially more work to use than Marten’s built in functionality.

In the case of Marten, we very purposely designed in the ability to reset the database state for integration testing scenarios from the very beginning. Add this functionality to the easy ability to run the underlying Postgresql database in a local Docker container for isolated testing, and I’ll claim that Marten is very usable within test automation scenarios with no real need to try to stub out the database or use some kind of low fidelity fake in memory database in testing.

See My Opinions on Data Setup for Functional Tests for more explanation of why I’m doing the database state reset before all tests, but never immediately afterward. And also why I think it’s important to place test data setup directly into tests rather than trying to rely on any kind of external, expected data set (when possible).

From my first pass at writing the sample test that’s coming in the next section, I discovered the need for one more helper method on IntegrationContext to make HTTP calls to the system while also tracking background Wolverine activity as shown below:

    // This method allows us to make HTTP calls into our system
    // in memory with Alba, but do so within Wolverine's test support
    // for message tracking to both record outgoing messages and to ensure
    // that any cascaded work spawned by the initial command is completed
    // before passing control back to the calling test
    protected async Task<(ITrackedSession, IScenarioResult)> TrackedHttpCall(Action<Scenario> configuration)
    {
        IScenarioResult result = null;
        
        // The outer part is tying into Wolverine's test support
        // to "wait" for all detected message activity to complete
        var tracked = await Host.ExecuteAndWaitAsync(async () =>
        {
            // The inner part here is actually making an HTTP request
            // to the system under test with Alba
            result = await Host.Scenario(configuration);
        });

        return (tracked, result);
    }

The method above gives me access to the complete history of Wolverine messages during the activity including all outgoing messages spawned by the HTTP call. It also delegates to Alba to run HTTP requests in memory and gives me access to the Alba wrapped response for easy interrogation of the response later (which I don’t need in the following test, but would frequently in other tests).

See Test Automation Support from the Wolverine documentation for more information on the integration testing support baked into Wolverine.

Writing the first integration test

The first “happy path” test that verifies that calling the web service through to the Wolverine message handler for withdrawing from an account without going into any kind of low balance conditions might look like this:

public class when_debiting_an_account : IntegrationContext
{
    public when_debiting_an_account(AppFixture fixture) : base(fixture)
    {
    }

    [Fact]
    public async Task should_increase_the_account_balance_happy_path()
    {
        // Drive in a known data, so the "Arrange"
        var account = new Account
        {
            Balance = 2500,
            MinimumThreshold = 200
        };

        await using (var session = Store.LightweightSession())
        {
            session.Store(account);
            await session.SaveChangesAsync();
        }

        // The "Act" part of the test.
        var (tracked, _) = await TrackedHttpCall(x =>
        {
            // Send a JSON post with the DebitAccount command through the HTTP endpoint
            // BUT, it's all running in process
            x.Post.Json(new WithdrawFromAccount(account.Id, 1300)).ToUrl("/accounts/debit");

            // This is the default behavior anyway, but still good to show it here
            x.StatusCodeShouldBeOk();
        });
        
        // Finally, let's do the "assert"
        await using (var session = Store.LightweightSession())
        {
            // Load the newly persisted copy of the data from Marten
            var persisted = await session.LoadAsync<Account>(account.Id);
            persisted.Balance.ShouldBe(1300); // Started with 2500, debited 1200
        }

        // And also assert that an AccountUpdated message was published as well
        var updated = tracked.Sent.SingleMessage<AccountUpdated>();
        updated.AccountId.ShouldBe(account.Id);
        updated.Balance.ShouldBe(1300);

    }
}

The test above follows the basic “arrange, act, assert” model. In order, the test:

  1. Writes a brand new Account document to the Marten database
  2. Makes an HTTP call to the system to POST a WithdrawFromAccount command to our system using our TrackedHttpCall method that also tracks Wolverine activity during the HTTP call
  3. Verify that the Account data was changed in the database the way we expected
  4. Verify that an expected outgoing message was published as part of the activity

It was a lot of initial set up to get to the point where we could write tests, but I’m going to argue in the next section that we’ve done a lot to reduce the friction in writing additional integration tests for our system in a reliable way.

Avoiding the Selenium as Golden Hammer Anti-Pattern

Playwright or Cypress.io may prove to be better options than Selenium over time (I’m bullish on Playwright myself), but the main point is really that only depending on end to end tests through the browser can easily be problematic and inefficient.

Before I go back to defending why I think the testing approach and tooling shown in this post is very effective, let’s build up an all too real strawman of inefficient and maybe even ineffective test automation:

  • All your integration tests are blackbox, end to end tests that use Selenium to drive a web browser
  • These tests can only be executed externally to the application when the application is deployed to a development or testing environment. In the worst case scenario — which is also unfortunately common — the Selenium tests cannot be easily executed locally on demand
  • The tests are prone to failures due to UI changes
  • The tests are prone to intermittent “blinking” failures due to asynchronous behavior in the UI where test assertions happen before actions are completed in the application. This is a source of major friction and poor results in large scale Selenium testing that has been endemic in every single shop or project where I’ve used or seen Selenium used over the past decade — including in my current role.
  • The end to end tests are slow compared to finer grained unit tests or smaller whitebox integration tests that do not have to use the browser
  • Test failures are often difficult to diagnose since the tests are running out of process without direct access to the actual application. Some folks try to alleviate this issue with screenshots of the browser or in more advanced usages, trying to correlate the application logs to the test runs
  • Test failures often happen because related test databases are not in the expected state

I’m laying it on pretty thick here, but I think that I’m getting my point across that only relying on Selenium based browser testing is potentially very inefficient and sometimes ineffective. Now, let’s consider how the “critter stack” tools and the testing approach I used up above solve some of the issues I raised just above:

  • Postgresql itself is very easy to run in Docker containers or if you have to, to deploy locally. That makes it friendly for automated testing where you really, really want to have isolated testing infrastructure and avoid sharing any kind of stateful resource between testing processes
  • Marten in particular has built in support for setting up known database states going into automated tests. This is invaluable for integration testing
  • Executing directly against HTTP API endpoints is much faster than browser testing with something like Selenium. Faster executing tests == faster feedback cycles == better development throughput and delivery period
  • Running the tests completely in process with the application such as we did with Alba makes debugging test failures much easier for developers than trying to solve Selenium failures in a CI environment
  • Using the Alba + xUnit.Net (or NUnit etc) approach means that the integration tests can live with the application code and can be executed on demand whenever. That shifts the testing “left” in the development cycle compared to the slower Selenium running on CI only cycle. It also helps developers quickly spot check potential issues.
  • By embedding the integration tests directly in the codebase, you’re much less likely to get the drift between the application itself and automated tests that frequently arises from Selenium centric approaches.
  • This approach makes developers be involved with the test automation efforts. I strongly believe that it’s impossible for large scale test automation to work whatsoever without developer involvement
  • Whitebox tests are simply much more efficient than the blackbox model. This statement is likely to get me yelled at by real testing professionals, but it’s still true

This post took way, way too long to write compared to how I thought it would go. I’m going to make a little bonus followup on using Lamar of all things for other random test state resets.

Leave a comment