Useful Tricks with Lamar for Integration Testing

Earlier this week I started a new blog series on Wolverine & Marten:

Earlier this week I started a new series of blog posts about Wolverine capabilities with:

Today I’m taking a left turn in Albuquerque to talk about how to deal with injecting fake services in integration test scenarios for external service gateways in Wolverine applications using some tricks in the underlying Lamar IoC container — or really just anything that turns out to be difficult to deal with in automated tests.

Since this is a headless service, I’m not too keen on introducing Alba or WebApplicationFactory and all their humongous tail of ASP.Net Core dependencies. Instead, I made a mild change to the Program file of the main application to revert back to the “old” .NET 6 style of bootstrapping instead of the newer, implied Program.Main() style strictly to facilitate integration testing:

public static class Program
{
    public static Task<int> Main(string[] args)
    {
        return CreateHostBuilder().RunOaktonCommands(args);
    }

    // This method is a really easy way to bootstrap the application
    // in testing later
    public static IHostBuilder CreateHostBuilder()
    {
        return Host.CreateDefaultBuilder()
            .UseWolverine((context, opts) =>
            {
                // And a lot of necessary configuration here....
            });
    }
}

Now, I’m going to start a new xUnit.Net project to test the main application (NUnit or MSTest would certainly be viable as well). In the testing project, I want to test the payment ingestion service from the prior blog posts with basically the exact same set up as the main application, with the exception of replacing the service gateway for the “very unreliable 3rd party service” with a stub that we can control at will during testing. That stub could look like this:

// More on this later...
public interface IStatefulStub
{
    void ClearState();
}

public class ThirdPartyServiceStub : IThirdPartyServiceGateway, IStatefulStub
{
    public Dictionary<Guid, LoanInformation> LoanInformation { get; } = new();
    
    public Task<LoanInformation> FindLoanInformationAsync(Guid loanId, CancellationToken cancellation)
    {
        if (LoanInformation.TryGetValue(loanId, out var information))
        {
            return Task.FromResult(information);
        }

        // I suppose you'd throw a more specific exception type, but I'm lazy, so....
        throw new ArgumentOutOfRangeException(nameof(loanId), "Unknown load id");
    }

    public Task PostPaymentScheduleAsync(PaymentSchedule schedule, CancellationToken cancellation)
    {
        PostedSchedules.Add(schedule);
        return Task.CompletedTask;
    }

    public List<PaymentSchedule> PostedSchedules { get; } = new();
    public void ClearState()
    {
        PostedSchedules.Clear();
        LoanInformation.Clear();
    }
}

Now that we have a usable stub for later, let’s build up a test harness for our application. Right off the bat, I’m going to say that we won’t even try to run integration tests in parallel, so I’m going for a shared context that bootstraps the applications IHost:

public class AppFixture : IAsyncLifetime
{
    public async Task InitializeAsync()
    {
        // This is bootstrapping the actual application using
        // its implied Program.Main() set up
        Host = await Program.CreateHostBuilder()
            // This is from Lamar, this will override the service registrations
            // no matter what order registrations are done. This was specifically
            // intended for automated testing scenarios
            .OverrideServices(services =>
            {
                // Override the existing application's registration with a stub
                // for the third party service gateway
                services.AddSingleton<IThirdPartyServiceGateway>(ThirdPartyService);
            }).StartAsync();

    }

    // Just a convenient way to get at this later
    public ThirdPartyServiceStub ThirdPartyService { get; } = new();

    public IHost Host { get; private set; }
 
    public Task DisposeAsync()
    {
        return Host.StopAsync();
    }
}

So a couple comments about the code up above:

  • I’m delegating to the Program.CreateHostBuilder() method from our real application to create an IHostBuilder that is exactly the application itself. I think it’s important to do integration tests as close to the real application as possible so you don’t get false positives or false negatives from some sort of different bootstrapping or configuration of the application.
  • That being said, it’s absolutely going to be a pain in the ass to use the real “unreliable 3rd party service” in integration testing, so it would be very convenient to have a nice, easily controlled stub or “spy” we can use to capture data sent to the 3rd party or to set up responses from the 3rd party service
  • And no, we don’t know if your application actually works end to end if we use the whitebox testing approach, and there is very likely going to be unforeseen issues when we integrate with the real 3rd party service. All that being said, it’s very helpful to first know that our code works exactly the way we intended it to before we tackle fully end to end tests.
  • But if this were a real project, I’d spike the actual 3rd party gateway code ASAP because that’s likely where the major project risk is. In the real life project this was based on, that gateway code was not under my purview at first and I might have gotten myself temporarily banned from the client site after finally snapping at the developer “responsible” for that after about a year of misery. Moving on!
  • Lamar is StructureMap’s descendent, but it’s nowhere near as loosey-goosey flexible about runtime service overrides as StructureMap. That was very purposeful on my part as that led to Lamar having vastly better (1-3 orders of magnitude improvement) performance, and also to reduce my stress level by simplifying the Lamar usage over StructureMap’s endlessly complicated rules for service overrides. Long story short, that requires you to think through in advance a little bit about what services are going to be overridden in tests and to frankly use that sparingly compared to what was easy in StructureMap years ago.

Next, I’ll add the necessary xUnit ICollectionFixture type that I almost always forget to do at first unless I’m copy/pasting code from somewhere else:

[CollectionDefinition("integration")]
public class ScenarioCollection : ICollectionFixture<AppFixture>
{
     
}

Now, I like to have a base class for integration tests that just adds a tiny bit of reusable helpers and lifecycle methods to clean up the system state before all tests:

public abstract class IntegrationContext : IAsyncLifetime
{
    public IntegrationContext(AppFixture fixture)
    {
        Host = fixture.Host;
        Store = Host.Services.GetRequiredService<IDocumentStore>();
        ThirdPartyService = fixture.ThirdPartyService;
    }

    public ThirdPartyServiceStub ThirdPartyService { get; set; }

    public IHost Host { get; }
    public IDocumentStore Store { get; }

    async Task IAsyncLifetime.InitializeAsync()
    {
        // Using Marten, wipe out all data and reset the state
        // back to exactly what we described in InitialAccountData
        await Store.Advanced.ResetAllData();
        
        // Clear out all the stateful stub state too!
        // First, I'm getting at the broader Lamar service
        // signature to do Lamar-specific things...
        var container = (IContainer)Host.Services;

        // Find every possible service that's registered in Lamar that implements
        // the IStatefulStub interface, resolve them, and loop though them 
        // like so
        foreach (var stub in container.Model.GetAllPossible<IStatefulStub>())
        {
            stub.ClearState();
        }
    }
 
    // This is required because of the IAsyncLifetime 
    // interface. Note that I do *not* tear down database
    // state after the test. That's purposeful
    public Task DisposeAsync()
    {
        return Task.CompletedTask;
    }

}

And now, some comments about that bit of code. You generally want a clean slate of system state going into each test, and our stub for the 3rd party system is stateful, so we’d want to clear it out between tests to keep from polluting the next test. That what the `IStatefulStub` interface and the calls to GetAllPossible() is helping us do with the Lamar container. If the system grows and we use more stubs, we can use that mechanism to have a one stop shop to clear out any stateful objects in the container between tests.

Lastly, here’s a taste of how the full test harness might be used:

public class ASampleTestHarness : IntegrationContext
{
    public ASampleTestHarness(AppFixture fixture) : base(fixture)
    {
    }

    [Fact]
    public async Task how_the_test_might_work()
    {
        // Do the Arrange and Act part of the tests....
        await Host.InvokeMessageAndWaitAsync(new PaymentValidated(new Payment()));

        // Our test *should* have posted a single payment schedule
        // within the larger workflow, and this will blow up if there's
        // none or many
        var schedule = ThirdPartyService.PostedSchedules.Single();
        
        // Write assertions against the expected data for the schedule maybe?
    }
}

The InvokeMessageAndWaitAsync() is baked into Wolverine’s test automation support.

Summary and next time…

I don’t like piecing together special application bootstrapping in the test automation projects, as that tends to drift apart from the actual application over time. Instead, I’d rather use the application’s own bootstrapping — in this case how it builds up an IHostBuilder — then apply some limited number of testing overrides.

Lamar has a couple helpers for test automation, including the OverrideServices() method and the GetAllPossible() helper that can be useful for clearing out state between tests in stubs or caches or who knows what else in a systematic way.

So far I’ve probably mostly blogged about things that Wolverine does that other tools like NServiceBus, MassTransit, or MediatR do as well. Next time out, I want to go completely off road where those tools can’t follow and into Wolverine’s “compound handler” strategy for maximum testability using Jim Shore’s A-Frame Architecture approach.

Advertisement

Automating Integration Tests using the “Critter Stack”

This builds on the previous blog posts in this list:

Integration Testing, but How?

Some time over the holidays Jim Shore released an updated version of his excellent paper Testing Without Mocks: A Pattern Language. He also posted this truly massive thread with some provocative opinions about test automation strategies:

I think it’s a great thread over all, and the paper is chock full of provocative thoughts about designing for testability. Moreover, some of the older content in that paper is influencing the direction of my own work with Wolverine. I’ve also made it recommended reading for the developers in my own company.

All that being said, I strongly disagree with approach the approach he describes for integration testing with “nullable infrastructure” and eschewing DI/IoC for composition in favor of just willy nilly hard coding things because “DI us scary” or whatever. My strong preference and also where I’ve had the most success is to purposely choose to rely on development technologies that lend themselves to low friction, reliable, and productive integration testing.

And as it just so happens, the “critter stack” tools (Marten and Wolverine) that I work on are purposely designed for testability and include several features specifically to make integration testing more effective for applications using these tools.

Integration Testing with the Critter Stack

From my previous blog posts linked up above, I’ve been showing a very simplistic banking system to demonstrate the usage of Wolverine with Marten. For a testing scenario, let’s go back to part of this message handler for a WithdrawFromAccount message that will effect changes on an Account document entity and potentially send out other messages to perform other actions:

    [Transactional] 
    public static async Task Handle(
        WithdrawFromAccount command, 
        Account account, 
        IDocumentSession session, 
        IMessageContext messaging)
    {
        account.Balance -= command.Amount;
     
        // This just marks the account as changed, but
        // doesn't actually commit changes to the database
        // yet. That actually matters as I hopefully explain
        session.Store(account);
 
        // Conditionally trigger other, cascading messages
        if (account.Balance > 0 && account.Balance < account.MinimumThreshold)
        {
            await messaging.SendAsync(new LowBalanceDetected(account.Id));
        }
        else if (account.Balance < 0)
        {
            await messaging.SendAsync(new AccountOverdrawn(account.Id), new DeliveryOptions{DeliverWithin = 1.Hours()});
         
            // Give the customer 10 days to deal with the overdrawn account
            await messaging.ScheduleAsync(new EnforceAccountOverdrawnDeadline(account.Id), 10.Days());
        }
        
        // "messaging" is a Wolverine IMessageContext or IMessageBus service 
        // Do the deliver within rule on individual messages
        await messaging.SendAsync(new AccountUpdated(account.Id, account.Balance),
            new DeliveryOptions { DeliverWithin = 5.Seconds() });
    }

For a little more context, I’ve set up a Minimal API endpoint to delegate to this command like so:

// One Minimal API endpoint that just delegates directly to Wolverine
app.MapPost("/accounts/withdraw", (WithdrawFromAccount command, IMessageBus bus) => bus.InvokeAsync(command));

In the end here, I want a set of integration tests that works through the /accounts/withdraw endpoint, through all ASP.NET Core middleware, all configured Wolverine middleware or policies that wrap around that handler above, and verifies the expected state changes in the underlying Marten Postgresql database as well as any messages that I would expect to go out. And oh, yeah, I’d like those tests to be completely deterministic.

First, a Shared Test Harness

I’m starting to be interested in moving back to NUnit for the first time in years strictly for integration testing because I’m starting to suspect it would give you more control over the test fixture lifecycle in ways that are frequently valuable in integration testing.

Now, before writing the actual tests, I’m going to build an integration test harness for this system. I prefer to use xUnit.Net these days as my test runner, so we’re going to start with building what will be a shared fixture to run our application within integration tests. To be able to test through HTTP endpoints, I’m also going to add another JasperFx project named Alba to the testing project (See Alba for Effective ASP.Net Core Integration Testing for more information):

public class AppFixture : IAsyncLifetime
{
    public async Task InitializeAsync()
    {
        // Workaround for Oakton with WebApplicationBuilder
        // lifecycle issues. Doesn't matter to you w/o Oakton
        OaktonEnvironment.AutoStartHost = true;
        
        // This is bootstrapping the actual application using
        // its implied Program.Main() set up
        Host = await AlbaHost.For<Program>(x =>
        {
            // I'm overriding 
            x.ConfigureServices(services =>
            {
                // Let's just take any pesky message brokers out of
                // our integration tests for now so we can work in
                // isolation
                services.DisableAllExternalWolverineTransports();
                
                // Just putting in some baseline data for our database
                // There's usually *some* sort of reference data in 
                // enterprise-y systems
                services.InitializeMartenWith<InitialAccountData>();
            });
        });
    }

    public IAlbaHost Host { get; private set; }

    public Task DisposeAsync()
    {
        return Host.DisposeAsync().AsTask();
    }
}

There’s a bit to unpack in that class above, so let’s start:

  • A .NET IHost can be expensive to set up in memory, so in any kind of sizable system I will try to share one single instance of that between integration tests.
  • The AlbaHost mechanism is using WebApplicationFactory to bootstrap our application. This mechanism allows us to make some modifications to the application’s normal bootstrapping for test specific setup, and I’m exploiting that here.
  • The `DisableAllExternalWolverineTransports()` method is a built in extension method in Wolverine that will disable all external sending or listening to external transport options like Rabbit MQ. That’s not to say that Rabbit MQ itself is necessarily impossible to use within automated tests — and Wolverine even comes with some help for that in testing as well — but it’s certainly easier to create our tests without having to worry about messages coming and going from outside. Don’t worry though, because we’ll still be able to verify the messages that should be sent out later.
  • I’m using Marten’s “initial data” functionality that’s a way of establishing baseline data (reference data usually, but for testing you may include a baseline set of test user data maybe). For more context, `InitialAccountData` is shown below:
public class InitialAccountData : IInitialData
{
    public static Guid Account1 = Guid.NewGuid();
    public static Guid Account2 = Guid.NewGuid();
    public static Guid Account3 = Guid.NewGuid();
    
    public Task Populate(IDocumentStore store, CancellationToken cancellation)
    {
        return store.BulkInsertAsync(accounts().ToArray());
    }

    private IEnumerable<Account> accounts()
    {
        yield return new Account
        {
            Id = Account1,
            Balance = 1000,
            MinimumThreshold = 500
        };
        
        yield return new Account
        {
            Id = Account2,
            Balance = 1200
        };

        yield return new Account
        {
            Id = Account3,
            Balance = 2500,
            MinimumThreshold = 100
        };
    }
}

Next, just a little more xUnit.Net overhead. To make a shared fixture across multiple test classes with xUnit.Net, I add this little marker class:

[CollectionDefinition("integration")]
public class ScenarioCollection : ICollectionFixture<AppFixture>
{
    
}

I have to look this up every single time I use this functionality.

For integration testing, I like to a have a slim base class that I tend to quite originally call “IntegrationContext” like this one:

public abstract class IntegrationContext : IAsyncLifetime
{
    public IntegrationContext(AppFixture fixture)
    {
        Host = fixture.Host;
        Store = Host.Services.GetRequiredService<IDocumentStore>();
    }
    
    public IAlbaHost Host { get; }
    public IDocumentStore Store { get; }
    
    public async Task InitializeAsync()
    {
        // Using Marten, wipe out all data and reset the state
        // back to exactly what we described in InitialAccountData
        await Store.Advanced.ResetAllData();
    }

    // This is required because of the IAsyncLifetime 
    // interface. Note that I do *not* tear down database
    // state after the test. That's purposeful
    public Task DisposeAsync()
    {
        return Task.CompletedTask;
    }
}

Other than simply connecting real test fixtures to the ASP.Net Core system under test (the IAlbaHost), this IntegrationContext utilizes another bit of Marten functionality to completely reset the database state back to only the data defined by the InitialAccountData so that we always have known data in the database before tests execute.

By and large, I find NoSQL databases to be more easily usable in automated testing than purely relational databases because it’s generally easier to tear down and rebuild databases with NoSQL. When I’m having to use a relational database in tests, I opt for Jimmy Bogard’s Respawn library to do the same kind of reset, but it’s substantially more work to use than Marten’s built in functionality.

In the case of Marten, we very purposely designed in the ability to reset the database state for integration testing scenarios from the very beginning. Add this functionality to the easy ability to run the underlying Postgresql database in a local Docker container for isolated testing, and I’ll claim that Marten is very usable within test automation scenarios with no real need to try to stub out the database or use some kind of low fidelity fake in memory database in testing.

See My Opinions on Data Setup for Functional Tests for more explanation of why I’m doing the database state reset before all tests, but never immediately afterward. And also why I think it’s important to place test data setup directly into tests rather than trying to rely on any kind of external, expected data set (when possible).

From my first pass at writing the sample test that’s coming in the next section, I discovered the need for one more helper method on IntegrationContext to make HTTP calls to the system while also tracking background Wolverine activity as shown below:

    // This method allows us to make HTTP calls into our system
    // in memory with Alba, but do so within Wolverine's test support
    // for message tracking to both record outgoing messages and to ensure
    // that any cascaded work spawned by the initial command is completed
    // before passing control back to the calling test
    protected async Task<(ITrackedSession, IScenarioResult)> TrackedHttpCall(Action<Scenario> configuration)
    {
        IScenarioResult result = null;
        
        // The outer part is tying into Wolverine's test support
        // to "wait" for all detected message activity to complete
        var tracked = await Host.ExecuteAndWaitAsync(async () =>
        {
            // The inner part here is actually making an HTTP request
            // to the system under test with Alba
            result = await Host.Scenario(configuration);
        });

        return (tracked, result);
    }

The method above gives me access to the complete history of Wolverine messages during the activity including all outgoing messages spawned by the HTTP call. It also delegates to Alba to run HTTP requests in memory and gives me access to the Alba wrapped response for easy interrogation of the response later (which I don’t need in the following test, but would frequently in other tests).

See Test Automation Support from the Wolverine documentation for more information on the integration testing support baked into Wolverine.

Writing the first integration test

The first “happy path” test that verifies that calling the web service through to the Wolverine message handler for withdrawing from an account without going into any kind of low balance conditions might look like this:

public class when_debiting_an_account : IntegrationContext
{
    public when_debiting_an_account(AppFixture fixture) : base(fixture)
    {
    }

    [Fact]
    public async Task should_increase_the_account_balance_happy_path()
    {
        // Drive in a known data, so the "Arrange"
        var account = new Account
        {
            Balance = 2500,
            MinimumThreshold = 200
        };

        await using (var session = Store.LightweightSession())
        {
            session.Store(account);
            await session.SaveChangesAsync();
        }

        // The "Act" part of the test.
        var (tracked, _) = await TrackedHttpCall(x =>
        {
            // Send a JSON post with the DebitAccount command through the HTTP endpoint
            // BUT, it's all running in process
            x.Post.Json(new WithdrawFromAccount(account.Id, 1300)).ToUrl("/accounts/debit");

            // This is the default behavior anyway, but still good to show it here
            x.StatusCodeShouldBeOk();
        });
        
        // Finally, let's do the "assert"
        await using (var session = Store.LightweightSession())
        {
            // Load the newly persisted copy of the data from Marten
            var persisted = await session.LoadAsync<Account>(account.Id);
            persisted.Balance.ShouldBe(1300); // Started with 2500, debited 1200
        }

        // And also assert that an AccountUpdated message was published as well
        var updated = tracked.Sent.SingleMessage<AccountUpdated>();
        updated.AccountId.ShouldBe(account.Id);
        updated.Balance.ShouldBe(1300);

    }
}

The test above follows the basic “arrange, act, assert” model. In order, the test:

  1. Writes a brand new Account document to the Marten database
  2. Makes an HTTP call to the system to POST a WithdrawFromAccount command to our system using our TrackedHttpCall method that also tracks Wolverine activity during the HTTP call
  3. Verify that the Account data was changed in the database the way we expected
  4. Verify that an expected outgoing message was published as part of the activity

It was a lot of initial set up to get to the point where we could write tests, but I’m going to argue in the next section that we’ve done a lot to reduce the friction in writing additional integration tests for our system in a reliable way.

Avoiding the Selenium as Golden Hammer Anti-Pattern

Playwright or Cypress.io may prove to be better options than Selenium over time (I’m bullish on Playwright myself), but the main point is really that only depending on end to end tests through the browser can easily be problematic and inefficient.

Before I go back to defending why I think the testing approach and tooling shown in this post is very effective, let’s build up an all too real strawman of inefficient and maybe even ineffective test automation:

  • All your integration tests are blackbox, end to end tests that use Selenium to drive a web browser
  • These tests can only be executed externally to the application when the application is deployed to a development or testing environment. In the worst case scenario — which is also unfortunately common — the Selenium tests cannot be easily executed locally on demand
  • The tests are prone to failures due to UI changes
  • The tests are prone to intermittent “blinking” failures due to asynchronous behavior in the UI where test assertions happen before actions are completed in the application. This is a source of major friction and poor results in large scale Selenium testing that has been endemic in every single shop or project where I’ve used or seen Selenium used over the past decade — including in my current role.
  • The end to end tests are slow compared to finer grained unit tests or smaller whitebox integration tests that do not have to use the browser
  • Test failures are often difficult to diagnose since the tests are running out of process without direct access to the actual application. Some folks try to alleviate this issue with screenshots of the browser or in more advanced usages, trying to correlate the application logs to the test runs
  • Test failures often happen because related test databases are not in the expected state

I’m laying it on pretty thick here, but I think that I’m getting my point across that only relying on Selenium based browser testing is potentially very inefficient and sometimes ineffective. Now, let’s consider how the “critter stack” tools and the testing approach I used up above solve some of the issues I raised just above:

  • Postgresql itself is very easy to run in Docker containers or if you have to, to deploy locally. That makes it friendly for automated testing where you really, really want to have isolated testing infrastructure and avoid sharing any kind of stateful resource between testing processes
  • Marten in particular has built in support for setting up known database states going into automated tests. This is invaluable for integration testing
  • Executing directly against HTTP API endpoints is much faster than browser testing with something like Selenium. Faster executing tests == faster feedback cycles == better development throughput and delivery period
  • Running the tests completely in process with the application such as we did with Alba makes debugging test failures much easier for developers than trying to solve Selenium failures in a CI environment
  • Using the Alba + xUnit.Net (or NUnit etc) approach means that the integration tests can live with the application code and can be executed on demand whenever. That shifts the testing “left” in the development cycle compared to the slower Selenium running on CI only cycle. It also helps developers quickly spot check potential issues.
  • By embedding the integration tests directly in the codebase, you’re much less likely to get the drift between the application itself and automated tests that frequently arises from Selenium centric approaches.
  • This approach makes developers be involved with the test automation efforts. I strongly believe that it’s impossible for large scale test automation to work whatsoever without developer involvement
  • Whitebox tests are simply much more efficient than the blackbox model. This statement is likely to get me yelled at by real testing professionals, but it’s still true

This post took way, way too long to write compared to how I thought it would go. I’m going to make a little bonus followup on using Lamar of all things for other random test state resets.

Alba for Effective ASP.Net Core Integration Testing

Alba is a small library that enables easy integration testing of ASP.Net Core routes completely in process within an NUnit/xUnit.Net/MSTest project. Alba 7.1 just dropped today with .NET 7 support, improved JSON handling for Minimal API endpoints, and multipart form support.

Quickstart with Minimal API

Keeping things almost absurdly simple, let’s say that you have a Minimal API route (taken from the Alba tests) like so:

app.MapPost("/go", (PostedMessage input) => new OutputMessage(input.Id));

Now, over in your testing project, you could write a crude test for the route above like so:

    [Fact]
    public async Task sample_test()
    {
        // This line only matters if you use Oakton for the command line
        // processing
        OaktonEnvironment.AutoStartHost = true;
        
        // I'm doing this inline to make the sample easier to understand,
        // but you'd want to share the AlbaHost between tests because
        // this is expensive
        await using var host = await AlbaHost.For<MinimalApiWithOakton.Program>();
        
        var guid = Guid.NewGuid();
        
        var result = await _host.PostJson(new PostedMessage(guid), "/go")
            .Receive<OutputMessage>();

        result.Id.ShouldBe(guid);
    }

A couple notes about the code above:

  • The test is bootstrapping your actual application using its configuration, but using the TestServer in place of Kestrel as the web server.
  • The call to PostJson() is using the application’s JSON serialization configuration, just in case you’ve customized the JSON serialization. Likewise, the call to Receive<T>() is also using the application’s JSON serialization mechanism to be consistent. This functionality was improved in Alba 7 to “know” whether to use MVC Core or Minimal API style JSON serialization (but you can explicitly override that in mixed applications on a case by case basis)
  • When the test executes, it’s running through your entire application’s ASP.Net Core pipeline including any and all registered middleware

If you choose to use Alba with >= .NET 6 style application bootstrapping inside of an inferred Program.Main() method, be aware that you will need to grant your test project visibility to the internals of your main project something like this:

  <ItemGroup>
    <InternalsVisibleTo Include="ProjectName.Tests" />
  </ItemGroup>

How does Alba fit into projects?

I think most people by now are somewhat familiar with the testing pyramid idea (or testing trophy or any other number of shapes). Just to review, it’s the idea that a software system is best served by being backed by a mix of automated tests between solitary unit tests, intermediate integration tests, and some number of end to end, black box tests.

We can debate what the exact composition of your test pyramid should be on a particular project until the cows come home. For my part, I want more fast running, easier to write tests and fewer potentially nasty Selenium/Playwright/Cypress.io tests that tend towards being slow and brittle. I like Alba in particular because it allows our teams at work to test at the HTTP web service layer through to the database completely within process — meaning the tests can be executed on demand without any kind of deployment. In short, Alba sits in the middle of the pyramid graphic above and makes those very valuable kind of tests easier to write, execute, and debug for the developers working on the system.

Real Life TDD Example

Continuing a new blog series that I started yesterday on the application and usage of Test Driven Development.

Other posts in this series (so far):

In this post I’m going to walk through how I used TDD myself to build a feature and try to explain why I wrote the tests I did, and why I sequenced things as I did. Along the way, I dropped in short descriptions of ideas or techniques to best use TDD that I’ll hopefully revisit in longer form later in subsequent posts.

I do generally use Test Driven Development (TDD) in the course of my own coding work, but these days the mass majority of my coding work is in open source projects off to the side. One of the active open source projects I’m actively contributing to is a tool named “Wolverine” that is going to be a new command bus / mediator / messaging tool for .NET (it’s “Jasper” rebranded with a lot of improvements). I’ll be using Wolverine code for the code samples in this post going forward.

TDD’ing “Back Pressure” in Wolverine

One of the optional features of Wolverine is to buffer incoming messages from an external queue like Rabbit MQ in a local, in-process queue (through TPL Dataflow if you’re curious) before these messages are processed by the application’s message handlers. That’s sometimes great because it can sometimes speed up processing throughput quite a bit. It can also be bad if the local queue gets backed up and there are too many messages floating around that create memory pressure in your application.

To alleviate that concern, Wolverine uses the idea of “back pressure” to temporarily shut off local message listening from external message brokers if the local queue gets too big, and turn message listening back on only when the local queues get smaller as messages are successfully handled.

Here’s more information about “back pressure” from Derek Comartin.

Here’s a little diagram of the final structure of that back pressure subsystem and where it sits in the greater scope of things:

The diagram above reflects the final product only after I used Test Driven Development along the way to help shape the code. Rewinding a little bit, let me talk about the intermediate steps I took to get to this final, fully tested structure by going through some of my internal rules for TDD.

The first, most important step though is to just commit to actually doing TDD as you work. Everything else follows from that.

Writing that First Test

Like a lot of other things in life, coding is sometimes a matter of momentum or lack thereof. Developers can easily psych themselves into a state of analysis paralysis if they can’t immediately decide on exactly how the code should be designed from end to end. TDD can help here by letting you concentrate on a small area of the code you do know how to build, and verify that the new code works before you set it aside to work on the next step.

When I started the back pressure work, the very first test I wrote was to simply verify the ability for users to configure thresholds for when the messaging listener should be stopped and restarted on an endpoint by endpoint basis (think Rabbit MQ queue or a named, local in memory queue). I also wrote a test for default thresholds (which I made up on the spot) in cases when there was no explicit override.

Here’s the “Arrange” part of the first test suite:

public class configuring_endpoints : IDisposable
{
    private readonly IHost _host;
    private WolverineOptions theOptions;
    private readonly IWolverineRuntime theRuntime;

    public configuring_endpoints()
    {
        // This bootstraps a simple Wolverine system
        _host = Host.CreateDefaultBuilder().UseWolverine(x =>
        {
            // I'm configuring some known endpoints in the system. This is the "Arrange"
            // part of the system
            x.ListenForMessagesFrom("local://one").Sequential().Named("one");
            x.ListenForMessagesFrom("local://two").MaximumParallelMessages(11);
            x.ListenForMessagesFrom("local://three").UseDurableInbox();
            x.ListenForMessagesFrom("local://four").UseDurableInbox().BufferedInMemory();
            x.ListenForMessagesFrom("local://five").ProcessInline();

            x.ListenForMessagesFrom("local://durable1").UseDurableInbox(new BufferingLimits(500, 250));
            x.ListenForMessagesFrom("local://buffered1").BufferedInMemory(new BufferingLimits(250, 100));
        }).Build();

        theOptions = _host.Get<WolverineOptions>();
        theRuntime = _host.Get<IWolverineRuntime>();
    }

I’m a very long term usage of ReSharper and now Rider from JetBrains, so I happily added the new BufferingLimits argument to the previously existing BufferedInMemory() method in the unit test and let Rider add the argument to the method based on your inferred usage within the unit test. It’s not really the point of this post, but absolutely lean on your IDE when writing code “test first” to generate stub methods or change existing methods based on the inferred usage from your test code. It’s frequently a way to go a little faster when doing TDD.

And next, here’s some of the little tests that I used to verify both the buffering limit defaults and overrides based on the new syntax above:

    [Fact]
    public void has_default_buffering_options_on_buffered()
    {
        var queue = localQueue("four");
        queue.BufferingLimits.Maximum.ShouldBe(1000);
        queue.BufferingLimits.Restart.ShouldBe(500);
    }

    [Fact]
    public void override_buffering_limits_on_buffered()
    {
        var queue = localQueue("buffered1");
        queue.BufferingLimits.Maximum.ShouldBe(250);
        queue.BufferingLimits.Restart.ShouldBe(100);
    }

It’s just a couple simple tests with a little bit of admittedly non-trivial setup code, but you have to start somewhere. A few notes about why I started with those particular tests and how I decided to test that way:

  • Test Small before Testing Big — One of my old rules of doing TDD is to start by testing the building blocks of a new user story/feature/bug fix before trying to attempt to write a test that spans the entire flow of the new code. In this case, I want to prove out that just the configuration element of this complicated new functionality works before I even think about running the full stack. Using this rule should help you keep your debugger on the sidelines. More on this in later posts
  • Bottom Up or Top Down — You can either start by trying to code the controlling workflow and create method stubs or interface stubs for dependencies as you discover exactly what’s necessary. That’s working top down. In contrast, I frequently work “bottom up” when I understand some of the individual tasks within the larger feature, but maybe don’t yet understand how the entire workflow should be yet. More on this in a later post, but the key is always to start with what you already understand.
  • Sociable vs solitary tests — The tests above are “sociable” in that they use a “full” Wolverine application to test the new configuration code within the full cycle of the application bootstrapping process. This is opposed to being a “solitary” test that tests a very small, isolated part of the code. My decision to do this was based on my feeling that that test would be simple enough to write, and that a more isolated test in this particular case wasn’t really useful anyway.

The code tested by these first couple tests was pretty trivial, but it has to work before the whole feature can work, so it deserves a test. By and large, I like the advice that you write tests for any code that could conceivably be wrong.

I should also note that I did not in this case do a full design upfront of how this entire back pressure feature would be structured before I wrote that first couple tests.

One of the advantages of working in a TDD style is that it forces you (or should) to work incrementally in smaller pieces of code, which can hopefully be rearranged later when your initial ideas about how the code should be structured turn out to be wrong.

Using Responsibility Driven Design

I don’t always do this in a formal way, but by and large my first step in developing a new feature is to just think through the responsibilities within the new feature. To help discover those responsibilities I like to use object role stereotypes to quickly suggest splitting up the feature into different elements of the code by responsibility in order to make the code easier to test and proceed from there.

Back to building the back pressure feature, from experience I knew that it’s often helpful to separate out the responsibility to make a decision to take an action away from actually performing that action. To that end I chose to separate out a small, separate class called BackPressureAgent that will be responsible for deciding when to pause or restart listening based on the conditions of the current endpoint (how many messages are queued locally, and is the listener actively pulling in new messages from the external resource).

In object role stereotype terms, BackPressureAgent becomes a “controller” that controls and directs the actions of other objects and decides what those other objects should be doing. In this case, BackPressureAgent is telling an IListeningAgent object whether to pause or restart as shown in this “happy path, all is good, do nothing” test case shown below below:

    [Fact]
    public void do_nothing_when_accepting_and_under_the_threshold()
    {
        theListeningAgent.Status
            .Returns(ListeningStatus.Accepting);
        theListeningAgent.QueueCount
            .Returns(theEndpoint.BufferingLimits.Maximum - 1);
        
        // Evaluate whether or not the listening should be paused
        // based on the current queued item count, the current status
        // of the listening agent, and the configured buffering limits
        // for the endpoint
        theBackPressureAgent.CheckNowAsync();

        // Should decide NOT to do anything in this particular case
        theListeningAgent.DidNotReceive().MarkAsTooBusyAndStopReceivingAsync();
        theListeningAgent.DidNotReceive().StartAsync();
    }

In the tests above, I’m using a dynamic mock using NSubstitute for the listening agent just to simulate the current queue size and status, then evaluate whether or not the code under test decided to stop the listening or not. In the case above, the listening agent is running fine, and no action should take place.

Some notes on the test above:

  • In object role stereotype terms, the IListeningAgent is both an “interfacer” that we can use to provide information about the local queue and a “service provider” that can in this case “mark a listening endpoint as too busy and stop receiving external messages” and also restart the message listening later
  • The test above is an example of “interaction-based testing” that I’ll expound on and contrast with “state-based testing” in the following section
  • IListeningAgent already existed at this point, but I added new elements for QueueCount and the clumsily named `MarkAsTooBusyAndStopReceivingAsync()` method while writing the test. Again, I defined the new method and property names within the test itself, then let Rider generate the methods behind the scenes. We’ll come back to those later.
  • Isolate the Ugly Stuff — Early on I decided that I’d probably have BackPressureAgent use a background timer to occasionally sample the state of the listening agent and take action accordingly. Writing tests against code that uses a timer or really any asynchronous code is frequently a pain, so I bypassed that for now by isolating the logic on deciding to stop or restart external message listening away from the background timer, the active message broker infrastructure (again, think Rabbit MQ or AWS SNS or Azure Service Bus).
  • Keep a Short Tail — Again, the decision making logic is easy to test without having to pull in the background timer, the local queue infrastructure, or any kind of external infrastructure. Another way to think about that I learned years ago was this simple test of your code’s testability: “if I try to write a test for your code/method/function, what else do I have to pull off the shelf in order to run that test?” You ideally want that answer to be “not very much” or at least “nothing that’s hard to set up or control.”
  • Mocks are a red pepper flake test ingredient. Just like cooking with red pepper flakes, some judicial usage of dynamic mock objects can sometimes be a good thing, but using too many mock objects is pretty much always going to ruin the test in terms of readability, test setup work, and harmful coupling between the test and the implementation details

I highly recommend Rebecca Wirfs-Brock’s online A Brief Tour of Responsibility Driven Development for more background on this.

I didn’t test this

I needed to add an actual implementation of IListeningAgent.QueueCount that just reflected the current state of a listening endpoint based on the local queue within that endpoint like so:

    public int QueueCount => _receiver is ILocalQueue q ? q.QueueCount : 0;

I made the judgement call that that code above was simple enough — and also too much trouble to test anyway — that it was low risk to not write any test whatsoever.

Making a required code coverage number is not a first class goal. Neither is using pure, unadulterated TDD for every line of code you write (but definitely test as you work rather than waiting until the very end to test no matter how you work). The real goal is being able to use TDD as a very rapid feedback cycle and as a way to arrive as code that exhibits the desirable qualities of high cohesion and low coupling.

Introducing the first integration test

Earlier I said that one of my rules was “test small before testing big.” At this point I still wasn’t ready to try to just code the rest of the back pressure and try to run it all because I hadn’t yet coded the functionality to actually pause listening to external messages. That new method in ListeningAgent is shown below:

    public async ValueTask MarkAsTooBusyAndStopReceivingAsync()
    {
        if (Status != ListeningStatus.Accepting || _listener == null) return;
        await _listener.StopAsync();
        await _listener.DisposeAsync();
        _listener = null;
        
        Status = ListeningStatus.TooBusy;
        _runtime.ListenerTracker.Publish(new ListenerState(Uri, Endpoint.Name, Status));

        _logger.LogInformation("Marked listener at {Uri} as too busy and stopped receiving", Uri);
    }

It’s not very much code, and to be honest, I sketched out the code without first writing a test. Now, I could have written a unit test for this method, but my ultimate “zeroth rule” of testing is:

Test with the finest grained mechanism that tells you something important

Me!

I did not believe that a “solitary” unit test — probably using mock objects? — would provide the slightest bit of value and would simply replicate the implementation of the method in mock object expectations. Instead, I wrote an integration test in Wolverine’s “transport compliance” test suite like so:

[Fact]
public async Task can_stop_receiving_when_too_busy_and_restart_listeners()
{
    var receiving = (theReceiver ?? theSender);
    var runtime = receiving.Get<IWolverineRuntime>();

    foreach (var listener in runtime.Endpoints.ActiveListeners().Where(x => x.Endpoint.Role == EndpointRole.Application))
    {
        await listener.MarkAsTooBusyAndStopReceivingAsync();

        listener.Status.ShouldBe(ListeningStatus.TooBusy);
    }

    foreach (var listener in runtime.Endpoints.ActiveListeners().Where(x => x.Endpoint.Role == EndpointRole.Application))
    {
        await listener.StartAsync();

        listener.Status.ShouldBe(ListeningStatus.Accepting);
    }

    var session = await theSender.TrackActivity(Fixture.DefaultTimeout)
        .AlsoTrack(theReceiver)
        .DoNotAssertOnExceptionsDetected()
        .ExecuteAndWaitAsync(c => c.SendAsync(theOutboundAddress, new Message1()));


    session.FindSingleTrackedMessageOfType<Message1>(EventType.MessageSucceeded)
        .ShouldNotBeNull();
}

The test above reaches into the listening endpoints within a receiving Wolverine application:

  1. Pauses the external message listening
  2. Restarts the external message listening
  3. Publishes a new message from a sender to a receiving application
  4. Verifies that, yep, that message really got to where it was supposed to go

As the test above is applied to every current transport type in Wolverine (Rabbit MQ, Pulsar, TCP), I had to then run a whole bunch of integration tests against external infrastructure (running locally in Docker containers, isn’t it a great time to be alive?).

Once that test passed for all transports — and I felt that was important because there had been previous issues making a similar circuit breaker feature work without “losing” in flight messages — I was able to move on.

Almost there, but when should back pressure be applied?

At this point I was so close to being ready to make that last step and finish it all off by running end to end with everything! But at this point I remembered that back pressure should only be checked for certain types of messaging endpoints with what ultimately became these rules:\

  • It’s not a local queue. I know this might be a touch confusing, but Wolverine let’s you use named, local queues as well as using local queues internally for the listening endpoint from external message brokers like Rabbit MQ queues. If the endpoint is a named, local queue, there’s no point in using back pressure (at least in its current incarnation).
  • The listening endpoint is configured to be what Wolverine calls “buffered” mode as opposed to “inline” mode where a message has be be completely processed inline with being delivered by external message brokers before you acknowledge the receipt to the message broker
  • Or the listening endpoint is enrolled in Wolverine’s durable inbox

After fiddling with the logic to make that determination inline inside of ListeningAgent or BufferingAgent, I decided for a variety of reasons that that little bit of logic really belonged in its own method on Wolverine’s Endpoint class that is the configuration model for all communication endpoints. The base method is just this:

    public virtual bool ShouldEnforceBackPressure()
    {
        return Mode != EndpointMode.Inline;
    }

In this particular case, I probably jumped right into the code, but immediately wrote tests for the code for Rabbit MQ endpoints:

        [Theory]
        [InlineData(EndpointMode.BufferedInMemory, true)]
        [InlineData(EndpointMode.Durable, true)]
        [InlineData(EndpointMode.Inline, false)]
        public void should_enforce_back_pressure(EndpointMode mode, bool shouldEnforce)
        {
            var endpoint = new RabbitMqEndpoint(new RabbitMqTransport());
            endpoint.Mode = mode;
            endpoint.ShouldEnforceBackPressure().ShouldBe(shouldEnforce);
        }

and also for endpoints that model local queue endpoints that should of course never have back pressure applied in the current model:

    [Theory]
    [InlineData(EndpointMode.Durable)]
    [InlineData(EndpointMode.Inline)]
    [InlineData(EndpointMode.BufferedInMemory)]
    public void should_not_enforce_back_pressure_no_matter_what(EndpointMode mode)
    {
        var endpoint = new LocalQueueSettings("foo")
        {
            Mode = mode
        };
        
        endpoint.ShouldEnforceBackPressure().ShouldBeFalse();
    }

That’s nearly trivial code, and I wasn’t that worried about the code not working. I did write tests for that code — even if later — because the test made a statement about how the code should work and keeps someone else from accidentally breaking the back pressure subsystem by changing that method. In a way, putting that test in the code acts as documentation for later developers.

Before wrapping up with a giant integration test, let’s talk about…

State vs Interaction Testing

One way or another, most automated tests are going to fall into the rough structure of Arrange-Act-Assert where you connect known inputs to expected outcomes for some kind of action or determination within your codebase. Focusing on assertions, most of the time developers are using state-based testing where the tests are validating the expected value of:

  • A return value from a method or function
  • The state of an object
  • Changes to a database or file

Here’s a simple example from Wolverine that tests some exception handling code with a state-based test:

    [Fact]
    public void type_match()
    {
        var match = new TypeMatch<BadImageFormatException>();
        match.Matches(new BadImageFormatException()).ShouldBeTrue();
        match.Matches(new DivideByZeroException()).ShouldBeFalse();
    }

In contrast, interaction-based testing involves asserting on the expected signals passed or messages passed between two or more elements of code. You probably already know this from mock library usage. Here’s an example from Wolverine code that I’ll explain and discuss more below:

    [Fact]
    public async Task do_not_actually_send_outgoing_batched_when_the_system_is_trying_to_shut_down()
    {
        // This is a cancellation token for the subsystem being tested
        theCancellation.Cancel();

        // This is the "action"
        await theSender.SendBatchAsync(theBatch);

        // Do not send on the batch of messages if the
        // underlying cancellation token has been marked
        // as cancelled
        await theProtocol.DidNotReceive()
            .SendBatchAsync(theSenderCallback, theBatch);
    }

Part of Wolverine’s mission is to be a messaging tool between two or more processes. The code being tested above takes part of sending outgoing messages in a background test. When the application has signaled that it is shutting down through the usage of a CancellationToken, the BatchSender class being tested above should not send any more outgoing messages. I’m asserting that behavior by checking that a certain interaction between BatchSender and a raw socket handling class was not called with new messages, and therefore, no outgoing messages were sent.

A common criticism of the testing technique I used above is something to the effect of “why do I care whether or not a method was called, I only care about the actual impact of the code!” This is a bit semantic, but my advice here is to say (and think to yourself) that you are asserting on the decision whether or not to send outgoing messages when the system itself is trying to shut down.

As to whether or not to use state-based vs interaction-based testing, I’d say that is a case by case decision. If you can easily verify the expected change of state or expected result of an action, definitely opt for state-based testing. I’d also use state-based testing anytime that the necessary interactions are unclear or confusing, even if that means opting for a bigger more “sociable” test or a full blown integration test.

However, to repeat an earlier theme, there are plenty of times when it’s easiest in code to separate the decision made to take an action from testing the result of that action in code. Here’s an example from my own work from just last week adding some back pressure protection to the message listening subsystem in Wolverine.

Summary of Test Driven Development So Far

My goal with this post was to introduce a lot of ideas and concepts I like to use with TDD in the context of a non-trivial, but still not too big, development of a real life feature that was built with TDD.

I briefly mentioned some of my old “Jeremy’s Rules of Test Driven Development” that really just amount to some heuristic tools to think through separation of concerns through the lens of what makes unit testing easier or at least possible:

  • Test Small before Testing Big
  • Isolate the Ugly Stuff
  • Keep a Short Tail
  • Push, don’t Pull — I didn’t have an example for this in the back pressure work, but I’ll introduce this in its own post some day soon

I also discussed state-based vs interaction-based testing. I think you need both in your mental toolbox and have some idea of when to apply both.

I also introduced responsibility driven design with an eye toward how that can help TDD efforts.

In my next post I think I’ll revisit the back pressure feature from Wolverine and show how I ultimately created an end to end integration test that got cut from this post because it’s big, hugely complicated, and worthy of its own little post.

After that, I’ll do some deeper dives on some of the design techniques and testing concepts that I touched on in this post.

Until later, Jeremy out…

Self Diagnosing Deployments with Oakton and Lamar

So here’s the deal, sometimes, somehow you deploy a new version of a system into a testing, staging, or production environment and it doesn’t work. Shocking and sometimes distressing when that happens, right?

There are any number of problems that could be the cause. Maybe a database is in an invalid state, maybe a file got missed in the deployment, maybe some bit of configuration is wrong, maybe a downstream or upstream collaborating system is down or unreachable. Who knows, right? Well, what if we could make our systems self-diagnosing so they can do a quick rundown of how they’re configured and running right at start up time and fail fast if something is detected to be wrong with the deployment? And also have the system tell us exactly what is wrong through first causes without having to later trace runtime failures to the ultimate root cause?

To that end, let’s talk about some mechanisms in both the Oakton and Lamar libraries to quickly add in what I like to call “environment checks”, meaning self diagnosing checks on system startup to test out system configuration and validity. Oakton has a facility to run environment checks in either a separate diagnostic command directly in your application, or to run the environment checks at system start up time, make a detailed report of any failures, and make the process stop and return an error code. In turn, a continuous deployment script should be able to detect the failure to start the system and rollback to a previously known, good state.

In the past, I’ve worked in teams where we’ve embedded environment checks to:

  • Verify that a configured database is reachable
  • Ping an external web service dependency
  • Check for the existence of required files
  • Verify that an expected COM dependency was registered (shudder, there’s some bad memories in that one)
  • Test that security features are correctly configured and usable — and I think that’s a big one
  • Assert that the system has the ability to read and write configured file system directories — also some serious scar tissue

The point here is to make your deployments fail fast anytime there’s an environmental misconfiguration, and do so in a way that makes it easy for you to spot exactly what’s wrong. Environment checks can help teams avoid system down time and keep testers from blowing up the defect lists with false negatives from misconfigured systems.

Getting Started with Oakton

Just to get started, I spun up a new .Net 5 web service with dotnet new webapi. That template gives you this code in the Program.Main() method:

        public static void Main(string[] args)
        {
            CreateHostBuilder(args).Build().Run();
        }

I’m going to add a Nuget reference to Oakton, then change that method up above so that Oakton is handling the command line parsing and system startup like this:

        // It's important to return Task<int> so that
        // Oakton can signal failures with non-zero
        // return codes
        public static Task<int> Main(string[] args)
        {
            return CreateHostBuilder(args).RunOaktonCommands(args);
        }

There’s still nothing going on in our web service, but from the command line we could run dotnet run -- check-env just to start up our application’s IHost and run all the environment checks in a test mode. Nothing there yet, so you’d get output like this:

   ___            _      _
  / _ \    __ _  | | __ | |_    ___    _ __
 | | | |  / _` | | |/ / | __|  / _ \  | '_ \
 | |_| | | (_| | |   <  | |_  | (_) | | | | |
  \___/   \__,_| |_|\_\  \__|  \___/  |_| |_|

No environment checks.
All environment checks are good!

We can add environment checks directly to our application with Oakton’s built in mechanisms like this example that just validates that there’s an appsettings.json file in the application path:

        public void ConfigureServices(IServiceCollection services)
        {
            // Literally just proving out that appsettings.json is available
            services.CheckThatFileExists("appsettings.json");
            
            // Other registrations
        }

And now that same dotnet run -- check-env file gives us this output:

   ___            _      _
  / _ \    __ _  | | __ | |_    ___    _ __
 | | | |  / _` | | |/ / | __|  / _ \  | '_ \
 | |_| | | (_| | |   <  | |_  | (_) | | | | |
  \___/   \__,_| |_|\_\  \__|  \___/  |_| |_|

   1.) Success: File 'appsettings.json' exists

Running Environment Checks ---------------------------------------- 100%

All environment checks are good!

One of the other things that Oakton does is intercept the basic dotnet run command with extra options, so we can use the --check flag like so:

dotnet run -- --check

That command is going to:

  1. Bootstrap and start the IHost for your application
  2. Load and execute all the registered environment checks for your application
  3. Report the status of each environment check to the console
  4. Fail the executable if any environment checks fail

In my tiny sample app I’m building here, the output of that call is this:

C:\code\JasperSamples\EnvironmentChecks\WebApplication\WebApplication>dotnet run -- --check
Building...
   1.) Success: File 'appsettings.json' exists

Running Environment Checks ---------------------------------------- 100%

info: Microsoft.Hosting.Lifetime[0]
      Now listening on: https://localhost:5001
info: Microsoft.Hosting.Lifetime[0]
      Now listening on: http://localhost:5000
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Development
info: Microsoft.Hosting.Lifetime[0]
      Content root path: C:\code\JasperSamples\EnvironmentChecks\WebApplication\WebApplication

At deployment time, that call is probably just [your application name] --check to start the application.

Environment Checks with Lamar

Now that we’ve got the basics for environment checks built into our new web service with Oakton, I want to introduce Lamar as the DI/IoC tool for the application. I’ll add a Nuget reference to Lamar.Microsoft.DependencyInjection to my project. First to make Lamar be the IoC container for my new service, I’ll add this line of code to the Program.CreateHostBuilder() method:

        public static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                
                // Make Lamar the application container
                .UseLamar()
                .ConfigureWebHostDefaults(webBuilder =>
                {
                    webBuilder.UseStartup<Startup>();
                });

Now, I’m going to go a little farther and add a reference to the Lamar.Diagnostics Nuget as well. That adds some Lamar specific diagnostic commands to our command line options, but also allows us to add Lamar container checks at startup like this:

        public void ConfigureServices(IServiceCollection services)
        {
            // Literally just proving out that appsettings.json is available
            services.CheckThatFileExists("appsettings.json");
            
            // Do a full check of the Lamar configuration
            // and run Lamar environment checks too!
            services.CheckLamarConfiguration();

Running our dotnet run -- check-env command again gives us:

   ___            _      _
  / _ \    __ _  | | __ | |_    ___    _ __
 | | | |  / _` | | |/ / | __|  / _ \  | '_ \
 | |_| | | (_| | |   <  | |_  | (_) | | | | |
  \___/   \__,_| |_|\_\  \__|  \___/  |_| |_|

   1.) Success: File 'appsettings.json' exists
   2.) Success: Lamar IoC Service Registrations
   3.) Success: Lamar IoC Type Scanning

Running Environment Checks ---------------------------------------- 100%

All environment checks are good!

The Lamar container checks are a little heavyweight, so watch out for that because you might want to dial them back. Those checks run through every single service registration in Lamar, verifies that all the dependencies exist, and tries to build every registration at least once.

Next though, let’s look at how Lamar lets us plug its own form of environment checks. On any concrete class that is built by Lamar, you can directly embed environment checks with methods like this fake service:

    public class SometimesMisconfiguredService
    {
        private readonly IConfiguration _configuration;

        public SometimesMisconfiguredService(IConfiguration configuration)
        {
            _configuration = configuration;
        }

        [ValidationMethod]
        public void Validate() // The method name does not matter
        {
            var connectionString = _configuration.GetConnectionString("database");
            using var conn = new SqlConnection(connectionString);
            
            // Just try to connect to the configured database
            conn.Open();
        }
    }

The method name above doesn’t matter. All that Lamar needs to see is the [ValidationMethod] attribute. See Lamar’s Environment Tests for a little more information about using this feature. And I don’t really need to “do” anything other than throw an exception if the check logically fails.

And I’ll register that service in the container in the Startup.ConfigureServices() method like so:

            services.AddSingleton<SometimesMisconfiguredService>();

Now, going back to the application and I’ll try to start it up with the environment check flag active — but I haven’t configured a database connection string or even attempted to spin up a database at all, so this should fail fast. And it does with a call to dotnet run -- --check:

   1.) Success: File 'appsettings.json' exists
   2.) Failed: Lamar IoC Service Registrations

ERROR: Lamar.IoC.ContainerValidationException: Error in WebApplication.SometimesMisconfiguredService.Validate()

AND A LOT OF STACK TRACE VERBIAGE FROM THE FAILURES

To get at just the Lamar container validation, you can also use dotnet run -- lamar-validate. IoC container tools are a dime a dozen in .Net, and many of them are perfectly competent. When I’m asked “why use Lamar?”, my stock answer is to use Lamar for its diagnostic capabilities.

More Information

Integration Testing: IHost Lifecycle with NUnit

Starting yesterday, all of my content about automated testing is curated under the new Automated Testing page on this site.

I kicked off a new blog series yesterday with Integration Testing: IHost Lifecycle with xUnit.Net. I started by just discussing how to manage the lifecycle of a .Net IHost inside of an xUnit.Net testing library. I used xUnit.Net because I’m much more familiar with that library, but we mostly use NUnit for our testing at MedeAnalytics, so I’m going to see how the IHost lifecycle I discussed and demonstrated last time in xUnit.NEt could work in NUnit.

To catch you up from the previous post, I have two projects:

  1. An ASP.Net Core web service creatively named WebApplication. This web service has a small endpoint that allows you to post an array of numbers that returns a response telling you the sum and product of those numbers. The code for that controller action is shown in my previous post.
  2. A second testing project using NUnit that references WebApplication. The testing project is going to use Alba for integration testing at the HTTP layer.

With NUnit, I chose to use the SetupFixture construct to manage and share the IHost for the test suite like this:

    [SetUpFixture]
    public class Application
    {
        // Make this lazy so you don't build it out
        // when you don't need it.
        private static readonly Lazy<IAlbaHost> _host;

        static Application()
        {
            _host = new Lazy<IAlbaHost>(() => Program
                .CreateHostBuilder(Array.Empty<string>())
                .StartAlba());
        }

        public static IAlbaHost AlbaHost => _host.Value;

        // I want to expose the underlying Lamar container for some later
        // usage
        public static IContainer Container => (IContainer)_host.Value.Services;

        // Make sure that NUnit will shut down the AlbaHost when
        // all the projects are finished
        [OneTimeTearDown]
        public void Teardown()
        {
            if (_host.IsValueCreated)
            {
                _host.Value.Dispose();
            }
        }
    }

With the IHost instance managed by the Application static class above, I can consume the Alba host in an NUnit test like this:

    public class sample_integration_fixture
    {
        [Test]
        public async Task happy_path_arithmetic()
        {
            // Building the input body
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await Application.AlbaHost.Scenario(x =>
            {
                // Alba deals with Json serialization for us
                x.Post.Json(input).ToUrl("/math");
                
                // Enforce that the HTTP Status Code is 200 Ok
                x.StatusCodeShouldBeOk();
            });

            var output = response.ReadAsJson<Result>();
            output.Sum.ShouldBe(9);
            output.Product.ShouldBe(24);
        }
    }

And now a couple notes about what I did in Application:

  1. I think it’s important to create the IHost lazily, so that you don’t incur the cost of spinning up the IHost when you might be running other tests in your suite that don’t need the IHost. Rapid developer feedback is important, and that’s an awfully easy optimization that could pay off.
  2. The static Teardown() method is decorated with the `[OneTimeTearDown]` attribute to direct NUnit to call that method after all the tests are executed. I cannot stress enough how important it is to clean up resources in your test harness to ensure your ability to quickly iterate through subsequent test runs.
  3. NUnit has a very different model for parallelization than xUnit.Net, and it’s completely “opt in”, so I think there’s less to worry about on that front with NUnit.

At this point I don’t think I have a hard opinion about xUnit.Net vs. NUnit, and I certainly wouldn’t bother switching an existing project from one to the other (even though I’ve certainly done that plenty of times in the past). I haven’t thought this one through enough, but I still think that xUnit.Net is a little bit cleaner for unit testing, but NUnit might be better for integration testing because it gives you finer grained control over fixture lifecycle and has some built in support for test timeouts and retries. At one point I had high hopes for Fixie as another alternative, and that project has become active again, but it would have a long road to challenge either of the two now mainstream tools.

What’s Next?

This series is meant to support my colleagues at MedeAnalytics, so it’s driven by what we just happen to be talking about at any given point. Tomorrow I plan to put out a little post on some Lamar-specific tricks that are helpful in integration testing. Beyond that, I think dealing with database state is the most important thing we’re missing at work, so that needs to be a priority.

Integration Testing: IHost Lifecycle with xUnit.Net

I’m part of an initiative at work to analyze and ultimately improve our test automation practices. As part of that work, I’ll be blogging quite a bit about test automation starting with my brain dump on test automation last week and my most recent post on mocks and stubs last month. From here on out, I’m curating all of my posts and selected writings from other folks on my new Automated Testing page.

I’m already on record as saying that the generic host (IHost) in recent versions of .Net is one of the best things that’s ever happened to the .Net ecosystem. In my previous post I stated that I strongly prefer having the system under test running in process with the test harness for faster feedback cycles and easier debugging. The generic host builder introduced in .Net Core turns out to be a very effective way to bootstrap your system within automated test harnesses.

Before I dive into how to use the IHost in automated testing, here’s a couple issues I think you have to address in your integration testing strategy before we go willy nilly spinning up an IHost:

  • You ideally want to test against your code running in a realistic way, so the way code is bootstrapped and configured should be relatively close to how that code is started up in the real application.
  • There will inevitably need to be at least some configuration that needs to be different in testing or some services — usually accessing resources external to your system — that need to be replaced with stubs or some other kind of fake implementation.
  • It’s important to cleanly dispose or shutdown any IHost object you create in memory to avoid potential locks of resources like database connections, files, or ports. Failing to clean up resources in tests can easily make it harder to iterate through test fixes if you find yourself needing to manually kill processes or restart your IDE to release locked resources (been there, done that).
  • The IHost can be expensive to build up, and sometimes there’s going to be some serious benefit in reusing the IHost between tests to make the test suite run faster.
  • But the IHost is stateful, and there could easily be resources (singleton scoped services, databases, and whatnot) that could impact later test runs in the suite.

Before I jump into solutions, let’s assume that I have two projects:

  1. WebApplication is an ASP.Net Core web service project. WebApplication uses Lamar as its underlying DI container.
  2. A test project that references WebApplication

xUnit.Net Mechanics

I’m more comfortable with xUnit.Net, so I’m going to use that first. My typical usage is to share the IHost through xUnit.Net’s CollectionFixture mechanism (and if you think the usage of this thing is confusing, welcome to the club). First up, I’ll build out a new class I usually call AppFixture to manage the lifecycle of the IHost. The example project I’ve built here is an ASP.Net Core web service project, so I’m going to use Alba to wrap the host inside of AppFixture as shown below:

    public class AppFixture : IDisposable, IAsyncLifetime
    {
        public IAlbaHost Host { get; private set; }
        public async Task InitializeAsync()
        {
            // Program.CreateHostBuilder() is the code from the WebApplication
            // that configures the HostBuilder for the system
            Host = await Program
                .CreateHostBuilder(Array.Empty<string>())
                
                // This extension method starts up the underlying IHost,
                // but Alba replaces Kestrel with a TestServer and
                // wraps the IHost
                .StartAlbaAsync();
        }

        public Task DisposeAsync()
        {
            return Host.StopAsync();
        }

        public void Dispose()
        {
            Host?.Dispose();
        }
    }

A couple things to note in that code above:

  • As we’ll set up next, that class above will be constructed once in memory by xUnit and shared between test fixture classes
  • The Dispose() and DisposeAsync() methods both dispose the IHost. By normal .Net mechanics, that will also dispose the underlying Lamar IoC container, which will in turn dispose any services created by Lamar at runtime that implement IDisposable. Disposing the IHost also stops any registered IHostedService services that your application may be using for long running tasks (for my colleagues who may be reading this, both NServiceBus and MassTransit start and stop their message listeners in an IHostedService, so that might be in use even if you don’t explicitly use that technique).

Next, we’ll set up AppFixture to be shared between our integration test classes by using the [CollectionDefinition] attribute on a marker class:

    [CollectionDefinition("Integration")]
    public class AppFixtureCollection : ICollectionFixture<AppFixture>
    {
        
    }

Lastly, I like to build out a base class for integration tests like this one:

    [Collection("Integration")]
    public abstract class IntegrationContext
    {
        protected IntegrationContext(AppFixture fixture)
        {
            theHost = fixture.Host;
            
            // I am using Lamar as the underlying DI container
            // and want some Lamar specific things later on
            // in the tests
            Container = (IContainer)fixture.Host.Services;
        }

        public IAlbaHost theHost { get; }
        
        public IContainer Container { get; }
    }

The [Collection] attribute is meaningful here because that makes xUnit.Net run all the tests that are contained in test fixture classes that inherit from IntegrationContext in a single thread so we don’t have to worry about concurrent test runs.*

And finally to bring this all together, let’s say that WebApplication has this simplistic web service code to do some arithmetic:

    public class Result
    {
        public int Sum { get; set; }
        public int Product { get; set; }
    }

    public class Numbers
    {
        public int[] Values { get; set; }
    }
    
    public class ArithmeticController : ControllerBase
    {
        [HttpPost("/math")]
        public Result DoMath([FromBody] Numbers input)
        {
            var product = 1;
            foreach (var value in input.Values)
            {
                product *= value;
            }

            return new Result
            {
                Sum = input.Values.Sum(),
                Product = product
            };
        }
    }

In the next code block, let’s finally see a test fixture class that uses the new IntegrationContext as a base class and tests the HTTP endpoint shown in the block above.

    public class ArithmeticApiTests : IntegrationContext
    {
        public ArithmeticApiTests(AppFixture fixture) : base(fixture)
        {
        }

        [Fact]
        public async Task post_to_a_secured_endpoint_with_jwt_from_extension()
        {
            // Building the input body
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await theHost.Scenario(x =>
            {
                // Alba deals with Json serialization for us
                x.Post.Json(input).ToUrl("/math");
                
                // Enforce that the HTTP Status Code is 200 Ok
                x.StatusCodeShouldBeOk();
            });

            var output = response.ReadAsJson<Result>();
            output.Sum.ShouldBe(9);
            output.Product.ShouldBe(24);
        }
    }

Alright, at this point we’ve got a way to shared the system’s IHost in tests for better efficiency, and we’re making sure that all the resources in the IHost are cleaned up when the test suite is done. We’re using the WebApplication’s exact configuration for the IHost, but we still might need to alter that in testing. And there’s also the issue of needing to roll back state in our system between tests. I’ll pick up those subjects in my next couple posts, as well as using NUnit instead of xUnit.Net because that’s what the majority of code at my work uses for testing.

* It would be nice to be able to run parallel tests using our shared IHost, but that can often be problematic because of shared state, so I generally bypass test parallelization in integrated tests. The subject of parallelizing integration tests is worthy of a later blog post of some thoughts I haven’t quite elucidated yet.

A brain dump on automated integration testing

I’m strictly talking about automated testing in this post. I’m more or less leading an effort at work to improve our test automation and Test Driven Development practices at work, so I’ll be trying to blog quite a bit about related topics in the next couple months. After reviewing quite a bit of in flight code, I think I’ll try to revisit some of my old blog posts on testability design from the CodeBetter days and update those old lessons from the early days of TDD to what we’re building now.

My company builds and maintains several long running software systems with a healthy back log of feature requests, performance improvements, and stories to retire technical debt. All that is to say that we’re constantly adding to or improving existing code — which implies we’re always running some non-zero risk of creating regression defects. To keep everybody’s stress levels down, we’re taking incremental steps toward a true continuous delivery model where we can smoothly and consistently build and deploy fully tested features while being confident that we aren’t introducing regression defects.

As you’d likely guess, we’re very interested in improving our automated testing practices as a safety net to enable continuous delivery while also improving our quality in general. That leads to the next question, what kind of automated testing should we be doing? Followed by, is there automated testing we’re doing today that isn’t delivering enough bang for the buck?

To that point, let’s take a look at the classic idea of the testing pyramid at some point, as shown below:

From Unit test: sociable or solitary

The thinking behind the testing pyramid is that there’s a certain, healthy mix of different sorts of automated tests that efficiently lead to better results. I say a “mix” here because unit tests, though relatively cheap compared to other tests, cannot detect many defects that only come out during integration between components, code modules, or systems. From a quick search, I found worlds of memes along the lines of this one:

No integration tests, but all the unit tests pass!

To address exactly what kind of automated tests we should be writing, here’s a stream of consciousness brain dump that I later organized with a patina of organization:

On End to End User Interface Tests

Any kind of test that uses a tool like Selenium to do end to end, black-box testing is going to run slowly. These tests are also frequently be unstable because of asynchronous timing issues in modern browser applications. There’s an unhealthy tendency in many shops who adopt Selenium as a test automation solution to use it as their golden hammer to the exclusion of other testing techniques that can be much more efficient in certain circumstances. To put it bluntly, it’s very difficult to successfully author and maintain test automation suites based on Selenium against complicated applications. In my experience in shops that have attempted large scale Selenium usage, I do not believe that the benefits of those tests have ever outweighed the costs.

I would still recommend using some small number of end to end tests with a tool like Selenium, but those tests should be focused on proving out integration mechanisms between a user interface and backing server side code. For example, I’ve been working on a new integration of Open Id Connect (OIDC) authentication into our web services and web applications. I’ve used Playwright to automate browser testing to prove out the interactions and redirects between the OIDC service and our applications or services.

I also find Cypress.io interesting, but more for doing integration testing of our Angular applications by themselves with a dummy backend. For true end to end testing of .Net-backed web applications, I think I’m interested in replacing Selenium from here on out with Playwright as I think and hope it just does much more for you to make performant and reliable automated tests compared to Selenium.

Driving a browser should not be used to automate functional testing of business logic or data services or any kind of data analysis that could possibly be tested without using the full browser.

If you insist on trying to do a lot of browser automation testing, you better invest in collecting diagnostic information in test runs that can be used by developers to debug test failures. Ideally, I like to have the application’s log output correlated to the test run somehow. I’ve worked with teams that were able to pipe the console.log() tracing from the JavaScript code running in the browser to the test results and that was extremely helpful. Taking screenshots as part of the test can certainly help. As another ideal, I very strongly recommend that any kind of browser automation tests be executable by developers on their local machine on demand for easier debugging. More on this later.

Again, if you absolutely have to write Selenium/Playwright/Cypress tests despite all of my warnings, I strongly recommend you write those tests in the same programming language as the real application. That statement is going to be controversial if any real test automation engineers stumble into this post, but I think it’s important to make it as easy as possible for developers to collaborate with test automation engineers. Moreover, I despise the kind of shadow data access layers you can get from test automation code doing their own thing to write to and read from the underlying data store of the system under test. I think it’s less likely to get that kind of insidious, hidden code duplication if the test code is written in the same programming language and even uses the system’s own data access code to set up or verify database state as part of the automated tests.

Choosing Solitary/Unit Tests or Sociable/Integration Tests

I missed out on this when it was first published, but I think I like the nomenclature of solitary vs sociable tests better than thinking about unit vs integration tests. I also encourage folks to think of that as a continuum rather than a hard categorization. Moreover, I recommend that you switch between solitary and sociable tests even within the same test library where one or the other is more effective.

I would recommend organizing tests by functional area first, and only consider separating out integration tests into a separate testing project when it’s advantageous to use a “fast test, slow test” division for more efficient development.

We have formal requirements for test coverage metrics in our continuous integration builds, so I’d definitely make sure that any integration tests count toward that coverage number.

I think an emphasis on always writing classical unit tests can easily create a strong coupling between the production code and your testing code. That can and will reduce your ability to evolve your code, add new functionality, or do performance optimizations without rewriting your tests.

In many cases, integration tests that start at a natural sub-system facade or a logical controller/conductor entry point will do much better for you as a regression safety net to allow you to refactor your code to allow for new behavior or do important performance optimizations.

Case in point, I relied strictly on fine grained unit tests in my early work in StructureMap, and I definitely felt the negative consequences of that approach (I gave a talk about it in 2008 that’s still relevant). With later releases of StructureMap and now with Lamar, I lean much more heavily on integrated acceptance tests (let’s go ahead and call it Behavior Driven Development) that test from the entry point of the library down and focus on user-centric scenarios. I feel like that testing approach has led to much better results — both in the ease of adding new features, detecting regression defects in automated builds, and allowing me to evolve the functionality of the library.

On the other hand, integration tests can be harder to troubleshoot when they fail because you have more ground to cover. They also run slower of course. If you find that your feedback cycles feel too slow to efficiently run the tests continuously or especially if you find yourself doing long, marathon sessions in your debugger, stop and consider introducing more fine-grained tests first.

Running the Tests Locally vs Remotely

To the previous point, I think it’s critical that developers should be able to easily spin up and run automated integration tests on demand on their local development boxes. Tests will fail, and being able to easily troubleshoot a failing test is a prerequisite for successful test automation. If you can run an integration test locally, you’re much more likely to be able to iterate and try potential fixes quickly. There’s also the very real possibility of attaching a debugger to the testing process.

I think that our current technology set makes it much easier to do integration testing than it was when I was first getting started and the strict Michael Feathers definition of a unit test was in vogue. Just speaking from my own experience, the current .Net 5 generation is very easy to spin up and down in process for automated testing. Docker has been a great way to stand up development environments using Sql Server, Postgresql, Rabbit MQ, and other infrastructural tools.

As a follow up to the previous section, it’s also advantageous for automated tests to be able to run in process with the test harness code. For instance, I’d much rather do Alba testing of HTTP endpoints in .Net 5 where I’m able to quickly spin up an actual web service in memory and shut it down from my testing project. As opposed to the old, full .Net framework where you’d have to run the web service project in IIS or IISExpress first, then use HttpClient to address the service from your unit tests. The first, .Net 5/Alba approach is a much faster iteration and feedback cycle to support a Test Driven Development workflow than the 2nd approach.

Likewise, when given a choice between tests that can be run locally versus tests that can only be executed in a remote server location, give me the local tests every time. When and if you hit a scenario where you really need to run tests remotely *cough* serverless *cough*, that’s the one and only exception I can think of to my “never deploy from your local development box” rule. If depending on remote execution of tests, I’d at least want the ability to send my local development branch to the remote server at will. Just having a CI server build out pull request branches might get you there of course, but then you might be dependent upon being about to run that test suite in parallel with other CI builds. That’s not a show stopper, but it might make you have to invest more in your build automation to spin up isolated environments on demand.

Apollo Testing!

There’s plenty of debate over what the actual ration of UI tests to integration tests to unit tests should be, and plenty of folks have different metaphors than a “pyramid” to describe what they thing the ratio should be. I happen to like the integration test heavy ratio described in The Testing Trophy and Testing Classifications with the graphic below:

From The Testing Trophy and Testing Classifications

I think his image of the “testing trophy” looks a lot like the command module from the Apollo missions to the moon in the 70’s:

The Apollo Command Module

So from now on, I’m calling our intended testing approach the Apollo Testing Method!

What is the purpose of testing?

I’m just barely old enough that I started my official software development career in old fashioned waterfall models. In those days we did some unit testing with ad hoc testing tools to troubleshoot new code, but it wasn’t anywhere close to what developers do today with Test Driven Development and xUnit tools. As developers, we mostly ran the complete application on our development boxes and stepped through things manually to check out new code locally before throwing things over the wall to QA at the end of the project.

Regardless of whatever ad hoc, local testing developers did of their code locally, the only official testing that actually counted was the purely manual testing done by our testers in the testing environment that was supposed to exactly mimic the production environment (it never quite did, but that’s a story for another day). The QA team strictly used black-box testing with some direct access to the underlying database.

That old black-box testing approach at the end of the project was a much slower feedback cycle than we’re accustomed to today after the advent of Agile Software Development. The killer problem was that the testing feedback cycles were too slow to consider evolutionary design approaches because of the real fear of regression failures. It was also harder as a developer to address defects found in the testing cycle because you were frequently needing to work with code you hadn’t touched in many months that certainly wasn’t fresh in your mind. That frequently led to marathon debugging sessions while a helpful project manager came by your cubicle to cheerfully ask for any updates several times a day and crank up the pressure. As a developer you were also completely at the mercy of QA for information about what was really happening in the system when they found bugs.

In my mind, the most important element of Agile software development overall was the emphasis on improving feedback cycles. Faster feedback allowed

Testing is not about proving that our code works perfectly so much as a way to find and remove enough problems from the code that it can be deployed to production. I think this is an important approach because it allows us to use faster, finer grained testing approaches like isolated unit testing or intermediate level white-box integration tests that are generally faster running and cheaper to build than classic black-box, end to end tests.

What’s Next?

My organization has started an effort to introduce much more integration testing into our development processes as a way of improving quality and throughput. To help out on that, I’m going to attempt to write a series of blog posts going into specific areas about tools and techniques, but for right now I’m just jotting down this stream of consciousness brain dump to get started.

Based on what I think we need to establish at work, I’m thinking to cover:

  • .Net IHost bootstrapping and lifecycle within xUnit.Net or NUnit. I’m much more familiar with xUnit.Net from recent development, but we mostly use NUnit at work so I’ll be trying to cover that base as well.
  • HTTP API testing, which will inevitably feature Alba
  • Dealing with databases in tests, and that’s gonna have to cover both RDBMS databases and probably Mongo Db for now
  • Message handler testing with MassTransit and NServiceBus (we use both in different products)
  • Another brain dump on doing end to end testing after a meeting today about the CI of one of our big systems.
  • How should automated testing be integrated into the development cycle, and who should be responsible for these tests, and why is the obvious answer a very close collaboration between testers, developers, and even business experts
  • This will be a bigger stretch for me, but maybe get into how to do some semi-integration testing of an Angular front end with NgRX.
  • After talking through some of our issues with test automation at work, I think I’d like to blog about some of the positive things we did with Storyteller. I’ve been increasingly frustrated with xUnit.Net (and don’t think NUnit would be much better) for integration testing, so I’ve got quite a bit of notes about what an alternative tool optimized for integration tests could look like I wouldn’t mind publishing.

Testing web services secured by JWT tokens with Alba v5

We’re working toward standing up a new OIDC infrastructure built around Identity Server 5, with a couple gigantic legacy monolith applications and potentially dozens of newer microservices needing to use this new identity server for authentication. We’ll have to have a good story for running our big web applications that will have this new identity server dependency at development time, but for right now I just want to focus on an automated testing strategy for our newer ASP.Net Core web services using the Alba library.

First off, Alba is a helper library for integration testing HTTP API endpoints in .Net Core systems. Alba wraps the ASP.Net Core TestServer while providing quite a bit of convenient helpers for setting up and verifying HTTP calls against your ASP.Net Core services. We will be shortly introducing Alba into my organization at MedeAnalytics as a way of doing much more integration testing at the API layer (think the middle layer of any kind of testing pyramid concept).

In my previous post I laid out some plans and proposals for a quickly forthcoming Alba v5 release, with the biggest improvement being a new model for being able to stub out OIDC authentication for APIs that are secured by JWT bearer tokens (I think I win programming bingo for that sentence!).

Before I show code, I should say that all of this code is in the v5 branch of Alba on GitHub, but not yet released as it’s very heavily in flight.

To start, I’m assuming that you have a web service project, then a testing library for that web service project. In your web application, bearer token authentication is set up something like this inside your Startup.ConfigureServices() method:

services.AddAuthentication("Bearer")
    .AddJwtBearer("Bearer", options =>
    {
        // A real application would pull all this information from configuration
        // of course, but I'm hardcoding it in testing
        options.Audience = "jwtsample";
        options.ClaimsIssuer = "myapp";
        
        // don't worry about this, our JwtSecurityStub is gonna switch it off in
        // tests
        options.Authority = "https://localhost:5001";
            

        options.TokenValidationParameters = new TokenValidationParameters
        {
            ValidateAudience = false,
            IssuerSigningKey = new SymmetricSecurityKey(Encoding.UTF8.GetBytes("some really big key that should work"))
        };
    });

And of course, you also have these lines of code in your Startup.Configure() method to add in ASP.Net Core middleware for authentication and authorization:

app.UseAuthentication();
app.UseAuthorization();

With these lines of setup code, you will not be able to hit any secured HTTP endpoint in your web service unless there is a valid JWT token in the Authorization header of the incoming HTTP request. Moreover, with this configuration your service would need to make calls to the configured bearer token authority (http://localhost:5001 above). It’s going to be awkward and probably very brittle to depend on having the identity server spun up and running locally when our developers try to run API tests. It would obviously be helpful if there was a quick way to stub out the bearer token authentication in testing to automatically supply known claims so our developers can focus on developing their individual service’s functionality.

That’s where Alba v5 comes in with its new JwtSecurityStub extension that will:

  1. Disable any validation interactions with an external OIDC authority
  2. Automatically add a valid JWT token to any request being sent through Alba
  3. Give developers fine-grained control over the claims attached to any specific request if there is logic that will vary by claim values

To demonstrate this new Alba functionality, let’s assume that you have a testing project that has a direct reference to the web service project. The direct project reference is important because you’ll want to spin up the “system under test” in a test fixture like this:

    public class web_api_authentication : IDisposable
    {
        private readonly IAlbaHost theHost;

        public web_api_authentication()
        {
            // This is calling your real web service's configuration
            var hostBuilder = Program.CreateHostBuilder(new string[0]);

            // This is a new Alba v5 extension that can "stub" out
            // JWT token authentication
            var jwtSecurityStub = new JwtSecurityStub()
                .With("foo", "bar")
                .With(JwtRegisteredClaimNames.Email, "guy@company.com");

            // AlbaHost was "SystemUnderTest" in previous versions of
            // Alba
            theHost = new AlbaHost(hostBuilder, jwtSecurityStub);
        }

I was using xUnit.Net in this sample, but Alba is agnostic about the actual testing library and we’ll use both NUnit and xUnit.Net at work.

In the code above I’ve bootstrapped the web service with Alba and attached the JwtSecurityStub. I’ve also established some baseline claims that will be added to every JWT token on all Alba scenario requests. The AlbaHost extends the IHost interface you’re already used to in .Net Core, but adds the important Scenario() method that you can use to run HTTP requests all the way through your entire application stack like this test :

        [Fact]
        public async Task post_to_a_secured_endpoint_with_jwt_from_extension()
        {
            // Building the input body
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await theHost.Scenario(x =>
            {
                // Alba deals with Json serialization for us
                x.Post.Json(input).ToUrl("/math");
                
                // Enforce that the HTTP Status Code is 200 Ok
                x.StatusCodeShouldBeOk();
            });

            var output = response.ResponseBody.ReadAsJson<Result>();
            output.Sum.ShouldBe(9);
            output.Product.ShouldBe(24);
        }

You’ll notice that I did absolutely nothing in regards to JWT set up or claims or anything. That’s because the JwtSecurityStub is taking care of everything for you. It’s:

  1. Reaching into your application’s bootstrapping to pluck out the right signing key so that it builds JWT token strings that can be validated with the right signature
  2. Turns off any external token validation using an external OIDC authority
  3. Places a unique, unexpired JWT token on each request that matches the issuer and authority configuration of your application

Now, to further control the claims used on any individual scenario request, you can use this new method in Scenario tests:

        [Fact]
        public async Task can_modify_claims_per_scenario()
        {
            var input = new Numbers
            {
                Values = new[] {2, 3, 4}
            };

            var response = await theHost.Scenario(x =>
            {
                // This is a custom claim that would only be used for the 
                // JWT token in this individual test
                x.WithClaim(new Claim("color", "green"));
                x.Post.Json(input).ToUrl("/math");
                x.StatusCodeShouldBeOk();
            });

            var principal = response.Context.User;
            principal.ShouldNotBeNull();
            
            principal.Claims.Single(x => x.Type == "color")
                .Value.ShouldBe("green");
        }

I’ve got plenty of more ground to cover in how we’ll develop locally with our new identity server strategy, but I’m feeling pretty good about having a decent API testing strategy. All of this code is just barely written, so any feedback you might have would be very timely. Thanks for reading about some brand new code!

Testing effectively — with or without mocks or stubs

My team at MedeAnalytics are working with our development teams on a long term initiative to improve the effectiveness of our developer testing practices and encouraging more Test Driven Development in daily development. As part of that effort, it’s time for my organization to talk about how — and when — we use test doubles like mock objects or stubs in our daily work.

I think most developers have probably heard the terms mock or stub. Mocks and stubs are examples of “test doubles” that refer to any kind of testing object or function that is substituted a production dependency while testing or in early development before the real dependency is available. At the moment, I’m concerned that we may be overusing mock objects and especially dynamic mocking tools when other types of testing doubles could be easier or we should be switching to integration testing and be testing against the real dependencies.

Let’s first talk about the different kinds of tests that developers write, both as part of a Test Driven Development workflow and tests that you may add later to act as regression tests. When I was learning about developer testing, we talked about a pretty strict taxonomy of test types using the old Michael Feathers definition of what constituted a unit test. Unit tests typically meant that we tested one class at a time with all of its dependencies replaced with some kind of test double so we could isolate the functionality of just that one test. We’d also write some integration tests that ran part or all of the application stack, but that wasn’t emphasized very much at the time.

Truth be told, many of the mock-heavy unit tests I wrote back then didn’t provide a lot of value compared to the effort I put into authoring them, and I learned the hard way in longer lived codebases like StructureMap that I was better off in many cases breaking the “one class” rule of unit tests and writing far more coarser grained tests that were focused on usage scenarios because the fine-grained unit tests actually made it much harder to evolve the internal structure of the code. In my later efforts, I switched to mostly testing through the public APIs all the way down the stack and got much better results.

Flash forward to today, and we talk a lot about the testing pyramid concept where code really needs to be covered by different types of tests for maximum effective test coverage – lots of small unit tests, a medium amount of a nebulously defined middle ground of integration tests, and a handful of full blown, end to end black box tests. The way that I personally over-simplify this concept is to say:

Test with the finest grained mechanism that tells you something important

Jeremy Miller (me!)

For any given scenario, I want developers to consciously choose the proper mix of testing techniques that really tells you that the code fulfills its requirements correctly. At the end of the day, the code passing all of its integration tests should go along way toward making us confident that the code is ready to ship to production.

I also want developers to be aware that unit tests that become tightly coupled to the implementation details can make it quite difficult to refactor that code later. In one ongoing project, one of my team members is doing a lot of work to optimize an expensive process. We’ve already talked about the need to do the majority of his testing from outside-in with integration tests so he will have more freedom to iterate on completely different internal mechanisms while he pursues performance optimization.

I think I would recommend that we all maybe stop thinking so much about unit vs integration tests and think more about tests being on a continuous spectrum between “sociable” vs “solitary” tests that Marten Fowler discusses in On the Diverse And Fantastical Shapes of Testing.

To sum up this section, I think there are two basic questions I want our developers to constantly ask themselves in regards to using any kind of test double like a mock object or a stub:

  1. Should we be using an integration or “sociable” test instead of a “solitary” unit test that uses test doubles?
  2. When writing a “solitary” unit test with a test double, which type of test double is easiest in this particular test?

In the past, we strongly preferred writing “solitary” tests with or without mock objects or other fakes because those tests were reliable and ran fast. That’s still a valid consideration, but I think these days it’s much easier to author more “socialable” tests that might even be using infrastructure like databases or the file system than it was when I originally learned TDD and developer testing. Especially if a team is able to use something like Docker containers to quickly spin up local development environments, I would very strongly recommend writing tests that work through the data layer or call HTTP endpoints in place of pure, Feathers-compliant unit tests.

Black box or UI tests through tools like Selenium are just a different ball game altogether and I’m going to just say that’s out of the scope of this post and get on with things.

As for when mock objects or stubs or any other kind of test double are or are not appropriate, I’m going to stand pat on what I’ve written in the past:

After all of that, I’d finally like to talk about all the different kinds of test doubles, but first I’d like to make a short digression into…

The Mechanics of Automated Tests

I think that most developers writing any kind of tests today are familiar with the Arrange-Act-Assert pattern and nomenclature. To review, most tests will follow a structure of:

  1. “Arrange” any initial state and inputs to the test. One of the hallmarks of a good test is being repeatable and creating a clear relationship between known inputs and expected outcomes.
  2. “Act” by executing a function or a method on a class or taking some kind of action in code
  3. “Assert” that the expected outcome has happened

In the event sourcing support in Marten V4, we have a subsystem called the “projection daemon” that constantly updates projected views based on incoming events. In one of its simpler modes, there is a class called SoloCoordinator that is simply responsible for starting up every single configured projected view agent when the projection daemon is initialized. To test the start up code of that class, we have this small test:

    public class SoloCoordinatorTests
    {
        [Fact]
        public async Task start_starts_them_all()
        {
            // "Arrange" in this case is creating a test double object
            // as an input to the method we're going to call below
            var daemon = Substitute.For<IProjectionDaemon>();
            var coordinator = new SoloCoordinator();

            // This is the "Act" part of the test
            await coordinator.Start(daemon, CancellationToken.None);

            // This is the "Assert" part of the test
            await daemon.Received().StartAllShards();
        }
    }

and the little bit of code it’s testing:

        public Task Start(IProjectionDaemon daemon, CancellationToken token)
        {
            _daemon = daemon;
            return daemon.StartAllShards();
        }

In the code above, the real production code for the IProjectionDaemon interface is very complicated and setting up a real one would require a lot more code. To short circuit that set up in the “arrange” part of the test, I create a “test double” for that interface using the NSubstitute library, my dynamic mock/stub/spy library of choice.

In the “assert” phase of the test I needed to verify that all of the known projected views were started up, and I did that by asserting through the mock object that the IProjectionDaemon.StartAllShards() method was called. I don’t necessarily care here that that specific method was called so much as that the SoloCoordinator sent a logical message to start all the projections.

In the “assert” part of a test you can either verify some expected change of state in the system or return value (state-based testing), or use what’s known as “interaction testing” to verify that the code being tested sent the expected messages or invoked the proper actions to its dependencies.

See an old post of mine from 2005 called TDD Design Starter Kit – State vs. Interaction Testing for more discussion on this subject. The old code samples and formatting are laughable, but I think the discussion about the concepts are still valid.

As an aside, you might ask why I bothered writing a test for such a simple piece of code? I honestly won’t write bother writing unit tests in every case like this, but a piece of advice I read from (I think) Kent Beck was to write a test for any piece of code that could possibly break. Another rule of thumb is to write a test for any vitally important code regardless of how simple it is in order to remove project risk by locking it down through tests that could fail in CI if the code is changed. And lastly, I’d argue that it was worthwhile to write that test as documentation about what the code should be doing for later developers.

Mocks or Spies

Now that we’ve established the basic elements of automated testing and reviewed the difference between state-based and interaction-based testing, let’s go review the different types of test doubles. I’m going to be using the nomenclature describing different types of test doubles from the xUnit Patterns book by Gerard Meszaro (I was one of the original technical reviewers of that book and used the stipend to buy a 30gb iPod that I had for years). You can find his table explaining the different kinds of test doubles here.

As I explain the differences between these concepts, I recommend that you focus more on the role of a test double within a test than get too worked up about how things happen to be implemented and if we are or are not using a dynamic mocking tool like NSubstitute.

The most commonly used test double term is a “mock.” Some people use this term to mean “dynamic stand in object created dynamically by a mocking tool.” The original definition was an object that can be used to verify the calls or interactions that the code under test makes to its dependencies.

To illustrate the usage of mock objects, let’s start with a small feature in Marten to seed baseline data in the database whenever a new Marten DocumentStore is created. That feature allows users to register objects implementing this interface to the configuration of a new DocumentStore:

    public interface IInitialData
    {
        Task Populate(IDocumentStore store);
    }

Here’s a simple unit test from the Marten codebase that uses NSubstitute to do interaction testing with what I’d still call a “mock object” to stand in for IInitialData objects in the test:

        [Fact]
        public void runs_all_the_initial_data_sets_on_startup()
        {

            // These three objects are mocks or spies
            var data1 = Substitute.For<IInitialData>();
            var data2 = Substitute.For<IInitialData>();
            var data3 = Substitute.For<IInitialData>();

            // This is part of a custom integration test harness
            // we use in the Marten codebase. It's configuring and
            // and spinning up a new DocumentStore. As part of the
            // DocumentStore getting initialized, we expect it to
            // execute all the registered IInitialData objects
            StoreOptions(_ =>
            {
                _.InitialData.Add(data1);
                _.InitialData.Add(data2);
                _.InitialData.Add(data3);
            });

            theStore.ShouldNotBeNull();

            // Verifying that the expected interactions
            // with the three mocks happened as expected
            data1.Received().Populate(theStore);
            data2.Received().Populate(theStore);
            data3.Received().Populate(theStore);
        }

In the test above, the “assertions” are just that at some point when a new Marten DocumentStore is initialized it will call the three registered IInitialData objects to seed data. This test would fail if the IInitialData.Populate(IDocumentStore) method was not called on any of the mock objects. Mock objects are used specifically to do assertions about the interactions between the code under test and its dependencies.

In the original xUnitPatterns book, the author also identified another slightly different test double called a “spy” that recorded the inputs to itself that could be interrogated in the “Assert” part of a test. That differentiation did make sense years ago when early mock tools like RhinoMocks or NMock worked very differently than today’s tools like NSubstitute or FakeItEasy.

I used NSubstitute in the sample above to build a mock dynamically, but at other times I’ll roll a mock object by hand when it’s more convenient. Consider the common case of needing to verify that an important message or exception was logged (I honestly won’t always write tests for this myself, but it makes for a good example here).

Using a dynamic mocking tool (Moq in this case) to mock the ILogger<T> interface from the core .Net abstractions just to verify that an exception was logged could result in code like this:

_loggerMock.Received().Log(
            _loggerMock.Verify(
                x => x.Log(
                    LogLevel.Error,
                    It.IsAny<EventId>(),
                    It.IsAny<It.IsAnyType>(),
                    It.IsAny<Exception>(),
                    It.IsAny<Func<It.IsAnyType, Exception, string>>()),
                Times.AtLeastOnce);

In this case you have to use an advanced feature of mocking libraries called argument matchers as a kind of wild card to match the expected call against data generated at runtime you don’t really care about. As you can see, this code is a mess to write and to read. I wouldn’t say to never use argument matchers, but it’s a “guilty until proven” kind of technique to me.

Instead, let’s write our own mock object for logging that will more easily handle the kind of assertions we need to do later (this wasn’t a contrived example, I really have used this):

    public class RecordingLogger<T> : ILogger<T>
    {
        public IDisposable BeginScope<TState>(TState state)
        {
            throw new NotImplementedException();
        }

        public bool IsEnabled(LogLevel logLevel)
        {
            return true;
        }

        public void Log<TState>(
            LogLevel logLevel, 
            EventId eventId, 
            TState state, 
            Exception exception, 
            Func<TState, Exception, string> formatter)
        {
            // Just add this object to the list of messages
            // received
            var message = new LoggedMessage
            {
                LogLevel = logLevel,
                EventId = eventId,
                State = state,
                Exception = exception,
                Message = formatter?.Invoke(state, exception)
            };

            Messages.Add(message);
        }

        public IList<LoggedMessage> Messages { get; } = new List<RecordingLogger<T>.LoggedMessage>();

        public class LoggedMessage
        {
            public LogLevel LogLevel { get; set; }

            public EventId EventId { get; set; }

            public object State { get; set; }

            public Exception Exception { get; set; }

            public string Message { get; set; }
        }

        public void AssertExceptionWasLogged()
        {
            // This uses Fluent Assertions to blow up if there
            // are no recorded errors logged
            Messages.Any(x => x.LogLevel == LogLevel.Error)
                .Should().BeTrue("No exceptions were logged");
        }
    }

With this hand-rolled mock object, the ugly code above that uses argument matchers just becomes this assertion in the tests:

// _logger is a RecordingLogger<T> that 
// was used as an input to the code under
// test
_logger.AssertExceptionWasLogged();

I’d argue that that’s much simpler to read and most certainly to write. I didn’t show it here, but with you could have just interrogated the calls made to a dynamically generated mock object without using argument matchers, but the syntax for that can be very ugly as well and I don’t recommend that in most cases.

To sum up, “mock objects” are used in the “Assert” portion of your test to verify that expected interactions were made with the dependencies of the code under tests. You also don’t have to use a mocking tool like NSubstitute, and sometimes a hand-rolled mock class might be easier to consume and lead to easier to read tests.

Pre-Canned Data with Stubs

“Stubs” are just a way to replace real services with some kind of stand in that supplies pre-canned data as test inputs. Whereas “mocks” refer to interaction testing, “stubs” are to provide inputs in state-based testing. I won’t go into too much detail because I think this concept is pretty well understood, but here’s an example of using NSubstitute to whip up a stub in place of a full blown Marten query session in a test:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(session);
            
            // carry out the Arrange and Assert parts of the test
        }

Again, “stub” refers to a role within the test and not how it was built. In memory database stand ins in tools like Entity Framework Core are another common example of using stubs.

Dummy Objects

A “dummy” is a test double who’s only real purpose is to act as a stand in service that does nothing but yet allows your test to run without constant NullReferenceException problems. Going back to the ServiceThatLooksUpUsers in the previous section, let’s say that the service also depends on the .Net ILogger<T> abstraction for tracing within the service. We may not care about the log messages happening in some of our tests, but ServiceThatLooksUpUsers will blow up if it doesn’t have a logger, so we’ll use the built in NullLogger<T> that’s part of the .Net logging as a “dummy” like so:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(
                session,
                
                // Using a dummy logger
                new NullLogger<ServiceThatLooksUpUsers>());

            // carry out the Arrange and Assert parts of the test
        }

Summing it all up

I tried to cover a lot of ground, and to be honest, this was meant to be the first cut at a new “developer testing best practices” guide at work so it meanders a bit.

There’s a few things I would hope folks would get out of this post:

  • Mocks or stubs can sometimes be very helpful in writing tests, but can also cause plenty of heartburn in other cases.
  • Don’t hesitate to skip mock-heavy unit tests with some sort of integration test might be easier to write or do more to ascertain that the code actually works
  • Use the quickest feedback cycle you can get away with when trying to decide what kind of testing to do for any given scenario — and sometimes a judicious usage of a stub or mock object helps write tests that run faster and are easier to set up than integration tests with the real dependencies
  • Don’t get tunnel vision on using mocking libraries and forget that hand-rolled mocks or stubs can sometimes be easier to use within some tests