Resiliency with Wolverine

Yesterday I started a new series of blog posts about Wolverine capabilities with:

To review, I was describing a project I worked on years ago that involved some interactions with a very unreliable 3rd party web service system to handle payments originating from a flat file:

Just based on that diagram above, and admittedly some bad experiences in the shake down cruise of the historical system that diagram was based on, here’s some of the things that can go wrong:

  • The system blows up and dies while the payments from a particular file are only half way processed
  • Transient errors from database connectivity. Network hiccups
  • File IO errors from reading the flat files (I tend to treat direct file system access a lot like a poisonous snake due to very bad experiences early in my career)
  • HTTP errors from timeouts calling the web service
  • The 3rd party system is under distress and performing very poorly, such that a high percentage of requests are timing out
  • The 3rd party system can be misconfigured after code migrations on its system so that it’s technically “up” and responsive, but nothing actually works
  • The 3rd party system is completely down

Man, it’s a scary world sometimes!

Let’s say right now that our goal is as much as possible to have a system that is:

  1. Able to recover from errors without losing any ongoing work
  2. Doesn’t allow the system to permanently get into an inconsistent state — i.e. a file is marked as completely read, but somehow some of the payments from that file got lost along the way
  3. Rarely needs manual intervention from production support to recover work or restart work
  4. Heavens forbid, when something does happen that the system can’t recover from, it notifies production support

Now let’s go onto how to utilize Wolverine features to satisfy those goals in the face of all the potential problems I identified.

What if the system dies halfway through a file?

If you read through the last post, I used the local queueing mechanism in Wolverine to effectively create a producer/consumer workflow. Great! But what if the current process manages to die before all the ongoing work is completed? That’s where the durable inbox support in Wolverine comes in.

Pulling Marten in as our persistence strategy (but EF Core with either Postgresql or Sql Server is fully supported for this use case as well), I’m going to set up the application to opt into durable inbox mechanics for all locally queued messages like so (after adding the WolverineFx.Marten Nuget):

using Marten;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Oakton;
using Wolverine;
using Wolverine.Marten;
using WolverineIngestionService;

return await Host.CreateDefaultBuilder()
    .UseWolverine((context, opts) =>
    {
        // There'd obviously be a LOT more set up and service registrations
        // to be a real application

        var connectionString = context.Configuration.GetConnectionString("marten");
        opts.Services
            .AddMarten(connectionString)
            .IntegrateWithWolverine();

        // I want all local queues in the application to be durable
        opts.Policies.UseDurableLocalQueues();
        opts.LocalQueueFor<PaymentValidated>().Sequential();
        opts.LocalQueueFor<PostPaymentSchedule>().Sequential();
    }).RunOaktonCommands(args);

And with those changes, all in flight messages in the local queues are also stored durably in the backing database. If the application process happens to fail in flight, the persisted messages will fail over to either another running node or be picked up by restarting the system process.

So far, so good? Onward…

Getting Over transient hiccups

Sometimes database interactions will fail with transient errors and will very well succeed if retried later. This is especially common when the database is under stress. Wolverine’s error handling policies easily accommodate that, and in this case I’m going to add some retry capabilities for basic database exceptions like so:

        // Retry on basic database exceptions with some cooldown time in
        // between retries
        opts
            .Policies
            .OnException<NpgsqlException>()
            .Or<MartenCommandException>()
            .RetryWithCooldown(100.Milliseconds(), 250.Milliseconds(), 500.Milliseconds());

        opts
            .OnException<TimeoutException>()
            .RetryWithCooldown(250.Milliseconds(), 500.Milliseconds());

Notice how I’ve specified some “cooldown” times for subsequent failures. This is more or less an example of exponential back off error handling that’s meant to effectively throttle a distressed subsystem to allow it to catch up and recover.

Now though, not every exception implies that the message may magically succeed at a later time, so in that case…

Walk away from bad apples

Over time we can recognize exceptions that pretty well mean that the message can never succeed. In that case we should just throw out the message instead of allowing it to suck down resources by being retried multiple times. Wolverine happily supports that as well. Let’s say that payment messages can never work if it refers to an account that cannot be found, so let’s do this:

        // Just get out of there if the account referenced by a message
        // does not exist!
        opts
            .OnException<UnknownAccountException>()
            .Discard();

I should also note that Wolverine is writing to your application log when this happens.

Circuit Breakers to give the 3rd party system a timeout

As I’ve repeatedly said in this blog series so far, the “very unreliable 3rd party system” was somewhat less than reliable. What we found in practice was that the service would fail in bunches when it fell behind, but could recover over time. However, what would happen — even with the exponential back off policy — was that when the system was distressed it still couldn’t recover in time and continuing to pound it with retries just led to everything ending up in dead letter queues where it eventually required manual intervention to recover. That was exhausting and led to much teeth gnashing (and fingers pointed at me in angry meetings). In response to that, Wolverine comes with circuit breaker support as shown below:

        // These are the queues that handle calls to the 3rd party web service
        opts.LocalQueueFor<PaymentValidated>()
            .Sequential()
            
            // Add the circuit breaker
            .CircuitBreaker(cb =>
            {
                // If the conditions are met to stop processing messages,
                // Wolverine will attempt to restart in 5 minutes
                cb.PauseTime = 5.Minutes();

                // Stop listening if there are more than 20% failures 
                // over the tracking period
                cb.FailurePercentageThreshold = 20;

                // Consider the failures over the past minute
                cb.TrackingPeriod = 1.Minutes();
                
                // Get specific about what exceptions should
                // be considered as part of the circuit breaker
                // criteria
                cb.Include<TimeoutException>();

                // This is our fault, so don't shut off the listener
                // when this happens
                cb.Exclude<InvalidRequestException>();
            });
        
        opts.LocalQueueFor<PostPaymentSchedule>()
            .Sequential()
            
            // Or the defaults might be just fine
            .CircuitBreaker();

With the set up above, if Wolverine detects too high a rate of message failures in a given time, it will completely stop message processing for that particular local queue. Since we’ve isolated the message processing for the two types of calls to the 3rd party web service, we’re allowing everything else to continue when the circuit breaker stops message processing. Do note that the circuit breaker functionality will try to restart message processing later after the designated pause time. Hopefully the pause time allows for the 3rd party system to recover — or for production support to make it recover. All of this without making all the backed up messages continuously fail and end up landing in the dead letter queues where it will take manual intervention to recover the work in progress.

Hold the line, the 3rd party system is broken!

On top of every thing else, the “very unreliable 3rd party system” was easily misconfigured at the drop of a hat such that it would become completely nonfunctional even though it appeared to be responsive. When this happened, every single message to that service would fail. So again, instead of letting all our pending work end up in the dead letter queue, let’s instead completely pause all message handling on the current local queue (wherever the error happened) if we can tell from the exception that the 3rd party system is nonfunctional like so:

        // If we encounter this specific exception with this particular error code,
        // it means that the 3rd party system is 100% nonfunctional even though it appears
        // to be up, so let's pause all processing for 10 minutes
        opts.OnException<ThirdPartyOperationException>(e => e.ErrorCode == 235)
            .Requeue().AndPauseProcessing(10.Minutes());

Summary and next time!

It’s helpful to assign work within message handlers in such a way to maximize your error handling. Think hard about what actions in your system are prone to failure and may deserve to be their own individual message handler and messaging endpoint to allow for exact error handling policies like the way I used a circuit breaker on the queues that handled calls to the unreliable 3rd party service.

For my next post in this series, I think I want to make a diversion into integration testing using a stand in stub for the 3rd party service using the application setup with Lamar.

Wolverine’s New HTTP Endpoint Model

UPDATE: If you pull down the sample code, it’s not quite working with Swashbuckle yet. It *does* publish the metadata and the actual endpoints work, but it’s not showing up in the OpenAPI spec. Always something.

I just published Wolverine 0.9.10 to Nuget (after a much bigger 0.9.9 yesterday). There’s several bug fixes, some admitted breaking changes to advanced configuration items, and one significant change to the “mediator” behavior that’s described at the section at the very bottom of this post.

The big addition is a new library that enables Wolverine’s runtime model directly for HTTP endpoints in ASP.Net Core services without having to jump through the typical sequence of delegating directly from a Minimal API method directly to Wolverine’s mediator functionality like this:

app.MapPost("/items/create", (CreateItemCommand cmd, IMessageBus bus) => bus.InvokeAsync(cmd));

app.MapPost("/items/create2", (CreateItemCommand cmd, IMessageBus bus) => bus.InvokeAsync<ItemCreated>(cmd));

Instead, Wolverine now has the WolverineFx.Http library to directly use Wolverine’s runtime model — including its unique middleware approach — directly from HTTP endpoints.

Shamelessly stealing the Todo sample application from the Minimal API documentation, let’s build a similar service with WolverineFx.Http, but I’m also going to switch to Marten for persistence just out of personal preference.

To bootstrap the application, I used the dotnet new webapi model, then added the WolverineFx.Marten and WolverineFx.HTTP nugets. The application bootstrapping for basic integration of Wolverine, Marten, and the new Wolverine HTTP model becomes:

using Marten;
using Oakton;
using Wolverine;
using Wolverine.Http;
using Wolverine.Marten;

var builder = WebApplication.CreateBuilder(args);

// Adding Marten for persistence
builder.Services.AddMarten(opts =>
    {
        opts.Connection(builder.Configuration.GetConnectionString("Marten"));
        opts.DatabaseSchemaName = "todo";
    })
    .IntegrateWithWolverine()
    .ApplyAllDatabaseChangesOnStartup();

// Wolverine usage is required for WolverineFx.Http
builder.Host.UseWolverine(opts =>
{
    // This middleware will apply to the HTTP
    // endpoints as well
    opts.Policies.AutoApplyTransactions();
    
    // Setting up the outbox on all locally handled
    // background tasks
    opts.Policies.UseDurableLocalQueues();
});

// Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
}

// Let's add in Wolverine HTTP endpoints to the routing tree
app.MapWolverineEndpoints();

return await app.RunOaktonCommands(args);

Do note that the only thing in that sample that pertains to WolverineFx.Http itself is the call to IEndpointRouteBuilder.MapWolverineEndpoints().

Let’s move on to “Hello, World” with a new Wolverine http endpoint from this class we’ll add to the sample project:

public class HelloEndpoint
{
    [WolverineGet("/")]
    public string Get() => "Hello.";
}

At application startup, WolverineFx.Http will find the HelloEndpoint.Get() method and treat it as a Wolverine http endpoint with the route pattern GET: / specified in the [WolverineGet] attribute.

As you’d expect, that route will write the return value back to the HTTP response and behave as specified by this Alba specification:

[Fact]
public async Task hello_world()
{
    var result = await _host.Scenario(x =>
    {
        x.Get.Url("/");
        x.Header("content-type").SingleValueShouldEqual("text/plain");
    });
    
    result.ReadAsText().ShouldBe("Hello.");
}

Moving on to the actual Todo problem domain, let’s assume we’ve got a class like this:

public class Todo
{
    public int Id { get; set; }
    public string? Name { get; set; }
    public bool IsComplete { get; set; }
}

In a sample class called TodoEndpoints let’s add an HTTP service endpoint for listing all the known Todo documents:

[WolverineGet("/todoitems")]
public static Task<IReadOnlyList<Todo>> Get(IQuerySession session) 
    => session.Query<Todo>().ToListAsync();

As you’d guess, this method will serialize all the known Todo documents from the database into the HTTP response and return a 200 status code. In this particular case the code is a little bit noisier than the Minimal API equivalent, but that’s okay, because you can happily use Minimal API and WolverineFx.Http together in the same project. WolverineFx.Http, however, will shine in more complicated endpoints.

Consider this endpoint just to return the data for a single Todo document:

// Wolverine can infer the 200/404 status codes for you here
// so there's no code noise just to satisfy OpenAPI tooling
[WolverineGet("/todoitems/{id}")]
public static Task<Todo?> GetTodo(int id, IQuerySession session, CancellationToken cancellation) 
    => session.LoadAsync<Todo>(id, cancellation);

At this point it’s effectively de rigueur for any web service to support OpenAPI documentation directly in the service. Fortunately, WolverineFx.Http is able to glean most of the necessary metadata to support OpenAPI documentation with Swashbuckle from the method signature up above. The method up above will also cleanly set a status code of 404 if the requested Todo document does not exist.

Now, the bread and butter for WolverineFx.Http is using it in conjunction with Wolverine itself. In this sample, let’s create a new Todo based on submitted data, but also publish a new event message with Wolverine to do some background processing after the HTTP call succeeds. And, oh, yeah, let’s make sure this endpoint is actively using Wolverine’s transactional outbox support for consistency:

[WolverinePost("/todoitems")]
public static async Task<IResult> Create(CreateTodo command, IDocumentSession session, IMessageBus bus)
{
    var todo = new Todo { Name = command.Name };
    session.Store(todo);

    // Going to raise an event within out system to be processed later
    await bus.PublishAsync(new TodoCreated(todo.Id));
    
    return Results.Created($"/todoitems/{todo.Id}", todo);
}

The endpoint code above is automatically enrolled in the Marten transactional middleware by simple virtue of having a dependency on Marten’s IDocumentSession. By also taking in the IMessageBus dependency, WolverineFx.Http is wrapping the transactional outbox behavior around the method so that the TodoCreated message is only sent after the database transaction succeeds.

Lastly for this page, consider the need to update a Todo from a PUT call. Your HTTP endpoint may vary its handling and response by whether or not the document actually exists. Just to show off Wolverine’s “composite handler” functionality and also how WolverineFx.Http supports middleware, consider this more complex endpoint:

public static class UpdateTodoEndpoint
{
    public static async Task<(Todo? todo, IResult result)> LoadAsync(UpdateTodo command, IDocumentSession session)
    {
        var todo = await session.LoadAsync<Todo>(command.Id);
        return todo != null 
            ? (todo, new WolverineContinue()) 
            : (todo, Results.NotFound());
    }

    [WolverinePut("/todoitems")]
    public static void Put(UpdateTodo command, Todo todo, IDocumentSession session)
    {
        todo.Name = todo.Name;
        todo.IsComplete = todo.IsComplete;
        session.Store(todo);
    }
}

In the WolverineFx.Http model, any bit of middleware that returns an IResult object is tested by the generated code to execute any IResult object returned from middleware that is not the built in WolverineContinue type and stop all further processing. This is intended to enable validation or authorization type middleware where you may need to filter calls to the inner HTTP handler.

With the sample application out of the way, here’s a rundown of the significant things about this library:

  • It’s actually a pretty small library in the greater scheme of things and all it really does is connect ASP.Net Core’s endpoint routing to the Wolverine runtime model — and Wolverine’s runtime model is likely going to be somewhat more efficient than Minimal API and much more efficient that MVC Core
  • It can be happily combined with Minimal API, MVC Core, or any other ASP.Net Core model that exploits endpoint routing, even within the same application
  • Wolverine is allowing you to use the Minimal API IResult model
  • The JSON serialization is strictly System.Text.Json and uses the same options as Minimal API within an ASP.Net Core application
  • It’s possible to use Wolverine middleware strategy with the HTTP endpoints
  • Wolverine is trying to glean necessary metadata from the method signatures to feed OpenAPI usage within ASP.Net Core without developers having to jump through hoops adding attributes or goofy TypedResult noise code just for Swashbuckle
  • This model plays nicely with Wolverine’s transactional outbox model for common cases where you need to both make database changes and publish additional messages for background processing in the same HTTP call. That’s a bit of important functionality that I feel is missing or is clumsy at best in many leading .NET server side technologies.

For the handful of you reading this that still remember FubuMVC, Wolverine’s HTTP model retains some of FubuMVC’s old strengths in terms of still not ramming framework concerns into your application code, but learned some hard lessons from FubuMVC’s ultimate failure:

  • FubuMVC was an ambitious, sprawling framework that was trying to be its own ecosystem with its own bootstrapping model, logging abstractions, and even IoC abstractions. WolverineFx.Http is just a citizen within the greater ASP.Net Core ecosystem and uses common .NET abstractions, concepts, and idiomatic naming conventions at every possible turn
  • FubuMVC relied too much on conventions, which was great when the convention was exactly what you needed, and kinda hurtful when you needed something besides the exact conventions. Not to worry, WolverineFx.Http let’s you drop right down to the HttpContext level at will or use any of the IResult objects in existing ASP.Net Core whenever the Wolverine conventions don’t fit.
  • FubuMVC could technically be used with old ASP.Net MVC, but it was a Frankenstein’s monster to pull off. Wolverine can be mixed and matched at will with either Minimal API, MVC Core, or even other OSS projects that exploit ASP.Net Core endpoint routing.
  • Wolverine is trying to play nicely in terms of OpenAPI metadata and security related metadata for usage of standard ASP.Net Core middleware like the authorization or authentication middleware
  • FubuMVC’s “Behavior” model gave you a very powerful “Russian Doll” middleware ability that was maximally flexible — and also maximally inefficient in runtime. Wolverine’s runtime model takes a very different approach to still allow for the “Russian Doll” flexibility, but to do so in a way that is more efficient at runtime than basically every other commonly used framework today in the .NET community.
  • When things went boom in FubuMVC, you got monumentally huge stack traces that could overwhelm developers who hadn’t had a week’s worth of good night sleeps. It sounds minor, but Wolverine is valuable in the sense that the stack traces from HTTP (or message handler) failures will have very minimal Wolverine related framework noise in the stack trace for easier readability by developers.

Big Change to In Memory Mediator Model

I’ve been caught off guard a bit by how folks have mostly been interested in Wolverine as an alternative to MediatR with typical usage like this where users just delegate to Wolverine in memory within a Minimal API route:

app.MapPost("/items/create2", (CreateItemCommand cmd, IMessageBus bus) => bus.InvokeAsync<ItemCreated>(cmd));

With the corresponding message handler being this:

public class ItemHandler
{
    // This attribute applies Wolverine's EF Core transactional
    // middleware
    [Transactional]
    public static ItemCreated Handle(
        // This would be the message
        CreateItemCommand command,

        // Any other arguments are assumed
        // to be service dependencies
        ItemsDbContext db)
    {
        // Create a new Item entity
        var item = new Item
        {
            Name = command.Name
        };

        // Add the item to the current
        // DbContext unit of work
        db.Items.Add(item);

        // This event being returned
        // by the handler will be automatically sent
        // out as a "cascading" message
        return new ItemCreated
        {
            Id = item.Id
        };
    }
}

Prior to the latest release, the ItemCreated event in the handler above when used from IMessageBus.InvokeAsync<ItemCreated>() was not published as a message because my original assumption was that in that case you were using the return value explicitly as a return value. Early users have been surprised that the ItemCreated was not published as a message, so I just changed the behavior to do so to make the cascading message behavior be more consistent and what folks seem to actually want.

New Wolverine Release & Future Plans

After plenty of keystone cops shenanigans with CI automation today that made me question my own basic technical competency, there’s a new Wolverine 0.9.8 release on Nuget today with a variety of fixes and some new features. The documentation website was also re-published.

First, some thanks:

  • Wojtek Suwala made several fixes and improvements to the EF Core integration
  • Ivan Milosavljevic helped fix several hanging tests on CI, built the MemoryPack integration, and improved the FluentValidation integration
  • Anthony made his first OSS contribution (?) to help fix quite a few issues with the documentation
  • My boss and colleague Denys Grozenok for all his support with reviewing docs and reporting issues
  • Kebin for improving the dead letter queue mechanics

The highlights:

Dogfooding baby!

Conveniently enough, I’m part of a little officially sanctioned skunkworks team at work experimenting with converting a massive distributed monolithic application to the full Marten + Wolverine “critter stack.” I’m very encouraged by the effort so far, and it’s driven some recent features in Wolverine’s execution model to handle complexity in enterprise systems. More on that soon.

It’s also pushing the story for interoperability with NServiceBus on the other end of Rabbit MQ queues. Strangely enough, no one is interested in trying to convert a humongous distributed system to Wolverine in one round of work. Go figure.

When will Wolverine hit 1.0?

There’s a little bit of awkwardness in that Marten V6.0 (don’t worry, that’s a much smaller release than 4/5) needs to be released first and I haven’t been helping Oskar & Babu with that recently, but I think we’ll be able to clear that soon.

My “official” plan is to finish the documentation website by the end of February and make the 1.0 release by March 1st. Right now, Wolverine is having its tires kicked by plenty of early users and there’s plenty of feedback (read: bugs or usability issues) coming in that I’m trying to address quickly. Feature wise, the only things I’m hoping to have done by 1.0 are:

  • Using more native capabilities of Azure Service Bus, Rabbit MQ, and AWS SQS for dead letter queues and delayed messaging. That’s mostly to solidify some internal abstractions.
  • It’s a stretch goal, but have Wolverine support Marten’s multi-tenancy through a database per tenant strategy. We’ll want that for internal MedeAnalytics usage, so it might end up being a priority
  • Some better integration with ASP.Net Core Minimal API

Automating Integration Tests using the “Critter Stack”

This builds on the previous blog posts in this list:

Integration Testing, but How?

Some time over the holidays Jim Shore released an updated version of his excellent paper Testing Without Mocks: A Pattern Language. He also posted this truly massive thread with some provocative opinions about test automation strategies:

I think it’s a great thread over all, and the paper is chock full of provocative thoughts about designing for testability. Moreover, some of the older content in that paper is influencing the direction of my own work with Wolverine. I’ve also made it recommended reading for the developers in my own company.

All that being said, I strongly disagree with approach the approach he describes for integration testing with “nullable infrastructure” and eschewing DI/IoC for composition in favor of just willy nilly hard coding things because “DI us scary” or whatever. My strong preference and also where I’ve had the most success is to purposely choose to rely on development technologies that lend themselves to low friction, reliable, and productive integration testing.

And as it just so happens, the “critter stack” tools (Marten and Wolverine) that I work on are purposely designed for testability and include several features specifically to make integration testing more effective for applications using these tools.

Integration Testing with the Critter Stack

From my previous blog posts linked up above, I’ve been showing a very simplistic banking system to demonstrate the usage of Wolverine with Marten. For a testing scenario, let’s go back to part of this message handler for a WithdrawFromAccount message that will effect changes on an Account document entity and potentially send out other messages to perform other actions:

    [Transactional] 
    public static async Task Handle(
        WithdrawFromAccount command, 
        Account account, 
        IDocumentSession session, 
        IMessageContext messaging)
    {
        account.Balance -= command.Amount;
     
        // This just marks the account as changed, but
        // doesn't actually commit changes to the database
        // yet. That actually matters as I hopefully explain
        session.Store(account);
 
        // Conditionally trigger other, cascading messages
        if (account.Balance > 0 && account.Balance < account.MinimumThreshold)
        {
            await messaging.SendAsync(new LowBalanceDetected(account.Id));
        }
        else if (account.Balance < 0)
        {
            await messaging.SendAsync(new AccountOverdrawn(account.Id), new DeliveryOptions{DeliverWithin = 1.Hours()});
         
            // Give the customer 10 days to deal with the overdrawn account
            await messaging.ScheduleAsync(new EnforceAccountOverdrawnDeadline(account.Id), 10.Days());
        }
        
        // "messaging" is a Wolverine IMessageContext or IMessageBus service 
        // Do the deliver within rule on individual messages
        await messaging.SendAsync(new AccountUpdated(account.Id, account.Balance),
            new DeliveryOptions { DeliverWithin = 5.Seconds() });
    }

For a little more context, I’ve set up a Minimal API endpoint to delegate to this command like so:

// One Minimal API endpoint that just delegates directly to Wolverine
app.MapPost("/accounts/withdraw", (WithdrawFromAccount command, IMessageBus bus) => bus.InvokeAsync(command));

In the end here, I want a set of integration tests that works through the /accounts/withdraw endpoint, through all ASP.NET Core middleware, all configured Wolverine middleware or policies that wrap around that handler above, and verifies the expected state changes in the underlying Marten Postgresql database as well as any messages that I would expect to go out. And oh, yeah, I’d like those tests to be completely deterministic.

First, a Shared Test Harness

I’m starting to be interested in moving back to NUnit for the first time in years strictly for integration testing because I’m starting to suspect it would give you more control over the test fixture lifecycle in ways that are frequently valuable in integration testing.

Now, before writing the actual tests, I’m going to build an integration test harness for this system. I prefer to use xUnit.Net these days as my test runner, so we’re going to start with building what will be a shared fixture to run our application within integration tests. To be able to test through HTTP endpoints, I’m also going to add another JasperFx project named Alba to the testing project (See Alba for Effective ASP.Net Core Integration Testing for more information):

public class AppFixture : IAsyncLifetime
{
    public async Task InitializeAsync()
    {
        // Workaround for Oakton with WebApplicationBuilder
        // lifecycle issues. Doesn't matter to you w/o Oakton
        OaktonEnvironment.AutoStartHost = true;
        
        // This is bootstrapping the actual application using
        // its implied Program.Main() set up
        Host = await AlbaHost.For<Program>(x =>
        {
            // I'm overriding 
            x.ConfigureServices(services =>
            {
                // Let's just take any pesky message brokers out of
                // our integration tests for now so we can work in
                // isolation
                services.DisableAllExternalWolverineTransports();
                
                // Just putting in some baseline data for our database
                // There's usually *some* sort of reference data in 
                // enterprise-y systems
                services.InitializeMartenWith<InitialAccountData>();
            });
        });
    }

    public IAlbaHost Host { get; private set; }

    public Task DisposeAsync()
    {
        return Host.DisposeAsync().AsTask();
    }
}

There’s a bit to unpack in that class above, so let’s start:

  • A .NET IHost can be expensive to set up in memory, so in any kind of sizable system I will try to share one single instance of that between integration tests.
  • The AlbaHost mechanism is using WebApplicationFactory to bootstrap our application. This mechanism allows us to make some modifications to the application’s normal bootstrapping for test specific setup, and I’m exploiting that here.
  • The `DisableAllExternalWolverineTransports()` method is a built in extension method in Wolverine that will disable all external sending or listening to external transport options like Rabbit MQ. That’s not to say that Rabbit MQ itself is necessarily impossible to use within automated tests — and Wolverine even comes with some help for that in testing as well — but it’s certainly easier to create our tests without having to worry about messages coming and going from outside. Don’t worry though, because we’ll still be able to verify the messages that should be sent out later.
  • I’m using Marten’s “initial data” functionality that’s a way of establishing baseline data (reference data usually, but for testing you may include a baseline set of test user data maybe). For more context, `InitialAccountData` is shown below:
public class InitialAccountData : IInitialData
{
    public static Guid Account1 = Guid.NewGuid();
    public static Guid Account2 = Guid.NewGuid();
    public static Guid Account3 = Guid.NewGuid();
    
    public Task Populate(IDocumentStore store, CancellationToken cancellation)
    {
        return store.BulkInsertAsync(accounts().ToArray());
    }

    private IEnumerable<Account> accounts()
    {
        yield return new Account
        {
            Id = Account1,
            Balance = 1000,
            MinimumThreshold = 500
        };
        
        yield return new Account
        {
            Id = Account2,
            Balance = 1200
        };

        yield return new Account
        {
            Id = Account3,
            Balance = 2500,
            MinimumThreshold = 100
        };
    }
}

Next, just a little more xUnit.Net overhead. To make a shared fixture across multiple test classes with xUnit.Net, I add this little marker class:

[CollectionDefinition("integration")]
public class ScenarioCollection : ICollectionFixture<AppFixture>
{
    
}

I have to look this up every single time I use this functionality.

For integration testing, I like to a have a slim base class that I tend to quite originally call “IntegrationContext” like this one:

public abstract class IntegrationContext : IAsyncLifetime
{
    public IntegrationContext(AppFixture fixture)
    {
        Host = fixture.Host;
        Store = Host.Services.GetRequiredService<IDocumentStore>();
    }
    
    public IAlbaHost Host { get; }
    public IDocumentStore Store { get; }
    
    public async Task InitializeAsync()
    {
        // Using Marten, wipe out all data and reset the state
        // back to exactly what we described in InitialAccountData
        await Store.Advanced.ResetAllData();
    }

    // This is required because of the IAsyncLifetime 
    // interface. Note that I do *not* tear down database
    // state after the test. That's purposeful
    public Task DisposeAsync()
    {
        return Task.CompletedTask;
    }
}

Other than simply connecting real test fixtures to the ASP.Net Core system under test (the IAlbaHost), this IntegrationContext utilizes another bit of Marten functionality to completely reset the database state back to only the data defined by the InitialAccountData so that we always have known data in the database before tests execute.

By and large, I find NoSQL databases to be more easily usable in automated testing than purely relational databases because it’s generally easier to tear down and rebuild databases with NoSQL. When I’m having to use a relational database in tests, I opt for Jimmy Bogard’s Respawn library to do the same kind of reset, but it’s substantially more work to use than Marten’s built in functionality.

In the case of Marten, we very purposely designed in the ability to reset the database state for integration testing scenarios from the very beginning. Add this functionality to the easy ability to run the underlying Postgresql database in a local Docker container for isolated testing, and I’ll claim that Marten is very usable within test automation scenarios with no real need to try to stub out the database or use some kind of low fidelity fake in memory database in testing.

See My Opinions on Data Setup for Functional Tests for more explanation of why I’m doing the database state reset before all tests, but never immediately afterward. And also why I think it’s important to place test data setup directly into tests rather than trying to rely on any kind of external, expected data set (when possible).

From my first pass at writing the sample test that’s coming in the next section, I discovered the need for one more helper method on IntegrationContext to make HTTP calls to the system while also tracking background Wolverine activity as shown below:

    // This method allows us to make HTTP calls into our system
    // in memory with Alba, but do so within Wolverine's test support
    // for message tracking to both record outgoing messages and to ensure
    // that any cascaded work spawned by the initial command is completed
    // before passing control back to the calling test
    protected async Task<(ITrackedSession, IScenarioResult)> TrackedHttpCall(Action<Scenario> configuration)
    {
        IScenarioResult result = null;
        
        // The outer part is tying into Wolverine's test support
        // to "wait" for all detected message activity to complete
        var tracked = await Host.ExecuteAndWaitAsync(async () =>
        {
            // The inner part here is actually making an HTTP request
            // to the system under test with Alba
            result = await Host.Scenario(configuration);
        });

        return (tracked, result);
    }

The method above gives me access to the complete history of Wolverine messages during the activity including all outgoing messages spawned by the HTTP call. It also delegates to Alba to run HTTP requests in memory and gives me access to the Alba wrapped response for easy interrogation of the response later (which I don’t need in the following test, but would frequently in other tests).

See Test Automation Support from the Wolverine documentation for more information on the integration testing support baked into Wolverine.

Writing the first integration test

The first “happy path” test that verifies that calling the web service through to the Wolverine message handler for withdrawing from an account without going into any kind of low balance conditions might look like this:

public class when_debiting_an_account : IntegrationContext
{
    public when_debiting_an_account(AppFixture fixture) : base(fixture)
    {
    }

    [Fact]
    public async Task should_increase_the_account_balance_happy_path()
    {
        // Drive in a known data, so the "Arrange"
        var account = new Account
        {
            Balance = 2500,
            MinimumThreshold = 200
        };

        await using (var session = Store.LightweightSession())
        {
            session.Store(account);
            await session.SaveChangesAsync();
        }

        // The "Act" part of the test.
        var (tracked, _) = await TrackedHttpCall(x =>
        {
            // Send a JSON post with the DebitAccount command through the HTTP endpoint
            // BUT, it's all running in process
            x.Post.Json(new WithdrawFromAccount(account.Id, 1300)).ToUrl("/accounts/debit");

            // This is the default behavior anyway, but still good to show it here
            x.StatusCodeShouldBeOk();
        });
        
        // Finally, let's do the "assert"
        await using (var session = Store.LightweightSession())
        {
            // Load the newly persisted copy of the data from Marten
            var persisted = await session.LoadAsync<Account>(account.Id);
            persisted.Balance.ShouldBe(1300); // Started with 2500, debited 1200
        }

        // And also assert that an AccountUpdated message was published as well
        var updated = tracked.Sent.SingleMessage<AccountUpdated>();
        updated.AccountId.ShouldBe(account.Id);
        updated.Balance.ShouldBe(1300);

    }
}

The test above follows the basic “arrange, act, assert” model. In order, the test:

  1. Writes a brand new Account document to the Marten database
  2. Makes an HTTP call to the system to POST a WithdrawFromAccount command to our system using our TrackedHttpCall method that also tracks Wolverine activity during the HTTP call
  3. Verify that the Account data was changed in the database the way we expected
  4. Verify that an expected outgoing message was published as part of the activity

It was a lot of initial set up to get to the point where we could write tests, but I’m going to argue in the next section that we’ve done a lot to reduce the friction in writing additional integration tests for our system in a reliable way.

Avoiding the Selenium as Golden Hammer Anti-Pattern

Playwright or Cypress.io may prove to be better options than Selenium over time (I’m bullish on Playwright myself), but the main point is really that only depending on end to end tests through the browser can easily be problematic and inefficient.

Before I go back to defending why I think the testing approach and tooling shown in this post is very effective, let’s build up an all too real strawman of inefficient and maybe even ineffective test automation:

  • All your integration tests are blackbox, end to end tests that use Selenium to drive a web browser
  • These tests can only be executed externally to the application when the application is deployed to a development or testing environment. In the worst case scenario — which is also unfortunately common — the Selenium tests cannot be easily executed locally on demand
  • The tests are prone to failures due to UI changes
  • The tests are prone to intermittent “blinking” failures due to asynchronous behavior in the UI where test assertions happen before actions are completed in the application. This is a source of major friction and poor results in large scale Selenium testing that has been endemic in every single shop or project where I’ve used or seen Selenium used over the past decade — including in my current role.
  • The end to end tests are slow compared to finer grained unit tests or smaller whitebox integration tests that do not have to use the browser
  • Test failures are often difficult to diagnose since the tests are running out of process without direct access to the actual application. Some folks try to alleviate this issue with screenshots of the browser or in more advanced usages, trying to correlate the application logs to the test runs
  • Test failures often happen because related test databases are not in the expected state

I’m laying it on pretty thick here, but I think that I’m getting my point across that only relying on Selenium based browser testing is potentially very inefficient and sometimes ineffective. Now, let’s consider how the “critter stack” tools and the testing approach I used up above solve some of the issues I raised just above:

  • Postgresql itself is very easy to run in Docker containers or if you have to, to deploy locally. That makes it friendly for automated testing where you really, really want to have isolated testing infrastructure and avoid sharing any kind of stateful resource between testing processes
  • Marten in particular has built in support for setting up known database states going into automated tests. This is invaluable for integration testing
  • Executing directly against HTTP API endpoints is much faster than browser testing with something like Selenium. Faster executing tests == faster feedback cycles == better development throughput and delivery period
  • Running the tests completely in process with the application such as we did with Alba makes debugging test failures much easier for developers than trying to solve Selenium failures in a CI environment
  • Using the Alba + xUnit.Net (or NUnit etc) approach means that the integration tests can live with the application code and can be executed on demand whenever. That shifts the testing “left” in the development cycle compared to the slower Selenium running on CI only cycle. It also helps developers quickly spot check potential issues.
  • By embedding the integration tests directly in the codebase, you’re much less likely to get the drift between the application itself and automated tests that frequently arises from Selenium centric approaches.
  • This approach makes developers be involved with the test automation efforts. I strongly believe that it’s impossible for large scale test automation to work whatsoever without developer involvement
  • Whitebox tests are simply much more efficient than the blackbox model. This statement is likely to get me yelled at by real testing professionals, but it’s still true

This post took way, way too long to write compared to how I thought it would go. I’m going to make a little bonus followup on using Lamar of all things for other random test state resets.

My OSS Plans for 2023

Before I start, I am lucky to be part of a great group of OSS collaborators across the board. In particular, thanks to Oskar, Babu, Khalid, Hawxy, and Eric Smith for helping make 2022 a hugely productive and satisfying year in OSS work for me. I’m looking forward to working with y’all more in the times ahead.

In recent years I’ve kicked off my side project work with an overly optimistic and hopelessly unrealistic list of ambitions for my OSS projects. You can find the 2022 and 2021 versions still hanging around, only somewhat fulfilled. I’m going to put down my markers for what I hope to accomplish in 2023 — and because I’m the kind of person who obsesses more about the list of things to do rather than looking back at accomplishments, I’ll take some time to review what was done in many of these projects in 2022. Onward.

Marten is going gang busters, and 2022 was a very encouraging year for the Marten core team & I. The sizable V5.0 release dropped in March with some significant usability improvements, multi-tenancy with a database per tenant(s) support, and other goodness specifically to deal with apparent flaws in the gigantic V4.0 release from late 2021.

For 2023, the V6 release will come soon, mostly with changes to underlying dependencies.

Beyond that, I think that V7 will be a massively ambitious release in terms of important new features — hopefully in time for Event Sourcing Live 2023. If I had a magic wand that would magically give us all enough bandwidth to pull it off, my big hopes for Marten V7 are:

  • The capability to massively scale the Event Store functionality in Marten to much, much larger systems
  • Improved throughput and capacity with asynchronous projections
  • A formal, in the box subscription model
  • The ability to shard document database entities
  • Dive into the Linq support again, but this time use Postgresql V15 specific functionality to make the generated queries more efficient — especially for any possible query that goes through child collections. I haven’t done the slightest bit of detailed analysis on that one yet though
  • The ability to rebuild projections with zero downtime and/or faster projection rebuilds

Marten will also be impacted by the work being done with…

After a couple years of having almost given up on it, I restarted work pretty heavily on what had been called Jasper. While building a sample application for a conference talk, Oskar & I realized there was some serious opportunity for combining Marten and the then-Jasper for very low ceremony CQRS architectures. Now, what’s the best way to revitalize an OSS project that was otherwise languishing and basically a failure in terms of adoption? You guessed it, rename the project with an obvious theme related to an already successful OSS project and get some new, spiffier graphics and better website! And basically all new internals, new features, quite a few performance improvements, better instrumentation capabilities, more robust error handling, and a unique runtime model that I very sincerely believe will lead to better developer productivity and better application performance than existing tools in the .NET space.

Hence, Wolverine is the new, improved message bus and local mediator (I like to call that a “command bus” so as to not suffer the obvious comparisons to MediatR which I feel shortchanges Wolverine’s much greater ambitions). Right now I’m very happy with the early feedback from Wolverine’s JetBrains webinar (careful, the API changed a bit since then) and its DotNetRocks episode.

Right now the goal is to make it to 1.0 by the end of January — with the proviso that Marten V6 has to go first. The remaining work is mostly to finish the documentation website and a handful of tactical feature items mostly to prove out some of the core abstractions before minting 1.0.

Luckily for me, a small group of us at work have started a proof of concept for rebuilding/converting/migrating a very large system currently using NHibernate, Sql Server, and NServiceBus to Wolverine + Marten. That’s going to be an absolutely invaluable learning experience that will undoubtedly shape the short term work in both tools.

Beyond 1.0, I’m hoping to effectively use Wolverine to level up on a lot of technologies by adding:

  • Some other transport options (Kafka? Kinesis? EventBridge?)
  • Additional persistence options with Cosmos Db and Dynamo Db being the likely candidates so far
  • A SignalR transport
  • First class serverless support using Wolverine’s runtime model, with some way of optimizing the cold start
  • An option to use Wolverine’s runtime model for ASP.Net Core API endpoints. I think there’s some opportunity to allow for a low ceremony, high performance alternative for HTTP API creation while still being completely within the ASP.Net Core ecosystem

I hope that Wolverine is successful by itself, but the real goal of Wolverine is to allow folks to combine it with Marten to form the….

“Critter Stack”

The hope with Marten + Wolverine is to create a very effective platform for server-side .NET development in general. More specifically, the goal of the “critter stack” combination is to become the acknowledged industry leader for building systems with a CQRS plus Event Sourcing architectural model. And I mean across all development platforms and programming languages.

Pride goeth before destruction, and an haughty spirit before a fall.

Proverbs 16:18 KJV

And let me just more humbly say that there’s a ways to go to get there, but I’m feeling optimistic right now and want to set out sights pretty high. I especially feel good about having unintentionally made a huge career bet on Postgresql.

Lamar recently got its 10.0 release to add first class .NET 7.0 support (while also dropping anything < .NET 6) and a couple performance improvements and bug fixes. There hasn’t been any new functionality added in the last year except for finally getting first class support for IAsyncDisposable. It’s unlikely that there will be much development in the new year for Lamar, but we use it at work, I still think it has advantages over the built in DI container from .NET, and it’s vital for Wolverine. Lamar is here to stay.

Alba

Alba 7.0 (and a couple minor releases afterward) added first class .NET 7 support, much better support for testing Minimal API routes that accept and/or return JSON, and other tactical fixes (mostly by Hawxy).

See Alba for Effective ASP.Net Core Integration Testing for more information on how Alba improved this year.

I don’t have any specific plans for Alba this year, but I use Alba to test pieces of Marten and Wolverine and we use it at work. If I manage to get my way, we’ll be converting as many slow, unreliable Selenium based tests to fast running Alba tests against HTTP endpoints in 2023 at work. Alba is here to stay.

Not that this is germane to this post, but the very lightly traveled road behind that sign has a straightaway section where you can see for a couple miles at a time. I may or may not have tried to find out exactly how fast my first car could really go on that stretch of road at one point.

Oakton had a significant new feature set around the idea of “stateful resources” added in 2022, specifically meant for supporting both Marten and Wolverine. We also cleaned up the documentation website. The latest version 6.0 brought Oakton up to .NET 7 while also using shared dependencies with the greater JasperFx family (Marten, Wolverine, Lamar, etc.). I don’t exactly remember when, but it also got better “help” presentation by leveraging Spectre.Console more.

I don’t have any specific plans for Oakton, but it’s the primary command line parser and command line utility library for both Marten, Wolverine, and Lamar, so it’s going to be actively maintained.

And finally, I’ve registered my own company called “Jasper Fx Software.” It’s going much slower than I’d hoped, but at some point early in 2023 I’ll have my shingle out to provide support contracts, consulting, and custom development with the tools above. It’s just a side hustle for now, but we’ll see if that can become something viable over time.

To be clear about this, the Marten core team & I are very serious about building a paid, add-on model to Marten + Wolverine and some of the new features I described up above are likely to fall under that umbrella. I’m sneaking that in at the end of this, but that’s probably the main ambition for me personally in the new year.

What about?…

If it’s not addressed in this post, it’s either dead (StructureMap) or something I consider just to be a supporting player (Weasel). Storyteller alas, is likely not coming back. Unless it does as something renamed to “Bobcat” as a tool specifically designed to help automate tests for Marten or Wolverine where xUnit.Net by itself doesn’t do so hot. And if Bobcat does end up existing, it’ll leverage existing tools as much as possible.

Transactional Outbox/Inbox with Wolverine and why you care

I’ve been able to talk and write a bit about Wolverine in the last couple weeks. This builds on the last two blog posts in this list:

Alright, back to the sample message handler from my previous two blog posts here’s the shorthand version:

    [Transactional] 
    public static async Task Handle(
        DebitAccount command, 
        Account account, 
        IDocumentSession session, 
        IMessageContext messaging)
    {
        account.Balance -= command.Amount;
     
        // This just marks the account as changed, but
        // doesn't actually commit changes to the database
        // yet. That actually matters as I hopefully explain
        session.Store(account);
 
        // Conditionally trigger other, cascading messages
        if (account.Balance > 0 && account.Balance < account.MinimumThreshold)
        {
            await messaging.SendAsync(new LowBalanceDetected(account.Id));
        }
        else if (account.Balance < 0)
        {
            await messaging.SendAsync(new AccountOverdrawn(account.Id));
         
            // Give the customer 10 days to deal with the overdrawn account
            await messaging.ScheduleAsync(new EnforceAccountOverdrawnDeadline(account.Id), 10.Days());
        }
    }

and just for the sake of completion, here is a longer hand, completely equivalent version of the same handler:

[Transactional] 
public static async Task Handle(
    DebitAccount command, 
    Account account, 
    IDocumentSession session, 
    IMessageContext messaging)
{
    account.Balance -= command.Amount;
     
    // This just marks the account as changed, but
    // doesn't actually commit changes to the database
    // yet. That actually matters as I hopefully explain
    session.Store(account);
 
    if (account.Balance > 0 && account.Balance < account.MinimumThreshold)
    {
        await messaging.SendAsync(new LowBalanceDetected(account.Id));
    }
    else if (account.Balance < 0)
    {
        await messaging.SendAsync(new AccountOverdrawn(account.Id));
         
        // Give the customer 10 days to deal with the overdrawn account
        await messaging.ScheduleAsync(new EnforceAccountOverdrawnDeadline(account.Id), 10.Days());
    }
}

To review just a little bit, that Wolverine style message handler at runtime is committing changes to an Account in the underlying database and potentially sending out additional messages based on the state of the Account. For folks who are experienced with asynchronous messaging systems who hear me say that Wolverine does not support any kind of 2 phase commits between the database and message brokers, you’re probably already concerned with some potential problems in that code above:

  • Maybe the database changes fail, but there are “ghost” messages already queued that pertain to data changes that never actually happened
  • Maybe the messages actually manage to get through to their downstream handlers and are applied erroneously because the related database changes have not yet been applied. That’s a race condition that absolutely happens if you’re not careful (ask me how I know 😦 )
  • Maybe the database changes succeed, but the messages fail to be sent because of a network hiccup or who knows what problem happens with the message broker

Needless to say, there’s genuinely a lot of potential problems from those handful lines of code up above. Some of you reading this have probably already said to yourself that this calls for using some sort of transactional outbox — and Wolverine thinks so too!

The general idea of an “outbox” is to obviate the lack of true 2 phase commits by ensuring that outgoing messages are held until the database transaction is successful, then somehow guaranteeing that the messages will be sent out afterward. In the case of Wolverine and its integration with Marten, the order of operations in the message handler (in either version) shown above is to:

  1. Tell Marten that the Account document needs to be persisted. Nothing happens at this point other than marking the document as changed
  2. The handler creates messages that are registered with the current IMessageContext. Again, the messages do not actually go out here, instead they are routed by Wolverine to know exactly how and where they should be sent later
  3. The Wolverine + Marten [Transactional] middleware is calling the Marten IDocumentSession.SaveChangesAsync() method that makes the changes to the Account document and also creates new database records to persist any outgoing messages in the underlying Postgresql application database in one single, native database transaction. Even better, with the Marten integration, all the database operations are even happening in one single batched database call for maximum efficiency.
  4. When Marten successfully commits the database transaction, it tells Wolverine to “flush” the outgoing messages to the sending agents in Wolverine (depending on configuration and exact transport type, the messages might be sent “inline” or batched up with other messages to go out later).

To be clear, Wolverine also supports a transactional outbox with EF Core against either Sql Server or Postgresql. I’ll blog and/or document that soon.

The integration with Marten that’s in the WolverineFx.Marten Nuget isn’t that bad (I hope). First off, in my application bootstrapping I chain the IntegrateWithWolverine() call to the standard Marten bootstrapping like this:

using Wolverine.Marten;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddMarten(opts =>
{
    // This would be from your configuration file in typical usage
    opts.Connection(Servers.PostgresConnectionString);
    opts.DatabaseSchemaName = "wolverine_middleware";
})
    // This is the wolverine integration for the outbox/inbox,
    // transactional middleware, saga persistence we don't care about
    // yet
    .IntegrateWithWolverine()
    
    // Just letting Marten build out known database schema elements upfront
    // Helps with Wolverine integration in development
    .ApplyAllDatabaseChangesOnStartup();

For the moment, I’m going to say that all the “cascading messages” from the DebitAccount message handler are being handled by local, in memory queues. At this point — and I’d love to have feedback on the applicability or usability of this approach — each endpoint has to be explicitly enrolled into the durable outbox or inbox (for incoming, listening endpoints) mechanics. Knowing both of those things, I’m going to add a little bit of configuration to make every local queue durable:

builder.Host.UseWolverine(opts =>
{
    // Middleware introduced in previous posts
    opts.Handlers.AddMiddlewareByMessageType(typeof(AccountLookupMiddleware));
    opts.UseFluentValidation();
    
    // The nomenclature might be inconsistent here, but the key
    // point is to make the local queues durable
    opts.Policies
        .AllLocalQueues(x => x.UseDurableInbox());
});

If instead I chose to publish some of the outgoing messages with Rabbit MQ to other processes (or just want the messages queued), I can add the WolverineFx.RabbitMQ Nuget and change the bootstrapping to this:

builder.Host.UseWolverine(opts =>
{
    // Middleware introduced in previous posts
    opts.Handlers.AddMiddlewareByMessageType(typeof(AccountLookupMiddleware));
    opts.UseFluentValidation();

    var rabbitUri = builder.Configuration.GetValue<Uri>("rabbitmq-broker-uri");
    opts.UseRabbitMq(rabbitUri)
        // Just do the routing off of conventions, more or less
        // queue and/or exchange based on the Wolverine message type name
        .UseConventionalRouting()
        .ConfigureSenders(x => x.UseDurableOutbox());
});

I just threw a bunch of details at you all, so let me try to anticipate a couple questions you might have and also try to answer them:

  • Do the messages get delivered before the transaction completes? No, they’re held in memory until the transaction completes, then get sent
  • What happens if the message delivery fails? The Wolverine sending agents run in a hosted service within your application. When message delivery fails, the sending agent will try it again up to a configurable amount of times (100 is the default). Read the next question though before the “100” number bugs you:
  • What happens if the whole message broker is down? Wolverine’s sending agents have a crude circuit breaker and will stop trying to send message batches if there are too many failures in a period of time, then resume sending after a periodic “ping” message gets though. Long story short, Wolverine will buffer outgoing messages in the application database until Wolverine is able to reach the message broker.
  • What happens if the application process fails between the transaction succeeding and the message getting to the broker? The message will be recovered and sent by either another active node of the application if running in a cluster, or by restarting the single application process.
  • So you can do this in a cluster without sending the message multiple times? Yep.
  • What if you have zillions of stored messages and you restart the application, will it overwhelm the process and cause harm? It’s paged, distributes a bit between nodes, and there’s some back pressure to keep it from having too many outgoing messages in memory.
  • Can I use Sql Server instead? Yes. But for the moment, it’s like the scene in Blues Brothers when Elwood asks what kinds of music they have and the waitress replies “we have both kinds, Country and Western.”
  • Can I tell Wolverine to throw away a message that’s old and maybe out of date if it still hasn’t been processed? Yes, and I’ll show a bit of that in the next post.
  • What about messages that are routed to a non-durable endpoint as part of an outbox’d transaction? Good question! Wolverine is still holding those messages in memory until the message being processed successfully finishes, then kicks them out to in memory sending agents. Those sending agents have their own internal queues and retry loops for maximum resiliency. And actually for that matter, Wolverine has a built in in memory outbox to at least deal with ordering between the message processing and actually sending outgoing messages.

Next Time

WordPress just cut off the last section, so I’ll write a short follow up on mixing in non-durable message queues with message expirations. Next week I’ll keep on this sample application by discussing how Wolverine & its friends try really hard for a “clone n’go” developer workflow where you can be up and running mere minutes with all the database & message broker infrastructure up and going after a fresh clone of the codebase.

Marten and Friend’s (Hopefully) Big Future!

Marten was conceived and launched way back in 2016 as an attempt to quickly improve the performance and stability of a mission critical web application by utilizing Postgresql and its new JSON capabilities as a replacement for a 3rd party document database – and do that in a hurry before the next busy season. My former colleagues and I did succeed in that endeavor, but more importantly for the longer run, Marten was also launched as an open source project on GitHub and quickly attracted attention from other developers. The addition of an originally small feature set for event sourcing dramatically increased interest and participation in Marten. 

Fast forward to today, and we have a vibrant community of engaged users and a core team of contributors that are constantly improving the tool and discussing ideas about how to make it even better. The giant V4 release last year brought an overhaul of almost all the library internals and plenty of new capabilities. V5 followed early in 2022 with more multi-tenancy options and better tooling for development lifecycles and database management based on early issues with V4. 

At this point, I’d list the strong points of Marten that we’ve already achieved as:

  • A very useful document database option that provides the powerful developer productivity you expect from NoSQL solutions while also supporting a strong consistency model that’s usually missing from NoSQL databases. 
  • A wide range of viable hosting options by virtue of being on top of Postgresql. No cloud vendor lock-in with Marten!
  • Quite possibly the easiest way to build an application using Event Sourcing in .NET with both event storage and user defined view projections in the box
  • A great local development story through the simple ability to run Postgresql in a Docker container and Marten’s focus on an “it just works” style database schema management subsystem
  • The aforementioned core team and active user base makes Marten a viable OSS tool for teams wanting some reassurance that Marten is going to be well supported in the future

Great! But now it’s time to talk about the next steps we’re planning to take Marten to even greater heights in the forthcoming Marten V6 that’s being planned now. The overarching theme is to remove the most common hurdles for not choosing Marten. By and large, I think the biggest themes for Marten are:

  1. Scalability, so Marten can be used for much larger data sets. From user feedback, Marten is able to handle data sets of 10 million events today, but there’s opportunities to go far, far larger than that.
  2. Improvements to operational support. Database migrations when documents change, rebuilding projections without downtime, usage metrics, and better support for using multiple databases for multi-tenancy
  3. Marten is in good shape as a purely storage option for Event Sourcing, but users are very often asking for an array of subscription options to propagate events captured by Marten
  4. More powerful options for aggregating event data into more complex projected views
  5. Improving the Linq and other querying support is a seemingly never-ending battle
  6. The lack of professional support for Marten. Obviously a lot of shops and teams are perfectly comfortable with using FOSS tools knowing that they may have to roll up their sleeves and pitch in with support, but other shops are not comfortable with this at all and will not allow FOSS usage for critical functions. More on this later.

First though, Marten is getting a new “critter” friend in the larger JasperFx project family:

Wolverine is a new/old OSS command bus and messaging tool for .NET. It’s what was formerly being developed as Jasper, but the Marten team decided to rebrand the tool as a natural partner with Marten (both animals plus Weasel are members of the Mustelidae family). While both Marten and Wolverine are happily usable without each other, we think that the integration of these tools gives us the opportunity to build a full fledged platform for building applications in .NET using a CQRS architecture with Event Sourcing. Moreover, we think there’s a significant gap in .NET for this kind of tooling and we hope to fill that. 

So, onto future plans…

There’s a couple immediate ways to improve the scalability of Marten we’re planning to build in Marten V6. The first idea is to utilize Postgresql table sharding in a couple different ways. 

First, we can enable sharding on document tables based on user defined criteria through Marten configuration. The big challenge there is to provide a good migration strategy for doing this as it requires at least a 3 step process of copying the existing table data off to the side before creating the new tables. 

The next idea is to shard the event storage tables as well, with the immediate idea being to shard off of archived status to effectively create a “hot” storage of recent events and a “cold” storage of older events that are much less frequently accessed. This would allow Marten users to keep the active “hot” event storage to a much smaller size and therefore greatly improve potential performance even as the database continues to grow.

We’re not done “sharding” yet, but this time we need to shift to the asynchronous projection support in Marten. The core team has some ideas to improve the throughput of the asynchronous projection code as it is, but today it’s limited to only running on one single application node with “hot/cold” rollover support. With some help from Wolverine, we’re hoping to build a “sharded” asynchronous projection that can shard the processing of single projections and distribute the projection work across potentially many nodes as shown in the following diagram:

The asynchronous projection sharding is going to be a big deal for Marten all by itself, but there’s some other potentially big wins for Marten V6 with better tooling for projection rebuilds and asynchronous projections in general:

  1. Some kind of user interface to monitor and manage the asynchronous projections
  2. Faster projection rebuilds
  3. Zero downtime projection rebuilds

Marten + Wolverine == “Critter Stack” 

Again, both Marten and Wolverine will be completely usable independently, but we think there’s some potential synergy through the combination. One of the potential advantages of combining the tools is to use Wolverine’s messaging to give Marten a full fledged subscription model for Marten events. All told we’re planning three different mechanisms for propagating Marten events to the rest of your system:

  1. Through Wolverine’s transactional outbox right at the point of event capture when you care more about immediate delivery than strict ordering (this is already working)
  2. Through Martens asynchronous daemon when you do need strict ordering
  3. If this works out, through CDC event streaming straight from the database to Kafka/Pulsar/Kinesis

That brings me to the last topic I wanted to talk about in this post. Marten and Wolverine in their current form will remain FOSS under the MIT license, but it’s past time to make a real business out of these tools.

I don’t know how this is exactly going to work out yet, but the core Marten team is actively planning on building a business around Marten and now Wolverine. I’m not sure if this will be the front company, but I personally have formed a new company named “Jasper Fx Software” for my own activity – but that’s going to be limited to just being side work for at least awhile. 

The general idea – so far – is to offer:

  • Support contracts for Marten 
  • Consulting services, especially for help modeling and maximizing the usage of the event sourcing support
  • Training workshops
  • Add on products that add the advanced features I described earlier in this post

Maybe success leads us to offering a SaaS model for Marten, but I see that as a long way down the road.

What think you gentle reader? Does any of this sound attractive? Should we be focusing on something else altogether?

Projecting Marten Events to a Flat Table

Marten 5.8 dropped over the weekend with mostly bug fixes, but one potentially useful new feature for projecting event data to plain old SQL tables. One of the strengths of Marten that we’ve touted from the beginning was the ability to mix document database features with event sourcing and old fashioned relational tables all with one database in a single application as your needs dictate.

Let’s dive right into a sample usage of this. If you’re a software developer long enough and move around just a little bit, you’re going to get sucked into building a workflow for importing flat files of dubious quality from external partners or customers. I’m going to claim that event sourcing is a good fit for this problem domain for event sourcing (and also suggesting this pretty strongly at work). That being said, here’s what the event types might look like that are recording the progress of a file import:

public record ImportStarted(
    DateTimeOffset Started,
    string ActivityType,
    string CustomerId,
    int PlannedSteps);

public record ImportProgress(
    string StepName,
    int Records,
    int Invalids);

public record ImportFinished(DateTimeOffset Finished);

public record ImportFailed;

At some point, we’re going to want to apply some metrics to the execution history to understand the average size of the incoming files, what times of the day have more or less traffic, and performance information broken down by file size, file type, and who knows what. This sounds to me like a perfect use case for SQL queries against a flat table.

Enter Marten 5.8’s new functionality. First off, let’s do this simply by writing some explicit SQL in a new projection that we can replay against the existing events when we’re ready. I’m going to use Marten’s EventProjection as a base class in this case:

public class ImportSqlProjection: EventProjection
{
    public ImportSqlProjection()
    {
        // Define the table structure here so that 
        // Marten can manage this for us in its schema
        // management
        var table = new Table("import_history");
        table.AddColumn<Guid>("id").AsPrimaryKey();
        table.AddColumn<string>("activity_type").NotNull();
        table.AddColumn<DateTimeOffset>("started").NotNull();
        table.AddColumn<DateTimeOffset>("finished");

        SchemaObjects.Add(table);

        // Telling Marten to delete the table data as the 
        // first step in rebuilding this projection
        Options.DeleteDataInTableOnTeardown(table.Identifier);
    }

    public void Project(IEvent<ImportStarted> e, IDocumentOperations ops)
    {
        ops.QueueSqlCommand("insert into import_history (id, activity_type, started) values (?, ?, ?)",
            e.StreamId, e.Data.ActivityType, e.Data.Started
        );
    }

    public void Project(IEvent<ImportFinished> e, IDocumentOperations ops)
    {
        ops.QueueSqlCommand("update import_history set finished = ? where id = ?",
            e.Data.Finished, e.StreamId
        );
    }

    public void Project(IEvent<ImportFailed> e, IDocumentOperations ops)
    {
        ops.QueueSqlCommand("delete from import_history where id = ?", e.StreamId);
    }
}

A couple notes about the code above:

  • We’ve invested a huge amount of time in Marten and the related Weasel library building in robust schema management. The Table model I’m using up above comes from Weasel, and this allows a Marten application using this projection to manage the table creation in the underlying database for us. This new table would be part of all Marten’s built in schema management functionality.
  • The QueueSqlCommand() functionality came in a couple minor releases ago, and gives you the ability to add raw SQL commands to be executed as part of a Marten unit of work transaction. It’s important to note that the QueueSqlCommand() method doesn’t execute inline, rather it adds the SQL you enqueue to be executed in a batch query when you eventually call the holding IDocumentSession.SaveChangesAsync(). I can’t stress this enough, it has consistently been a big performance gain in Marten to batch up queries to the database server and reduce the number of network round trips.
  • The Project() methods are a naming convention with Marten’s EventProjection. The first argument is always assumed to be the event type. In this case though, it’s legal to use Marten’s IEvent<T> envelope type to allow you access to event metadata like timestamps, version information, and the containing stream identity.

Now, let’s use Marten’s brand new FlatTableProjection recipe to do a little more advanced version of the earlier projection:

public class FlatImportProjection: FlatTableProjection
{
    // I'm telling Marten to use the same database schema as the events from
    // the Marten configuration in this application
    public FlatImportProjection() : base("import_history", SchemaNameSource.EventSchema)
    {
        // We need to explicitly add a primary key
        Table.AddColumn<Guid>("id").AsPrimaryKey();

        TeardownDataOnRebuild = true;

        Project<ImportStarted>(map =>
        {
            // Set values in the table from the event
            map.Map(x => x.ActivityType).NotNull();
            map.Map(x => x.CustomerId);
            map.Map(x => x.PlannedSteps, "total_steps")
                .DefaultValue(0);
            
            map.Map(x => x.Started);

            // Initial values
            map.SetValue("status", "started");
            map.SetValue("step_number", 0);
            map.SetValue("records", 0);
        });

        Project<ImportProgress>(map =>
        {
            // Add 1 to this column when this event is encountered
            map.Increment("step_number");

            // Update a running sum of records progressed
            // by the number of records on this event
            map.Increment(x => x.Records);

            map.SetValue("status", "working");
        });

        Project<ImportFinished>(map =>
        {
            map.Map(x => x.Finished);
            map.SetValue("status", "completed");
        });

        // Just gonna delete the record of any failures
        Delete<ImportFailed>();

    }
}

A couple notes on this version of the code:

  • FlatFileProjection is adding columns to its table based on the designated column mappings. You can happily customize the FlatFileProjection.Table object to add indexes, constraints, or defaults.
  • Marten is able to apply schema migrations and manage the table from the FlatFileProjection as long as it’s registered with Marten
  • When you call Map(x => x.ActivityType), Marten is by default mapping that to a kebab-cased derivation of the member name for the column, so “activity_type”. You can explicitly map the column name yourself.
  • The call to Map(expression) chains a fluent builder for the table column if you want to further customize the table column with default values or constraints like the NotNull()
  • In this case, I’m building a database row per event stream. The FlatTableProjection can also map to arbitrary members of each event type
  • The Project<T>(lambda) configuration leads to a runtime, code generation of a Postgresql upsert command so as to not be completely dependent upon events being captured in the exact right order. I think this will be more robust in real life usage than the first, more explicit version.

The FlatTableProjection in its first incarnation is not yet able to use event metadata because I got impatient to finish up 5.8 and punted on that for now. I think it’s safe to say this feature will evolve when it hits some real world usage.

Command Line Support for Marten Projections

Marten 5.7 was published earlier this week with mostly bug fixes. The one, big new piece of functionality was an improved version of the command line support for event store projections. Specifically, Marten added support for multi-tenancy through multiple databases and the ability to use separate document stores in one application as part of our V5 release earlier this year, but the projections command didn’t really catch up and support that — but now it can with Marten v5.7.0.

From a sample project in Marten we use to test this functionality, here’s part of the Marten setup that has a mix of asynchronous and inline projections, as well as uses the database per tenant strategy:

services.AddMarten(opts =>
{
    opts.AutoCreateSchemaObjects = AutoCreate.All;
    opts.DatabaseSchemaName = "cli";

    // Note this app uses multiple databases for multi-tenancy
 opts.MultiTenantedWithSingleServer(ConnectionSource.ConnectionString)
        .WithTenants("tenant1", "tenant2", "tenant3");

    // Register all event store projections ahead of time
    opts.Projections
        .Add(new TripAggregationWithCustomName(), ProjectionLifecycle.Async);
    
    opts.Projections
        .Add(new DayProjection(), ProjectionLifecycle.Async);
    
    opts.Projections
        .Add(new DistanceProjection(), ProjectionLifecycle.Async);

    opts.Projections
        .Add(new SimpleAggregate(), ProjectionLifecycle.Inline);

    // This is actually important to register "live" aggregations too for the code generation
    opts.Projections.SelfAggregate<SelfAggregatingTrip>(ProjectionLifecycle.Live);
}).AddAsyncDaemon(DaemonMode.Solo);

At this point, let’s introduce the Marten.CommandLine Nuget dependency to the system just to add Marten related command line options directly to our application for typical database management utilities. Marten.CommandLine brings with it a dependency on Oakton that we’ll actually use as the command line parser for our built in tooling. Using the now “old-fashioned” pre-.NET 6 manner of running a console application, I add Oakton to the system like this:

public static Task<int> Main(string[] args)
{
    // Use Oakton for running the command line
    return CreateHostBuilder(args).RunOaktonCommands(args);
}

When you use the dotnet command line options, just keep in mind that the “–” separator you’re seeing me here is used to separate options directly to the dotnet executable itself on the left from arguments being passed to the application itself on the right of the “–” separator.

Now, turning to the command line at the root of our project, I’m going to type out this command to see the Oakton options for our application:

dotnet run -- help

Which gives us this output:

If you’re wondering, the commands db-apply and marten-apply are synonyms that’s there as to not break older users when we introduced the now, more generic “db” commands.

And next I’m going to see the usage for the projections command with dotnet run -- help projections, which gives me this output:

For the simplest usage, I’m just going to list off the known projections for the entire system with dotnet run -- projections --list:

Which will show us the four registered projections in the main IDocumentStore, and tells us that there are no registered projections in the separate IOtherStore.

Now, I’m just going to continuously run the asynchronous projections for the entire application — while another process is constantly pumping random events into the system so there’s always new work to be doing — with dotnet run -- projections, which will spit out this continuously updating table (with an assist from Spectre.Console):

What I hope you can tell here is that every asynchronous projection is actively running for each separate tenant database. The blue “High Water Mark” is telling us where the current event store for each database is at.

And finally, for the main reason why I tackled the projections command line overhaul last week, folks needed a way to rebuild projections for every database when using a database per tenant strategy.

While the new projections command will happily let you rebuild any combination of database, store, and projection name by flags or even an interactive mode, we can quickly trigger a full rebuild of all the asynchronous projections with dotnet run -- projections --rebuild, which is going to loop through every store and database like so:

For the moment, the rebuild works on all the projections for a single database at a time. I’m sure we’ll attempt some optimizations of the rebuilding process and try to understand how much we can really parallelize more, but for right now, our users have an out of the box way to rebuild projections across separate databases or separate stores.

This *might* be a YouTube video soon just to kick off my new channel for Marten/Jasper/Oakton/Alba/Lamar content.

Low Code Ceremony Sagas with Jasper & Marten

You’ll need at least Jasper v2.0.0-alpha-4 if you want to recreate the saga support in this post. All the sample code for this post is in an executable sample on GitHub. Jasper does support sagas with EF Core and Sql Server or Postgresql, but Marten is where most of the effort is going just at the moment.

The Saga pattern is a way to solve the issue of logical, long-running transactions that necessarily need to span over multiple operations. In the approaches I’ve encountered throughout my career, this has generally meant persisting a “saga state” of some sort in a database that is used within a message handling framework to “know” what steps have been completed, and what’s outstanding.

Jumping right into an example, consider a very simple order management service that will have steps to:

  1. Create a new order
  2. Complete the order
  3. Or alternatively, delete new orders if they have not been completed within 1 minute

For the moment, I’m going to ignore the underlying persistence and just focus on the Jasper message handlers to implement the order saga workflow with this simplistic saga code:

using Baseline.Dates;
using Jasper;

namespace OrderSagaSample;

public record StartOrder(string Id);

public record CompleteOrder(string Id);

public record OrderTimeout(string Id) : TimeoutMessage(1.Minutes());

public class Order : Saga
{
    public string? Id { get; set; }

    // By returning the OrderTimeout, we're triggering a "timeout"
    // condition that will process the OrderTimeout message at least
    // one minute after an order is started
    public OrderTimeout Start(StartOrder order, ILogger<Order> logger)
    {
        Id = order.Id; // defining the Saga Id.

        logger.LogInformation("Got a new order with id {Id}", order.Id);
        // creating a timeout message for the saga
        return new OrderTimeout(order.Id);
    }

    public void Handle(CompleteOrder complete, ILogger<Order> logger)
    {
        logger.LogInformation("Completing order {Id}", complete.Id);

        // That's it, we're done. This directs Jasper to delete the
        // persisted saga state after the message is done.
        MarkCompleted();
    }

    public void Handle(OrderTimeout timeout, ILogger<Order> logger)
    {
        logger.LogInformation("Applying timeout to order {Id}", timeout.Id);

        // That's it, we're done. Delete the saga state after the message is done.
        MarkCompleted();
    }
}

I’m just aiming for a quick sample rather than exhaustive documentation here, but a few notes:

  • Jasper leans a bit on type and naming conventions to discover message handlers and to “know” how to call these message handlers. Some folks will definitely not like the magic, but this approach leads to substantially less code and arguably complexity compared to existing .Net tools
  • Jasper supports the idea of scheduled messages, and the new TimeoutMessage base class up there is just a way to utilize that support for “saga timeout” conditions
  • Jasper generally tries to adapt to your application code rather than force a lot of mandatory framework artifacts into your message handler code

Now let’s move over to the service bootstrapping and add Marten in as our persistence mechanism in the Program file:

using Jasper;
using Jasper.Persistence.Marten;
using Marten;
using Oakton;
using Oakton.Resources;
using OrderSagaSample;

var builder = WebApplication.CreateBuilder(args);

// Not 100% necessary, but enables some extra command line diagnostics
builder.Host.ApplyOaktonExtensions();

// Adding Marten
builder.Services.AddMarten(opts =>
    {
        var connectionString = builder.Configuration.GetConnectionString("Marten");
        opts.Connection(connectionString);
        opts.DatabaseSchemaName = "orders";
    })

    // Adding the Jasper integration for Marten.
    .IntegrateWithJasper();


builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

// Do all necessary database setup on startup
builder.Services.AddResourceSetupOnStartup();

// The defaults are good enough here
builder.Host.UseJasper();

var app = builder.Build();

// Just delegating to Jasper's local command bus for all
app.MapPost("/start", (StartOrder start, ICommandBus bus) => bus.InvokeAsync(start));
app.MapPost("/complete", (CompleteOrder start, ICommandBus bus) => bus.InvokeAsync(start));
app.MapGet("/all", (IQuerySession session) => session.Query<Order>().ToListAsync());
app.MapGet("/", (HttpResponse response) =>
{
    response.Headers.Add("Location", "/swagger");
    response.StatusCode = 301;
});

app.UseSwagger();
app.UseSwaggerUI();

return await app.RunOaktonCommands(args);

Off screen, I’ve started up a docker container for Postgresql to get a blank database. With that running, I’ll start the application up with the usual dotnet run command and open up the Swagger page:

You’ll get a lot of SQL in your terminal on the first run as Marten sets up the database for you, that’s perfectly normal.

I’m going to first create a new order for “Shoes” and execute the /create endpoint:

And verify that it’s persisted by checking the /all endpoint:

If I’m quick enough, I’ll post {"Id": "Shoes"} to /complete, and then verify through the /all endpoint that the “Shoes” order has been completed.

Otherwise, if I’m too slow to complete the order, the timeout message will be applied to our order and you’ll see evidence of that in the logging output like so:

And that’s it, one working saga implementation with database backed persistence through Marten. The goal of Jasper is to make this kind of server side development as low ceremony and easy to use as possible, so any feedback about what you do or don’t like in this sample would be very helpful.

Related Posts

I’ve spit out quite a bit of blogging content the past several weeks on both Marten and Jasper: