Messaging with Wolverine using Apache Pulsar

As part of the Wolverine 3.0 release a couple weeks back, Wolverine gained a lightweight messaging transport option with Apache Pulsar.

“Lightweight” just meaning “it doesn’t have a lot of features yet”

To get started, first add this Nuget to your system:

dotnet add WolverineFx.Pulsar

And just like that, you’re ready to start adding publishing rules and subscriptions to Pulsar topics in a very idiomatic Wolverine way:

var builder = Host.CreateApplicationBuilder();
builder.UseWolverine(opts =>
{
    opts.UsePulsar(c =>
    {
        var pulsarUri = builder.Configuration.GetValue<Uri>("pulsar");
        c.ServiceUrl(pulsarUri);
        
        // Any other configuration you want to apply to your
        // Pulsar client
    });

    // Publish messages to a particular Pulsar topic
    opts.PublishMessage<Message1>()
        .ToPulsarTopic("persistent://public/default/one")
        
        // And all the normal Wolverine options...
        .SendInline();

    // Listen for incoming messages from a Pulsar topic
    opts.ListenToPulsarTopic("persistent://public/default/two")
        
        // And all the normal Wolverine options...
        .Sequential();
});

It’s a minimal implementation for right now (no conventional routing topology for example), but we’ll happily enhance this transport option if there’s interest. To be honest, the Pulsar transport has been hanging out inside the Wolverine codebase for years, but never got released for whatever reason. Someone asked about this awhile back, so here we go!

Assuming that the US still exists tomorrow and I’m not trying to move my family to Canada, I’ll follow up with Wolverine’s new, fully robust transport option for Google Pubsub.

Network Round Trips are Evil, So Batch Your Queries When You Can

JasperFx Software frequently helps our customers wring better performance or scalability out of our customer’s systems. A somewhat frequent opportunity for improving the responsiveness and throughput of systems is merely identifying ways to batch up requests from middle tier, server side code to the backing database or databases. There’s a certain amount of overhead in making any network round trips between processes, and it often pays off in terms of performance to batch up queries or commands to reduce the number of network round trips.

Today I’m merely going to focus on Marten as a persistence tool and a bit on Wolverine as “Mediator” and show some ways that Marten reduces network round trips. Just know though that this general idea of reducing network round trips by batching up database queries or commands is certainly going to apply to improving performance with any other persistence tooling.

Batching Writes

First off, let’s just look at doing a mixed bag of “writes” with a Marten session to add, delete, or modify user data:

    public static async Task modify_some_users(IDocumentSession session)
    {
        // Mixed bag of document operations
        session.Insert(new User{FirstName = "Hans", LastName = "Gruber"});
        session.Store(new User{FirstName = "John", LastName = "McClane"});
        session.DeleteWhere<User>(x => x.LastName == "Miller");

        session.Patch<User>(x => x.LastName == "May").Set(x => x.Nickname, "Mayday");

        // Let's append some events too just for fun!
        session.Events.StartStream<User>(new UserCreated("Harry", "Ellis"));

        // Commit all the changes
        await session.SaveChangesAsync();
    }

What’s important to note in the code up above is that all the logical operations to insert, “upsert”, delete, patch, or start event streams is batched up into a single database round trip when session.SaveChangesAsync() is called. In the early days of Marten we tried a lot of different things to improve throughput in Marten, including alternative serializers, reducing string concatenation, code generation techniques, and alternative data structures internally. Our consistent finding was that the single biggest improvements always came from reducing network round trips, with alternative JSON serializers being a distant second, and every other factor far behind that.

If you’re curious about the technical underpinnings, Marten 7+ is creating a single NpgsqlBatch for all the commands and even using positional parameters because that’s a touch more efficient for the interaction with PostgreSQL.

Moving to another example, let’s say that you have workflow where you need to apply logical changes to a batch of Item entities using a mix of Marten and Wolverine. Here’s a first, naive cut at this handler:

public static class ApproveItemsHandler
{
    // I'm passing in CancellationToken because:
    // a. It's probably a good idea anyway
    // b. That's how Wolverine "enforces" message timeouts
    public static async Task HandleAsync(
        ApproveItems message,
        IDocumentSession session,
        CancellationToken token)
    {
        foreach (var id in message.Ids)
        {
            var existing = await session.LoadAsync<Item>(id, token);
            if (existing != null)
            {
                existing.Approved = true;
                session.Store(existing);
            }
        }

        await session.SaveChangesAsync(token);
    }
}

Now, let’s assume that we could easily be getting 100-1000 different ids of Item entities to approve at any one time, which would make this operation chatty and potentially slow. Let’s make it a little worse though and add in Wolverine as a “mediator” to handle each individual Item inline:

public static class ApproveItemHandler
{
    public static async Task HandleAsync(
        ApproveItem message, 
        IDocumentSession session, 
        CancellationToken token)
    {
        var existing = await session.LoadAsync<Item>(message.Id, token);
        if (existing == null) return;

        existing.Approved = true;

        await session.SaveChangesAsync(token);
    }
}

public static class ApproveItemsHandler
{
    // I'm passing in CancellationToken because:
    // a. It's probably a good idea anyway
    // b. That's how Wolverine "enforces" message timeouts
    public static async Task HandleAsync(
        ApproveItems message,
        IMessageBus bus,
        CancellationToken token)
    {
        foreach (var id in message.Ids)
        {
            await bus.InvokeAsync(new ApproveItem(id), token);
        }
    }
}

In terms of performance, the second version is even worse. We compounded the existing chattiness problem with looking up each Item individually by separating out the database “writes” to separate database calls and separate transactions within “Wolverine as Mediator” usage through that InvokeAsync()call. You should be aware that when you use any kind of in process “Mediator” tool like Wolverine, MediatR, Brighter, or MassTransit’s in process mediator functionality that each call to InvokeAsync() involves a certain amount of overhead and very likely means a nested transaction that gets committed independently from the parent message handling or HTTP request that triggered the InvokeAsync() call. I think I might go so far as to say that calling IMessageBus.InvokeAsync() from another message handler is a “guilty until proven innocent” type of approach.

I’d of course argue here that the performance may or may not end up being a big deal, but not having a transactional boundary around the original message processing can easily lead to inconsistent state in our system if any of the individual Item updates fail.

Let’s make one last version of this batch approve item handler with an eye toward reducing network round trips and keeping a strongly consistent transaction boundary around all the approvals (meaning they all succeed or all fail, no in between “who knows what really happened” state):

public static class ApproveItemsHandler
{
    // I'm passing in CancellationToken because:
    // a. It's probably a good idea anyway
    // b. That's how Wolverine "enforces" message timeouts
    public static async Task HandleAsync(
        ApproveItems message,
        IDocumentSession session,
        CancellationToken token)
    {
        // Find all the related items in *one* network round trip
        var items = await session.LoadManyAsync<Item>(token, message.Ids);
        foreach (var item in items)
        {
            item.Approved = true;
            session.Store(item);
        }

        await session.SaveChangesAsync().ConfigureAwait(false);
    }
}

In the usage above, we’re making one database call to fetch the matching Item entities, and updating all of the impacted Item entities in a single batched database command within the IDocumentSession.SaveChangesAsync(). This version should almost always be much faster than the earlier versions where we issued individual queries for each Item, plus we have better transactional consistency in the case of system errors.

Lastly of course for the sake of completeness, we could just do this with one network round trip:

public static class ApproveItemsHandler
{
    // Assuming here that Wolverine "auto-transaction"
    // middleware is in place
    public static void Handle(
        ApproveItems message,
        IDocumentSession session)
    {
        session
            .Patch<Item>(x => x.Id.IsOneOf(message.Ids))
            .Set(x => x.Approved, true);
    }
}

That last version eliminates the usage of current state to validate the operation first or give us any indication of what exactly was changed, but hey, that’s the fastest possible way to code this with Marten and it might be suitable sometimes in your own system.

Batch Querying

Marten has strong support for batch querying where you can combine any number of disparate queries in a batch to the database, and read the results one at a time afterward. Here’s an example from the Marten documentation, but just know that session in this case is a Marten IQuerySession:

// Start a new IBatchQuery from an active session
var batch = session.CreateBatchQuery();

// Fetch a single document by its Id
var user1 = batch.Load<User>("username");

// Fetch multiple documents by their id's
var admins = batch.LoadMany<User>().ById("user2", "user3");

// User-supplied sql
var toms = batch.Query<User>("where first_name == ?", "Tom");

// Where with Linq
var jills = batch.Query<User>().Where(x => x.FirstName == "Jill").ToList();

// Any() queries
var anyBills = batch.Query<User>().Any(x => x.FirstName == "Bill");

// Count() queries
var countJims = batch.Query<User>().Count(x => x.FirstName == "Jim");

// The Batch querying supports First/FirstOrDefault/Single/SingleOrDefault() selectors:
var firstInternal = batch.Query<User>().OrderBy(x => x.LastName).First(x => x.Internal);

// Kick off the batch query
await batch.Execute();

// All of the query mechanisms of the BatchQuery return
// Task's that are completed by the Execute() method above
var internalUser = await firstInternal;
Debug.WriteLine($"The first internal user is {internalUser.FirstName} {internalUser.LastName}");

That’s a little more code and complexity than you might have otherwise if you just make the queries independently, but there’s some significant performance gains to be made from batching queries.

This is a much, much longer discussion than I have ambition for today, but the rampant usage of repository abstractions around raw persistence tooling like Marten has a tendency to knock out more powerful functionality like query batching. That’s especially compounded with “noun-centric” code organization where you may have IOrderRepository and IInvoiceRepository wrapping your raw persistence tooling, but yet frequently have logical operations that deal with both Order and Invoice data at the same time. With Wolverine especially, I’m pushing JasperFx clients and our users to try to get away with eschewing these kinds of abstractions and leaning hard into Wolverine’s “A-Frame Architecture” approach so you can utilize the full power of Marten (or EF Core or RavenDb or whatever else you actually use).

What I can tell you is that for a current JasperFx client, we’re looking in the long run to collapse and simplify and inline their current usage of Railway Programming and MediatR-calling-other-MediatR handlers as a way to enable us to utilize query batching to optimize some of their very complicated operations that today end up being very chatty between the server and database.

Including Related Entities when Querying

There are plenty of times you’ll have an operation in your system that needs information from multiple, related entity types. Marten provides its version of Include() in its LINQ provider as a way to batch query related documents in fewer network round trips, and hence better performance like this example from the tests:

[Fact]
public async Task simple_include_for_a_single_document()
{
    var user = new User();
    var issue = new Issue { AssigneeId = user.Id, Title = "Garage Door is busted" };

    using var session = theStore.IdentitySession();
    session.Store<object>(user, issue);
    await session.SaveChangesAsync();

    using var query = theStore.QuerySession();

    // The following query will fetch both the Issue document
    // and the related User document for the Issue in one
    // network round trip
    User included = null;
    var issue2 = query
        .Query<Issue>()
        .Include<User>(x => included = x).On(x => x.AssigneeId)
        .Single(x => x.Title == issue.Title);

    included.ShouldNotBeNull();
    included.Id.ShouldBe(user.Id);

    issue2.ShouldNotBeNull();
}

I’ll refer you to the documentation for more alternative usages, but just know that Marten has this capability and it’s a valuable way to improve performance in your system by reducing the number of network roundtrips between your code and the backend.

Marten’s Include() functionality was originally inspired/copied from RavenDb. We’ve unfortunately had some confusion in the past from folks coming over from EF Core where its Include() means something very different. Oh, and just to pull aside the curtain, it’s not doing any kind of JOIN behind the scenes, but a temporary table + multiple SELECT() statements.

Summary

I just wanted to get a handful of things across in this post:

Network round trips can easily be expensive and a contributing factor in poor system performance. Reducing the number of network round trips by batching queries can sometimes pay off overall even if that sometimes means more complex code
Marten has several features specifically meant to improve system performance by batching database queries that you can utilize. Both Marten and Wolverine are absolutely built with this philosophy of reducing network round trips as much as possible
Any coding or architectural strategy that results in excessive layering, long call stacks (A calls B that calls C that calls D that finally calls to a database), or really just obfuscates your understanding of how system operations lead to increased numbers of network round trips can easily be harmful to your system’s performance because you can’t easily “see” what your system is really doing

Never mind, Lamar is going to continue

A couple months ago I wrote Retiring Lamar and the Ghost of IoC Containers Past as we were closing in on decoupling Wolverine 3.0 from Lamar (since completed) and I was already getting sick of edge case bugs introduced by Microsoft from their inexplicably wacky approach for keyed services. Since releasing Wolverine 3.0 without its previous coupling to Lamar, I’ve recommended to several users and clients to just go put back Lamar because of various annoyances with .NET’s built in ServiceProvider. There’s just too many places where Lamar is significantly less finicky than ServiceProvider and I’m personally missing Lamar’s “it should just work” attitude when being forced to use ServiceProvider or helping other folks who just upgraded to Wolverine 3.0.

Long story short, I change my mind about ending Lamar support and I’m actually starting Lamar 14 today as part of the Critter Stack 2025 initiative. Sorry for the churn folks.

Personal Identifiable Information Masking in Marten

JasperFx Software helps our customers be more successful with their usage of the “Critter Stack” tools (or any other server side .NET tooling you might be using). The work in this post was delivered for a JasperFx customer to help protect their customer’s private information. If you need or want any help with event sourcing, Event Driven Architecture, or automated testing, drop us a note and we’d be happy to talk with you about what JasperFx can do for you.

I defy you to say the title of this post out loud in rapid succession without stumbling over it.

According to the U.S. Department of Labor, “Personal Identifiable Information” (PII) is defined as:

Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.

Increasingly, Marten users are running into requirements to be able to “forget” PII that is persisted within a Marten database. For the document storage, I think this is relatively easy to do with a host of existing functionality including the partial update functionality that Marten got (back) in V7. For the event store though, there wasn’t anything built in that would have made it easy to erase or “mask” protected information within the persisted event data — until now!

The Marten 7.31 adds a new capability to erase or mask PII data within the event store.

For a variety of reasons, you may wish to remove or mask sensitive data elements in a Marten database without necessarily deleting the information as a whole. Documents can be amended with Marten’s Patching API. With event data, you now have options to reach into the event data and rewrite selected members as well as to add custom headers. First, start by defining data masking rules by event type like so:

var builder = Host.CreateApplicationBuilder();
builder.Services.AddMarten(opts =>
{
    opts.Connection(builder.Configuration.GetConnectionString("marten"));

    // By a single, concrete type
    opts.Events.AddMaskingRuleForProtectedInformation<AccountChanged>(x =>
    {
        // I'm only masking a single property here, but you could do as much as you want
        x.Name = "****";
    });

    // Maybe you have an interface that multiple event types implement that would help
    // make these rules easier by applying to any event type that implements this interface
    opts.Events.AddMaskingRuleForProtectedInformation<IAccountEvent>(x => x.Name = "****");

    // Little fancier
    opts.Events.AddMaskingRuleForProtectedInformation<MembersJoined>(x =>
    {
        for (int i = 0; i < x.Members.Length; i++)
        {
            x.Members[i] = "*****";
        }
    });
});

That’s strictly a configuration time effort. Next, you can apply the masking on demand to any subset of events with the IDocumentStore.Advanced.ApplyEventDataMasking() API. First, you can apply the masking for a single stream:

public static Task apply_masking_to_streams(IDocumentStore store, Guid streamId, CancellationToken token)
{
    return store
        .Advanced
        .ApplyEventDataMasking(x =>
        {
            x.IncludeStream(streamId);

            // You can add or modify event metadata headers as well
            // BUT, you'll of course need event header tracking to be enabled
            x.AddHeader("masked", DateTimeOffset.UtcNow);
        }, token);
}

As a finer grained operation, you can specify an event filter (Func<IEvent, bool>) within an event stream to be masked with this overload:

public static Task apply_masking_to_streams_and_filter(IDocumentStore store, Guid streamId, CancellationToken token)
{
    return store
        .Advanced
        .ApplyEventDataMasking(x =>
        {
            // Mask selected events within a single stream by a user defined criteria
            x.IncludeStream(streamId, e => e.EventTypesAre(typeof(MembersJoined), typeof(MembersDeparted)));

            // You can add or modify event metadata headers as well
            // BUT, you'll of course need event header tracking to be enabled
            x.AddHeader("masked", DateTimeOffset.UtcNow);
        }, token);
}

Note that regardless of what events you specify, only events that match a pre-registered masking rule will have the header changes applied.

To apply the event data masking across streams on an arbitrary grouping, you can use a LINQ expression as well:

public static Task apply_masking_by_filter(IDocumentStore store, Guid[] streamIds)
{
    return store.Advanced.ApplyEventDataMasking(x =>
        {
            x.IncludeEvents(e => e.EventTypesAre(typeof(QuestStarted)) && e.StreamId.IsOneOf(streamIds));
        });
}

Finally, if you are using multi-tenancy, you can specify the tenant id as part of the same fluent interface:

public static Task apply_masking_by_tenant(IDocumentStore store, string tenantId, Guid streamId)
{
    return store
        .Advanced
        .ApplyEventDataMasking(x =>
        {
            x.IncludeStream(streamId);

            // Specify the tenant id, and it doesn't matter
            // in what order this appears in
            x.ForTenant(tenantId);
        });
}

Here’s a couple more facts you might need to know:

The masking rules can only be done at configuration time (as of right now)
You can apply multiple masking rules for certain event types, and all will be applied when you use the masking API
The masking has absolutely no impact on event archiving or projected data — unless you rebuild the projection data after applying the data masking of course

Summary

The Marten team is at least considering support for crypto-shredding in Marten 8.0, but no definite plans have been made yet. It might fit into the “Critter Stack 2025” release cycle that we’re just barely starting.

Sending Messages to the Original Sender with Wolverine

Yesterday I blogged about a small, convenience feature we snuck into he release of Wolverine 3.0 last week for a JasperFx Software customer I wrote about in Combo HTTP Endpoint and Message Handler with Wolverine 3.0. Today I’d like to show some additions to Wolverine 3.0 just to improve its ability to send responses back to the original sending application or raise other messages in response to problems.

One of Wolverine’s main functions is to be an asynchronous messaging framework where we expect messages to come into our Wolverine systems through messaging brokers like Azure Service Bus or Rabbit MQ or AWS SQS from another system (or you can message to yourself too of course). A frequent question from users is what if there’s a message that can’t be processed for some reason and there’s a need to send a message back to the originating system or to create some kind of alert message to a support person to intervene?

Let’s start with the assumption that at least some problems can be found with validation rules early in message processing such that you can determine early that a message is not able to be processed — and if this happens, send a message back to the original sender telling it (or a person) so. In the Wolverine documentation, we have this middleware for looking up account information for any message that implements an IAccountCommand interface:

// This is *a* way to build middleware in Wolverine by basically just
// writing functions/methods. There's a naming convention that
// looks for Before/BeforeAsync or After/AfterAsync
public static class AccountLookupMiddleware
{
    // The message *has* to be first in the parameter list
    // Before or BeforeAsync tells Wolverine this method should be called before the actual action
    public static async Task<(HandlerContinuation, Account?)> LoadAsync(
        IAccountCommand command,
        ILogger logger,

        // This app is using Marten for persistence
        IDocumentSession session,

        CancellationToken cancellation)
    {
        var account = await session.LoadAsync<Account>(command.AccountId, cancellation);
        if (account == null)
        {
            logger.LogInformation("Unable to find an account for {AccountId}, aborting the requested operation", command.AccountId);
        }

        return (account == null ? HandlerContinuation.Stop : HandlerContinuation.Continue, account);
    }
}

Now, let’s change the middleware up above to send a notification message back to whatever the original sender is if the referenced account cannot be found. For the first attempt, let’s do it by directly injecting IMessageContext (IMessageBus, but with some specific API additions we need in this case) from Wolverine like so:

public static class AccountLookupMiddleware
{
    // The message *has* to be first in the parameter list
    // Before or BeforeAsync tells Wolverine this method should be called before the actual action
    public static async Task<(HandlerContinuation, Account?)> LoadAsync(
        IAccountCommand command,
        ILogger logger,

        // This app is using Marten for persistence
        IDocumentSession session,
        
        IMessageContext bus,

        CancellationToken cancellation)
    {
        var account = await session.LoadAsync<Account>(command.AccountId, cancellation);
        if (account == null)
        {
            logger.LogInformation("Unable to find an account for {AccountId}, aborting the requested operation", command.AccountId);

            // Send a message back to the original sender, whatever that happens to be
            await bus.RespondToSenderAsync(new InvalidAccount(command.AccountId));

            return (HandlerContinuation.Stop, null);
        }

        return (HandlerContinuation.Continue, account);
    }
}

Okay, hopefully not that bad. Now though, let’s utilize Wolverine’s OutgoingMessages type to relay that message with this functionally equivalent code:

public static class AccountLookupMiddleware
{
    // The message *has* to be first in the parameter list
    // Before or BeforeAsync tells Wolverine this method should be called before the actual action
    public static async Task<(HandlerContinuation, Account?, OutgoingMessages)> LoadAsync(
        IAccountCommand command,
        ILogger logger,

        // This app is using Marten for persistence
        IDocumentSession session,

        CancellationToken cancellation)
    {
        var messages = new OutgoingMessages();
        var account = await session.LoadAsync<Account>(command.AccountId, cancellation);
        if (account == null)
        {
            logger.LogInformation("Unable to find an account for {AccountId}, aborting the requested operation", command.AccountId);

            messages.RespondToSender(new InvalidAccount(command.AccountId));
            return (HandlerContinuation.Stop, null, messages);
        }

        // messages would be empty here
        return (HandlerContinuation.Continue, account, messages);
    }
}

As of Wolverine 3.0, you’re now able to send messages from “before / validate” middleware by either using IMessageBus/IMessageContext or OutgoingMessages. This is in addition to the older functionality to possibly send messages on certain message failures, as shown below in a sample from the Wolverine documentation on custom error handling policies:

theReceiver = await Host.CreateDefaultBuilder()
    .UseWolverine(opts =>
    {
        opts.ListenAtPort(receiverPort);
        opts.ServiceName = "Receiver";

        opts.Policies.OnException<ShippingFailedException>()
            .Discard().And(async (_, context, _) =>
            {
                if (context.Envelope?.Message is ShipOrder cmd)
                {
                    await context.RespondToSenderAsync(new ShippingFailed(cmd.OrderId));
                }
            });
    }).StartAsync();

Summary

You’ve got options! Wolverine does have a concept of “respond to sender” if you’re sending messages between Wolverine applications that will let you easily send a new message inside a message handler or message handler exception handling policy back to the original sender. This functionality also works, admittedly in a limited capacity, with interoperability between MassTransit and Wolverine through Rabbit MQ.

Combo HTTP Endpoint and Message Handler with Wolverine 3.0

With the release of Wolverine 3.0 last week, we snuck in a small feature at the last minute that was a request from a JasperFx Software customer. Specifically, they had a couple instances of a logical message type that needed to be handled both from Wolverine’s Rabbit MQ message transport, and also from the request body of an HTTP endpoint inside their BFF application.

You can certainly beat this problem a couple different ways:

Use the Wolverine message handler as a mediator from within an HTTP endpoint. I’m not a fan of this approach because of the complexity, but it’s very common in .NET world of course.
Just delegate from an HTTP endpoint in Wolverine directly to the (in this case) static method message handler. Simpler mechanically, and we’ve done that a few times, but there’s a wrinkle coming of course.

One of the things that Wolverine’s HTTP endpoint model does is allow you to quickly make little one off validation rules using the ProblemDetails specification that’s great for one off validations that don’t fit cleanly into Fluent Validation usage (which is also supported by Wolverine for both message handlers and HTTP endpoints). Our client was using that pattern on HTTP endpoints, but wanted to expose the same logic — and validation logic — as a message handler while still retaining the validation rules and ProblemDetails response for HTTP.

As of the Wolverine 3.0 release last week, you can now use the ProblemDetails logic with message handlers as a one off validation test if you are using Wolverine.Http as well as Wolverine core. Let’s jump right to an example of a class to both handle a message as a message handler in Wolverine and handle the same message body as an HTTP web service with a custom validation rule using ProblemDetails for the results:

public record NumberMessage(int Number);

public static class NumberMessageHandler
{
    // More likely, these one off validation rules do some kind of database
    // lookup or use other services, otherwise you'd just use Fluent Validation
    public static ProblemDetails Validate(NumberMessage message)
    {
        // Hey, this is contrived, but this is directly from
        // Wolverine.Http test suite code:)
        if (message.Number > 5)
        {
            return new ProblemDetails
            {
                Detail = "Number is bigger than 5",
                Status = 400
            };
        }
        
        // All good, keep on going!
        return WolverineContinue.NoProblems;
    }
    
    // Look at this! You can use this as an HTTP endpoint too!
    [WolverinePost("/problems2")]
    public static void Handle(NumberMessage message)
    {
        Debug.WriteLine("Handled " + message);
        Handled = true;
    }

    public static bool Handled { get; set; }
}

What’s significant about this class is that it’s a perfectly valid message handler that will be discovered by Wolverine as a message handler. Because of the presence of the [WolverinePost] attribute, Wolverine.HTTP will discover this as well and independently create an AspNetCore Endpoint route for this method.

If the Validate method returns a non-“No problems” response:

As a message handler, Wolverine will log a JSON serialized value of the ProblemDetails and stop all further processing
As an HTTP endpoint, Wolverine.HTTP will write the ProblemDetails out to the HTTP response, set the status code and content-type headers appropriately, and stop all further processing

Arguably, Wolverine’s entire schtick and raison d’être is to provide a much lower code ceremony development experience than other .NET server side development tools. I think the code above is a great example of how Wolverine really does this. Especially if you know that Wolverine.HTTP is able to glean and enhance the OpenAPI metadata created for the endpoint above to reflect the possible status code 400 and application/problem+json content type response, compare the Wolverine approach above to a more typical .NET “vertical slice architecture” approach that is probably using MVC Core controllers or Minimal API registrations with plenty of OpenAPI-related code noise to delegate to MediatR message handlers with all of its attendant code ceremony.

Besides code ceremony, I’d also point out that the functions you write for Wolverine up above are much more often going to be pure functions and/or synchronous for much easier unit testing than you can with other tools. Lastly, and I’ll try to show this in a follow up blog post about Wolverine’s middleware strategy, Wolverine’s execution pipeline results in fewer object allocations than IoC-centric tools like MediatR or MassTransit or MVC Core / Minimal API do at runtime.

Wolverine 3.0 is Live!

Just as the title says, Wolverine 3.0 is live and published to Nuget! I believe that this release addresses some of Wolverine’s prior weaknesses and adds some powerful new features requested by our users. The journey for Wolverine right now is to be the singular most effective set of tooling for building robust, maintainable, and testable server side code in the .NET ecosystem. If you’re wondering about the value proposition of Wolverine as any combination of mediator, in process message bus, asynchronous messaging framework, or alternative HTTP web service framework, it’s that Wolverine will help you be successful with substantially less code because Wolverine helps you much more to simplify the code inside of message handlers or HTTP endpoint methods than other comparable .NET tooling.

Enough of the salesmanship, before I go any farther, let me thank quite a few folks for their contributions to Wolverine:

Babu Annamalai
JT for all his work on Rabbit MQ for this release and a whole host of other contributions to the “Critter Stack” including leveling us up on Discord usage
Jesse for making quite a few suggestions that wound up being usability improvements
Haefele for his contributions
Erik Shafer for helping with project communications
JasperFx Software‘s clients across the globe for making it possible for me to work on the “Critter Stack” and push it forward (a lot of features and functionality in this release were built at the behest of JasperFx clients)
And finally, even though this doesn’t show up in GitHub contributor numbers sometimes, everyone who has taken the time to write up actionable bug reports or feature requests. That is an absolutely invaluable element of successful OSS community projects

Alright, more lists! Here’s some relevant links:

The major new features or changes in this release are:

Wolverine is no longer directly coupled to Lamar and can now used with at least ServiceProvider and theoretically any other IoC tool that conforms to the .NET DI standards — but I’d highly recommend that you stick to the well lit paths of ServiceProvider or Lamar. Not that many people cared, but the ones who did cared about this a lot
You can now bootstrap Wolverine with HostApplicationBuilder or any .NET bootstrapper that supports IServiceCollection some how, some way. Wolverine is no longer limited to only IHostBuilder
Wolverine’s leadership election and node assignment subsystem got a pretty substantial overhaul. The result is much simpler code and far, far better behavior and reliability. This was arguably the biggest weakness of Wolverine < 3.0
There’s a new transport option for Apache Pulsar (actually really old code, but released to Nuget now)
Batch message processing
“Sticky” message handling when you need to handle a single message type in multiple handlers with “sticky” assignments to particular queues or listeners.
An options for RavenDb persistence including the transactional inbox/outbox, scheduled messaging, and saga persistence
Additions to the Rabbit MQ support including the ability to use header exchanges
Lightweight saga storage for either PostgreSQL or SQL Server that works without either Marten or EF Core

And plenty of small “reduce paper cuts and repetitive code” changes here and there. The documentation website also got some review and refinement as well.

What’s next, because there’s always a next…

There will be bug reports, and we’ll try to deal with them as quickly. There’s a GCP PubSub transport option brewing in the community that may hit soon. It’s somewhat likely there will be a CosmosDb integration for Wolverine message storage, sagas, and scheduled messages this year. There were some last minute scope cuts for productivity that maybe gets addressed with follow up releases to Wolverine 3.0, but more likely in 4.0.

Mostly though, Wolverine 3.0 might be somewhat short lived as Wolverine 4.0 work (and Marten 8) will hopefully start as early as next week as the “Critter Stack” community and JasperFx Software tries to implement what I’ve been calling the “Critter Stack 2025” goals heading into 1st quarter 2025.

I’m logging off for the rest of the night (at least from work), and I know there’ll be a list of questions or problems in the morning (the joy of being 5-7 hours behind most of your users and clients), but for now:

Multi Step Workflows with the Critter Stack

I’m working with a JasperFx Software client who is in the beginning stages of building a pretty complex, multi-step file import process that is going to involve several different services. For the sake of example code in this post, let’s say that we have the (much simplified from my client’s actual logical workflow) workflow from the diagram above:

External partners (or customers) are sending us an Excel sheet with records that our system will need to process and utilize within our downstream systems (invoices? payments? people? transactions?)
For the sake of improved throughput, the incoming file is broken into batches of records so the smaller batches can be processed in parallel
Each batch needs to be validated by the “Validation Service”
When each batch has been completely validated:
- If there are any errors, send a rejection summary about the entire file to the original external partner
- If there are no errors, try to send each record batch to “Downstream System #1”
When each batch has been completely accepted or rejected by “Downstream System #1”
- If there are any rejections, send a rejection summary about the entire file to the original external partner
- If all batches are accepted by “Downstream System #1”, try to send each record batch to “Downstream System #2”
When each batch has been completely accepted or rejected by “Downstream System #2”
- If there are any rejections, send a rejection summary about the entire file to the original external partner and a message to “Downstream System #1” to reverse each previously accepted records in the file
- If all batches are accepted by “Downstream System #2”, send a successful receipt message to the original external partner and archive the intermediate state

Right off the bat, I think we can identify a couple needs and challenges:

We need some way to track the current, in process state of an individual file and where all the various batches are in that process
At every point, make decisions about what to do next in the workflow based on the current state of the file based on incremental process. And to make this as clear as possible, I think it’s extremely valuable to be able to clearly write, read, unit test, and reason about this workflow code without any significant coupling to the surrounding infrastructure.
The whole system should be resilient in the face of the expected transient hiccups like a database getting overwhelmed or a downstream system being temporarily down and “work” should never get lost or hopefully even require human intervention at runtime
Especially for large files, we absolutely better be prepared for some challenging concurrency issues when lots of incoming messages attempt to update that central file import processing state
Make it all performance too of course!

Alright, so we’re definitely using both Marten for persistence and Wolverine for the workflow and messaging between services for all of this. The first basic approach for the state management is to use Wolverine’s stateful saga support with Marten. In that case we might have a saga type in Marten something like this:

// Again, express the stages in terms of your
// business domain instead of technical terms,
// but you'll do better than me on this front!
public enum FileImportStage
{
    Validating,
    Downstream1,
    Downstream2,
    Completed
}

// As long as it's JSON serialization friendly, you can happily
// tighten up the access here all you want, but I went for quick and simple
public class FileImportSaga : 
    // Only necessary marker type for Wolverine here
    Saga, 
    
    // Opts into tracked version concurrency for Marten
    // We probably want in this case
    IRevisioned
{
    // Identity for this saga within our system
    public Guid Id { get; set; }
    public string FileName { get; set; }
    public string PartnerTrackingNumber { get; set; }
    public DateTimeOffset Created { get; set; } = DateTimeOffset.UtcNow;
    public List<RecordBatchTracker> RecordBatches { get; set; } = new();

    public FileImportStage Stage { get; set; } = FileImportStage.Validating;
    
    // Much more in just a bit...
}

Inside our system, we can start a new FileImportSaga and launch the first set of messages to validate each batch of records with this handler that reacts to a request to import a new file:

public record ImportFile(string fileName);

// This could have been done inside the FileImportSaga as well,
// but I think I'd rather keep that focused on the state machine
// and workflow logic
public static class FileImportHandler
{
    public static async Task<(FileImportSaga, OutgoingMessages)> Handle(
        ImportFile command, 
        IFileImporter importer,
        CancellationToken token)
    {
        var saga = await importer.ReadAsync(command.fileName, token);
        var messages = new OutgoingMessages();
        messages.AddRange(saga.CreateValidationMessages());

        return (saga, messages);
    }
}

public interface IFileImporter
{
    Task<FileImportSaga> ReadAsync(string fileName, CancellationToken token);
}

Let’s say that we’re receiving messages back from the Validation Message like this:

public record ValidationResult(Guid Id, Guid BatchId, ValidationMessage[] Messages);

public record ValidationMessage(int RecordNumber, string Message);

Quick note, if Wolverine is handling the messaging in the downstream systems, it’s helping make this easier by tracking the saga id in message metadata from upstream to downstream and back to the upstream through response messages. Otherwise you’d have to track the saga id on the incoming messages.

We could process the validation results in our saga one at a time like so:

    // Use Wolverine's cascading message feature here for the next steps
    public IEnumerable<object> Handle(ValidationResult validationResult)
    {
        var currentBatch = RecordBatches
            .FirstOrDefault(x => x.Id == validationResult.BatchId);
        
        // We'd probably rig up Wolverine error handling so that it either discards
        // a message in this case or immediately moves it to the dead letter queue
        // because there's no sense in trying to retry a message that can never be
        // processed successfully
        if (currentBatch == null) throw new UnknownBatchException(Id, validationResult.BatchId);
        currentBatch.ReadValidationResult(validationResult);
        
        var currentValidationStatus = determineValidationStatus();
        switch (currentValidationStatus)
        {
            case RecordStatus.Pending:
                yield break;
            
            case RecordStatus.Accepted:
                Stage = FileImportStage.Downstream1;
                foreach (var batch in RecordBatches)
                {
                    yield return new RequestDownstream1Processing(Id, batch.Id, batch.Records);
                }

                break;
            
            case RecordStatus.Rejected:
                // This saga is complete
                MarkCompleted();
                
                // Tell the original sender that this file is rejected
                // I'm assuming that Wolverine will get the right information
                // back to the original sender somehhow
                yield return BuildRejectionMessage();
                break;
            
        }
    }
    
    private RecordStatus determineValidationStatus()
    {
        if (RecordBatches.Any(x => x.ValidationStatus == RecordStatus.Pending))
        {
            return RecordStatus.Pending;
        }

        if (RecordBatches.Any(x => x.ValidationStatus == RecordStatus.Rejected))
        {
            return RecordStatus.Rejected;
        }

        return RecordStatus.Accepted;
    }

First off, I’m going to argue that the way that Wolverine supports its stateful sagas and its cascading message feature make the workflow logic pretty easy to unit test in isolation from all the infrastructure. That part is good, right? But what’s maybe not great is that we could easily be getting a bunch of those ValidationResult messages back for the same file at the same time because they’re handled in parallel, so we really need to be prepared for that.

We could rely on the Wolverine/Marten combination’s support for optimistic concurrency and just retry ValidationResult messages that fail because of caught ConcurrencyException, but that’s potentially thrashing the database and the application pretty hard. We could also solve this problem in a “sledgehammer to crack a nut” kind of way by using Wolverine’s strictly ordered listener approach that would force the file import status messages to be processed in order on a single running node:

builder.Host.UseWolverine(opts =>
{
    opts.UseRabbitMq(builder.Configuration.GetConnectionString("rabbitmq"));

    opts.ListenToRabbitQueue("file-import-updates")
        
        // Single file, serialized access across the
        // entire running application cluster!
        .ListenWithStrictOrdering();
});

That solves the concurrency issue in a pretty hard core way, but it’s not going to terribly performant because you’ve eliminated all concurrency between different files and you’re making the system constantly load, then save the FileImportSaga data for intermediate steps. Let’s adjust this and incorporate Wolverine’s new message batching feature.

First off, let’s add a new validation batch message like so:

public record ValidationResultBatch(Guid Id, ValidationResult[] Results);

And a new message handler on our saga type for that new message type:

    public IEnumerable<object> Handle(ValidationResultBatch batch)
    {
        var groups = batch.Results.GroupBy(x => x.BatchId);
        foreach (var group in groups)
        {
            var currentBatch = RecordBatches
                .FirstOrDefault(x => x.Id == group.Key);

            foreach (var result in group)
            {
                currentBatch.ReadValidationResult(result);
            }
        }

        return DetermineNextStepsAfterValidation();
    }

    // I pulled this out as a helper, but also, it's something
    // you probably want to unit test in isolation on just the FileImportSaga
    // class to nail down the workflow logic w/o having to do an integration
    // test
    public IEnumerable<object> DetermineNextStepsAfterValidation()
    {
        var currentValidationStatus = determineValidationStatus();
        switch (currentValidationStatus)
        {
            case RecordStatus.Pending:
                yield break;
            
            case RecordStatus.Accepted:
                Stage = FileImportStage.Downstream1;
                foreach (var batch in RecordBatches)
                {
                    yield return new RequestDownstream1Processing(Id, batch.Id, batch.Records);
                }

                break;
            
            case RecordStatus.Rejected:
                // This saga is complete
                MarkCompleted();
                
                // Tell the original sender that this file is rejected
                // I'm assuming that Wolverine will get the right information
                // back to the original sender somehhow
                yield return BuildRejectionMessage();
                break;
            
        }
    }

And lastly, we need to tell Wolverine how to do the message batching, which I’ll do first with this code:

public class ValidationResultBatcher : IMessageBatcher
{
    public IEnumerable<Envelope> Group(IReadOnlyList<Envelope> envelopes)
    {
        var groups = envelopes
            .GroupBy(x => x.Message.As<ValidationResult>().Id)
            .ToArray();
        
        foreach (var group in groups)
        {
            var message = new ValidationResultBatch(group.Key, group.OfType<ValidationResult>().ToArray());

            // It's important here to pass along the group of envelopes that make up 
            // this batched message for Wolverine's transactional inbox/outbox
            // tracking
            yield return new Envelope(message, group);
        }
    }

    public Type BatchMessageType => typeof(ValidationResultBatch);
}

Then lastly, in your Wolverine configuration in your Program file (or a helper method that’s called from Program), you’d tell Wolverine about the batching strategy like so:

builder.Host.UseWolverine(opts =>
{
    // Other Wolverine configuration...
    
    opts.BatchMessagesOf<ValidationResult>(x =>
    {
        x.Batcher = new ValidationResultBatcher();
        x.BatchSize = 100;
    });
});

With the message batching, you’re potentially putting less load on the database and improving performance by simply making fewer reads and writes over all. You might still have some concurrency concerns, so you have more options to control the parallelization of the ValidationResultBatch messages running locally like this in your UseWolverine() configuration:

    opts.LocalQueueFor<ValidationResultBatch>()
        
        // You *could* do this to completely prevent
        // concurrency issues
        .Sequential()

        // Or depend on some level of retries on concurrency
        // exceptions and let it parallelize work by file
        .MaximumParallelMessages(5);

We could choose to accept some risk of concurrent access to an individual FileImportSaga (unlikely after the batching, but still), so let’s add some better optimistic concurrency checking with our friend Marten. For any given Saga type that’s persisted with Marten, just implement the IRevisioned interface to let Wolverine know to opt into Marten’s concurrency protection like so:

public class FileImportSaga : 
    // Only necessary marker type for Wolverine here
    Saga, 
    
    // Opts into tracked version concurrency for Marten
    // We probably want in this case
    IRevisioned

That’s it, that’s all you need to do. What this does for you is create a check by Wolverine & Marten together that during the processing of any message on a FileImportSaga that no other message was successfully processed against that FileImportSaga between loading the initial copy of the saga at the time the transaction is committed. If Marten detects a concurrency violation upon the commit, it rejects the transaction and throws a ConcurrencyException. We can handle that with a series of retries to just have Wolverine retry the message from the new state with this error handling policy that I’m going to make specific to our FileImportSaga like so:

public class FileImportSaga : 
    // Only necessary marker type for Wolverine here
    Saga, 
    
    // Opts into tracked version concurrency for Marten
    // We probably want in this case
    IRevisioned
{
    public static void Configure(HandlerChain chain)
    {
        // Retry the message over again at least 3 times
        // with the specified wait times
        chain.OnException<ConcurrencyException>()
            .RetryWithCooldown(100.Milliseconds(), 250.Milliseconds(), 250.Milliseconds());
    }

    // ... the rest of FileImportSaga

See Wolverine’s error handling facilities for more information.

So now we’ve got the beginnings of a multi-step process using Wolverine’s stateful saga support. We’ve also taken some care to protect our file import process against concurrency concerns. And we’ve done all of this in a way where we can quite handily test the workflow logic by just doing state-based tests against the FileImportSaga with no database or message broker infrastructure in sight before we waste any time trying to debug the whole shebang.

Summary

The key takeaway I hope you get from this is that the full Critter Stack has some significant tooling to help you build complex, multi-step workflows. Pair that with the easy getting started stories that both tools have, and I think you have a toolset that allows you to quickly start while also scaling up to more complex needs when you need that.

As so very often happens, this blog post was bigger than I thought it would be, and I’m breaking it up into a series of a follow ups. In the next version of this post, we’ll take the same logical FileImportSaga and do the logical workflow tracking with Marten event sourcing to track the state and use some cool new Marten functionality for the workflow logic inside of Marten projections.

This might take a bit to get to, but I’ll also revisit this original implementation and talk about some extra Marten functionality to further optimize performance by baking in archiving through Marten soft-deletes and its support for PostgreSQL table partitioning.

So historically I’m actually pretty persnickety about being precise about technical terms and design pattern names, but I’m admittedly sloppy about calling something a “Saga” when maybe it’s technically a “Process Manager” and I got jumped online about that by a celebrity programmer. Sorry, not sorry?

Scaling Event Projections and Subscriptions with the Critter Stack

The feature set shown in this post was built earlier this year at the behest of a JasperFx Software client who has some unusually high data throughput and wanted to have some significant ability to scale up Marten and Wolverine‘s ability to handle a huge number of incoming events. We originally put this into what was meant to be a paid add on product, but after consultation with the rest of the Critter Stack core team and other big users, we’ve decided that it would be best for this functionality to be in the OSS core of Wolverine.

JasperFx Software is currently working with a client who has a system with around 75 million events in their database and the expectation that that database could double soon. At the same time, they need to be running around 15-20 different event projections continuously running asynchronously to build read side views. To put it mildly, they’re going to want some serious ability for Marten (with a possible helping hand from Wolverine) to handle that data in a performant manner.

Before Marten 7.0, Marten could only run projections with a “hot/cold” ownership mode that resulted in every possible projection running on a single application node within the cluster. So, not that awesome for scalability to say the least. With 7.0, Marten can do some load distribution of different projections, but it’s not terribly predictable and has no guarantee of spreading the load out.

Enter Wolverine 3.0 (RC-2 in this case) and its new ability to distribute event projections and subscriptions throughout an application cluster. With this option, as shown below:

opts.Services.AddMarten(m =>
    {
        m.DisableNpgsqlLogging = true;
        m.Connection(Servers.PostgresConnectionString);
        m.DatabaseSchemaName = "csp";

        // This was taken from Wolverine test code
        // Imagine there being far more projections and
        // subscriptions
        m.Projections.Add<TripProjection>(ProjectionLifecycle.Async);
        m.Projections.Add<DayProjection>(ProjectionLifecycle.Async);
        m.Projections.Add<DistanceProjection>(ProjectionLifecycle.Async);
    })
    .IntegrateWithWolverine(m =>
    {
        // This makes Wolverine distribute the registered projections
        // and event subscriptions evenly across a running application
        // cluster
        m.UseWolverineManagedEventSubscriptionDistribution = true;
    });

Using the UseWolverineManagedEventSubscriptionDistribution() option in place of Marten’s own async daemon management will give you a load distribution more like this:

Using this model, Wolverine can spread the asynchronous load to more running nodes so you can hopefully get a lot more throughput in your asynchronous projections without overloading any one node.

With this option, Wolverine is going to ensure that every single known asynchronous event projection and every event subscription is running on exactly one running node within your application cluster. Moreover, Wolverine will purposely stop and restart projections or subscriptions to purposely spread the running load across your entire cluster of running nodes.

In the case of using multi-tenancy through separate databases per tenant with Marten, this Wolverine “agent distribution” will assign the work by tenant databases, meaning that all the running projections and subscriptions for a single tenant database will always be running on a single application node. This was done with the theory that this affinity would hopefully reduce the number of used database connections over all.

If a node is taken offline, Wolverine will detect that the node is no longer accessible and try to move start the missing projection/subscription agents on another active node.

If you run your application on only a single server, Wolverine will of course run all projections and subscriptions on just that one server.

Some other facts about this integration:

Wolverine’s agent distribution does indeed work with per-tenant database multi-tenancy
Wolverine does automatic health checking at the running node level so that it can fail over assigned agents
Wolverine can detect when new nodes come online and redistribute work
Wolverine is able to support blue/green deployment and only run projections or subscriptions on active nodes where a capability is present. This just means that you can add all new projections or subscriptions, or even just new versions of a projection or subscription on some application nodes in order to do try “blue/green deployment.”
This capability does depend on Wolverine’s built-in leadership election — which fortunately got a lot better in Wolverine 3.0

Future Plans

While this functionality will be in the OSS core of Wolverine 3.0, we plan to add quite a bit of support to further monitor and control this feature with the planned “Critter Watch” management console tool we (JasperFx) are building. We’re planning to allow users to:

Visualize and monitor which projections and/or subscriptions are running on which application node
See a correlation to performance metrics being emitted to the Open Telemetry tool of your choice — with Prometheus PromQL compatible tools being supported first
Be able to create affinity groups between projections or subscriptions that might be using the same event data as a possible optimization
Allow individual projections or subscriptions to be paused or restarted
Trigger manual projection rebuilds at runtime
Trigger “rewinds” of subscriptions at runtime

We’re also early in planning to port the Marten event sourcing support to additional database engines. The above functionality will be available for those other database engines when we get there.

This functionality was originally conceived of something like 5-6 years ago, and it’s personally very exciting to me to finally see it out in the wild!

My Recommendations for a Test Automation Strategy

I’ve been helping a JasperFx Software client with their test automation strategy on a new web application and surrounding suite of services. That makes this a perfectly good time to reevaluate how I think teams can succeed with automated testing as an update to what I thought a decade ago. I think you can justifiably describe this post as a stream of consciousness brain dump with just a modicum of editing.

Psych, it took me about four months to actually finish this post, but they’re doing fine as is!

First off, let’s talk about the desirable qualities of a successful test automation strategy.

The backing automated test suite gives you enough confidence to know when your code can be shipped. Mind you, this isn’t about 100% test coverage because that’s rarely practical or cost effective. Instead, this is feeling that there is an acceptably low risk of problems when we deploy if the automated tests are all currently passing. And sorry, I don’t have a hard and fast number to put on that “feeling,” but hopefully you could do so over time by tracking the actual rate of defects from releases.
It’s mechanically easy enough to write the automated tests for your system that the effort in doing so pays off. To some degree you can improve this equation by purposely choosing development tools that lend themselves to automated testing (like Marten and PostgreSQL!). Otherwise, you can also improve the value of the automated tests through some judicious usage of custom testing harnesses or possibly using BDD tools (like Gherkin, but I’ve also had success from time to time with old FIT/FitNesse style testing or even just some one off internal DSL tools) that might make the tests be more declarative.
The automated tests run fast enough to give us an effective feedback cycle — but that’s admittedly 100% subjective. If the tests are too slow, folks won’t run them often enough for the tests to be perfectly helpful and the tests will tend to drift apart from the code. In an ideal world, the tests are running often enough that regression test failures are caught at nearly the same time as the code change that introduced the regression so your teams have an easier time diagnosing the regression problems.
The automated tests are reliable, just meaning that there’s little to no flakiness and you can generally trust the test results as really being a success or failure. User interface testing or any testing involving asynchronous processes are notoriously hard to do reliably, and the flakiness can be a very real problem. Given a choice between having technically more test coverage of a system and the existing test suites being more reliable, I will purposely choose to delete flaky tests as a compromise if it’s not feasible to improve or rewrite the flaky tests first.

Now let’s talk about how to get to the qualities above by covering both some squishy people oriented process stuff and hard technical approaches that I think help lead to better results.

The test automation engineers should ideally be just part of the development team. It takes a lot of close collaboration between developers and test automation engineers to make a test automation strategy actually work. Most of the, let’s nicely say, less successful test automation efforts I’ve seen over time have been at least partially caused by insufficient collaboration between developers and test automation engineers.

There’s always been a sizable backlash and general weariness in regards to Agile Software methods (and was from the very beginning as I recall), but one thing early Agile methods like Extreme Programming got absolutely right was an emphasis on self-contained teams where everybody’s goal is to ship software rather than being narrow specialists on separate teams who only worried about writing code or testing or designing. Or as the Lean Development folks told us, look to optimize the whole process of shipping software rather than any one intermediate deliverable or artifact.

In practice, this “optimize the whole” probably means that developers are full participants in the automated testing, whether that’s simply adjusting the system to make testing easier (especially if your shop is going to make any investment into automating tests through the user interface) or getting their hands dirty helping write “socialable” integration tests. “Optimize the whole” to means that it’s absolutely worth developer’s time to help with test automation efforts and to even purposely make changes in the system architecture to facilitate easier testing if that extra work still results in shipping software faster through quicker testing.

Use the fastest feedback cycle that adequately tests whatever it is you’re trying to test. I’m sure many of you have seen some form of the test automation pyramid:

We could have a debate about exactly what mix of “solitary” unit tests to “sociable” integration tests to end to end, or truly black box end to end tests is ideal in any given situation, but I think the guiding rule is what I referred to years ago as Jeremy’s Only Rule of Testing:

Test with the finest grained mechanism that tells you something important

Let’s make this rule more concrete by considering a few cases and how we might go about automating testing.

First, let’s say that we have a business rule that says that attempting to create an overdraft of a banking account where an account isn’t allowed to do that should reject the requested transactions. That’s absolutely worth an integration test of some sort too, but I’d absolutely vote first for pretty isolated unit tests against just the business logic that doesn’t involve any kind of database or user interface.

On the other hand, one of my clients is utilizing GraphQL between their front end React.js components and the backend. In that case where you won’t really know for sure that the GraphQL sent from the TypeScript client works correctly with the .NET backend without some end to end tests — which is what they are doing with Playwright. All the same though, we did come up with a recipe for testing out the GraphQL endpoints in isolation from the HTTP request level down to the database as a way of testing the database wiring. I’d say that these two types of testing are highly complementary, as it also is to test business logic elements within their GraphQL mutations without the database. One point I recommended to these clients is to move toward, or at least add, more granular tests of some sort anytime the end to end tests are being hard to debug in the case of test failures. In more simpler terms, excessive trouble debugging problems is probably an indication that you might need more fine-grained tests.

Before I get out of this section, let’s just pick on Selenium overuse here as the absolute scourge of successful test automation in the wild (my client is going with Playwright for browser testing instead which would have been my recommendation anyway). End to end tests using Selenium to drive a web browser are naturally much slower and often more work to write than more focused white box integration tests or isolated unit tests would be — not to mention frequently much less reliable. For that reason, I’m personally a big fan of using white box integration tests much more than end to end, black box tests. Living in server side .NET, that to me means testing a lot more at the message handler level, or at the HTTP endpoint level (which is what Alba does for JasperFx clients and Wolverine.HTTP itself).

The test automation code should be in the same repository as the application code.

I’m not sure why this would be even remotely controversial, but I’ve frequently seen it both together with the system code and in completely separate repositories.

As a default approach, the test automation code should be written in the same language as the application code — with a preference for the server side language. I think this would be the first place I’d compromise though because there are so many testing tools that are coupled to the JavaScript world, so maybe never mind this one:)

It’s very advantageous for any automated integration tests to be easily executed locally by developers on demand. What I mean by this is that developers can easily take their current development branch, and run any part of the automated test suite on demand against their current code. There’s a couple major advantages when you can do this:

When tests are broken, and they will be, being able to run the tests locally is a much faster feedback cycle for investigating why the tests are broken that it would be to only be able to run the tests by deploying to a build or test server
It’s very helpful to be able to use automated tests to jump right into debugger session against the code
Developers will be much more likely to help keep the tests up to date with the system code if they at least occasionally run the tests themselves
It’s helpful to use the big end to end tests as a safety net for bigger restructuring work

I’ve seen multiple shops where the end to end tests were written by test automation engineers in a black box manner where the test suites could basically only be executed on centralized test servers and sometimes even only through CI (Continuous Integration) servers. That situation doesn’t seem to ever lead to successful test automation efforts.

Automated tests should be what old colleagues and I called “self-contained” tests. All I mean by this is that I want automated tests to be responsible for setting up the system state for the test within the expression of the test. You want to do this in my opinion for two reasons:

It will make the tests be much more reliable because you can count on the system being in the exact right state for the test
Having the system state set up by the test itself hopefully makes it easier to reason about the test itself and how the system state, action, and assertions all relate to each other

As an alternative, think about tests that depend on some kind of external script setting up a database through a shared data set. From experience, I can tell you that’s often very hard to reason about a failing test when you can’t easily see the test inputs.

No shared databases if you can help it. Again, this isn’t something I think should be controversial in the year 2024. You can easily get some horrendous false positives or false negatives from trying to execute automated tests against a shared database. Given even remotely a choice, I want an isolated database for each developer, tester, or formal testing environment to have isolated test data setup. This does put some onus on teams to have effective database scripting automation — but you want that anyway.

My preference these days is to rely hard on technologies that are friendly to being part of integration tests, which usually means some combination of being easy for developers to run locally and being relatively easy to configure or setup expected state in code within test harnesses. One of the reasons Marten exists in the first place was to have a NoSQL type workflow in development while being able to very easily spin up new databases and to quickly tear down database state between automated test runs.

Give a choice — and you won’t always have that choice, so don’t get too excited here — I strongly prefer to use technologies that have a great local development and testing story over “Cloud only” technologies. If you do need to utilize Cloud only technology (Azure Service Bus being a common example of that in my recent experience), you can ameliorate the problems that causes for testing by somehow letting each developer or testing environment get their own namespace or some other kind of resource isolation like prefixed resource names per environment. The point here is that automated testing always goes better when you have predictable system inputs that you can expect to lead to expected outcomes in tests. Using any kind of shared resource can sometimes lead to untrustworthy test results.

Older Writings

I’ve written a lot about automated testing over the years, and this post admittedly overlaps with a lot of previous writing — but it’s also kind of fun to see what has or hasn’t evolved in my own thinking:

Integration Testing: IHost Lifecycle with NUnit (2021)
Integration Testing: IHost Lifecycle with xUnit.Net (2021)
A Brain Dump on Automated Testing (2021)
Testing effectively — with or without mocks or stubs (2021)
Using Alba to Test ASP.Net Services (2021)
Fast Build, Slow Build, and the Testing Pyramid (2020)
A Small Case Study in Test Automation (and other things) (2020)
Automated Test Pyramid in our Typical Development Stack (2017)
Using Mocks or Stubs, Revisited (2016)
Succeeding with Automated Integration Tests (2015)
My Opinions on Data Setup for Functional Tests (2013)
Jeremy’s Only Rule of Testing (2012)

And quite a bit of older stuff I didn’t quite manage to rescue off of Codebetter.com before it went off air:(