What I use for interacting with Git

A friend of mine asked me at lunch this week what I was using to interact or manipulate Git, and I think he might have been disappointed with my lack of sophistication. At this point, I’m not willing to cast aspersions on whatever folks use as long as they’re being effective in their own environment, and there are definitely folks with way more Git-fu than I have.

Offhand though, I tend to use:

  • JetBrains Rider for making commits while I’m working just because it’s really easy because that’s just the window you’re already in. A quick “CMD-K” in my keyboard shortcut setup, type up the message, and hit return if nothing looks out of place. Honestly, I do far more amend commits just because of how easy that is with a user interface, and that might have actually changed my workflow a little bit from what it was 5 years ago.
  • With a codebase that has a fast build, I’m might instead do it at the command line with a [run build] && git commit -a -m "commit message"
  • If I really need to look closer at an ongoing commit or look closely at the recent change history, I’ll use the GitHub Desktop user interface.
  • I pretty much only use GitHub Desktop for cherry picking or squashing commits — which I rarely do, but I will just happen to need here in the next hour
  • For everything else, including pulls, pushes, and creating or deleting branches, I just do the 4-finger swipe over to an open terminal window and do it directly in the command line. Yes, you can certainly open a terminal window directly in Rider, but I just don’t have muscle memory for that
  • Just to add yet another tool, I really like using VS Core for merge conflicts. For whatever reason, that feels the easiest to me

There you go, what I do here and there. Not particularly advanced, but I don’t feel like I have to spend much time at all with Git.

Why and How Marten is a Great Document Database

Just a reminder, JasperFx Software offers support contracts and consulting services to help you get the most out of the “Critter Stack” tools (Marten and Wolverine). If you’re building server side applications on .NET, the Critter Stack is the most feature rich tool set for Event Sourcing and Event Driven Architectures around. And as I hope to prove to you in this post, Marten is a great option as a document database too!

Marten as a project started as an ultimately successful attempt to replace my then company’s usage of an early commercial “document database” with the open source PostgreSQL database — but with a small, nascent event store functionality bolted onto the side. With the exception of LINQ provider related issues, most of my attention these days is focused on the event sourcing side of things with the document database features in Marten just being a perfect complement for event projections.

This week and last though, I’ve had cause to work with a different document database option and it served to remind me that hey, Marten has a very strong technical story as a document database option. With that being said, let me get on with lionizing Marten by starting with a quick start.

Let’s say that you are building a server side .NET application with some kind of customer data and you at least start by modeling that data like so:

public class Customer
{
    public Guid Id { get; set; }

    // We'll use this later for some "logic" about how incidents
    // can be automatically prioritized
    public Dictionary<IncidentCategory, IncidentPriority> Priorities { get; set; }
        = new();

    public string? Region { get; set; }

    public ContractDuration Duration { get; set; }
}

public record ContractDuration(DateOnly Start, DateOnly End);

public enum IncidentCategory
{
    Software,
    Hardware,
    Network,
    Database
}

public enum IncidentPriority
{
    Critical,
    High,
    Medium,
    Low
}

And once you have those types, you’d like to have that customer data saved to a database in a way that makes it easy to persist, query, and load that data with minimal developmental cost while still being as robust as need be. Assuming that you have access to a running instance of PostgreSQL (it’s very Docker friendly and I tend to use that as a development default), bring in Marten by first adding a reference to the “Marten” Nuget. Next, write the following code in a simple console application that also contains the C# code from above:

using Marten;
using Newtonsoft.Json;

// Bootstrap Marten itself with default behaviors
await using var store = DocumentStore
    .For("Host=localhost;Port=5432;Database=marten_testing;Username=postgres;password=postgres");

// Build a Customer object to save
var customer = new Customer
{
    Duration = new ContractDuration(new DateOnly(2023, 12, 1), new DateOnly(2024, 12, 1)),
    Region = "West Coast",
    Priorities = new Dictionary<IncidentCategory, IncidentPriority>
    {
        { IncidentCategory.Database, IncidentPriority.High }
    }
};

// IDocumentSession is Marten's unit of work 
await using var session = store.LightweightSession();
session.Store(customer);
await session.SaveChangesAsync();

// Marten assigned an identity for us on Store(), so 
// we'll use that to load another copy of what was 
// just saved
var customer2 = await session.LoadAsync<Customer>(customer.Id);

// Just making a pretty JSON printout
Console.WriteLine(JsonConvert.SerializeObject(customer2, Formatting.Indented));

And that’s that, we’ve got a working usage of Marten to save, then load Customer data to the underlying PostgreSQL database. Right off the bat I’d like to point out a couple things about the code samples above:

  • We didn’t have to do any kind of mapping from our Customer type to a database structure. Marten is using JSON serialization to persist the data to the database, and as long as the Customer type can be bi-directionally serialized to and from JSON, Marten is going to be able to persist and load the type.
  • We didn’t specify or do anything about the actual database structure. In its default “just get things done” settings, Marten is able to happily detect that the necessary database objects for Customer are missing in the database, and build those out for us on demand

So that’s the easiest possible quick start, but what about integrating Marten into a real .NET application? Assuming you have a reference to the Marten nuget package, it’s just an IServiceCollection.AddMarten() call as shown below from a sample web application:

builder.Services.AddMarten(opts =>
    {
        // You always have to tell Marten what the connection string to the underlying
        // PostgreSQL database is, but this is the only mandatory piece of 
        // configuration
        var connectionString = builder.Configuration.GetConnectionString("postgres");
        opts.Connection(connectionString);
    })
    // This is a mild performance optimization
    .UseLightweightSessions();

At this point in the .NET ecosystem, it’s more or less idiomatic to use an Add[Tool]() method to integrate tools with your application’s IHost, and Marten tries to play within the typical .NET rules here.

I think this idiom and the generic host builder tooling has been a huge boon to OSS tool development in the .NET space compared to the old wild, wild west days. I do wish it would stop changing from .NET version to version though.

So that’s all a bunch of simple stuff, so let’s dive into something that shows off how Marten — really PostgreSQL — has a much stronger transactional model than many document databases that only support eventual consistency:

public static async Task manipulate_customer_data(IDocumentSession session)
{
    var customer = new Customer
    {
        Name = "Acme",
        Region = "North America",
        Class = "first"
    };
    
    // Marten has "upsert", insert, and update semantics
    session.Insert(customer);
    
    // Partial updates to a range of Customer documents
    // by a LINQ filter
    session.Patch<Customer>(x => x.Region == "EMEA")
        .Set(x => x.Class, "First");

    // Both the above operations happen in one 
    // ACID transaction
    await session.SaveChangesAsync();

    // Because Marten is ACID compliant, this query would
    // immediately work as expected even though we made that 
    // broad patch up above and inserted a new document.
    var customers = await session.Query<Customer>()
        .Where(x => x.Class == "First")
        .Take(100)
        .ToListAsync();
}

That’s a completely contrived example, but the point is, because Marten is completely ACID-compliant, you can make a range of operations within transactional boundaries and not have to worry about eventual consistency issues in immediate queries that other document databases suffer from.

So what else does Marten do? Here’s a bit of a rundown because Marten has a significantly richer built in feature set than many other low level document databases:

And quite a bit more than that, including some test automation support I really need to better document:/

And on top of everything else, because Marten is really just a fancy library on top of PostgreSQL — the most widely used database engine in the world — Marten instantly comes with a wide array of solid cloud hosting options as well as being deployable to local infrastructure on premise. PostgreSQL is also very Docker-friendly, making it a great technical choice for local development.

What’s a Document Database?

If you’re not familiar with the term “document database,” it refers to a type of NoSQL database where data is almost inevitably stored as JSON data, where the database allows you to quickly marshal objects in code to the database, then query that data later right back into the same object structures. The huge benefit of document databases at development time is being able to code much more productively because you just don’t have nearly as much friction as you do when dealing with any kind of object-relational mapping with either an ORM tool or by writing SQL and object mapping code by hand.

Low Ceremony Sagas with Wolverine

Wolverine puts a very high emphasis on reducing code ceremony and tries really hard to keep itself out of your application code. Wolverine is also built with testability in mind. If you’d be interested in learning more about how Wolverine could simplify your existing application code or set you up with a solid foundation for sustainable productive development for new systems, JasperFx Software is happy to work with you!

Before I get into the nuts and bolts of Wolverine sagas, let me come right out and say that I think that compared to other .NET frameworks, the Wolverine implementation of sagas requires much less code ceremony and therefore easier code to reason about. Wolverine also requires less configuration and explicit code to integrate your custom saga with Wolverine’s saga persistence. Lastly, Wolverine makes the development experience better by building in so much support for automatically configuring development environment resources like database schema objects or message broker objects. I do not believe that any other .NET tooling comes close to the developer experience that the Wolverine and its “Critter Stack” buddy Marten can provide.

Let’s say that you have some kind of multi-step process in your application that might have some mix of:

  • Callouts to 3rd party services
  • Some logical steps that can be parallelized
  • Possibly some conditional workflow based on the results of some of the steps
  • A need to enforce “timeout” conditions if the workflow is taking too long — think maybe of some kind of service level agreement for your workflow

This kind of workflow might be a great opportunity to use Wolverine’s version of Sagas. Conceptually speaking, a “saga” in Wolverine is just a special message handler that needs to inherit from Wolverine’s Saga class and modify itself to track state between messages that impact the saga.

Below is a simple version from the documentation called Order:

public record StartOrder(string OrderId);

public record CompleteOrder(string Id);

public class Order : Saga
{
    // You do need this for the identity
    public string? Id { get; set; }

    // This method would be called when a StartOrder message arrives
    // to start a new Order
    public static (Order, OrderTimeout) Start(StartOrder order, ILogger<Order> logger)
    {
        logger.LogInformation("Got a new order with id {Id}", order.OrderId);

        // creating a timeout message for the saga
        return (new Order{Id = order.OrderId}, new OrderTimeout(order.OrderId));
    }

    // Apply the CompleteOrder to the saga
    public void Handle(CompleteOrder complete, ILogger logger)
    {
        logger.LogInformation("Completing order {Id}", complete.Id);

        // That's it, we're done. Delete the saga state after the message is done.
        MarkCompleted();
    }

    // Delete this order if it has not already been deleted to enforce a "timeout"
    // condition
    public void Handle(OrderTimeout timeout, ILogger<Order> logger)
    {
        logger.LogInformation("Applying timeout to order {Id}", timeout.Id);

        // That's it, we're done. Delete the saga state after the message is done.
        MarkCompleted();
    }

    public static void NotFound(CompleteOrder complete, ILogger logger)
    {
        logger.LogInformation("Tried to complete order {Id}, but it cannot be found", complete.Id);
    }
}

Order is really meant to just be a state machine where it modifies its own state in response to incoming messages and returns cascading messages (you could also use IMessageBus directly as a method argument if you prefer, but my advice is to use simple pure functions) that tell Wolverine what to do next in the multi-step process.

A new Order saga can be created by any old message handler by simply returning a type that inherits from the Saga type in Wolverine. Wolverine is going to automatically discover any public types inheriting from Saga and utilize any public instance methods following certain naming conventions (or static Create() methods) as message handlers that are assumed to modify the state of the saga objects. Wolverine itself is handling everything to do with loading and persisting the Order saga object between commands around the call to the message handler methods on the saga types.

If you’ll notice the Handle(CompleteOrder) method above, the Order is calling MarkCompleted() on itself. That will tell Wolverine that the saga is now complete, and direct Wolverine to delete the current Order saga from the underlying persistence.

As for tracking the saga id between message calls, there are naming conventions about the messages that Wolverine can use to pluck the identity of the saga, but if you’re strictly exchanging messages between a Wolverine saga and other Wolverine message handlers, Wolverine will automatically track metadata about the active saga back and forth.

I’d also ask you to notice the OrderTimeout message that the Order saga returns as it starts. That message type is shown below:

// This message will always be scheduled to be delivered after
// a one minute delay because I guess we want our customers to be
// rushed? Goofy example code:)
public record OrderTimeout(string Id) : TimeoutMessage(1.Minutes());

Wolverine’s cascading message support allows you to return an outgoing message with a time delay — or a particular scheduled time or any other number of options — by just returning a message object. Admittedly this ties you into a little more of Wolverine, but the key takeaway I want you to notice here is that every handler method is a “pure function” with no service dependencies. Every bit of the state change and workflow logic can be tested with simple unit tests that merely work on the before and after state of the Order objects as well as the cascaded messages returned by the message handler functions. No mock objects, no fakes, no custom test harnesses, just simple unit tests. No other saga implementation in the .NET ecosystem can do that for you anywhere nearly as cleanly.

So far I’ve only focused on the logical state machine part of sagas, so let’s jump to persistence. Wolverine has long had a simplistic saga storage mechanism with its integration with Marten, and that’s still one of the easiest and most powerful options. You can also use EF Core for saga persistence, but ick, that means having to use EF Core.

Wolverine 3.0 added a new lightweight saga persistence option for either Sql Server or PostgreSQL (without Marten or EF Core) that just stands up a little table for just a single Saga type and uses JSON serialization to persist the saga. Here’s an example:

using var host = await Host.CreateDefaultBuilder()
    .UseWolverine(opts =>
    {
        // This isn't actually mandatory, but you'll
        // need to do it just to make Wolverine set up
        // the table storage as part of the resource setup
        // otherwise, Wolverine is quite capable of standing
        // up the tables as necessary at runtime if they
        // are missing in its default configuration
        opts.AddSagaType<RedSaga>("red");
        opts.AddSagaType(typeof(BlueSaga),"blue");
       
        
       // This part is absolutely necessary just to have the 
       // normal transactional inbox/outbox support and the new
       // default, lightweight saga persistence
opts.PersistMessagesWithSqlServer(Servers.SqlServerConnectionString, "color_sagas");
        opts.Services.AddResourceSetupOnStartup();
    }).StartAsync();

Just as with the integration with Marten, Wolverine’s lightweight saga implementation is able to build the necessary database table storage on the fly at runtime if it’s missing. The “critter stack” philosophy is to optimize the all important “time to first pull request” metric — meaning that you can get a Wolverine application up fast on your local development box because it’s able to take care of quite a bit of environment setup for you.

Lastly, Wolverine 3.0 is adding optimistic concurrency checks for the Marten saga storage and the new lightweight saga persistence. That’s been an important missing piece of the Wolverine saga story.

Just for some comparison, check out some other saga implementations in .NET:

Wolverine is taking a leap forward with the first 3.0 Alpha Release

Why should you care about this Wolverine tool anyway? I’d say that if you’re building just about any kind of server side .NET application, that Wolverine will do more than any other existing server side .NET framework in the mediator, background processing, HTTP, or asynchronous messaging space to simplify your code, maximize the testability of your system code, and do so while still helping you write robust, well performing systems.

The first Wolverine 3.0-alpha-1 release just landed on Nuget for anybody who is either waiting on 3.0 features or willing to try out the new bits. Just to rewind, here was the theoretic plans for Wolverine 3.0, with the Aspire integration having fallen off a bit.

See the Wolverine migration guide for more details.

There are some additive changes to address some previous limitations of the Wolverine tooling I’ll get to below, but the two big ticket items in 3.0 are:

  • Wolverine is now completely decoupled from Lamar and happily able to run with the built in ServiceProvider now. Before, Wolverine was quietly replacing your IoC container with Lamar because it heavily relied on Lamar internal behavior for its runtime code generation. 3.0 ended that particular limitation. Not everyone cared, but the folks who did care, were particularly loud about their unhappiness about that and Lamar is probably heading into the subset anyway in the future. I felt like this was a very important limitation of Wolverine to address in this release. It’s also a precursor to further usage of .NET Aspire and enabling Wolverine to play nicely with just about any common recipe for bootstrapping .NET applications (Blazor, WPF, Orleans, you name it).
  • The leader election subsystem in Wolverine was pretty close to 100% rewritten to a much simpler, and so far as the internal testing shows, far more reliable and performant mechanism. This subsystem has been way too problematic in real usage, and I’m beyond relieved that there’s some serious improvements coming for this

As for smaller things so far, some other highlights are:

  • Being able to utilize multiple handlers for the same message type in the same application, and even determine which handlers execute for particular external listeners
  • The stateful saga support in Wolverine got some necessary optimistic concurrency protection at the behest of a JasperFx Software client
  • New “lightweight” saga options to utilize either PostgreSQL or Sql Server as JSON storage mechanisms so you don’t have to suffer the pain of EF Core mapping just to persist sagas if you aren’t using Marten
  • The Rabbit MQ integration is using the new version of the Rabbit MQ client that is finally async all the way through to prevent deadlock issues. There is also some significant improvement to the Rabbit MQ transport for header exchanges and more control over messaging conventions
  • There is a new Nuget of compliance tests for Wolverine to hopefully speed up the construction of new saga persistence providers, messaging transports, or message storage options that I hope will unlock new functionality in the following months

I’m actually hopeful that the final 3.0 release goes out early next week. I’m not sure what the remaining work will be to make it in, but I’m wanting to tackle:

  • Message batching, because that comes up fairly often
  • A round of enhancements to the EF Core integration with Wolverine to try to increase Wolverine utilization for folks who don’t use Marten for some bizarre reason

Update on Wolverine 3.0 and Aspire

I had earlier said that “full” support for .NET Aspire would be a key part of the Wolverine 3.0 plans. After kicking the tires more on .NET Aspire and seeing where user priorities and our own road map is, I’m going to back off that statement quite a bit. Here’s what’s definitely in and actually ready to go for 3.0 as it pertains to Wolverine + Aspire:

  • Wolverine was decoupled from Lamar such that it can run with the built in ServiceProvider. We’ll add an add in adapter to still use Lamar as well so folks don’t have to switch out IoC tools for the new Wolverine (Lamar is much more forgiving and supports a lot of use cases that ServiceProvider does not, so you may not want to switch)
  • That last point was important because the changes to the internals also made it possible for Wolverine to be able to use any flavor of application bootstrapping like Host.CreateApplicationBuilder() whereas before Wolverine was limited to IHostBuilder (there were internal reasons for that). Some of the .NET Aspire client libraries depend on different versions of the application builders, so Wolverine needed to adapt. And folks wanted that anyway, so there we go.

Now, as to what else Wolverine will support, it’s perfectly possible to use Aspire to launch Wolverine systems and Wolverine (and Marten) can happily export their Open Telemetry tracing and metrics to the .NET Aspire dashboard at runtime. You can see an example of that in my earlier post Marten, Metrics, and Open Telemetry Support.

Now on to the trickier parts. One of the things that .NET Aspire does is act as a replacement for docker compose for infrastructural concerns like SQL Server, PostgreSQL, Rabbit MQ, or Kafka and acts as a global configuration element for other infrastructure things like Azure Service Bus or AWS SQS. Somebody might have to correct me, but more or less, Aspire is launching the various applications and poking through environment variables for the configuration data that Aspire itself is defining and controlling (like PostgreSQL connection strings for example). To make that information easier to consume, the Aspire team and community have built a bunch of client adapter libraries like Aspire.RabbitMQ.Client or Aspire.Npgsql that are meant to hook your application to the resources configured by Aspire by adding service registrations to the application’s underlying IoC container.

After some research earlier this week as I get to work toward the Wolverine 3.0 release, I think that:

  • Aspire.Npgsql can already be used as is with Marten at least, and with the Wolverine + Marten integration. A little more work could enable Aspire.Npgsql to be used with PostgreSQL storage within Wolverine for use by itself or with EF Core. There’s no need for us to take a direct dependency on this library though
  • Aspire.RabbitMQ.Client creates a version conflict with the RabbitMQ.Client library for us right now, so that’s out. I’m leery of taking on the potential diamond dependency issue anyway, so we’ll probably never take a dependency on it
  • Aspire.Microsoft.Data.SqlClient registers a scoped dependency for SqlConnection, but doesn’t expose the connection information any other way. This would require quite a few changes to the Wolverine internals that I don’t think would pay off. We won’t use this, again partially because of the fear of diamond dependencies
  • Aspire.Azure.Messaging.ServiceBus just isn’t usable. It precludes several options for authentication to Azure Service Bus, and using it would knock out Wolverine’s ability to set up or tear down resources on the fly — which I think is a competitive advantage of Wolverine over other alternatives, so I’m not enthusiastic about this one either
  • Aspire.Confluent.Kafka doesn’t fit Wolverine at all where we want to have the broker connection information upfront, and where Wolverine is completely responsible for setting up consumers and producers

All told though, I don’t think that any of the Aspire.* client libraries are usable out of the box. I guess I’m not sure if these libraries were even meant to be used or are just example code that folks like me should use to build in more specific support. In all cases, I’m voting to hold off for now on any new, direct Aspire support until someone — hopefully a contributor or JasperFx Software client — directly asks for it.

Critter Stack Roadmap for the Rest of 2024

It’s been a little bit since I’ve written any kind of update on the unofficial “Critter Stack” roadmap, with the last update in February. A ton of new, important strategic features have been added to especially Marten since then, with plenty of expansion of Wolverine to boot. Before jumping into what’s to come, let me indulge in a bit of retrospective about what new features or improvements have been delivered in 2024 so far before getting into the road map in the next section.

2024 just so far!

At this point I feel like we’ve crossed off the mass majority of the features I thought we needed to add to Marten this year to be able to stand Marten up against basically any other event store infrastructure tooling on the whole damn planet. What that also means is that I think that Marten development probably slows down to nothing but bug fixes and community contributions as folks run into things. There are still some features in the backlog that I might personally work on, but that will be in the course of some ongoing and potential JasperFx client work.

That being said, let’s talk about the rest of the year!

The Roadmap for the Back Half of 2024

Obviously, this roadmap is just a snapshot in time and client needs, community requests, and who knows what changes from Microsoft or other related tools could easily change priorities from any of this. All that being said, this is the Critter Stack core team & I’s current vision of the next big steps.

  1. Wolverine 3.0 is an ongoing effort. I’m hopeful it can be out in the next couple weeks
  2. RavenDb integration with Wolverine. This is some client sponsored work that I’m hoping will set Wolverine up for easier integration with other database engines in the near future
  3. “Critter Watch” — an ongoing effort to build out a management and monitoring console application for any combination of Marten, Wolverine, and future critters. This will be a paid product. We’ve already had a huge amount of feedback from Marten & Wolverine users, and I’m personally eager to get this moving in the 3rd quarter
  4. Marten 8.0 and Wolverine 4.0 — the goal here is mostly a rearrangement of dependencies underneath both Marten & Wolverine to eliminate duplication and spin out a lot of the functionality around projections and the async daemon. This will also be a significant effort to spin off some new helper libraries for the “Critter Stack” to enable the next bullet point
  5. “Ermine” — a port of Marten’s event store capabilities and a subset of its document database capabilities to SQL Server. My thought is that this will share a ton of guts with Marten. I’m voting that Ermine will have direct integration with Wolverine from the very beginning as well for subscriptions and middleware similar to the existing Wolverine.Marten integration
  6. If Ermine goes halfway well, I’d love to attempt a CosmosDb and maybe a DynamoDb backed event store in 2025

As usual, that list is a guess and unlikely to ever play out exactly that way. All the same though, there’s my hopes and dreams for the next 6 months or so.

Did I miss something you were hoping for? Does any of that concern you? Let me and the rest of the Critter Stack community know either here or anywhere in our Discord room!

Making Marten Faster Through Table Partitioning

There’s been a definite theme lately about increasing the performance and scalability of Marten, as evident (I hope) in my post last week describing new optimization options in Marten 7.25. Today I was able to push a follow up feature that got missed in that release that allows Marten users to utilize PostgreSQL table partitioning behind the scenes for document storage (7.25 added a specific utilization of table partitioning for the event store). The goal here is in selected scenarios, this would enable PostgreSQL to be mostly working with far smaller tables than it would otherwise, and hence perform better in your system.

Think of these common usages of Marten:

  1. You’re using soft deletes in Marten against a document type, and the mass majority of the time Marten is putting a default filter in for you to only query for “not deleted” documents
  2. You are aggressively using the Marten feature to mark event streams as archived when whatever process they model is complete. In this case, Marten is usually querying against the event table using a value of is_archived = false
  3. You’re using “conjoined” multi-tenancy within a single Marten database, and most of the time your system is naturally querying for data from only one tenant at a time
  4. Maybe you have a table where you’re frequently querying against a certain date property or querying for documents by a range of expected values

In all of those cases, it would be more performant to opt into PostgreSQL table partitioning where PostgreSQL is separating the storage for a single, logical table into separate “partition” tables. Again, in all of those cases above we can enable PostgreSQL + Marten to largely be querying against a much smaller table partition than the entire table would be — and querying against smaller database tables can be hugely more performant than querying against bigger tables.

The Marten community has been kicking around the idea of utilizing table partitioning for years (since 2017 by my sleuthing last week through the backlog), but it always got kicked down the road because of the perceived challenges in supporting automatic database migrations for partitions the same way we do today in Marten for every other database schema object (and in Wolverine too for that matter).

Thanks to an engagement with a JasperFx customer who has some pretty extreme scalability needs, I was able to spend the time last week to break through the change management challenges with table partitioning, and finally add table partitioning support for Marten.

As for what’s possible, let’s say that you want to create table partitioning for a certain very large table in your system for a particular document type. Here’s the new option for 7.26:


var store = DocumentStore.For(opts =>
{
    opts.Connection("some connection string");

    // Set up table partitioning for the User document type
    opts.Schema.For<User>()
        .PartitionOn(x => x.Age, x =>
        {
            x.ByRange()
                .AddRange("young", 0, 20)
                .AddRange("twenties", 21, 29)
                .AddRange("thirties", 31, 39);
        });

    // Or use pg_partman to manage partitioning outside of Marten
    opts.Schema.For<User>()
        .PartitionOn(x => x.Age, x =>
        {
            x.ByExternallyManagedRangePartitions();

            // or instead with list

            x.ByExternallyManagedListPartitions();
        });

    // Or use PostgreSQL HASH partitioning and split the users over multiple tables
    opts.Schema.For<User>()
        .PartitionOn(x => x.UserName, x =>
        {
            x.ByHash("one", "two", "three");
        });

    opts.Schema.For<Issue>()
        .PartitionOn(x => x.Status, x =>
        {
            // There is a default partition for anything that doesn't fall into
            // these specific values
            x.ByList()
                .AddPartition("completed", "Completed")
                .AddPartition("new", "New");
        });

});

To use the “hot/cold” storage on soft-deleted documents, you have this new option:

var store = DocumentStore.For(opts =>
{
    opts.Connection("some connection string");

    // Opt into partitioning for one document type
    opts.Schema.For<User>().SoftDeletedWithPartitioning();

    // Opt into partitioning and and index for one document type
    opts.Schema.For<User>().SoftDeletedWithPartitioningAndIndex();

    // Opt into partitioning for all soft-deleted documents
    opts.Policies.AllDocumentsSoftDeletedWithPartitioning();
});

And to partition “conjoined” tenancy documents by their tenant id, you have this feature:

storeOptions.Policies.AllDocumentsAreMultiTenantedWithPartitioning(x =>
{
    // Selectively by LIST partitioning
    x.ByList()
        // Adding explicit table partitions for specific tenant ids
        .AddPartition("t1", "T1")
        .AddPartition("t2", "T2");

    // OR Use LIST partitioning, but allow the partition tables to be
    // controlled outside of Marten by something like pg_partman
    // https://github.com/pgpartman/pg_partman
    x.ByExternallyManagedListPartitions();

    // OR Just spread out the tenant data by tenant id through
    // HASH partitioning
    // This is using three different partitions with the supplied
    // suffix names
    x.ByHash("one", "two", "three");

    // OR Partition by tenant id based on ranges of tenant id values
    x.ByRange()
        .AddRange("north_america", "na", "nazzzzzzzzzz")
        .AddRange("asia", "a", "azzzzzzzz");

    // OR use RANGE partitioning with the actual partitions managed
    // externally
    x.ByExternallyManagedRangePartitions();
});

Summary

Your mileage will vary of course depending on how big your database is and how you really query the database, but at least in some common cases, the Marten community is pretty excited for the potential of table partitioning to improve Marten performance and scalability.

Marten 7.25 is Better, Faster, Stronger

Just a reminder, JasperFx Software offers support contracts and consulting services to help you get the most out of the “Critter Stack” tools (Marten and Wolverine). If you’re building server side applications on .NET, the Critter Stack is the most feature rich tool set for Event Sourcing and Event Driven Architectures around.

The theme of the last couple months for the Marten community and I has been a lot of focus on improving Marten’s event sourcing feature set to be able to reliably handle very large data loads. With that being said, Marten 7.25 was released today with a huge amount of improvements around its performance, scalability, and reliability under very heavy loads (we’re talking about databases with hundreds of millions of events).

Before I get into the details, there’s a lot of thanks and credit to go around:

  • Core team member JT made several changes to reduce the amount of object allocations that Marten does at runtime in SQL generation — and basically every operation it does involves SQL generation
  • Ben Edwards contributed several ideas, important feedback, and some optimization pull requests toward this release
  • Babu made some improvements to our CI pipeline that made it a lot easier for me to troubleshoot the work I was doing
  • a-shtifanov-laya did some important load testing harness work that helped quite a bit to validate the work in this release
  • Urbancsik Gergely for doing a lot of performance and load testing with Marten that helped tremendously
  • And I’ll be giving some personal thanks to a couple JasperFx clients who enabled me to spend so much time on this effort

And now, the highlights for event store performance, scalability, and reliability improvements — most of which are “opt in” configuration items so as to not disturb existing users:

  • The new “Quick Append” option is completely usable and appears from testing to be about 2X as fast as the V4-V7 “Rich” appending process. More than that, opting into the quick append mechanism appears to eliminate the event “skipping” problem with asynchronous projections or event subscriptions that some people have experienced in very heavy loads. Lastly, I originally meant to play this work because I think it will alleviate issues that some people run into with concurrent operations trying to append events to the same event streams
  • Marten can create a Hot/Cold Storage mechanism around its event store by leveraging PostgreSQL native table partitioning. There’s work on users part to mark event streams as archived for this to matter, but this is potentially a huge win for Marten scalability. A later Marten release will shortly add partitioning support to Marten document tables
  • There’s several optimizations inside of even the classic, “rich” event appending that reduce the number of network round trips happening at runtime — and thats a good thing because network round trips are evil!
  • There’s some further optimization to the FetchForWriting() API that I heavily recommend for command handler usage that is documented here.

Outside of the event store improvements, Marten also got a new “Specification” alternative called “query plans” for reusable query logic for when Marten’s compiled query feature won’t work. The goal with this feature is to help a JasperFx client migrate off of Clean Architecture style repository wrapper abstractions in a way that doesn’t cause code duplication while also setting them up to utilize Marten’s batch query feature for a much more performant code.

Summary

I’m still digging out from a very good family vacation, but man, getting this stuff out feels really good. The Marten community is very vibrant right now, with a lot of community engagement that’s driving the tool’s capabilities into much more serious system territory. The “hot/cold storage” feature that just went in has been in the Marten backlog since 2017, and I’m thrilled to finally see that make it in.

Network Round Trips are Evil

As Houston gets drenched by Hurricane Beryl as I write this, I’m reminded of a formative set of continuing education courses I took when I was living in Houston in the late 90’s and plotting my formal move into software development. Whatever we learned about VB6 in those MSDN classes is long, long since obsolete, but one pithy saying from one of our instructors (who went on to become a Marten user and contributor!) stuck with me all these years later:

Network round trips are evil

John Cavnar-Johnson

His point then, and my point now quite frequently working with JasperFx Software clients, is that round trips between browsers to backend web servers or between application servers and the database need to be treated as expensive operations and some level of request, query, or command batching is often a very valuable optimization in systems design.

Consider my family’s current kitchen predicament as diagrammed above. The very expensive, original refrigerator from our 20 year old house finally gave up the ghost, and we’ve had it completely removed while we wait on a different one to be delivered. Fortunately, we have a second refrigerator in the garage. When cooking now though, it’s suddenly a lot more time consuming to go to the refrigerator for an ingredient since I can’t just turn around and grab something when the kitchen refrigerator was just a step away. Now that we have to walk across the house from the kitchen to the garage to get anything from the other refrigerator, it’s becoming very helpful to try to grab as many things as you can at one time so you’re not constantly running back and forth.

While this issue certainly arises from user interfaces or browser applications making a series of little requests to a backing server, I’m going to focus on database access for the rest of this post. Using a simple example from Marten usage, consider this code where I’m just creating five little documents and persisting them to a database:


    public static async Task storing_many(IDocumentSession session)
    {
        var user1 = new User { FirstName = "Magic", LastName = "Johnson" };
        var user2 = new User { FirstName = "James", LastName = "Worthy" };
        var user3 = new User { FirstName = "Michael", LastName = "Cooper" };
        var user4 = new User { FirstName = "Mychal", LastName = "Thompson" };
        var user5 = new User { FirstName = "Kurt", LastName = "Rambis" };

        session.Store(user1);
        session.Store(user2);
        session.Store(user3);
        session.Store(user4);
        session.Store(user5);

        // Marten will *only* make a single database request here that
        // bundles up "upsert" statements for all five users added above
        await session.SaveChangesAsync();
    }

In the code above, Marten is only issuing a single batched command to the backing database that performs all five “upsert” operations in one network round trip. We were very performance conscious in the very early days of Marten development and did quite a bit of experimentation with different options for JSON serialization or how exactly to write SQL that queried inside of JSONB or even table structure. Consistently and unsurprisingly though, the biggest jump in performance was when we introduced command batching to reduce the number of network round trips between code using Marten and the backing PostgreSQL database. That early performance testing also led us to early investments in Marten’s batch querying support and the Include() query functionality that allows Marten users to fetch related data with fewer network hops to the database.

Just based on my own experience, here are two trends I see about interacting with databases in real world systems:

  1. There’s a huge performance gain to be made by finding ways to batch database queries
  2. It’s very common for systems in the real world to suffer from performance problems that can at least partially be traced to unnecessary chattiness between an application and its backing database(s)

At a guess, I think the underlying reasons for the chattiness problem are something like:

  • Developers who just aren’t aware of the expense of network round trips or aren’t aware of how to utilize any kind of database query batching to reduce the problems
  • Wrapper abstractions around the raw database persistence tooling that hides more powerful APIs that might alleviate the chattiness problem
  • Wrapper abstractions that encourage a pattern of only loading data by keys one row/object/document at a time
  • Wrapper abstractions around the raw database persistence that discourage developers from learning more about the underlying persistence tooling they’re using. Don’t underestimate how common that problem is. And I’ve absolutely been guilty of causing that issue as a younger “architect” in the past who created those abstractions.
  • Complicated architectural layering that can make it quite difficult to easily reason about the cause and effect between system inputs and the database queries that those inputs spawn. Big call stacks of a controller calling a mediator tool that calls one service that calls other services that call different repository abstractions that all make database queries is a common source of chattiness because it’s hard to even see where all the chattiness is coming from by reading the code.

As you might know if you’ve stumbled across any of my writings or conference talks from the last couple years, I’m not a big fan of typical Clean/Onion Architecture approaches. I think these approaches introduce a lot of ceremony code into the mix that I think causes more harm overall than whatever benefits they bring.

Here’s an example that’s somewhat contrived, but also quite typical in terms of the performance issues I do see in real life systems. Let’s say you’ve got a command handler for a ShipOrder command that will need to access data for both a related Invoice and Order entity that could look something like this:

public class ShipOrderHandler
{
    private readonly IInvoiceRepository _invoiceRepository;
    private readonly IOrderRepository _orderRepository;
    private readonly IUnitOfWork _unitOfWork;

    public ShipOrderHandler(
        IInvoiceRepository invoiceRepository,
        IOrderRepository orderRepository,
        IUnitOfWork unitOfWork)
    {
        _invoiceRepository = invoiceRepository;
        _orderRepository = orderRepository;
        _unitOfWork = unitOfWork;
    }

    public async Task Handle(ShipOrder command)
    {
        // Making one round trip to get an Invoice
        var invoice = await _invoiceRepository.LoadAsync(command.InvoiceId);

        // Then a second round trip using the results of the first pass
        // to get follow up data
        var order = await _orderRepository.LoadAsync(invoice.OrderId);

        // do some logic that changes the state of one or both of these entities

        // Commit the transaction that spans the two entities
        await _unitOfWork.SaveChangesAsync();
    }
}

The code is pretty simple in this case, but we’re still making more database round trips than we absolutely have to — and real enterprise systems can get much, much bigger than my little contrived example and incur a lot more overhead because of the chattiness problem that the repository abstractions naturally let in.

Let’s try this functionality again, but this time just depending on the raw persistence tooling (Marten’s IDocumentSession and use a Wolverine-style command handler to boot to further reduce the code noise:

public static class ShipOrderHandler
{
    // We're still keeping some separation of concerns to separate the infrastructure from the business
    // logic, but Wolverine lets us do that just through separate functions instead of having to use
    // all the limiting repository abstractions
    public static async Task<(Order, Invoice)> LoadAsync(IDocumentSession session, ShipOrder command)
    {
        // This is important (I think:)), the admittedly complicated
        // Marten usage below fetches both the invoice and its related order in a 
        // single network round trip to the database and can lead to substantially
        // better system performance
        Order order = null;
        var invoice = await session
            .Query<Invoice>()
            .Include<Order>(i => i.OrderId, o => order = o)
            .Where(x => x.Id == command.InvoiceId)
            .FirstOrDefaultAsync();

        return (order, invoice);
    }
    
    public static void Handle(ShipOrder command, Order order, Invoice invoice)
    {
        // do some logic that changes the state of one or both of these entities
        // I'm assuming that Wolverine is handling the transaction boundaries through
        // middleware here
    }
}

In the second code sample, we’ve been able to go right at the Marten tooling to take advantage of its more advanced functionality to batch up data fetching for better performance that wasn’t easily possible when we were putting repository abstractions between our command handler and the underlying persistence tooling. Moreover, we can even reason about the resulting database operations that are happening as a result of our command that can be somewhat obfuscated by more layers and more code separation as is common in Onion/Clean/Ports and Adapters style approaches.

It’s not just repository abstractions that cause problems, sometimes it’s just happily useful little extension methods that can be the source of chattiness. Here’s a pair of helper extension methods around Marten’s event store functionality that help you start a new event stream in a single line of code or append a single event to an existing event stream in a single line of code:

public static class DocumentSessionExtensions
{
    public static Task Add<T>(this IDocumentSession documentSession, Guid id, object @event, CancellationToken ct)
        where T : class
    {
        documentSession.Events.StartStream<T>(id, @event);
        return documentSession.SaveChangesAsync(token: ct);
    }

    public static Task GetAndUpdate<T>(
        this IDocumentSession documentSession,
        Guid id,
        int version,
        
        // If we're being finicky about performance here, these kinds of inline
        // lambdas are NOT cheap at runtime and I'm recommending against
        // continuation passing style APIs in application hot paths for
        // my clients
        Func<T, object> handle,
        CancellationToken ct
    ) where T : class =>
        documentSession.Events.WriteToAggregate<T>(id, version, stream =>
            stream.AppendOne(handle(stream.Aggregate)), ct);
}

Fine, right? These potentially make your code cleaner and simpler but of course, they’re also potentially harmful. Here’s an example of these two extension methods that were similar to some code I saw in the wild last week:

public static class Handler
{
    public static async Task Handle(Command command, IDocumentSession session, CancellationToken token)
    {
        var id = CombGuidIdGeneration.NewGuid();
        
        // One round trip
        await session.Add<Aggregate>(id, new FirstEvent(), token);

        if (command.SomeCondition)
        {
            // This actually makes a pair of round trips, one to fetch the current state
            // of the Aggregate compiled from the first event appended above, then
            // a second to append the SecondEvent
            await session.GetAndUpdate<Aggregate>(id, 1, _ => new SecondEvent(), token);
        }
    }
}

I got involved with this code in reaction to some load testing that was resulting in disappointing results. When I was pulled in, I saw the extra round trips that snuck in because of the usage of the convenience extension methods they had been using, and suggested a change to something like this (but with Wolverine’s aggregate handler workflow that simplified the code more than this):

public static class Handler
{
    public static async Task Handle(Command command, IDocumentSession session, CancellationToken token)
    {
        var events = determineEvents(command).ToArray();
        
        var id = CombGuidIdGeneration.NewGuid();
        session.Events.StartStream<Aggregate>(id, events);

        await session.SaveChangesAsync(token);
    }

    // This was isolated so you can easily unit test the business
    // logic that "decides" what events to append
    public static IEnumerable<object> determineEvents(Command command)
    {
        yield return new FirstEvent();
        if (command.SomeCondition)
        {
            yield return new SecondEvent();
        }
    }
}

The code above cut down the number of network round trips to the database and greatly improved the results of the load testing.

Summary

If system performance is a concern in your system (it’s not always), you probably need to be cognizant of how chatty your application is in regards to its communication and interaction with the backing database. Or any other remote system or infrastructure that your system interacts with at runtime.

Personally, I think that higher ceremony code structures make it much more likely to incur issues with database chattiness especially by first obfuscating your code so you don’t even easily recognize where there’s chattiness, then second by wrapping simplifying abstractions around your database persistence tooling that eliminate the usage of more advanced functionality for query batching.

And of course, both Wolverine and Marten put a heavy emphasis on reducing code ceremony and generally on code noise in general because I personally think that’s very valuable to help teams succeed over time with software systems in the wild. My theory of the case is that even at the cost of a little bit of “magic”, simply reducing the amount of code you have to wade through in existing systems will make those systems easier to maintain and troubleshoot over time.

And on that note, I’m basically on vacation for the next week, and you can address your complaints about my harsh criticism of Clean/Onion Architectures to the ether:-)

Off Topic: 90’s Country Music Rewind

I’m a huge fan of alt country or Americana music, definitely appreciate some classic country like Johnny Cash, Lefty Frizzell or Buck Owens, and I live in Austin so it’s mandatory to be into Willie Nelson, Robert Earl Keen, Hays Carll, and the late, great Billy Joe Shaver (maybe my favorite musician of all time). Mainstream country though? That’s a totally different animal. I like *some* newer country artists like Maren Morris or Kacey Musgraves (i.e. not really typical country at all), but I was only really into mainstream country during its 90’s boom when I was in or right out of college.

Just for fun and out of curiosity, I’ve been on a couple year mission to reevaluate what out of the country music of that that time is still worth listening to and what was “Mighty Morphin Power Rangers” level cheesy. I’m going hard into Comic Book Guy mode here as I make up some categories and lists:

Albums in rotation for me

I have plenty of playlists with 90’s country, but what albums are still worthwhile to me for a full play? I only came up with a few:

  • This Time by Dwight Yoakum. Such an awesome, timeless album. It’s the “Sharon Stone dumped me and I’m sad” album.
  • What a Crying Shame by the Mavericks. One of my all time favorite bands, and I even proposed to my wife right before seeing a Raul Malo concert at Gruene Hall. Trampoline from ’98 is my overall favorite Mavericks album but I’m saying they’d moved long past country by that point anyway.
  • Killin’ Time by Clint Black. Never got into anything else he ever did though
  • Brand New Man by Brooks & Dunn. Mostly about good memories, and I love Neon Moon
  • A Lot About Livin’ (And a Little ‘Bout Love) by Alan Jackson. It’s just a fun album.
  • Weirdly enough, I still like Thinkin’ Problem by David Ball. Still holds up for me even though it wasn’t any kind of big hit
  • Born to Fly by Sara Evans. That might have been the very last mainstream country CD I ever purchased. Think it was in the 2000’s, but still including it here!

What surprisingly holds up for me

I remember liking him at the time, but I will happily pull out Joe Diffie’s (RIP, and an early COVID casualty:( ) greatest hits and play that. The humor still works, and I think he had a genuinely great voice. I’ll see your John Deere Green, but give me Prop Me Up By the Jukebox When I Die.

Mark Chesnutt. I liked him at the time, remember him always being in the background on the radio, but I don’t think I appreciated how good he was until I tried out his Greatest Hits collection for this post. Same kind of humor (Bubba Shot the Jukebox) as Joe Diffie, but his ballads hold up for me too. Awesome voice too.

Sammy Kershaw. I’ll still pull out his greatest hits once in awhile. I loved him at the time of course

Um, what about the women?

This post has been a bit of a sausage fest so far, and that’s not fair. Let’s talk about the women too! I think just like today, that the female artists hold up better over all somehow.

  • Her biggest stuff came later, but I was Sara Evans fan from the start and I’ll still play her music sometimes
  • Joe Dee Messina was awfully good in the late 90’s, and Head’s Caroline, Tails California still merits cranking up the volume if it ever comes on the radio
  • Faith Hill was cheesy to me when she first got started, but I like her later, admittedly poppy stuff.
  • Shania Twain. I absolutely remember why 20-something guys like me were into her, a couple songs were fun, but man, that stuff is cheesy
  • I still like Suzy Bogguss
  • I don’t mind Trisha Yearwood or Mary Chapin Carpenter, but I think they were products of their time and they sound really dated to me now
  • The Chicks were and are the real thing. I liked their first couple albums, but Home from ’02 is their best in my opinion. Doesn’t hurt that there was a strong Austin influence on that album from the song writers
  • Pam Tillis had a couple fun songs
  • Patty Loveless had some fun stuff
  • SheDaisy was and is a guilty pleasure

Songs I still Love

Guilty pleasure or not, I still like these songs and will play them on purpose when my wife and kids aren’t around:

If I’m in absolutely the right mood once in a great while…

  • Little Texas
  • Travis Tritt even though he’s a nutjob Trumper in real life now
  • Tracy Lawrence once in awhile

Cheeseball City

Music I might have liked at the time, but are awfully cloying now

What about Garth Brooks or George Strait?

Just like Alan Jackson, the two biggest country guys of the 90’s could easily swerve from “hey, I really like that” to “zomg, that’s so cheesy I can’t believe anybody would ever listen to that on purpose.”

For Garth Brooks, give me Friends in Low Places & Shameless, and keep The Dance or Unanswered Prayers on the sideline. For George Strait, I think I’d throw out most of his 90’s music, but keep his earlier stuff like Amarillo by Morning or Baby Blue as a guilty pleasure.

Not counting Americana, but if I was…

For the sake of this post, I’m only going to consider mainstream artists and not bands that I think of as primarily being Americana or Alt Country. That being said, I think the Alt Country music of that era absolutely holds up:

  • Every single album that Shaver put out in his 90’s renaissance was fantastic, but I’m calling out Unshaven: Live at Smith’s Olde Bar as one of my all time favorite albums of any era or genre. Check it out, but make sure you play it loud. I cannot overstate how good those guys were live in the 90’s before Eddie Shaver passed away.
  • Joe Ely put out some great albums in the 90’s too, with Letter to Laredo being my favorite of his
  • Robert Earl Keen was prolific, and I’ll toss up No. 2 Live Dinner as my favorite of that time, but I would accept A Bigger Piece of Sky, Gringo Honeymoon, or Walking Distance as very strong contenders. With Feelin’ Good Again being one of my favorite songs of all time
  • Charlie Robison put out Life of the Party in the late 90’s
  • Bruce Robison’s (Charlie’s younger brother) first couple albums, and I’ll go with Wrapped in this list
  • Kelly Willis (Bruce’s wife) had several good albums, and I’ll pic What I Deserve here
  • Guy Clark released Dublin Blues in ’95

Artists that hold up for me

  • Brooks and Dunn
  • Dwight Yoakum
  • The Mavericks are one of my all time favorite bands, and they’ve long since transcended country, but they started as a very good traditional country band
  • Alan Jackson is across the whole spectrum between genuinely great stuff (Chasing that Neon Rainbow) and oh my gosh, that’s absurdly cheesy and I’m embarrassed for people that like that (Small Town Country Man)
  • Sawyer Brown, but some of their stuff is too maudlin
  • I have a soft spot for Sara Evans partially since she’s from Missouri, but also,
  • Faith Hill’s later music when she admittedly got a little poppier