Important Patterns Lurking in Your Persistence Tooling

I was foolish enough to glance at my speaker feedback from my talk at KCDC this summer where I gave an updated version of my Concurrency and Parallelism talk. Out of a combination of time constraints and the desire to shamelessly promote the Critter Stack tools, I mostly used sample problems and solutions from my own work with Marten and Wolverine. One red card evaluation complained that the talk was useless to him (and I have no doubt it was a “him”) because it didn’t focus on mainstream .NET tools. I’m going to do the same thing here, mostly out of time constraints, but I would hope you would take away some understanding of the conceptual patterns I’m discussing here rather than being hung up on the exact choice of tooling.

Continuing my new series about the usefulness of old design patterns that I started with The Lowly Strategy Pattern is Still Useful. Today I want to talk about a handful of design patterns commonly implemented inside of mainstream persistence tooling that you probably already use today. In all cases, the original terminology I’m using here comes from Martin Fowler’s seminal Patterns of Enterprise Application Architecture book from the early 00’s. I’ll be using Marten (of course) for all the examples, but these patterns all exist within Entity Framework Core as well. Again, anytime I write about design pattern usage, I urge you to pay more attention to the concepts, roles, and responsibilities within code without getting too hung up on implementation details.

I seriously doubt that most of you will ever purposely sit down and write your own implementations of these patterns, but it’s always helpful to understand how the tools and technical layers directly underneath your code actually work.

The first two patterns are important for performance and sometimes even just scoping within complex system operations. In a subsequent post I think I’d like to tackle patterns for data consistency and managing concurrency.

Quick Note on Marten Mechanics

If you’re not already familiar with it, Marten is a library in .NET that turns PostgreSQL into a full featured document database and event store. When you integrate Marten into a typical .NET system, you will probably use this idiomatic approach to add Marten:

// This is the absolute, simplest way to integrate Marten into your
// .NET application with Marten's default configuration
builder.Services.AddMarten(options =>
{
    // Establish the connection string to your Marten database
   options.Connection(builder.Configuration.GetConnectionString("Marten")!);
});

Using the AddMarten() method adds a service called IDocumentSession from Marten into your application’s IoC container with a Scoped lifetime, meaning that a single IDocumentSession should be created, shared, and used within a single HTTP request or within the processing of a single message within a messaging framework. That lifetime is important to understand the Marten (and similar EF Core DbContext usage) usage of the Identity Map and Unit of Work patterns explained below.

If you are more familiar with EF Core, just translate Marten’s IDocumentSession to EF Core’s DbContext in this post if that helps.

Identity Map

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

Martin Fowler

In enterprise software systems I’ve frequently hit code that tries to implement a single, logical transaction across multiple internal functions or services in a large call stack. This situation usually arises over time out of sheer complexity of the business rules and the build up of “special” condition handling over time for whatever cockamamie logic that additional customers require. Unfortunately, this arrangement can frequently lead to duplicated database queries for the same reference data that is needed by completely different areas of code within the single, logical transaction — which can easily lead to very poor performance in your system.

In my experience, chattiness (making many network round trips) to the database has been maybe the single most common source of poor system performance. Followed closely by chattiness between user interface clients and the backend services. The identity map mechanics shown here can be an easy way to mitigate at least the first problem.

This is where the “Unit of Work” pattern can help. Let’s say that in your code you frequently need to load information about the User entities within your system. Here’s a little demonstration of what the identity map actually does for you in terms of scoped caching:

using var store = DocumentStore.For("some connection string");

// Chiefs great Tamba Hali!
var user = new User { FirstName = "Tamba", LastName = "Hali" };

// Marten assigns the identity for the User as it 
// persists the new document
await store.BulkInsertAsync(new[] { user });

// Open a Marten session with the identity map
// functionality
await using var session = store.IdentitySession();

// First request for this document, so this call would
// hit the database
var user2 = await session.LoadAsync<User>(user.Id);

// This time it would be loaded from the identity map
// in memory
var user3 = await session.LoadAsync<User>(user.Id);

// Just to prove that
user2.ShouldBeSameAs(user3);

// And also...
var user4 = await session.Query<User>()
    .FirstAsync(x => x.FirstName == "Tamba");

// Yup, Marten has to actually query the database, but still
// finds the referenced document from the identity map when
// it resolves the results from the raw PostgreSQL data
user4.ShouldBeSameAs(user2);

With the “Identity Map” functionality, the Marten session is happily able to avoid making repeated requests to the database for the same information across multiple attempts to access that same data.

In bigger call stacks where there’s a real need to potentially access the same data at different times, the Identity Map is a great advantage. However, in smaller usages the Identity Map is nothing but extra overhead as your persistence tooling has to track the data it loads in some kind of in memory key/value storage. Especially in cases where you’re needing to load quite a bit of data at one time, the identity map can be a significant drag both in terms of memory usage and in performance.

Not to worry though, at least for Marten we can purposely create “lightweight” sessions that leave out the identity map tracking altogether like this:

using var store = DocumentStore.For("some connection string");

// Create a lightweight session without any
// identity map overhead
using var session = store.LightweightSession();

or globally within our application like so:

builder.Services.AddMarten(options =>
{
    // Establish the connection string to your Marten database
    options.Connection(builder.Configuration.GetConnectionString("Marten")!);
})
    // Tell Marten to use lightweight sessions
    // for the default IoC registration of
    // IDocumentSession
    .UseLightweightSessions();

You’re unlikely to ever purposely build your own implementation of the Identity Map pattern, but it’s in many common persistence tools and it’s still quite valuable to understand that behavior and also when you would rather bypass that usage to be more efficient.

Unit of Work

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

Martin Fowler

Data consistency is a pretty big deal in most enterprise systems, and that makes us developers have to care about our transactional boundaries to ensure that related changes succeed or fail in one operation. This is where the Unit of Work pattern implemented by tools like Marten comes into play.

For Marten, the IDocumentSession is the unit of work as shown below:

public static async Task DoWork(IDocumentSession session)
{
    DoWorkPart1(session);
    DoWorkPart2(session);
    DoWorkPart3(session);

    // Make all the queued up persistence
    // operations in one database transactions
    await session.SaveChangesAsync();
}

public static void DoWorkPart1(IDocumentSession session)
{
    session.Store(new User{FirstName = "Travis", LastName = "Kelce"});
    session.DeleteWhere<User>(x => x.Department == "Wide Receiver");
}

public static void DoWorkPart2(IDocumentSession session)
{
    session.Store(new User{FirstName = "Patrick", LastName = "Mahomes"});
    session.Events.StartStream<Game>(new KickedOff());
}

public static void DoWorkPart3(IDocumentSession session)
{
    session.Store(new User{FirstName = "Chris", LastName = "Jones"});
}

When IDocumentSession.SaveChangesAsync() is called, it executes a database command for all the new documents stored and the deletion operation queued up across the different helper methods all at one time. Marten is letting us worry about business logic and expressing what database changes should be made while Marten handles the actual transaction boundaries for us when it writes to the databse.

A couple more things to note about the code above:

  • If you don’t need to read any data first, Marten doesn’t even open a database connection until you call the SaveChangesAsync() method. That’s important to know because database connections are expensive within your system, and you want them to be as short lived as possible. In a manual implementation without a unit of work tracker of some sort, you often open a connection and start a transaction that you then pass around within your code — which leads to holding onto connections longer and risking potential destabilization of your system through connection exhaustion. And don’t blow that off, because that happens quite frequently when we developers are less than perfect with our database connection hygiene.
  • As I said earlier, Marten registers the IDocumentSession in your IoC container as Scoped, meaning that the same session would be shared by all objects created by the same scoped container within an AspNetCore request or inside message handling frameworks like Wolverine, NServiceBus, or MassTransit. That scoping is important to make the transactional boundaries through the session’s unit of work tracking actually work across different functions within the code.
  • I’m not sure about other tools, but Marten also batches the various database commands into a single request to the database when SaveChangesAsync() is called. We’ve consistently found that to be a very important performance optimization.

Next time…

I’d like to dive into patterns for managing data concurrency.

Leave a comment