Multi-Tenancy with Marten

Multitenancy is a reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated.
Gartner Glossary

In this case, I’m referring to “multi-tenancy” in regards to Marten‘s ability to deploy one logical system where the data for each client, organization, or “tenant” is segregated such that users are only ever reading or writing to their own tenant’s data — even if that data is all stored in the same database.

In my research and experience, I’ve really only seen three main ways that folks handle multi-tenancy at the database layer (and this is going to be admittedly RDBMS-centric here):

Use some kind of “tenant id” column in every single database table, then do something behind the scenes in the application layer to always be filtering on that column based on the current user. Marten has supported what I named the “Conjoined” model* since very early versions.
Separate database schema per tenant within the same database. This model is very unlikely to ever be supported by Marten because Marten compiles database schema names into generated code in many, many places.
Using a completely separate database per tenant with identical structures. This approach gives you the most complete separation of data between tenants, and could easily give your system much more scalability when the database is your throughput bottleneck. While you could — and many folks did — roll your own version of “tenant per database” with Marten, it wasn’t supported out of the box.

But, drum roll please, Marten V5 that dropped just last week adds out of the box support for doing multi-tenancy with Marten using a separate database for each tenant. Let’s just right into the simplest possible usage. Let’s say that we have a small system where all we want:

Tenants “tenant1” and “tenant2” to be stored in a database named “database1”
Tenant “tenant3” should be stored in a database named “tenant3”
Tenant “tenant4” should be stored in a database named “tenant4”

And that’s that. Just three databases that are known at bootstrapping time. Jumping into the configuration code in a small .Net 6 web api projection gives us this code:

var builder = WebApplication.CreateBuilder(args);

var db1ConnectionString = builder.Configuration
    .GetConnectionString("database1");

var tenant3ConnectionString = builder.Configuration
    .GetConnectionString("tenant3");

var tenant4ConnectionString = builder.Configuration
    .GetConnectionString("tenant4");

builder.Services.AddMarten(opts =>
{
    opts.MultiTenantedDatabases(x =>
    {
        // Map multiple tenant ids to a single named database
        x.AddMultipleTenantDatabase(db1ConnectionString,"database1")
            .ForTenants("tenant1", "tenant2");

        // Map a single tenant id to a database, which uses the tenant id as well for the database identifier
        x.AddSingleTenantDatabase(tenant3ConnectionString, "tenant3");
        x.AddSingleTenantDatabase(tenant4ConnectionString,"tenant4");
    });

    // Register all the known document types just
    // to enable database schema management
    opts.Schema.For<User>()
        // This is *only* necessary if you want to put more
        // than one tenant in one database. Which we did.
        .MultiTenanted();
});

Now let’s see this in usage a bit. Knowing that the variable theStore in the test below is the IDocumentStore registered in our system with the configuration code above, this test shows off a bit of the multi-tenancy usage:

[Fact]
public async Task can_use_bulk_inserts()
{
    var targets3 = Target.GenerateRandomData(100).ToArray();
    var targets4 = Target.GenerateRandomData(50).ToArray();

    await theStore.Advanced.Clean.DeleteAllDocumentsAsync();

    // This will load new Target documents into the "tenant3" database
    await theStore.BulkInsertDocumentsAsync("tenant3", targets3);

    // This will load new Target documents into the "tenant4" database
    await theStore.BulkInsertDocumentsAsync("tenant4", targets4);

    // Open a query session for "tenant3". This QuerySession will
    // be connected to the "tenant3" database
    using (var query3 = theStore.QuerySession("tenant3"))
    {
        var ids = await query3.Query<Target>().Select(x => x.Id).ToListAsync();

        ids.OrderBy(x => x).ShouldHaveTheSameElementsAs(targets3.OrderBy(x => x.Id).Select(x => x.Id).ToList());
    }

    using (var query4 = theStore.QuerySession("tenant4"))
    {
        var ids = await query4.Query<Target>().Select(x => x.Id).ToListAsync();

        ids.OrderBy(x => x).ShouldHaveTheSameElementsAs(targets4.OrderBy(x => x.Id).Select(x => x.Id).ToList());
    }
}

So far, so good. There’s a little extra configuration in this case to express the mapping of tenants to database, but after that, the mechanics are identical to the previous “Conjoined” multi-tenancy model in Marten. However, as the next set of questions will show, there was a lot of thinking and new infrastructure code under the visible surface because Marten can no longer assume that there’s only one database in the system.

https://commons.wikimedia.org/wiki/File:Iceberg.jpg

To dive a little deeper, I’m going to try to anticipate the questions a user might have about this new functionality:

Is there a `DocumentStore` per database, or just one?

DocumentStore is a very expensive object to create because of the dynamic code compilation that happens within it. Fortunately, with this new feature set, there is only one DocumentStore. The one DocumentStore does store the database schema difference detection by database though.

How much can I customize the database configuration?

The out of the box options for “database per tenant” configuration are pretty limited, and we know that they won’t cover every possible need of our users. No worries though, because this is pluggable by writing your own implementation of our ITenancy interface, then setting that on StoreOptions.Tenancy as part of your Marten bootstrapping.

For more examples, here’s the StaticMultiTenancy model that underpins the example usage up above. There’s also the SingleServerMultiTenancy model that will dynamically create a named database on the same database server for each tenant id.

To apply your custom ITenancy model, set that on StoreOptions like so:

var store = DocumentStore.For(opts =>
{
    // NO NEED TO SET CONNECTION STRING WITH THE
    // Tenancy option below
    //opts.Connection("connection string");

    // Apply custom tenancy model
    opts.Tenancy = new MySpecialTenancy();
});

Is it possible to mix “Conjoined” multi-tenancy with multiple databases?

Yes, it is, and the example code above tried to show that. You’ll still have to mark document types as MultiTenanted() to opt into the conjoined multi-tenancy in that case. We supported that model thinking that this would be helpful for cases where the logical tenant of an application may have suborganizations. Whether or not this ends up being useful is yet to be proven.

What about the “Clean” functionality?

Marten has some built in functionality to reset or teardown database state on demand that is frequently used for test automation (think Respawn, but built into Marten itself). With the introduction of database per tenant multi-tenancy, the old IDocumentStore.Advanced.Clean functionality had to become multi-database aware. So when you run this code:

    // theStore is an IDocumentStore
    await theStore.Advanced.Clean.DeleteAllDocumentsAsync();

Marten is deleting all the document data in every known tenant database. To be more targeted, we can also “clean” a single database like so:

            // Find a specific database
            var database = await store.Storage.FindOrCreateDatabase("tenant1");

            // Delete all document data in just this database
            await database.DeleteAllDocumentsAsync();

What about database management?

Marten tries really hard to manage database schema changes for you behind the scenes so that your persistence code “just works.” Arguably the biggest task for per database multi-tenancy was enhancing the database migration code to support multiple databases.

If you’re using the Marten command line support for the system above, this will apply any outstanding database changes to each and every known tenant database:

dotnet run -- marten-apply

But to be more fine-grained, we can choose to apply changes to only the tenant database named “database1” like so:

dotnet run -- marten-apply --database database1

And lastly, you can interactively choose which databases to migrate like so:

dotnet run -- marten-apply -i

In code, you can direct Marten to detect and apply any outstanding database migrations (between how Marten is configured in code and what actually exists in the underlying database) across all tenant database upon application startup like so:

services.AddMarten(opts =>
{
    // Marten configuration...
}).ApplyAllDatabaseChangesOnStartup();

The migration code above runs in an IHostedService upon application startup. To avoid collisions between multiple nodes in your application starting up at the same time, Marten uses a Postgresql advisory lock so that only one node at a time can be trying to apply database migrations. Lesson learned:)

Or in your own code, assuming that you have a reference to an IDocumentStore object named theStore, you can use this syntax:

// Apply changes to all tenant databases
await theStore.Storage.ApplyAllConfiguredChangesToDatabaseAsync();

// Apply changes to only one database
var database = await theStore.Storage.FindOrCreateDatabase("database1");
await database.ApplyAllConfiguredChangesToDatabaseAsync();

Can I execute a transaction across databases in Marten?

Not natively with Marten, but I think you could pull that off with TransactionScope, and multiple Marten IDocumentSession objects for each database.

Does the async daemon work across the databases?

Yes! Using the IHost integration to set up the async daemon like so:

services.AddMarten(opts =>
{
    // Marten configuration...
})
    // Starts up the async daemon across all known
    // databases on one single node
    .AddAsyncDaemon(DaemonMode.HotCold);

Behind the scenes, Marten is just iterating over all the known tenant databases and actually starting up a separate object instance of the async daemon for each database.

We don’t yet have any way of distributing projection work across application nodes, but that is absolutely planned.

Can I rebuild a projection by database? Or by all databases at one time?

Oopsie. In the course of writing this blog post I realized that we don’t yet support “database per tenant” with the command line projections option. You can create a daemon instance programmatically for a single database like so:

// Rebuild the TripProjection on just the database named "database1"
using var daemon = await theStore.BuildProjectionDaemonAsync("database1");
await daemon.RebuildProjection<TripProjection>(CancellationToken.None);

*I chose the name “Conjoined,” but don’t exactly remember why. I’m going to claim that that was taken from the “Conjoiner” sect from Alistair Reynolds “Revelation Space” series.

Marten V5 is out!

The Marten team published Marten V5.0 today! It’s not as massive a leap as the Marten V4 release late last year was (and a much, much easier transition from 4 to 5 than 3 to 4 was:)), but I think this addresses a lot of the user issues folks have had with the V4 and makes Marten a much better tool in production and in development.

Some highlights:

The release notes and the 5.0 GitHub milestone issues.
The closed GitHub milestone just to prove we were busy
Fully supports .Net 6 and the latest version of Npgsql for folks who use Marten in combination with Dapper or *gasp* EF Core
Marten finally supports doing multi-tenancy through a “database per tenant” strategy with Marten fully able to handle schema migrations across all the known databases!
There were a lot of improvements to the database change management and the “pre-built code generation” model has a much easier to use alternative now. See the Development versus Production Usage. Also see the new AddMarten().ApplyAllDatabaseChangesOnStartup() option here.
I went through the Marten internals with a fine toothed comb to try and eliminate async within sync calls using .GetAwaiter().GetResult() to try to prevent deadlock issues that some users had reported with, shall we say, “alternative” Marten usages.
You can now add and resolve additional document stores in one .Net application.
There’s a new option for “custom aggregations” in the event sourcing support for advanced aggregations that fall outside of what was currently possible. This still allows for the performance optimizations we did for Marten V4 aggregates without having to roll your own infrastructure.

As always, thank you to Oskar Dudycz and Babu Annamalai for all their contributions as Marten is fortunately a team effort.

I’ll blog some later this and next week on the big additions. 5.0.1 will inevitably follow soon with who knows what bug fixes. And after that, I’m taking a break on Marten development for a bit:)

Ruminations on 20 Years of being a .Net Developer

.Net turned 20 years old and retrospectives are all over the web right now — so here’s mine!

First off, here’s what’s most important about my 20 years of being a .Net developer:

I’ve made a good living as a .Net developer
I’ve met a lot of great people over the years through the .Net community
I’ve had the opportunity to travel quite a bit around North America and Europe to technical conferences because of my involvement with .Net
Most of the significant technical achievements in my career involved .Net in some way (ironically though, the most successful projects from a business perspective were a VB6/Javascript web application and a Node.js/React.js application much more recently)

Moreover, I think .Net as a technology is in a good place today as a result of the .Net Core generation.

Those things above are far more important than anything else in this write up that might not come off as all that positive.

The Initial Excitement

I didn’t have a computer growing up, so the thought of being a software developer just wasn’t something I could have grasped at that time. I came out of college in the mid-90’s as a mechanical engineer — and to this day wish I’d paid attention at the time to how much more I enjoyed our handful of classes that involved programming than I did my other engineering coursework.

Fortunately, I came into an industry that was just barely scratching the surface of project automation, and there was opportunities everywhere to create little bits of homegrown coding solutions to automate some of our engineering work or just do a lot better job of tracking materials. Being self-taught and already working heavily with Windows and Microsoft tools, I naturally fell into coding with Office VBA, ASP “classic”, MS Access, and later to Visual Basic 6 (VB6) and Oracle PL/SQL. As the dot-com bubble was blooming, I was able to turn my “shadow IT” work into a real development role in a Microsoft-centric company that at that time was largely building software with the old Microsoft DNA stack.

I first worked with .Net to to an architectural spike with VB.Net in the fall of 2002 as a potential replacement to a poorly performing VB6 shipping system. That spike went absolutely nowhere, but I appreciated how the new .Net languages and platform seemed to remove the training wheels feel of classic VB6. Now we had full fledged inheritance, interfaces, and full fledged OOP without the friction of COM Hell just like the Java folks did! We even had real stack trace messages that gave you contextual information about runtime errors. Don’t laugh since that’s something most of you reading this take for granted today, but that was literally the thing that helped us convince management to let us start working with .Net early.

My second “real” job as a software developer was my introduction to Agile Software Development — and that’s when the initial excitement about .Net all wore off for me.

The early mainstream tools in .Net like WebForms were very poorly suited for Agile software processes like TDD and even CI (.Net was difficult to script on the build server back then). Many if not most .Net development shops had the software as construction metaphor philosophy where teams first designed a relational database structure, then built the middle and front end layers to reflect the database structure. The unfortunate result of that approach was to greatly couple application logic and functionality with the database infrastructure and further make Agile techniques harder to use.

So fine, let’s talk about ALT.Net for just a minute and quickly move on…

In the early to mid-2000s, and as many of us who were trying to adopt Agile development with .Net were becoming increasingly frustrated with the current state of .Net tooling, Ruby on Rails popped up. I know that Rails has probably lost a great deal of its shine in the past dozen years or so, but at the time it was brand new, Rails made ASP.Net development look like a giant turd.

Right at the time that the rest of the development world seemed to be changing and improving fast, the .Net world was insular and largely a monoculture and self-contained echo chamber revolving around Microsoft. The catalyzing event for me was the MVP summit in 2007 when the poor unsuspecting Entity Framework team did an introductory demo of early EF and were completely blindsided by the negative feedback that I and many of the original instigators of ALT.Net gave them at the time.

Long story short, we thought that the early version of Entity Framework was a complete disaster on technical grounds (and I still do to this day), and because of Microsoft’s complete dominance of mindshare in the .Net ecosystem, that we were all going to end up being forced to use it in our shops.

That experience led to initially the EF Vote of No Confidence (read it at your peril, I thought the wording was overwrought at the time and hasn’t aged well). More importantly, that was the beginning of the ALT.Net community that drastically changed my own personal path within .Net.

If you care, you can read my post The Very Last ALT.Net Retrospective I’ll Ever Write from a couple years ago.

I’m happy to say that I think that EF Core of today is perfectly fine if you need a heavy ORM, but the Entity Framework V1 at that time was a different beast altogether and best forgotten in my opinion.

ASP.Net MVC and .Net OSS

Scott Guthrie was at the first big ALT.Net open spaces event in Austin in the fall of ’07 to give the first public demo of what became ASP.Net MVC. More than ALT.Net itself, I think the advent of MVC did a tremendous amount of good to open up the .Net community to ideas from the outside and led to a lot of improvements. I thought then and now that early MVC was mediocre in terms of its technical approach, but hey, it got us off of WebForms and moving toward a better place.

I thought that OSS development really matured after MVC kind of broke the dam for alternative approaches in .Net. Nuget itself made a huge difference for OSS adoption in .Net — but, yes, I will point out that the community itself tried a couple times to create a packaging specification for .Net before Microsoft finally stepped in and sort of crushed one of those attempts by coopting the name.

Even outside of MVC I thought that .Net tools got quite a bit better for Agile development, and I myself was deeply immersed in a couple OSS efforts (that largely failed, but let’s just move on). .Net is probably never going to be a great for OSS development because of Microsoft’s complete dominance and their tendency to purposely or not, squash OSS alternatives within the .Net ecosystem.

Project K and .Net Core

I’m admittedly getting tired of writing this, so I’m gonna quickly say that I think that “Project K” and .Net Core maybe saved .Net from irrelevance. I think that Microsoft has done a tremendous job making .Net a modern platform. In particular, I think these things are a big deal:

Being cross-platform and container friendly
The dotnet CLI makes project scripting of .Net applications so, so much easier than it used to be
The newer SDK project system made Nuget management so much better than it was before. Not to take anything away from Paket, but I think the newer project system plus the new dotnet CLI was a game changer
The performance improvements that have been baked into the newer versions of the .Net framework
The core, opinionated IHost, IConfiguration, IHostedService, etc. abstractions have done a lot of good to make it easier to spin up new systems with .Net. I think that’s also made it much easier for OSS authors and users to extend .Net

Granted, I routinely get irritated watching .Net MVP types gush about the .Net teams adding new features or ideas that were previously invented in other ecosystems or earlier .Net OSS projects that never took hold. So after a mostly positive post, I’ll leave off with a couple criticisms:

.Net would be in a much better place if we weren’t throttled by how fast Microsoft teams can innovate if instead innovations from the greater .Net community had a chance to gain widespread adoption
The MVP program is a net negative for .Net in my opinion. I think the MVP program generates way too much incentive to focus on tools from Microsoft to the exclusion of anything else. One of the drivers for founding ALT.Net to me back in the day was the tendency of .Net developers to jump on anything new coming from Microsoft without first considering if that new thing was actually any good.

.Net 6 WebApplicationBuilder and Lamar

TL;DR — The latest Lamar V8.0.1 release has some bug fixes and mild breaking changes around the .Net Core DI integration that eliminates user reported problems with the new .Net 6 bootstrapping.

Hey, before I jump into the Lamar improvements for .Net 6, read Jimmy Bogard’s latest post for an example reason why you would opt to use Lamar over the built in DI container.

I’ve had a rash of error reports against Lamar when used with the new WebApplicationBuilder bootstrapping model that came with ASP.Net Core 6. Fortunately, the common culprit (in ahem oddball .Net Core mechanics more than Lamar itself) was relatively easy to find, and the most recent Lamar V8 made some minor adjustments to the .Net Core adapter code to fix the issues.

To use Lamar with the new .Net 6 bootstrapping model, you need to install the Lamar.Microsoft.DependencyInjection Nuget and use the UseLamar() extension method on IHostBuilder to opt into using Lamar in place of the built in DI container.

You can find more information about using Lamar with the new WebApplicationBuilder model and Minimal APIs in the Lamar documentation.

As an example, consider this simplistic system from the Lamar testing code:

var builder = WebApplication.CreateBuilder(args);

// use Lamar as DI.
builder.Host.UseLamar((context, registry) =>
{
    // register services using Lamar
    registry.For<ITest>().Use<MyTest>();
    
    // Add your own Lamar ServiceRegistry collections
    // of registrations
    registry.IncludeRegistry<MyRegistry>();

    // discover MVC controllers -- this was problematic
    // inside of the UseLamar() method, but is "fixed" in
    // Lamar V8
    registry.AddControllers();
});

var app = builder.Build();
app.MapControllers();

// Add Minimal API routes
app.MapGet("/", (ITest service) => service.SayHello());

app.Run();

Notice that we’re adding service registrations directly within the nested lambda passed into the UseLamar() method. In the previous versions of Lamar, those service registrations were completely isolated and additive to the service registrations in the Startup.ConfigureServices() — and that was very rarely an issue. In the new .Net 6 model, that became problematic as some of Microsoft’s out of the box service registration extension methods like AddControllers() depend on state being smuggled through the service collection and did not work inside of the UseLamar() method before Lamar v8.

The simple “fix” in Lamar v8 was to ensure that the service registrations inside of UseLamar() were done additively to the existing set of service registrations built up by the core .Net host building like so:

        /// <summary>
        /// Shortcut to replace the built in DI container with Lamar using service registrations
        /// dependent upon the application's environment and configuration.
        /// </summary>
        /// <param name="builder"></param>
        /// <param name="registry"></param>
        /// <returns></returns>
        public static IHostBuilder UseLamar(this IHostBuilder builder, Action<HostBuilderContext, ServiceRegistry> configure = null)
        {
            return builder
                .UseServiceProviderFactory<ServiceRegistry>(new LamarServiceProviderFactory())
                .UseServiceProviderFactory<IServiceCollection>(new LamarServiceProviderFactory())
                .ConfigureServices((context, services) =>
                {
                    var registry = new ServiceRegistry(services);
                
                    configure?.Invoke(context, registry);
                
                    // Hack-y, but this makes everything work as 
                    // expected
                    services.Clear();
                    services.AddRange(registry);

#if NET6_0_OR_GREATER
                    // This enables the usage of implicit services in Minimal APIs
                    services.AddSingleton(s => (IServiceProviderIsService) s.GetRequiredService<IContainer>());
#endif
                    
                });
        }

The downside of this “fix” was that I eliminated all other overloads of the UseLamar() extension method that relied on custom Lamar ServiceRegistry types. You can still use the IncludeRegistry<T>() method to use custom ServiceRegistry types though.

As always, if you have any issues with Lamar with or without ASP.Net Core, the Lamar Gitter room is the best and fastest way to ask questions.

Unit Tests for Expected Exceptions

I generally write code for tools or libraries used by other developers instead of business facing features, so I frequently come up on the need to communicate invalid operations, incorrect configuration assertions, or just provide more contextual information about failures to those other developers. In these cases where the exception logic is important, I will write unit tests against the code that should be throwing an exception in certain cases.

When you test expected exception flow, you need to do these things:

Set up the expected failure case that should result in the exception (Duh.)
Call the code that should be throwing an exception
Assert that an exception was thrown, and it was the expected type of exception. This is an important point because doing this naively can result in false positive test results.
Potentially make additional assertions against the message or other details of the thrown exception

I generally use Shouldly in my project work for test assertions, and it comes with a mechanism for testing expected exceptions. Using an example from Marten’s Linq support, we needed to tell users when they were using an unsupported .Net type in their Linq query with a more useful exception, so we have this test case for that exception workflow:

        [Fact]
        public void get_a_descriptive_exception_message()
        {
            var ex = Should.Throw<BadLinqExpressionException>(() =>
            {
                // This action is executed by Shouldly inside
                // a try/catch block that asserts on the expected
                // exception
                theSession
                    .Query<MyClass>()
                    .Where(x => x.CustomObject == new CustomObject())
                    .ToList();
            });

            ex.Message.ShouldBe("Marten cannot support custom value types in Linq expression. Please query on either simple properties of the value type, or register a custom IFieldSource for this value type.");
        }

That’s using Shouldly, but Fluent Assertions has a very similar mechanism. My strong recommendation is that you use one of these two libraries anytime you are testing expected exception flow because it’s repetitive ceremony to test the expected exception flow with raw try/catch blocks and also easy to forget to even assert that an exception was thrown.

Actually, I’d go farther and recommend you pretty well always use either Shouldly (my preference) or Fluent Assertions (bigger, and more popular in general) in your testing projects. I think these libraries do a lot to make tests easier to read, quicker to write, and frequently easier to troubleshoot failing tests as well.

Lastly, if you want to understand what the Shouldly Should.Throw<TException>(Action) method is really doing, here’s an older extension method I used in projects before Shouldly was around that does effectively the same thing (the usage is `Exception<T>.ShouldBeThrownBy(Action)`):

    public static class Exception<T> where T : Exception
    {
        public static T ShouldBeThrownBy(Action action)
        {
            T exception = null;

            try
            {
                action();
            }
            catch (Exception e)
            {
                exception = e.ShouldBeOfType<T>();
            }

            // This is important, we need to protect against false
            // positive results by asserting that no exception was
            // thrown at the expected time and cause this test to fail
            exception.ShouldNotBeNull("An exception was expected, but not thrown by the given action.");

            return exception;
        }

Batch Querying with Marten

Before I talk about the batch querying feature set in Marten, let’s take a little detour through a common approach to persistence in .Net architectures that commonly causes the exact problem that Marten’s batch querying seeks to solve.

I’ve been in several online debates lately about the wisdom or applicability of granular repository abstractions over inner persistence infrastructure like EF Core or Marten like this sample below:

    public interface IRepository<T>
    {
        Task<T> Load(Guid id, CancellationToken token = default);
        Task Insert(T entity, CancellationToken token = default);
        Task Update(T entity, CancellationToken token = default);
        Task Delete(T entity, CancellationToken token = default);

        IQueryable<T> Query();
    }

That’s a pretty common approach, and I’m sure it’s working out for some people in at least simpler CRUD-centric applications. Unfortunately though, that reliance on fine-grained repositories also breaks down badly in more complicated systems where a single logical operation may need to span multiple entity types. Coincidentally, I have frequently seen this kind of fine grained abstraction directly lead to performance problems in the systems I’ve helped with after their original construction over the past 6-8 years.

For an example, let’s say that we have a message handler that will need to access and modify data from three different entity types in one logical transaction. Using the fine grained repository strategy, we’d have something like this:

    public class SomeMessage
    {
        public Guid UserId { get; set; }
        public Guid OrderId { get; set; }
        public Guid AccountId { get; set; }
    }

    public class Handler
    {
        private readonly IUnitOfWork _unitOfWork;
        private readonly IRepository<Account> _accounts;
        private readonly IRepository<User> _users;
        private readonly IRepository<Order> _orders;

        public Handler(
            IUnitOfWork unitOfWork,
            IRepository<Account> accounts,
            IRepository<User> users,
            IRepository<Order> orders)
        {
            _unitOfWork = unitOfWork;
            _accounts = accounts;
            _users = users;
            _orders = orders;
        }

        public async Task Handle(SomeMessage message)
        {
            // The potential performance problem is right here.
            // Multiple round trips to the database
            var user = await _users.Load(message.UserId);
            var account = await _accounts.Load(message.AccountId);
            var order = await _orders.Load(message.OrderId);

            var otherOrders = await _orders.Query()
                .Where(x => x.Amount > 100)
                .ToListAsync();

            // Carry out rules and whatnot

            await _unitOfWork.Commit();
        }
    }

So here’s the problem with the code up above as I see it:

You’re having to inject separate dependencies for the matching repository type for each entity type, and that adds code ceremony and noise code.
The code is making repeated round trips to the database server every time it needs more data. This is a contrived example, and it’s only 4 trips, but in real systems this could easily be many more. To make this perfectly clear, one of the very most pernicious sources of slow code is chattiness (frequent network round trips) between the application layer and backing database.

Fortunately, Marten has a facility called batch querying that we can use to fetch multiple data queries at one time, and even start processing against the earlier results while the later results are still being read. To use that, we’ve got to ditch the “one size fits all, least common denominator” repository abstraction and use the raw Marten IDocumentSession service as shown in this version below:

    public class MartenHandler
    {
        private readonly IDocumentSession _session;

        public MartenHandler(IDocumentSession session)
        {
            _session = session;
        }

        public async Task Handle(SomeMessage message)
        {
            // Not gonna lie, this is more code than the first alternative
            var batch = _session.CreateBatchQuery();

            var userLookup = batch.Load<User>(message.UserId);
            var accountLookup = batch.Load<Account>(message.AccountId);
            var orderLookup = batch.Load<Order>(message.OrderId);
            var otherOrdersLookup = batch.Query<Order>().Where(x => x.Amount > 100).ToList();

            await batch.Execute();

            // We can immediately start using the data from earlier
            // queries in memory while the later queries are still processing
            // in the background for a little bit of parallelization
            var user = await userLookup;
            var account = await accountLookup;
            var order = await orderLookup;

            var otherOrders = await otherOrdersLookup;

            // Carry out rules and whatnot

            // Commit any outstanding changes with Marten
            await _session.SaveChangesAsync();
        }

The code above creates a single, batched query for the four queries this handler needs, meaning that Marten is making a single database query for the four SELECT statements. As an improvement in the Marten V4 release, the results coming back from Postgresql are processed in a background Task, meaning that in the code above we can start working with the initial Account, User, and Order data while Marten is still building out the last Order results (remember that Marten has to deserialize JSON data to build out your documents and that can be non-trivial for large documents).

I think these are the takeaways for the before and after code here:

Network round trips are expensive and chattiness can be a performance bottleneck, but batch querying approaches like Marten’s can help a great deal.
Putting your persistence tooling behind least common denominator abstractions like the IRepository<T> approach shown above eliminate the ability to use advanced features of your actual persistence tooling. That’s a serious drawback as that disallows the usage of the exact features that allow you to create high performance solutions — and this isn’t specific to using Marten as your backing persistence tooling.
Writing highly performant code can easily mean writing more code as you saw above with the batch querying. The point there being to not automatically opt for the most highly performant approach if it’s unnecessary and more complex than a slower, but simpler approach. Premature optimization and all that.

I’m only showing a small fraction of what the batch query supports, so certainly checkout the documentation for more examples.

Creating the Conditions for Developer Happiness

In my last post, My Thoughts on Code “Modernization” I tried to describe my company and more specifically my team’s thinking about our technical initiatives and end goals as we work to update the technology and architecture of our large systems. In this post, I’d like to continue that discussion, but this time focus on the conditions that hopefully promotes developer (and testers and other team members) “happiness” with their ongoing work.

When I presented to our development organization last week, I included this slide to start that conversation:

To be clear, I’m completely focused in this post on issues or factors where I think I have influence or control over the situation. That being said, I am technically a people manager now, so what I can at least do for the other folks in my team is to:

Be supportive and appreciative of their efforts
Ask them for their feedback or advice on our shared work and intended strategies so they know that they have a voice
Occasionally be a “shit umbrella” for them whenever necessary, or maybe more likely just try to help with disputes or tensions with folks outside of our team. I’m still finding my sea legs on being a manager, so we’ll see how that goes
Not hold them up for too long when they do need my approvals for various HR kind of things or when they ask me to review a pull request (speaking of which, I need to pull the trigger on this post soon and go do just that).

On to other things…

Employability. Really? Yes.

Developer retention is a real issue, especially in a problem space like ours where domain knowledge is vital to working inside the code. Not to oversimplify a complex subject, but it’s my firm belief that developers feel most secure and event content in their current job when they feel that they’re actively developing skills that are in demand in the job market. I strongly believe, and our development management seems to agree, that we will do better with developer retention if we could be using newer technology — and we’re not talking about radical changes in platform here.

On the flip side, I think we have some compelling, anecdotal evidence that developers who feel like they’ve been in a rut on the technical side of things are more likely to get happy feet and consider leaving.

So, long story short, moving to newer tools like, say, React.js (as opposed to existing screens using Knockout.js or jQuery heavy Razor Pages) or the latest versions of .Net partially with the goal of making our developers happier with their jobs is actually a defensible goal in my mind. Within reason of course. And if that means that I get the chance to swap in Marten + Postgresql as a replacement for our limited usage of MongoDb, that’s just a bonus:)

At a minimum, I definitely think we should at least try to rotate developers between the existing monolith and the newer, spiffier, lower friction services so that everybody gets a taste of better work.

I know what some of you are thinking here, “this is just resume-driven development and you should concentrate on delivering value to the business instead of playing with shiny object development toys.” That’s a defensible position, but as you read this let’s pretend that my shop isn’t trying to go guard rail to guard rail by eschewing boring tools in favor of far out, bleeding edge tooling just to make a few folks happy. We’re just trying to move some important functionality from being based on obsolescent tools to more current technology as an intermediate step into our company’s future.

Low Friction Organizational Structure

I’m only speaking for myself in this section. Many of my colleagues would agree with what I’m saying here, but I’ll take the sole blame for all of it.

Given a choice and the ultimate power to design the structure of a software development organization, I would build around the idea of multi-disciplinary, self-contained teams where each team has every single skillset necessary for them to ship what they’re working on completely by themselves. This means that I want front end developers, back end developers, testers, and DevOps (or just folks with DevOps skillsets regardless of their title) folks all in the same team and collaborating together closely. This obviates the need for many formal handoffs between teams, which I think is one of the biggest single sources of friction and inefficiency in software development.

By formal handoffs, I mean Waterfall-ish things like:

Having to fill out Jira tickets for some other team to make changes in your development or testing environments
Creating a design or specification document for another team
Testers being in a separate organization and schedule than the development team so that there’s potentially a lag between coding and testing

I’m of course all in on Agile Software Development and I’m also generally negative toward Waterfall processes of any sort. It’s not surprising then that I think that formal handoffs and intermediate documentation deliverables take time and energy that could be better spent on creating value instead. More importantly though, it makes teams less flexible and more brittle because they’re more dependent upon upfront planning. More than that, you’re often dependent on people who have no skin in the game for your projects.

Being forced to be more plan-oriented and less flexible in terms of scheduling or external resources means that a team is less able to learn and adapt as they work. Being less adaptable and less iterative makes it harder for teams to deliver quality work. Lastly, communication and collaboration is naturally going to be better within a team than it is between teams or even completely separate organizations.

At a bare minimum, I absolutely want developers (including front end, back end, database, and whatever type of developers a team needs) and testers in one single team working on the same schedule toward shared goals. Preferably I’d like to see us transition to a DevOps culture by at least breaking down some of the current walls between development groups, testing teams, and our operations team.

Lastly, to relate this back to the main theme of making an environment that’s better to work in, I think that increasing direct collaboration between various disciplines and minimizing the overhead of formal handoffs makes for more job satisfaction and less frustration.

Let’s say we’re onboarding a new developer or maybe one of our developers is moving to a different product. After they do a clean clone of that codebase onto their local development machine, how fast can they get to a point where they’re able to build the code, run the actual system locally, and execute all the tests in the codebase? That’s what a former colleague of mine like to call the “time to login screen metric.”

To reduce that friction of getting started, my thinking is to:

Lean heavily on using Docker containers to stand up required infrastructure like databases, Redis, monitoring tools, etc. that are necessary to run the system or tests. I think it’s very important for any kind of stateful tools to be isolated per developer on their own local machines. Running docker compose up -d is a whole lot faster than trying to follow installation instructions in a Wiki page.
Try to avoid depending on technologies that cannot be used locally. As an example, we already use Rabbit MQ for message queueing, which conveniently is also very easy to run locally with Docker. As we move our systems to cloud hosting, I’m opposed to switching to Azure Service Bus without some other compelling reason because it does not have any local development story.
It’s vital to have build scripts within the code repository that can effectively stand up any environment necessary to start working with the code. This includes any kind of database migration infrastructure and baseline test data setup. Everybody wants to have a good README file in a new codebase to help them get started, but I also believe that a good automated script that sets things up for you is awfully effective as documentation too.

It’s probably also going to be important to get to a point where the codebases are a little smaller so that there’s just less stuff to set up at any one time.

“Quick Twitch” Codebases

Almost a decade ago I wrote a post entitled When I’m most productive about the type of technical ecosystems in which I felt most productive that I think still holds up, but let me expound on that a little bit here.

Let’s start with how fast a new developer or a current developer switching into a new codebase can be up and working. Using the “time to login screen” metric I learned from a former colleague, a developer should be able to successfully build and run the system and tests locally for a codebase very shortly after a fresh clone of that codebase.

Today our big platforms are fairly described as monoliths, with us underway toward breaking up the monolithic systems to something closer to a microservice architecture. I think we’d like to get the codebases broken up into smaller codebases where a development team can completely understand the codebase that they’re currently working in. Moreover, I’d like it to be much more feasible to update the technical tools, libraries, and runtime dependencies of a single codebase than it is today with our monoliths.

As a first class goal of splitting up today’s monoliths, we want our developers to be able to do what I call “quick twitch” development:

Most development tasks are small enough that developers can quickly and continuously flow from small unit tests to completed code and on to the next task. This is possible in well-factored codebases, but not so much in codebases that require a great deal of programming ceremony or have poor structural factoring.
Feedback cycles on the code are quick. This generally means that compilation is fast, and that test suites are fast enough to be executed constantly without breaking a developer’s mental flow state.
Unit tests can cover small areas of the code while still providing value such that it’s rare that a developer needs to use a debugger to understand and solve problems. Seriously, having to use the debugger quite a bit is a drag on developer productivity and usually a sign that your automated testing strategy needs to incorporate more finer grained tests.

The key here is to enable developers to achieve a “flow state” in their daily work.

This is not what we want in our codebases, except substitute “running through the test suite” in place of “compiling:”

Next time…

In the third and final post in this series, I want to talk through our evolution from monoliths to microservices and/or smaller distributed monoliths with an emphasis on not doing anything stupid by going from guard rail to guard rail.

My Thoughts on Code “Modernization”

Some of this is going to be specific to a .Net ecosystem, but most of what I’m talking about here I think should be applicable to most development shops. This is more or less a companion white paper for a big internal presentation I did at work this week.

My team at work is tasked with a multi-year code and architecture modernization across our large technical platforms. To give just a little bit of context, it’s a familiar story. We have some very large, very old, complex monolithic systems in production using some technologies, frameworks, and libraries that in a perfect world we’d like to update or replace. Being that quite a bit of code was written before Test Driven Development was just a twinkle in Kent Beck’s eye, the automated test coverage on parts of the code isn’t what we’d like it to be.

With all that said, to any of my colleagues that read this, I’d say that we’re in much better shape quality and ecosystem wise than the average shop with old, continuously developed systems.

During a recent meeting right before Christmas, one of my colleagues had the temerity to ask “what’s the end goal of modernization and when can we say we’re done?” — which set off some furious thinking, conversations within the team, and finally a presentation to the rest of our development groups.

My Thoughtful Spot on the Internet | Tina Hargaden's Blog

We came up with these three main goals for our modernization efforts:

Arrive at a point where we can practice Continuous Delivery (CD) within all our major product lines
Improved Developer (and Tester) Happiness
System Performance

Arguably, I’d say that being able to practice Continuous Delivery with a corresponding DevOps culture would help us achieve the other two goals, so I’m almost ready to declare that our main goal. Everything else that’s been on our “modernization agenda” is arguably just an intermediate step on the way to the goal of continuous delivery, or another goal that is at least partially unlocked by the advances we’ll have to make in order to get to continuous delivery.

Intermediate Steps

Speaking of the major intermediate or enabling steps we’ve identified, I took a shot at showing what we think are the major enabling steps for our future CD strategy in a diagram:

Upgrading to .Net vLatest

Upgrading from the full “classic” Windows-only version of .Net to the latest version of .Net and ASP.Net Core is taking up most of our hands on focus right now. There’s probably some performance gains to be had by merely updating to the latest .Net 5/6, but I see the big advantages to the latest .Net versions as being much more container friendly and allowing us flexibility on hosting options (Linux containers) compared to where we are now. I personally think that the recent generations of .Net and ASP.Net Core are far easier to work with in automated testing scenarios, and that should hopefully be a major enable of CD processes for us.

Most importantly of all, I’d like to get back to using a Mac for daily development work, so there’s that.

Improved Automated Testing

We’re fortunately starting from a decent base of test automation, but there’s plenty of opportunities to get better before we can support more frequent releases. (I’ve written quite a bit about automated testing here). Long story short, I think we have some opportunities to:

Get better at writing testable code for easier and more effective unit testing
Introduce a lot more integration testing in the middle zone of the stereotypical “test pyramid”
Cut back on expensive Selenium-based testing wherever possible in favor of some other form of more efficient test automation. See Jeremy’s Only Rule of Testing.

Since all of this is interrelated anyway, “testability” is absolutely one of the factors we’ll use to decide where service boundaries are as we try to slice our large monoliths into smaller, more focused services. If it’s not valuable to test a service by itself without including other services, then that service boundary is probably wrong.

Containerization

This comes up a lot at work, but I’d call this as mostly an enabler step toward deploying to cloud hosting and easier incremental deployment than we have today rather than any kind of end in itself, especially in areas where we need elastic scaling. I think being able to run our services in containers also going to be helpful for the occasional time when you need to test locally against multiple services or processes.

And yeah, we could try to do a lift and shift to move our big full .Net framework apps to virtual machines in the cloud or try out Windows containers, but previous analysis has suggested that that’s not viable for us. Plus nobody wants to do that.

Open Telemetry Tracing and Production Monitoring

This effort is fortunately well underway, but one of our intermediate goals is to apply effective Open Telemetry tracing through all our products, and I say that for these reasons:

It enables us to use a growing off the shelf ecosystem of visualization and metrics tooling
I think it’s an invaluable debugging tool, especially when you have asynchronous messaging or dependencies on external systems — and we’re only going to be increasing our reliance on messaging as we move more and more to micro-services
Open Telemetry is very handy in diagnosing performance or throughput problems by allowing you to “see” the context of what is happening within and across systems during a logical business operation.

To the last point, my key example of this was helping a team last year analyze some performance issues in their web services. An experienced developer will probably look through database logs to identify slow queries that might explain the poor performance as one of their first steps, but in this case that turned up no single query that was slow enough to explain the performance issues. Fortunately, I was able to diagnose the issue as an N+1 query issue by reading through the code, but let’s just say that I got lucky.

If we’d had open telemetry tracing between the web service calls and the database queries that each service invocation made, I think we would have been able to quickly see a relationship between slow web service calls and the sheer number of little database queries that the web service was making during the slow web service requests, which should have led the team to immediately suspect an N+1 problem.

As for production monitoring, we of course already do that but there’s some opportunity to be more responsive at least to performance issues detected by the monitoring rules. We’re working under the assumption that deploying more often and more incrementally means that we’ll also have to be better at detecting production issues. Not that you purposely try to let problems get through testing, but if we’re going to convince the greater company that it’s safe to deploy small changes in an automated fashion, we need to have ways to rapidly detect when new problems in production are introduced.

Again, the general theme is for us to be resilient and adaptive because problems are inevitable — but don’t let the fear of potential problems put us into an analysis paralysis spiral.

Cloud Hosting

I think that’s a major enabler of continuous delivery, with the real goal for us being more flexible in how our development, testing, and production environments are configured as we continue to break up the monolith codebases and change our current architecture. I’d also love for us to be able to flexibly spin up environments for testing on demand, and tear them down when they’re not needed without a lot of formal paperwork in the middle.

There might also be an argument for shifting to the cloud if we could reduce hosting and production support costs along the way, but I think there’s a lot of analysis left to do before we can make that claim to the folks in the high backed chairs.

System Performance

Good runtime performance and meeting our SLA agreements for such is absolutely vital for us as medical analytics company. I wrestled quite a bit with making this a first class goal of our “modernization” initiative and came down on the side of “yes, but…” My thinking here, with some agreement from other folks, is that system performance issues will be much easier to address when we’re backed by a continuous delivery backbone.

There’s something to be said for doing upfront architecture work to consider known performance risks before a single line of code is written, but the truth is that a great deal of the code is already written. Moreover, the performance issues and bottlenecks that pop up in production aren’t always where we would have expected them to be during upfront architecture efforts anyway.

Improving performance in a complicated system is generally going to require a lot of measurement and iteration. Knowing that, having the faster release cycle made safe by effective automated test coverage should help us react quicker to performance problems or take advantage of newer ideas to improve performance as we learn more about how our systems behave or gain some insights into client data sets. Likewise, we’ll have to improve our production monitoring and instrumentation to anyway to enable continuous delivery, and we’re hopeful that that will also help us more quickly identify and diagnose performance issues.

To phrase this a bit more bluntly, I believe that upfront design and architecture can be valuable and sometimes necessary, but consistent success in software development is more likely a result of feedback and adaptation over time than being dependent on getting everything right the first time.

Ending this post abruptly….

I’m tired, it’s late, and I’m going to play the trick of making this a blog series instead of one gigantic post that never gets finished. In following posts, I’d like to discuss my thoughts on:

Creating the circumstances for “Developer Happiness” with some thinking about what kind of organizational structure and technical ecosystem allows developers and testers to be maximally productive and at least have a chance to be happy within their roles
Some thinking around micro-services and micro-frontends as we try to break up the big ol’ monoliths with some focus on intermediate steps to get there

My professional and OSS aspirations for 2022

I trot out one of these posts at the beginning of each year, but this time around it’s “aspirations” instead of “plans” because a whole lot of stuff is gonna be a repeat from 2020 and 2021 and I’m not going to lose any sleep over what doesn’t get done in the New Year or not be open to brand new opportunities.

In 2022 I just want the chance to interact with other developers. I’ll be at ThatConference in Round Rock, TX in ~~January~~ May? speaking about Event Sourcing with Marten (my first in person conference since late 2019). Other than that, my only goal for the year (Covid-willing) is to maybe speak at a couple more in person conferences just to be able to interact with other developers in real space again.

My peak as a technical blogger was the late aughts, and I think I’m mostly good with not sweating any kind of attempt to regain that level of readership. I do plan to write material that I think would be useful for my shop, or just about what I’m doing in the OSS space when I feel like it.

Which brings me to the main part of this post, my involvement with the JasperFx (Marten, Lamar, etc). family of OSS projects (plus Storyteller) which takes up most of my extracurricular software related time. Just for an idea of the interdependencies, here’s the highlights of the JasperFx world:

.NET Transactional Document DB and Event Store on PostgreSQL

Marten took a big leap forward late in 2021 with the long running V4.0 release. I think that release might have been the single biggest, most complicated OSS release that I’ve ever been a part of — FubuMVC 1.0 notwithstanding. There’s also a 5.0-alpha release out that addresses .Net 6 support and the latest version of Npgsql.

Right now Marten is a victim of its own success, and our chat room is almost constantly hair on fire with activity, which directly led to some planned improvements for V5 (hopefully by the end of January?) in this discussion thread:

Multi-tenancy through a separate database per tenant (long planned, long delayed, finally happening now)
Some kind of ability to register and resolve services for more than one Marten database in a single application
And related to the previous two bullet points, improved database versioning and schema migrations that could accommodate there being more than one database within a single .Net codebase
Improve the “generate ahead” model to make it easier to adopt. Think faster cold start times for systems that use Marten

Beyond that, some of the things I’d like to maybe do with Marten this year are:

Investigate the usage of Postgresql table partitioning and database sharding as a way to increase scalability — especially with the event sourcing support
Projection snapshotting
In conjunction with Jasper, expand Marten’s asynchronous projection support to shard projection work across multiple running nodes, introduce some sort of optimized, no downtime projection rebuilds, and add some options for event streaming with Marten and Kafka or Pulsar
Try to build an efficient GraphQL adapter for Marten. And by efficient, I mean that you wouldn’t have to bounce through a Linq translation first and hopefully could opt into Marten’s JSON streaming wherever possible. This isn’t likely, but sounds kind of interesting to play with.

In a perfect, magic, unicorns and rainbows world, I’d love to see the Marten backlog in GitHub get under 50 items and stay there permanently. Commence laughing at me on that one:(

Jasper is a toolkit for common messaging scenarios between .Net applications with a robust in process command runner that can be used either with or without the messaging.

I started working on rebooting Jasper with a forthcoming V2 version late last year, and made quite a bit of progress before Marten got busy and .Net 6 being released necessitated other work. There’s a non-zero chance I will be using Jasper at work, which makes that a much more viable project. I’m currently in flight with:

Building Open Telemetry tracing directly into Jasper
Bi-directional compatibility with MassTransit applications (absolutely necessary to adopt this in my own shop).
Performance optimizations
.Net 6 support
Documentation overhaul
Kafka as a message transport option (Pulsar was surprisingly easy to add, and I’m hopeful that Kafka is similar)

And maybe, just maybe, I might extend Jasper’s somewhat unique middleware approach to web services utilizing the new ASP.Net Core Minimal API support. The idea there is to more or less create an improved version of the old FubuMVC idiom for building web services.

Lamar is a modern IoC container and the successor to StructureMap

I don’t have any real plans for Lamar in the new year, but there are some holes in the documentation, and a couple advanced features could sure use some additional examples. 2021 ended up being busy for Lamar though with:

Lamar v6 added interception (finally), a new documentation website, and a facility for overriding services at test time
Lamar v7 added support for IAsyncEnumerable (also finally), a small enhancement for the Minimal API feature in ASP.Net Core, and .Net 6 support

Add Robust Command Line Options to .Net Applications

Oakton did have a major v4/4.1 release to accommodate .Net 6 and ASP.Net Core Minimal API usage late in 2021, but I have yet to update the documentation. I would like to shift Oakton’s documentation website to VitePress first. The only plans I have for Oakton this year is to maybe see if there’d be a good way for Oakton to enable “buddy” command line tools to your application like the dotnet ef tool using the HostFactoryResolver class.

Alba is a wrapper around the ASP.Net Core TestServer for declarative, in process testing of ASP.Net Core web services. I don’t have any plans for Alba in the new year other than to respond to any issues or opportunities to smooth out usage from my shop’s usage of Alba.

Alba did get a couple major releases in 2021 though:

Storyteller has been mothballed for years, and I was ready to abandon it last year, but…

We still use Storyteller for some big, long running integration style tests in both Marten and Jasper where I don’t think xUnit/NUnit is a good fit, and I think maybe I’d like to reboot Storyteller later this year. The “new” Storyteller (I’m playing with the idea of calling it “Bobcat” as it might be a different tool) would be quite a bit smaller and much more focused on enabling integration testing rather than trying to be a BDD tool.

Not sure what the approach might be, it could be:

“Just” write some extension helpers to xUnit or NUnit for more data intensive tests
“Just” write some extension helpers to SpecFlow
Rebuild the current Storyteller concept, but also support a Gherkin model
Something else altogether?

My goals if this happens is to have a tool for automated testing that maybe supports:

Much more data intensive tests
Better handles integration tests
Strong support for test parallelization and even test run sharding in CI
Could help write characterization tests with a record/replay kind of model against existing systems (I’d *love* to have this at work)
Has some kind of model that is easy to use within an IDE like Rider or VS, even if there is a separate UI like Storyteller does today

And I’d still like to rewrite a subset of the existing Storyteller UI as an excuse to refresh my front end technology skillset.

To be honest, I don’t feel like Storyteller has ever been much of a success, but it’s the OSS project of mine that I’ve most enjoyed working on and most frequently used myself.

Weasel

Weasel is a set of libraries for database schema migrations and ADO.Net helpers that we spun out of Marten during its V4 release. I’m not super excited about doing this, but Weasel is getting some sort of database migration support very soon. Weasel isn’t documented itself yet, so that’s the only major plan other than supporting whatever Marten and/or Jasper needs this year.

Baseline

Baseline is a grab bag of helpers and extension methods that dates back to the early FubuMVC project. I haven’t done much with Baseline in years, and it might be time to prune it a little bit as some of what Baseline does is now supported in the .Net framework itself. The file system helpers especially could be pruned down, but then also get asynchronous versions of what’s left.

StructureMap

I don’t think that I got a single StructureMap question last year and stopped following its Gitter room. There are still plenty of systems using StructureMap out there, but I think the mass migration to either Lamar or another DI container is well underway.

Marten’s Compiled Query Feature

TL;DR: Marten’s compiled query feature makes using Linq queries significantly more efficient at runtime if you need to wring out just a little more performance in your Marten-backed application.

I was involved in a twitter conversation today that touched on the old Specification pattern of describing a reusable database query by an object (watch it, that word is overloaded in software development world and even refers to separate design patterns). I mentioned that Marten actually has an implementation of this pattern we call Compiled Queries.

Jumping right into a concrete example, let’s say that we’re building an issue tracking system because we hate Jira so much that we’d rather build one completely from scratch. At some point you’re going to want to query for all open issues currently assigned to a user. Assuming our new Marten-backed issue tracker has a document type called Issue, a compiled query class for that would look like this:

    // ICompiledListQuery<T> is from Marten
    public class OpenIssuesAssignedToUser: ICompiledListQuery<Issue>
    {
        public Expression<Func<IMartenQueryable<Issue>, IEnumerable<Issue>>> QueryIs()
        {
            return q => q
                .Where(x => x.AssigneeId == UserId)
                .Where(x => x.Status == "Open");
        }
        // This is an input parameter to the query
        public Guid UserId { get; set; }
    }

And now in usage, we’ll just spin up a new instance of the OpenIssuesAssignedToUser to query for the open issues for a given user id like this:

    var store = DocumentStore.For(opts =>
    {
        opts.Connection("some connection string");
    });

    await using var session = store.QuerySession();

    var issues = await session.QueryAsync(new OpenIssuesAssignedToUser
    {
        UserId = userId // passing in the query parameter to a known user id
    });
    
    // do whatever with the issues

Other than the weird method signature of the QueryIs() method, that class is pretty simple if you’re comfortable with Marten’s superset of Linq. Compiled queries can be valuable anywhere where the old Specification (query objects) pattern is useful, but here’s the cool part…

Compiled Queries are Faster

Linq has been an awesome addition to the .Net ecosystem, and it’s usually the very first thing I mention when someone asks me why they should consider .Net over Java or any other programming ecosystem. On the down side though, it’s complicated as hell, there’s some runtime overhead to generating and parsing Linq queries at runtime, and most .Net developers don’t actually understand how it works internally under the covers.

The best part of the compiled query feature in Marten is that on the first usage of a compiled query type, Marten memoizes its “query plan” for the represented Linq query so there’s significantly less overhead for subsequent usages of the same compiled query type within the same application instance.

To illustrate what’s happening when you issue a Linq query, consider the same logical query as above, but this time in inline Linq:


    var issues = await session.Query<Issue>()
        .Where(x => x.AssigneeId == userId)
        .Where(x => x.Status == "Open")
        .ToListAsync();

    // do whatever with the issues

When the Query() code above is executed, Marten is:

Building an entire object model in memory using the .Net Expression model.
Linq itself never executes any of the code within Where() or Select() clauses, instead it parses and interprets that Expression object model with a series of internal Visitor types.
The result of visiting the Expression model is to build a corresponding, internal IQueryHandler object is created that “knows” how to build up the SQL for the query and then how to process the resulting rows returned by the database and then to coerce the raw data into the desired results (JSON deserialization, stash things in identity maps or dirty checking records, etc).
Executing the IQueryHandler, which in turn writes out the desired SQL query to the outgoing database command
Make the actual call to the underlying Postgresql database to return a data reader
Interpret the data reader and coerce the raw records into the desired results for the Linq query

Sounds kind of heavyweight when you list it all out. When we move the same query to a compiled query, we only have to incur the cost of parsing the Linq query Expression model once, and Marten “remembers” the exact SQL statement, how to map query inputs like OpenIssuesAssignedToUser.UserId to the right database command parameter, and even how to process the raw database results. Behind the scenes, Marten is generating and compiling a new class at runtime to execute the OpenIssuesAssignedToUser query like this (I reformatted the generated source code just a little bit here):

using System.Collections.Generic;
using Marten.Internal;
using Marten.Internal.CompiledQueries;
using Marten.Linq;
using Marten.Linq.QueryHandlers;
using Marten.Testing.Documents;
using NpgsqlTypes;
using Weasel.Postgresql;

namespace Marten.Testing.Internals.Compiled
{
    public class
        OpenIssuesAssignedToUserCompiledQuery: ClonedCompiledQuery<IEnumerable<Issue>, OpenIssuesAssignedToUser>
    {
        private readonly HardCodedParameters _hardcoded;
        private readonly IMaybeStatefulHandler _inner;
        private readonly OpenIssuesAssignedToUser _query;
        private readonly QueryStatistics _statistics;

        public OpenIssuesAssignedToUserCompiledQuery(IMaybeStatefulHandler inner, OpenIssuesAssignedToUser query,
            QueryStatistics statistics, HardCodedParameters hardcoded): base(inner, query, statistics, hardcoded)
        {
            _inner = inner;
            _query = query;
            _statistics = statistics;
            _hardcoded = hardcoded;
        }


        public override void ConfigureCommand(CommandBuilder builder, IMartenSession session)
        {
            var parameters = builder.AppendWithParameters(
                @"select d.id, d.data from public.mt_doc_issue as d where (CAST(d.data ->> 'AssigneeId' as uuid) = ? and  d.data ->> 'Status' = ?)");

            parameters[0].NpgsqlDbType = NpgsqlDbType.Uuid;
            parameters[0].Value = _query.UserId;
            _hardcoded.Apply(parameters);
        }
    }

    public class
        OpenIssuesAssignedToUserCompiledQuerySource: CompiledQuerySource<IEnumerable<Issue>, OpenIssuesAssignedToUser>
    {
        private readonly HardCodedParameters _hardcoded;
        private readonly IMaybeStatefulHandler _maybeStatefulHandler;

        public OpenIssuesAssignedToUserCompiledQuerySource(HardCodedParameters hardcoded,
            IMaybeStatefulHandler maybeStatefulHandler)
        {
            _hardcoded = hardcoded;
            _maybeStatefulHandler = maybeStatefulHandler;
        }


        public override IQueryHandler<IEnumerable<Issue>> BuildHandler(OpenIssuesAssignedToUser query,
            IMartenSession session)
        {
            return new OpenIssuesAssignedToUserCompiledQuery(_maybeStatefulHandler, query, null, _hardcoded);
        }
    }
}

What else can compiled queries do?

Besides being faster than raw Linq and being useful as the old reliable Specification pattern, compiled queries can be very valuable if you absolutely insist on mocking or stubbing the Marten IQuerySession/IDocumentSession. You should never, ever try to mock or stub the IQueryable interface with a dynamic mock library like NSubstitute or Moq, but mocking the IQuerySession.Query<T>(T query) method is pretty straight forward.

Most of the Linq support in Marten is usable within compiled queries — even the Include() feature for querying related document types in one round trip. There’s even an ability to “stream” the raw JSON byte array data from compiled query results directly to the HTTP response body in ASP.Net Core for Marten’s “ludicrous speed” mode.