Batch Querying with Marten

Before I talk about the batch querying feature set in Marten, let’s take a little detour through a common approach to persistence in .Net architectures that commonly causes the exact problem that Marten’s batch querying seeks to solve.

I’ve been in several online debates lately about the wisdom or applicability of granular repository abstractions over inner persistence infrastructure like EF Core or Marten like this sample below:

    public interface IRepository<T>
    {
        Task<T> Load(Guid id, CancellationToken token = default);
        Task Insert(T entity, CancellationToken token = default);
        Task Update(T entity, CancellationToken token = default);
        Task Delete(T entity, CancellationToken token = default);

        IQueryable<T> Query();
    }

That’s a pretty common approach, and I’m sure it’s working out for some people in at least simpler CRUD-centric applications. Unfortunately though, that reliance on fine-grained repositories also breaks down badly in more complicated systems where a single logical operation may need to span multiple entity types. Coincidentally, I have frequently seen this kind of fine grained abstraction directly lead to performance problems in the systems I’ve helped with after their original construction over the past 6-8 years.

For an example, let’s say that we have a message handler that will need to access and modify data from three different entity types in one logical transaction. Using the fine grained repository strategy, we’d have something like this:

    public class SomeMessage
    {
        public Guid UserId { get; set; }
        public Guid OrderId { get; set; }
        public Guid AccountId { get; set; }
    }

    public class Handler
    {
        private readonly IUnitOfWork _unitOfWork;
        private readonly IRepository<Account> _accounts;
        private readonly IRepository<User> _users;
        private readonly IRepository<Order> _orders;

        public Handler(
            IUnitOfWork unitOfWork,
            IRepository<Account> accounts,
            IRepository<User> users,
            IRepository<Order> orders)
        {
            _unitOfWork = unitOfWork;
            _accounts = accounts;
            _users = users;
            _orders = orders;
        }

        public async Task Handle(SomeMessage message)
        {
            // The potential performance problem is right here.
            // Multiple round trips to the database
            var user = await _users.Load(message.UserId);
            var account = await _accounts.Load(message.AccountId);
            var order = await _orders.Load(message.OrderId);

            var otherOrders = await _orders.Query()
                .Where(x => x.Amount > 100)
                .ToListAsync();

            // Carry out rules and whatnot

            await _unitOfWork.Commit();
        }
    }

So here’s the problem with the code up above as I see it:

  1. You’re having to inject separate dependencies for the matching repository type for each entity type, and that adds code ceremony and noise code.
  2. The code is making repeated round trips to the database server every time it needs more data. This is a contrived example, and it’s only 4 trips, but in real systems this could easily be many more. To make this perfectly clear, one of the very most pernicious sources of slow code is chattiness (frequent network round trips) between the application layer and backing database.

Fortunately, Marten has a facility called batch querying that we can use to fetch multiple data queries at one time, and even start processing against the earlier results while the later results are still being read. To use that, we’ve got to ditch the “one size fits all, least common denominator” repository abstraction and use the raw Marten IDocumentSession service as shown in this version below:

    public class MartenHandler
    {
        private readonly IDocumentSession _session;

        public MartenHandler(IDocumentSession session)
        {
            _session = session;
        }

        public async Task Handle(SomeMessage message)
        {
            // Not gonna lie, this is more code than the first alternative
            var batch = _session.CreateBatchQuery();

            var userLookup = batch.Load<User>(message.UserId);
            var accountLookup = batch.Load<Account>(message.AccountId);
            var orderLookup = batch.Load<Order>(message.OrderId);
            var otherOrdersLookup = batch.Query<Order>().Where(x => x.Amount > 100).ToList();

            await batch.Execute();

            // We can immediately start using the data from earlier
            // queries in memory while the later queries are still processing
            // in the background for a little bit of parallelization
            var user = await userLookup;
            var account = await accountLookup;
            var order = await orderLookup;

            var otherOrders = await otherOrdersLookup;

            // Carry out rules and whatnot

            // Commit any outstanding changes with Marten
            await _session.SaveChangesAsync();
        }

The code above creates a single, batched query for the four queries this handler needs, meaning that Marten is making a single database query for the four SELECT statements. As an improvement in the Marten V4 release, the results coming back from Postgresql are processed in a background Task, meaning that in the code above we can start working with the initial Account, User, and Order data while Marten is still building out the last Order results (remember that Marten has to deserialize JSON data to build out your documents and that can be non-trivial for large documents).

I think these are the takeaways for the before and after code here:

  1. Network round trips are expensive and chattiness can be a performance bottleneck, but batch querying approaches like Marten’s can help a great deal.
  2. Putting your persistence tooling behind least common denominator abstractions like the IRepository<T> approach shown above eliminate the ability to use advanced features of your actual persistence tooling. That’s a serious drawback as that disallows the usage of the exact features that allow you to create high performance solutions — and this isn’t specific to using Marten as your backing persistence tooling.
  3. Writing highly performant code can easily mean writing more code as you saw above with the batch querying. The point there being to not automatically opt for the most highly performant approach if it’s unnecessary and more complex than a slower, but simpler approach. Premature optimization and all that.

I’m only showing a small fraction of what the batch query supports, so certainly checkout the documentation for more examples.

My professional and OSS aspirations for 2022

I trot out one of these posts at the beginning of each year, but this time around it’s “aspirations” instead of “plans” because a whole lot of stuff is gonna be a repeat from 2020 and 2021 and I’m not going to lose any sleep over what doesn’t get done in the New Year or not be open to brand new opportunities.

In 2022 I just want the chance to interact with other developers. I’ll be at ThatConference in Round Rock, TX in January May? speaking about Event Sourcing with Marten (my first in person conference since late 2019). Other than that, my only goal for the year (Covid-willing) is to maybe speak at a couple more in person conferences just to be able to interact with other developers in real space again.

My peak as a technical blogger was the late aughts, and I think I’m mostly good with not sweating any kind of attempt to regain that level of readership. I do plan to write material that I think would be useful for my shop, or just about what I’m doing in the OSS space when I feel like it.

Which brings me to the main part of this post, my involvement with the JasperFx (Marten, Lamar, etc). family of OSS projects (plus Storyteller) which takes up most of my extracurricular software related time. Just for an idea of the interdependencies, here’s the highlights of the JasperFx world:

.NET Transactional Document DB and Event Store on PostgreSQL

Marten took a big leap forward late in 2021 with the long running V4.0 release. I think that release might have been the single biggest, most complicated OSS release that I’ve ever been a part of — FubuMVC 1.0 notwithstanding. There’s also a 5.0-alpha release out that addresses .Net 6 support and the latest version of Npgsql.

Right now Marten is a victim of its own success, and our chat room is almost constantly hair on fire with activity, which directly led to some planned improvements for V5 (hopefully by the end of January?) in this discussion thread:

  • Multi-tenancy through a separate database per tenant (long planned, long delayed, finally happening now)
  • Some kind of ability to register and resolve services for more than one Marten database in a single application
  • And related to the previous two bullet points, improved database versioning and schema migrations that could accommodate there being more than one database within a single .Net codebase
  • Improve the “generate ahead” model to make it easier to adopt. Think faster cold start times for systems that use Marten

Beyond that, some of the things I’d like to maybe do with Marten this year are:

  • Investigate the usage of Postgresql table partitioning and database sharding as a way to increase scalability — especially with the event sourcing support
  • Projection snapshotting
  • In conjunction with Jasper, expand Marten’s asynchronous projection support to shard projection work across multiple running nodes, introduce some sort of optimized, no downtime projection rebuilds, and add some options for event streaming with Marten and Kafka or Pulsar
  • Try to build an efficient GraphQL adapter for Marten. And by efficient, I mean that you wouldn’t have to bounce through a Linq translation first and hopefully could opt into Marten’s JSON streaming wherever possible. This isn’t likely, but sounds kind of interesting to play with.

In a perfect, magic, unicorns and rainbows world, I’d love to see the Marten backlog in GitHub get under 50 items and stay there permanently. Commence laughing at me on that one:(

Jasper is a toolkit for common messaging scenarios between .Net applications with a robust in process command runner that can be used either with or without the messaging.

I started working on rebooting Jasper with a forthcoming V2 version late last year, and made quite a bit of progress before Marten got busy and .Net 6 being released necessitated other work. There’s a non-zero chance I will be using Jasper at work, which makes that a much more viable project. I’m currently in flight with:

  • Building Open Telemetry tracing directly into Jasper
  • Bi-directional compatibility with MassTransit applications (absolutely necessary to adopt this in my own shop).
  • Performance optimizations
  • .Net 6 support
  • Documentation overhaul
  • Kafka as a message transport option (Pulsar was surprisingly easy to add, and I’m hopeful that Kafka is similar)

And maybe, just maybe, I might extend Jasper’s somewhat unique middleware approach to web services utilizing the new ASP.Net Core Minimal API support. The idea there is to more or less create an improved version of the old FubuMVC idiom for building web services.

Lamar is a modern IoC container and the successor to StructureMap

I don’t have any real plans for Lamar in the new year, but there are some holes in the documentation, and a couple advanced features could sure use some additional examples. 2021 ended up being busy for Lamar though with:

  1. Lamar v6 added interception (finally), a new documentation website, and a facility for overriding services at test time
  2. Lamar v7 added support for IAsyncEnumerable (also finally), a small enhancement for the Minimal API feature in ASP.Net Core, and .Net 6 support

Add Robust Command Line Options to .Net Applications

Oakton did have a major v4/4.1 release to accommodate .Net 6 and ASP.Net Core Minimal API usage late in 2021, but I have yet to update the documentation. I would like to shift Oakton’s documentation website to VitePress first. The only plans I have for Oakton this year is to maybe see if there’d be a good way for Oakton to enable “buddy” command line tools to your application like the dotnet ef tool using the HostFactoryResolver class.

The bustling metropolis of Alba, MO

Alba is a wrapper around the ASP.Net Core TestServer for declarative, in process testing of ASP.Net Core web services. I don’t have any plans for Alba in the new year other than to respond to any issues or opportunities to smooth out usage from my shop’s usage of Alba.

Alba did get a couple major releases in 2021 though:

  1. Alba 5.0 streamlined the entry API to mimic IHost, converted the documentation website to VitePress, and introduced new facilities for dealing with security in testing.
  2. Alba 6.0 added support for WebApplicationFactory and ASP.Net Core 6

Solutions for creating robust, human readable acceptance tests for your .Net or CoreCLR system and a means to create “living” technical documentation.

Storyteller has been mothballed for years, and I was ready to abandon it last year, but…

We still use Storyteller for some big, long running integration style tests in both Marten and Jasper where I don’t think xUnit/NUnit is a good fit, and I think maybe I’d like to reboot Storyteller later this year. The “new” Storyteller (I’m playing with the idea of calling it “Bobcat” as it might be a different tool) would be quite a bit smaller and much more focused on enabling integration testing rather than trying to be a BDD tool.

Not sure what the approach might be, it could be:

  • “Just” write some extension helpers to xUnit or NUnit for more data intensive tests
  • “Just” write some extension helpers to SpecFlow
  • Rebuild the current Storyteller concept, but also support a Gherkin model
  • Something else altogether?

My goals if this happens is to have a tool for automated testing that maybe supports:

  • Much more data intensive tests
  • Better handles integration tests
  • Strong support for test parallelization and even test run sharding in CI
  • Could help write characterization tests with a record/replay kind of model against existing systems (I’d *love* to have this at work)
  • Has some kind of model that is easy to use within an IDE like Rider or VS, even if there is a separate UI like Storyteller does today

And I’d still like to rewrite a subset of the existing Storyteller UI as an excuse to refresh my front end technology skillset.

To be honest, I don’t feel like Storyteller has ever been much of a success, but it’s the OSS project of mine that I’ve most enjoyed working on and most frequently used myself.

Weasel

Weasel is a set of libraries for database schema migrations and ADO.Net helpers that we spun out of Marten during its V4 release. I’m not super excited about doing this, but Weasel is getting some sort of database migration support very soon. Weasel isn’t documented itself yet, so that’s the only major plan other than supporting whatever Marten and/or Jasper needs this year.

Baseline

Baseline is a grab bag of helpers and extension methods that dates back to the early FubuMVC project. I haven’t done much with Baseline in years, and it might be time to prune it a little bit as some of what Baseline does is now supported in the .Net framework itself. The file system helpers especially could be pruned down, but then also get asynchronous versions of what’s left.

StructureMap

I don’t think that I got a single StructureMap question last year and stopped following its Gitter room. There are still plenty of systems using StructureMap out there, but I think the mass migration to either Lamar or another DI container is well underway.

Marten’s Compiled Query Feature

TL;DR: Marten’s compiled query feature makes using Linq queries significantly more efficient at runtime if you need to wring out just a little more performance in your Marten-backed application.

I was involved in a twitter conversation today that touched on the old Specification pattern of describing a reusable database query by an object (watch it, that word is overloaded in software development world and even refers to separate design patterns). I mentioned that Marten actually has an implementation of this pattern we call Compiled Queries.

Jumping right into a concrete example, let’s say that we’re building an issue tracking system because we hate Jira so much that we’d rather build one completely from scratch. At some point you’re going to want to query for all open issues currently assigned to a user. Assuming our new Marten-backed issue tracker has a document type called Issue, a compiled query class for that would look like this:

    // ICompiledListQuery<T> is from Marten
    public class OpenIssuesAssignedToUser: ICompiledListQuery<Issue>
    {
        public Expression<Func<IMartenQueryable<Issue>, IEnumerable<Issue>>> QueryIs()
        {
            return q => q
                .Where(x => x.AssigneeId == UserId)
                .Where(x => x.Status == "Open");
        }
        // This is an input parameter to the query
        public Guid UserId { get; set; }
    }

And now in usage, we’ll just spin up a new instance of the OpenIssuesAssignedToUser to query for the open issues for a given user id like this:

    var store = DocumentStore.For(opts =>
    {
        opts.Connection("some connection string");
    });

    await using var session = store.QuerySession();

    var issues = await session.QueryAsync(new OpenIssuesAssignedToUser
    {
        UserId = userId // passing in the query parameter to a known user id
    });
    
    // do whatever with the issues

Other than the weird method signature of the QueryIs() method, that class is pretty simple if you’re comfortable with Marten’s superset of Linq. Compiled queries can be valuable anywhere where the old Specification (query objects) pattern is useful, but here’s the cool part…

Compiled Queries are Faster

Linq has been an awesome addition to the .Net ecosystem, and it’s usually the very first thing I mention when someone asks me why they should consider .Net over Java or any other programming ecosystem. On the down side though, it’s complicated as hell, there’s some runtime overhead to generating and parsing Linq queries at runtime, and most .Net developers don’t actually understand how it works internally under the covers.

The best part of the compiled query feature in Marten is that on the first usage of a compiled query type, Marten memoizes its “query plan” for the represented Linq query so there’s significantly less overhead for subsequent usages of the same compiled query type within the same application instance.

To illustrate what’s happening when you issue a Linq query, consider the same logical query as above, but this time in inline Linq:


    var issues = await session.Query<Issue>()
        .Where(x => x.AssigneeId == userId)
        .Where(x => x.Status == "Open")
        .ToListAsync();

    // do whatever with the issues

When the Query() code above is executed, Marten is:

  1. Building an entire object model in memory using the .Net Expression model.
  2. Linq itself never executes any of the code within Where() or Select() clauses, instead it parses and interprets that Expression object model with a series of internal Visitor types.
  3. The result of visiting the Expression model is to build a corresponding, internal IQueryHandler object is created that “knows” how to build up the SQL for the query and then how to process the resulting rows returned by the database and then to coerce the raw data into the desired results (JSON deserialization, stash things in identity maps or dirty checking records, etc).
  4. Executing the IQueryHandler, which in turn writes out the desired SQL query to the outgoing database command
  5. Make the actual call to the underlying Postgresql database to return a data reader
  6. Interpret the data reader and coerce the raw records into the desired results for the Linq query

Sounds kind of heavyweight when you list it all out. When we move the same query to a compiled query, we only have to incur the cost of parsing the Linq query Expression model once, and Marten “remembers” the exact SQL statement, how to map query inputs like OpenIssuesAssignedToUser.UserId to the right database command parameter, and even how to process the raw database results. Behind the scenes, Marten is generating and compiling a new class at runtime to execute the OpenIssuesAssignedToUser query like this (I reformatted the generated source code just a little bit here):

using System.Collections.Generic;
using Marten.Internal;
using Marten.Internal.CompiledQueries;
using Marten.Linq;
using Marten.Linq.QueryHandlers;
using Marten.Testing.Documents;
using NpgsqlTypes;
using Weasel.Postgresql;

namespace Marten.Testing.Internals.Compiled
{
    public class
        OpenIssuesAssignedToUserCompiledQuery: ClonedCompiledQuery<IEnumerable<Issue>, OpenIssuesAssignedToUser>
    {
        private readonly HardCodedParameters _hardcoded;
        private readonly IMaybeStatefulHandler _inner;
        private readonly OpenIssuesAssignedToUser _query;
        private readonly QueryStatistics _statistics;

        public OpenIssuesAssignedToUserCompiledQuery(IMaybeStatefulHandler inner, OpenIssuesAssignedToUser query,
            QueryStatistics statistics, HardCodedParameters hardcoded): base(inner, query, statistics, hardcoded)
        {
            _inner = inner;
            _query = query;
            _statistics = statistics;
            _hardcoded = hardcoded;
        }


        public override void ConfigureCommand(CommandBuilder builder, IMartenSession session)
        {
            var parameters = builder.AppendWithParameters(
                @"select d.id, d.data from public.mt_doc_issue as d where (CAST(d.data ->> 'AssigneeId' as uuid) = ? and  d.data ->> 'Status' = ?)");

            parameters[0].NpgsqlDbType = NpgsqlDbType.Uuid;
            parameters[0].Value = _query.UserId;
            _hardcoded.Apply(parameters);
        }
    }

    public class
        OpenIssuesAssignedToUserCompiledQuerySource: CompiledQuerySource<IEnumerable<Issue>, OpenIssuesAssignedToUser>
    {
        private readonly HardCodedParameters _hardcoded;
        private readonly IMaybeStatefulHandler _maybeStatefulHandler;

        public OpenIssuesAssignedToUserCompiledQuerySource(HardCodedParameters hardcoded,
            IMaybeStatefulHandler maybeStatefulHandler)
        {
            _hardcoded = hardcoded;
            _maybeStatefulHandler = maybeStatefulHandler;
        }


        public override IQueryHandler<IEnumerable<Issue>> BuildHandler(OpenIssuesAssignedToUser query,
            IMartenSession session)
        {
            return new OpenIssuesAssignedToUserCompiledQuery(_maybeStatefulHandler, query, null, _hardcoded);
        }
    }
}

What else can compiled queries do?

Besides being faster than raw Linq and being useful as the old reliable Specification pattern, compiled queries can be very valuable if you absolutely insist on mocking or stubbing the Marten IQuerySession/IDocumentSession. You should never, ever try to mock or stub the IQueryable interface with a dynamic mock library like NSubstitute or Moq, but mocking the IQuerySession.Query<T>(T query) method is pretty straight forward.

Most of the Linq support in Marten is usable within compiled queries — even the Include() feature for querying related document types in one round trip. There’s even an ability to “stream” the raw JSON byte array data from compiled query results directly to the HTTP response body in ASP.Net Core for Marten’s “ludicrous speed” mode.

Multi-Tenancy with Marten

We’ve got an upcoming Marten 5.0 release ostensibly to support breaking changes related to .Net 6, but that also gives us an opportunity to consider work that would result in breaking API changes. A strong candidate for V5 right now is finally adding long delayed first class support for multi-tenancy through separate databases.

Let’s say that you’re building an online database-backed, web application of some sort that will be servicing multiple clients. At a minimum, you need to isolate data access so that client users can only interact with the data for the correct client or clients. Ideally, you’d like to get away with only having one deployed instance of your application that services the users of all the clients. In other words, you want to support “multi-tenancy” in your architecture.

Software multitenancy is a software architecture in which a single instance of software runs on a server and serves multiple tenants.

Multi-tenancy on Wikipedia

For the rest of this post, I’m going to use the term “tenant” to refer to whatever the organizational entity is that owns separate database data. Depending on your business domain, that could be a client, a sub-organization, a geographic area, or some other organizational concept.

Fortunately, if you use Marten as your backing database store, Marten has strong support for multi-tenancy with new improvements in the recent V4 release and more potential improvements tentatively planned for V5.

There are three basic approaches to segregating tenant data in a database:

  1. Single database, single schema, but use a field or property in each table to denote the tenant. This is Marten’s approach today with what we call the “Conjoined” model. The challenge here is that all queries and writes to the database need to take into account the currently used tenant — and that’s where Marten’s multi-tenancy support helps a great deal. Database schema management is easier with this approach because there’s only one set of database objects to worry about. More on this later.
  2. Separate schema per tenant in a single database. Marten does not support this model, and it doesn’t play well with Marten’s current internal design. I seriously doubt that Marten will ever support this.
  3. Separate database per tenant. This has been in Marten’s backlog forever, and maybe now is the time this finally gets done (plenty of folks have used Marten this way already with custom infrastructure on top of Marten, but there’s some significant overhead). I’ll speak to this much more in the last section of this post.

Basic Multi-Tenancy Support in Marten

To set up multi-tenancy in your document storage with Marten, we can set up a document store with these options:

    var store = DocumentStore.For(opts =>
    {
        opts.Connection("some connection string");

        // Let's just say that each and every document
        // type is going to be multi-tenanted
        opts.Policies.AllDocumentsAreMultiTenanted();

        // Or you can do this document type by document type
        // if some document types are not related to a tenant
        opts.Schema.For<User>().MultiTenanted();
    });

There’s a couple other ways to opt document types into multi-tenancy, but you get the point. With just this, we can start a new Marten session for a particular tenant and carry out basic operations isolated to a single tenant like so:

    // Open a session specifically for the tenant "tenant1"
    await using var session = store.LightweightSession("tenant1");

    // This would return *only* the admin users from "tenant1"
    var users = await session.Query<User>().Where(x => x.Roles.Contains("admin"))
        .ToListAsync();

    // This user would be automatically be tagged as belonging to "tenant1" 
    var user = new User {UserName = "important_guy", Roles = new string[] {"admin"}};
    session.Store(user);

    await session.SaveChangesAsync();

The key thing to note here is that other than telling Marten which tenant you want to work with as you open a new session, you don’t have to do anything else to keep the tenant data segregated as Marten is dealing with those mechanics behind the scenes on all queries, inserts, updates, and deletions from that session.

Awesome, except that some folks needed to occasionally do operations against multiple tenants at one time…

Tenant Spanning Operations in Marten V4

The big improvements in Marten V4 for multi-tenancy was in making it much easier to work with data from multiple tenants in one document session. Marten has long had the ability to query data across tenants with the AnyTenant() or ` like so:

    var allAdmins = await session.Query<User>()
        .Where(x => x.Roles.Contains("admin"))
        
        // This is a Marten specific extension to Linq
        // querying
        .Where(x => x.AnyTenant())
        
        .ToListAsync();

Which is great for what it is, but there wasn’t any way to know what tenant each document returned belonged to. We made a huge effort in V4 to expand Marten’s document metadata capabilities, and part of that is the ability to write the tenant id to a document being fetched from the database by Marten. The easiest way to do that is to have your document type implement the new ITenanted interface like so:

    public class MyTenantedDoc: ITenanted
    {
        public Guid Id { get; set; }
        
        // This property will be set by Marten itself
        // when the document is persisted or loaded
        // from the database
        public string TenantId { get; set; }
    }

So now we at least have the ability to know which documents we queried across the tenants belong to which tenant.

The next thing folks wanted from V4 was the ability to make writes against multiple tenants with one single document session in a single unit of work. To that end, Marten V4 introduced the concept of ITenantOperations to log operations against a specific tenants besides the tenant the current session was opened as. And all those operations should be committed to the underlying Postgresql database as a single transaction.

To make that concrete, here’s some sample code, but this time adding two new User document with the same user name, but to two different tenants by tenant id:

    // Same user name, but in different tenants
    var user1 = new User {UserName = "bob"};
    var user2 = new User {UserName = "bob"};

    // This exposes operations against only tenant1
    session.ForTenant("tenant1").Store(user1);

    // This exposes operations that would apply to 
    // only tenant2
    session.ForTenant("tenant2").Store(user2);
 
    // And both operations get persisted in one transaction
    await session.SaveChangesAsync();

So that’s the gist of the V4 multi-tenancy improvements. We also finally support multi-tenancy within the asynchronous projection support, but I’ll blog about that some other time.

Now though, it’s time to consider…

Database per Tenant

To be clear, I’m looking for any possible feedback about the requirements for this feature in Marten. Blast away here in comments, or here’s a link to the GitHub issue, or go to Gitter.

While you can — and many folks have successfully achieved — multi-tenancy through database per tenant by just keeping an otherwise identically configured DocumentStore per named tenant in memory with the only difference being a connection string. That certainly can work, especially with a low number of tenants. There’s a few problems with that approach though:

  • You’re on your own to configure that in the DI container within your application
  • DocumentStore is a relatively expensive object to create, and it potentially generates a lot of runtime objects that get held in memory. You don’t really want a bunch of those hanging around
  • Going around AddMarten() negates the Marten CLI support, which is the easiest possible way to manage Marten database schema migrations. Now you’re completely on your own about how to do database migrations without using pure runtime database patching — which we do not recommend in production.

So let’s just call it a given that we do want to add some formal support for multi-tenancy through separate databases per tenant to Marten. Moreover, Database per Tenant been in our backlog forever, but pushed off every time we’ve struggled to make big Marten releases.

I think there’s some potential for this story to cause breaking API changes (I don’t have anything specific in mind, it’s just likely in my opinion), so that makes that story a very good candidate to get in place for Marten V5. From the backlog issue writeup I made back in 2017:

  • Have all tenants tracked in memory, such that a single DocumentStore can share all the expensive runtime built internal objects across tenants
  • A tenanting strategy that can lookup the database connection string per tenant, and create sessions per separate tenants. There’s actually an interface hook in Marten all ready to go that may serve out of the box when we do this (I meant to do this work years ago, but it just didn’t happen).
  • At development time (AutoCreate != AutoCreate.None), be able to spin up a new database on the fly for a tenant if it doesn’t already exist
  • “Know” what all the existing tenants are so that we could apply database migrations from the CLI or through the DocumentStore schema migration APIs
  • Extend the CLI support to support multiple tenant databases
  • Make the database registry mechanism be a little bit pluggable. Thinking that some folks would have a few tenants where you’d be good with just writing everything into a static configuration file. Other folks may have a *lot* of tenants (I’ve personally worked on a system that had >100 separate tenant databases in one deployed application), so they may want a “master” database

JasperFx OSS Plans for .Net 6 (Marten et al)

I’m going to have to admit that I got caught flat footed by the .Net 6 release a couple weeks ago. I hadn’t really been paying much attention to the forthcoming changes, maybe got cocky by how easy the transition from netcoreapp3.1 to .Net 5 was, and have been unpleasantly surprised by how much work it’s going to take to move some OSS projects up to .Net 6. All at the same time that the advance users of the world are clamoring for all their dependencies to target .Net 6 yesterday.

All that being said, here’s my running list of plans to get the projects in the JasperFx GitHub organization successfully targeting .Net 6. I’ll make edits to this page as things get published to Nuget.

Baseline

Baseline is a grab bag utility library full of extension methods that I’ve relied on for years. Nobody uses it directly per se, but it’s a dependency of just about every other project in the organization, so it went first with the 3.2.2 release adding a .Net 6 target. No code changes were necessary other than adding .Net 6 to the CI testing. Easy money.

Oakton

EDIT: Oakton v4.0 is up on Nuget. WebApplication is supported, but you can’t override configuration in commands with this model like you can w/ HostBuilder only. I’ll do a follow up at some point to fill in this gap.

Oakton is a tool to add extensible command line options to .Net applications based on the HostBuilder model. Oakton is my problem child right now because it’s a dependency in several other projects and its current model does not play nicely with the new WebApplicationBuilder approach for configuring .Net 6 applications. I’d also like to get the Oakton documentation website moved to the VitePress + MarkdownSnippets model we’re using now for Marten and some of the other JasperFx projects. I think I’ll take a shortcut here and publish the Nuget and let the documentation catch up later.

Alba

Alba is an automated testing helper for ASP.Net Core. Just like Oakton, Alba worked very well with the HostBuilder model, but was thrown for a loop with the new WebApplicationBuilder configuration model that’s the mechanism for using the new Minimal API (*cough* inevitable Sinatra copy *cough*) model. Fortunately though, Hawxy came through with a big pull request to make Alba finally work with the WebApplicationFactory model that can accommodate the new WebApplicationBuilder model, so we’re back in business soon. Alba 5.1 will be published soon with that work after some documentation updates and hopefully some testing with the Oakton + WebApplicationBuilder + Alba model.

EDIT: Alba 7.0 is up with the necessary changes, but the docs will come later this week

Lamar

Lamar is an IoC/DI container and the modern successor to StructureMap. The biggest issue with Lamar on v6 was Nuget dependencies on the IServiceCollection model, plus needing some extra implementation to light up the implied service model of Minimal APIs. All the current unit tests and even integration tests with ASP.Net Core are passing on .Net 6. To finish up a new Lamar 7.0 release is:

  • One .Net 6 related bug in the diagnostics
  • Better Minimal API support
  • Upgrade Oakton & Baseline dependencies in some of the Lamar projects
  • Documentation updates for the new IAsyncDisposable support and usage with WebApplicationBuilder with or without Minimal API usage

EDIT: Lamar 7.0 is up on Nuget with .Net 6 support

Marten/Weasel

We just made the gigantic V4 release a couple months ago knowing that we’d have to follow up quickly with a V5 release with a few breaking changes to accommodate .Net 6 and the latest version of Npgsql. We are having to make a full point release, so that opens the door for other breaking changes that didn’t make it into V4 (don’t worry, I think shifting from V4 to V5 will be easy for most people). The other Marten core team members have been doing most of the work for this so far, but I’m going to jump into the fray later this week to do some last minute changes:

  • Review some internal changes to Npgsql that might have performance impacts on Marten
  • Consider adding an event streaming model within the new V4 async daemon. For folks that wanna use that to publish events to some kind of transport (Kafka? Some kind of queue?) with strict ordering. This won’t be much yet, but it keeps coming up so we might as well consider it.
  • Multi-tenancy through multiple databases. It keeps coming up, and potentially causes breaking API changes, so we’re at least going to explore it

I’m trying not to slow down the Marten V5 release with .Net 6 support for too long, so this is all either happening really fast or not at all. I’ll blog more later this week about multi-tenancy & Marten.

Weasel is a spin off library from Marten for database change detection and ADO.Net helpers that are reused in other projects now. It will be published simultaneously with Marten.

Jasper

Oh man, I’d love, love, love to have Jasper 2.0 done by early January so that it’ll be available for usage at my company on some upcoming work. This work is on hold while I deal with the other projects, my actual day job, and family and stuff.

Rebooting Jasper

Jasper is a long running OSS passion project of mine. As it is now, Jasper is a command processing tool similar to Brighter or MassTransit that can be used as either an in memory mediator tool (like a superset of Mediatr) or as a service bus framework for asynchronous messaging. Jasper was originally conceived as a way to recreate the “good” parts of FubuMVC like low code ceremony, minimal intrusion of the framework into application code, and effective usage of the Russian Doll model for the execution pipeline. At the same time though, I wanted Jasper to improve upon the earlier FubuMVC architecture by maximizing performance, minimizing object allocations, easier configuration and bootstrapping, and making it much easier for developers to troubleshoot runtime issues.

I actually did cut a Jasper V1 release early in the COVID-19 pandemic, but otherwise dropped it to focus on Marten and stopped paying any attention to it. With Marten V4 is in the books, I’m going back to working on Jasper for a little bit. For right now, I’m thinking that the Jasper V2 work is something like this:

  1. Upgrading all the dependencies and targeting .Net 5/6 (basically done)
  2. Benchmarking and optimizing the core runtime internals. Sometimes the best way to improve a codebase is to step away from it for quite a bit and come back in with fresh perspectives. There’s also some significant lessons from Marten V4 that might apply to Jasper
  3. Build in Open Telemetry tracing through Jasper’s pipeline. Really all about getting me up to speed on distributed tracing.
  4. Support the AsyncAPI standard (Swagger for asynchronous messaging). I’m really interested in this, but haven’t taken much time to dive into it yet
  5. Wire compatibility with NServiceBus so a Jasper app can talk bi-directionally with an NServiceBus app
  6. Same with MassTransit. If I decide to pursue Jasper seriously, I’d have to do that to have any shot at using Jasper at work
  7. More transport options. Right now there’s a Kafka & Pulsar transport option stuck in PR purgatory from another contributor. Again, a learning opportunity.
  8. Optimize the heck out of the Rabbit MQ usage.
  9. Go over the usability of the configuration. To be honest here, I’ve less than thrilled with our MassTransit usage and the hoops you have to jump through to bend it to our will and I’d like to see if I could do better with Jasper
  10. Improve the documentation website (if I’m serious about Jasper)
  11. Play with some kind of Jasper/Azure Functions integration. No idea what that would look like, but the point is to go learn more about Azure Functions
  12. Maybe, but a low priority — I have a working version of FubuMVC style HTTP endpoints in Jasper already. With everybody all excited about the new Minimal API stuff in ASP.Net Core v6, I wouldn’t mind showing a slightly different approach

Integrating Marten and Jasper

Maybe the single biggest reason for me to play more with Jasper is to explore some deeper integration with Marten for some more complicated CQRS and event sourcing architectural problems. Jasper already has an outbox/inbox pattern implementation for Marten. Going farther, I’d like to have out of the box solutions for:

  • Event streaming from Marten to message queues using Jasper
  • An alternative to Marten’s async daemon using Kafka/Pulsar topics
  • Using Jasper to host Marten’s asynchronous projections in a way that distributes work across running nodes
  • Experimenting more with CQRS architectures using Jasper + Marten

Anyway, I’m jotting this down mostly for me, but I’m absolutely happy for any kind of feedback or maybe to see if anyone else would be interested in helping with Jasper development.

Marten V4: Hard Deletes, Soft Deletes, Un-Deletes, All the Deletes You Meet

If you haven’t seen this yet, Marten V4.0 officially dropped on Nuget late last week (finally).

V4 was a huge release for Marten and there’s lots to talk about in the new release, but I want to start simply with just talking about Marten’s support for deleting documents. In all the examples, we’re going to use a User document type from our testing code.

Let’s say we have the identity of a User document that we want to delete in our Marten database. That code is pretty simple, it’s just this below:

internal Task DeleteByDocumentId(IDocumentSession session, Guid userId)
{
    // Tell Marten the type and identity of a document to
    // delete
    session.Delete<User>(userId);

    return session.SaveChangesAsync();
}

By default, Marten will do a “hard” delete where the actual database row in Postgresql is deleted and that’s all she wrote. Marten has long had support for “soft” deletes where the underlying rows are marked as deleted with a metadata column that tracks “is deleted” instead of actually deleting the row, so let’s opt into that by configuring the User document as soft-deleted like so:

// Configuring a new Marten document store
var store = DocumentStore.For(opts =>
{
    opts.Connection("some connection string");

    opts.Schema.For<User>().SoftDeleted();
});

Or if you prefer, you can also use an attribute like so:

[SoftDeleted]
public class User
{
    public Guid Id { get; set; }
    
    // Other props we don't care about here
}

And I’ll show a 3rd way that’s new in Marten V4 later on.

The API to delete a User document is exactly the same, but the mechanics of what Marten is doing to the underlying Postgresql database have changed. Instead of deleting the underlying rows, Marten just marks an mt_deleted column as true. But now, think about this Linq query below where I’m searching for all the users that have the “admin” role:

            using var session = store.QuerySession();
            var admins = await session.Query<User>()
                .Where(x => x.Roles.Contains("admin"))
                .ToListAsync();

You’ll notice that I’m not doing anything to explicitly filter out the deleted users in this query, and that’s because Marten is doing that for you — and that’s the behavior you’d want most of the time. If for some reason you’d like to query for all the documents, even the ones that are marked as deleted, you can query like this:

            var admins = await session.Query<User>()
                .Where(x => x.Roles.Contains("admin"))
                
                // This is Marten specific
                .Where(x => x.MaybeDeleted())
                .ToListAsync();

And now, if you only want to query for documents that are explicitly marked as deleted, you can do this:

            var admins = await session.Query<User>()
                .Where(x => x.Roles.Contains("admin"))
                
                // This is Marten specific
                .Where(x => x.IsDeleted())
                .ToListAsync();

So far, so good? Now what if you change your mind about your deleted documents and you want to bring them back? Marten V4 adds the IDocumentSession.UndoDeleteWhere() method to reverse the soft deletes in the database. In the usage below, we’re going to mark every admin user as “not deleted”:

            using var session = store.LightweightSession();
            
            session.UndoDeleteWhere<User>(x => x.Roles.Contains("admin"));
            await session.SaveChangesAsync();

But hold on, what if instead you really want to just wipe out the database rows with a good old fashioned “hard” delete instead? Marten V4 adds some IDocumentSession.HardDelete*****() APIs to do exactly that as shown below:

            using var session = store.LightweightSession();

            session
                .HardDeleteWhere<User>(x => x.Roles.Contains("admin") && x.IsDeleted());
            await session.SaveChangesAsync();

There are also single document versions of HardDelete() as well.

So back to the sample of querying all admin users both past and present:

            var admins = await session.Query<User>()
                .Where(x => x.Roles.Contains("admin"))

                // This is Marten specific
                .Where(x => x.MaybeDeleted())
                .ToListAsync();

If you’re going to do that, it might be nice to be able to understand which users are marked as deleted, when they were marked deleted, and which users are not deleted. And it’d also be helpful if Marten would just tag the documents themselves with that metadata.

Enter the new Marten.Metadata.ISoftDeleted interface in V4. Let’s make our User document implement that interface as shown below:

public class User : ISoftDeleted
{
    public Guid Id { get; set; }

    // These two properties reflect Marten metadata
    // and will be updated upon Delete() calls
    // or when loading through Marten
    public bool Deleted { get; set; }
    public DateTimeOffset? DeletedAt { get; set; }

    // Other props we don't care about here
}

If a document type implements the ISoftDeleted interface, the document type will automatically be treated as soft-deleted by Marten, and the Deleted and DeletedAt properties will be set at document load time by Marten.

I’m not showing it here, but you can effectively create the same behavior without the marker interface by using a fluent interface. Check out the Marten documentation for that approach.

Going into Marten I thought of the delete operation as something simple conceptually, but thousands of users means getting requests for more control over the process and that’s what we hope V4 delivers here.

Marten Takes a Giant Leap Forward with the Official V4 Release!

Starting next week I’ll be doing some more deep dives into new Marten V4 improvements and some more involved sample usages.

Today I’m very excited to announce the official release of Marten V4.0! The Nugets just went live, and we’ve published out completely revamped project website at https://martendb.io.

This has been at least a two year journey of significant development effort by the Marten core team and quite a few contributors, preceded by several years of brainstorming within the Marten community about the improvements realized by this release. There’s plenty more to do in the Marten backlog, but I think this V4 release puts Marten on a very solid technical foundation for the long term future.

This was a massive effort, and I’d like to especially thank the other core team members Oskar Dudycz for answering so many user questions and being the champion for our event sourcing feature set, and Babu Annamalai for the newly improved website and all our grown up DevOps infrastructure. Their contributions over the years and especially on this giant release have been invaluable.

I’d also like to thank:

  • JT for taking on the nullability sweep and many other things
  • Ville Häkli might have accidentally become our best tester and helped us discover and deal with several issues along the way
  • Julien Perignon and his team for their patience and help with the Marten V4 shakedown cruise
  • Barry Hagan started the ball rolling with Marten’s new, expanded metadata collection
  • Raif Atef for several helpful bug reports and some related fixes
  • Simon Cropp for several pull requests and doing some dirty work
  • Kasper Damgård for a lot of feedback on Linq queries and memory usage
  • Adam Barclay helped us improve Marten’s multi-tenancy support and its usability

and many others who raised actionable issues, gave us feedback, and even made code contributions. Keeping in mind that I personally grew up on a farm in the middle of nowhere in the U.S., it’s a little mind-blowing to me to work on a project of this magnitude that at a quick glance included contributors from at least five continents on this release.

One of my former colleagues at Calavista likes to ask prospective candidates for senior architect roles what project they’ve done that they’re the most proud of. I answered “Marten” at the time, but I think I mean that even more now.

What Changed in this Release?

To quote the immortal philosopher Ferris Bueller:

The question isn‘t ‘what are we going to do’, the question is ‘what aren’t we going to do? ‘

We did try to write up a list of breaking changes for V4 in the migration guide, but here’s some highlights:

  • We generally made a huge sweep of the Marten code internals looking for every possible opportunity to reduce object allocations and dictionary lookups for low level performance improvements. The new dynamic code generation approach in Marten helped get us to that point.
  • We think Marten is even easier to bootstrap in new projects with improvements to the IServiceCollection.AddMarten() extensions
  • Marten supports System.Text.Json — but use that with some caution of course
  • The Linq support took a big step forward with a near rewrite and filled in some missing support for better querying through child collections as a big example. The Linq support is now much more modular and we think that will help us continue to grow that support. It’s a small thing, but the Linq parsing was even optimized a little bit for performance
  • Event Sourcing in Marten got a lot of big improvements that were holding up adoption by some users, especially in regards to the asynchronous projection support. The “async daemon” was completely rewritten and is now much easier to incorporate into .Net systems.
  • As a big user request, Marten supports much more options for tracking flexible metadata like correlation ids and even user defined headers in both document and event storage
  • Multi-tenancy support was improved
  • Soft delete support got some additional usability features
  • PLv8 adoption has been a stumbling block, so all the features related to PLv8 were removed to a separate add-on library called Marten.PLv8
  • The schema management features in Marten made some significant strides and should be able to handle more scenarios with less manual intervention — we think/hope/let’s just be positive for now

What’s Next for Marten?

Full point OSS releases inevitably bring a barrage of user reported errors, questions about migrating, possibly confusing wording in new documentation, and lots of queries about some significant planned features we just couldn’t fit into this already giant release. For that matter, we’ll probably have to quickly spin out a V5 release for .Net 6 and Npgsql 6 because there’s breaking changes coming due to those dependencies. OSS projects are never finished, only abandoned, and there’ll be a world of things to take care of in the aftermath of 4.0 — but for right now, Don’t Steal My Sunshine!.

Efficient Web Services with Marten V4

We’re genuinely close to finally pulling the damn trigger on Marten V4. One of the last things I’m coding for the release is a new recipe for users to write very efficient web services backed by Marten database storage.

A Traditional .Net Approach

Before I get into the exact mechanics of that, let’s set the stage a little bit and draw some contrasts with a more traditional .Net web service stack. Let’s start by saying that in that traditional stack, you’re not using any kind of CQRS and the system state is persisted in the “one true model” approach using Entity Framework Core and an RDBMS like Sql Server. Now, in some web services you need to query data from the database and serve up a subset of that data into the outgoing HTTP response in a web service endpoint. The typical flow — with a focus on what’s happening under the covers — would be to:

  1. ASP.Net Core receives an HTTP GET, finds the proper endpoint, and calls the right handler for that route. Not that it actually matters, but let’s assume the endpoint handler is an MVC Core Controller method
  2. ASP.Net Core invokes a call to your DI container to build up the MVC controller object, and calls the right method for the route
  3. You’ve been to .Net conferences and internalized the idea that an MVC controller shouldn’t get too big or do things besides HTTP mediation, so you’re delegating to a tool like MediatR. MediatR itself is going to go through another DI container service resolution to find the right handler for the input model, then invoke that handler
  4. EF Core issues a query against your Sql Server database. If you’re needing to fetch data on more than just the root aggregate model, the query is going to be an outer join against all the child tables too
  5. EF Core loops around in the database rows and creates objects for your .Net domain model classes based on its ORM mappings
  6. You certainly don’t want to send the raw domain model on the wire because of coupling concerns, or don’t want every bit of data exposed to the client in a particular web service, so you use some kind of tool like AutoMapper to transform the internal domain model objects built up by EF Core into Data Transfer Objects (DTO) purposely designed to go over the wire.
  7. Lastly, you return the outgoing DTO model, which is serialized to JSON and sent down the HTTP response by MVC Core

Sound pretty common? That’s also a lot of overhead. A lot of memory allocations, data mappings between structures, JSON serialization, and a lot of dictionary lookups just to get data out of the database and spit it out into the HTTP response. It’s also a non-trivial amount of code, and I’d argue that some of the tools I mentioned are high ceremony.

Now do CQRS!

I initially thought of CQRS as looking like a whole lot more work to code, and that’s not an uncommon impression when folks are first introduced to it. I’ve come to realize over time that it’s not really more work so much as it’s really just doing about the same amount of work in different places and different times in the application’s workflow.

Now let’s at least introduce CQRS into our application architecture. I’m not saying that that automatically implies using event sourcing, but let’s say that you are writing a pre-built “read side” model of the state of your system directly to a database of some sort. Now from that same web service I was describing before, you just need to fetch that persisted “read side” model from the database and spit that state right out to the HTTP response.

Now then, I’ve just yada, yada’d all the complexity of the CQRS architecture that continuously updates the read side view for you, but hey, Marten does that for you too and that can be a shortly forthcoming follow up blog post.

Finally bringing Marten V4 into play, let’s say our read side model for an issue tracking system looks like this:

    public class Note
    {
        public string UserName { get; set; }
        public DateTime Timestamp { get; set; }
        public string Text { get; set; }
    }

    public class Issue
    {
        public Guid Id { get; set; }
        public string Description { get; set; }
        public bool Open { get; set; }

        public IList<Note> Notes { get; set; }
    }

Before anyone gets bent out of shape by this, it’s perfectly possible to tell Marten to serialize the persisted documents to JSON with camel or even snake casing to be more idiomatic JSON or Javascript friendly.

Now, let’s build out two controller endpoints, one that gives you an Issue payload by searching by its id, and a second endpoint that gives you all the open Issue models in a single JSON array payload. That controller — using some forthcoming functionality in a new Marten.AspNetCore Nuget — looks like this:

    public class GetIssueController: ControllerBase
    {
        private readonly IQuerySession _session;

        public GetIssueController(IQuerySession session)
        {
            _session = session;
        }

        [HttpGet("/issue/{issueId}")]
        public Task Get(Guid issueId)
        {
            // This "streams" the raw JSON to the HttpResponse
            // w/o ever having to read the full JSON string or
            // deserialize/serialize within the HTTP request
            return _session.Json
                .WriteById<Issue>(issueId, HttpContext);
        }

        [HttpGet("/issue/open")]
        public Task OpenIssues()
        {
            // This "streams" the raw JSON to the HttpResponse
            // w/o ever having to read the full JSON string or
            // deserialize/serialize within the HTTP request
            return _session.Query<Issue>()
                .Where(x => x.Open)
                .WriteArray(HttpContext);
        }

In the GET: /issue/{issueId} endpoint, you’ll notice the call to the new IQuerySession.Json.WriteById() extension method, and how I’m passing to it the current HttpContext. That method is:

  1. Executing a database query against the underlying Postgresql database. And in this case, all of the data is stored in a single column in a single row, so there’s not JOINs or sparse datasets like there would be with an ORM querying an object that has child collections.
  2. Write the raw bytes of the persisted JSON data directly to the HttpResponse.Body without ever bothering to write the whole thing to a .Net string and definitely without having to incur the overhead of JSON deserialization/serialization. That extension method also sets the HTTP content-length and content-type response headers, as well as setting the HTTP status code to 200 if the document is found, or 404 if the data is not found.

In the second HTTP endpoint for GET: /issue/open, the call to WriteArray(HttpContext) is doing something very similar, but writing the results as a JSON array.

By no means is this technique going to be applicable to every HTTP GET endpoint, but when it does, this is far, far more efficient and simpler to code than the more traditional approach that involves all the extra pieces and just has so much more memory allocations and hardware operations going on just to shoot some JSON down the wire.

For a little more context, here’s a test against the /issue/{issueId} endpoint, with a cameo from Alba to help me test the HTTP behavior:

        [Fact]
        public async Task stream_a_single_document_hit()
        {
            var issue = new Issue {Description = "It's bad", Open = true};

            var store = theHost.Services.GetRequiredService<IDocumentStore>();
            using (var session = store.LightweightSession())
            {
                session.Store(issue);
                await session.SaveChangesAsync();
            }

            var result = await theHost.Scenario(s =>
            {
                s.Get.Url($"/issue/{issue.Id}");
                s.StatusCodeShouldBeOk();
                s.ContentTypeShouldBe("application/json");
            });

            var read = result.ReadAsJson<Issue>();

            read.Description.ShouldBe(issue.Description);
        }

and one more test showing the “miss” behavior:

        [Fact]
        public async Task stream_a_single_document_miss()
        {
            await theHost.Scenario(s =>
            {
                s.Get.Url($"/issue/{Guid.NewGuid()}");
                s.StatusCodeShouldBe(404);
            });
        }

Feedback anyone?

This code isn’t even released yet even in an RC Nuget, so feel very free to make any kind of suggestion or send us feedback.

Dynamic Code Generation in Marten V4

Marten V4 extensively uses runtime code generation backed by Roslyn runtime compilation for dynamic code. This is both much more powerful than source generators in what it allows us to actually do, but can have significant memory usage and “cold start” problems (seems to depend on exact configurations, so it’s not a given that you’ll have these issues). In this post I’ll show the facility we have to “generate ahead” the code to dodge the memory and cold start issues at production time.

Before V4, Marten had used the common model of building up Expression trees and compiling them into lambda functions at runtime to “bake” some of our dynamic behavior into fast running code. That does work and a good percentage of the development tools you use every day probably use that technique internally, but I felt that we’d already outgrown the dynamic Expression generation approach as it was, and the new functionality requested in V4 was going to substantially raise the complexity of what we were going to need to do.

Instead, I (Marten is a team effort, but I get all the blame for this one) opted to use the dynamic code generation and compilation approach using LamarCodeGeneration and LamarCompiler that I’d originally built for other projects. This allowed Marten to generate much more complex code than I thought was practical with other models (we could have used IL generation too of course, but that’s an exercise in masochism). If you’re interested, I gave a talk about these tools and the approach at NDC London 2019.

I do think this has worked out in terms of performance improvements at runtime and certainly helped to be able to introduce the new document and event store metadata features in V4, but there’s a catch. Actually two:

  1. The Roslyn compiler sucks down a lot of memory sometimes and doesn’t seem to ever release it. It’s gotten better with newer releases and it’s not consistent, but still.
  2. There’s a sometimes significant lag in cold start scenarios on the first time Marten needs to generate and compile code at runtime

What we could do though, is provide what I call..

Marten’s “Generate Ahead” Strategy

To side step the problems with the Roslyn compilation, I developed a model (I did this originally in Jasper) to generate the code ahead of time and have it compiled into the entry assembly for the system. The last step is to direct Marten to use the pre-compiled types instead of generating the types at runtime.

Jumping straight into a sample console project to show off this functionality, I’m configuring Marten with the AddMarten() method you can see in this code on GitHub.

The important line of code you need to focus on here is this flag:

opts.GeneratedCodeMode = TypeLoadMode.LoadFromPreBuiltAssembly;

This flag in the Marten configuration directs Marten to first look in the entry assembly of the application for any types that it would normally try to generate at runtime, and if that type exists, load it from the entry assembly and bypass any invocation of Roslyn. I think in a real application you’d wrap that call something like this so that it only applies when the application is running in production mode:

if (Environment.IsProduction())
{

    options.GeneratedCodeMode = 
        TypeLoadMode.LoadFromPreBuiltAssembly;
}

The next thing to notice is that I have to tell Marten ahead of time what the possible document types and even about any compiled query types in this code so that Marten will “know” what code to generate in the next section. The compiled query registration is new, but you already had to let Marten know about the document types to make the schema migration functionality work anyway.

Generating and exporting the code ahead of time is done from the command line through an Oakton command. First though, add the LamarCodeGeneration.Commands Nuget to your entry project, which will also add a transitive reference to Oakton if you’re not already using it. This is all described in the Oakton getting started page, but you’ll need to change your Program.Main() method slightly to activate Oakton:

        // The return value needs to be Task<int> 
        // to communicate command success or failure
        public static Task<int> Main(string[] args)
        {
            return CreateHostBuilder(args)
                
                // This makes Oakton be your CLI
                // runner
                .RunOaktonCommands(args);
        }

If you’ll open up the command terminal of your preference at the root directory of the entry project, type this command to see the available Oakton commands:

dotnet run -- help

That’ll spit out a list of commands and the assemblies where Oakton looked for command types. You should see output similar to this:

Searching 'LamarCodeGeneration.Commands, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' for commands
Searching 'Marten.CommandLine, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' for commands

  -------------------------------------------------
    Available commands:
  -------------------------------------------------
        check-env -> Execute all environment checks against the application
          codegen -> Utilities for working with LamarCodeGeneration and LamarCompiler

and assuming you got that far, now type dotnet run -- help codegen to see the list of options with the codegen command.

If you just want to preview the generated code in the console, type:

dotnet run -- codegen preview

To just verify that the dynamic code can be successfully generated and compiled, use:

dotnet run -- codegen test

To actually export the generated code, use:

dotnet run — codegen write

That command will write a single C# file at /Internal/Generated/DocumentStorage.cs for any document storage types and another at `/Internal/Generated/Events.cs` for the event storage and projections.

Just as a short cut to clear out any old code, you can use:

dotnet run -- codegen delete

If you’re curious, the generated code — and remember that it’s generated code so it’s going to be butt ugly — is going to look like this for the document storage, and this code for the event storage and projections.

The way that I see this being used is something like this:

  • The LoadFromPreBuiltAssembly option is only turned on in Production mode so that developers can iterate at will during development. That should be disabled at development time.
  • As part of the CI/CD process for a project, you’ll run the dotnet run -- codegen write command as an initial step, then proceed to the normal compilation and testing cycles. That will bake in the generated code right into the compiled assemblies and enable you to also take advantage of any kind AOT compiler optimizations

Duh. Why not source generators doofus?

Why didn’t we use source generators you may ask? The Roslyn-based approach in Marten is both much better and much worse than source generators. Source generators are part of the compilation process itself and wouldn’t have any kind of cold start problem like Marten has with the runtime Roslyn compilation approach. That part is obviously much better, plus there’s no weird two step compilation process at CI/CD time. But on the downside, source generators can only use information that’s available at compilation time, and the Marten code generation relies very heavily on type reflection, conventions applied at runtime, and configuration built up through a fluent interface API. I do not believe that we could use source generators for what we’ve done in Marten because of that dependency on runtime information.