Dynamic Code Generation in Marten V4

Marten V4 extensively uses runtime code generation backed by Roslyn runtime compilation for dynamic code. This is both much more powerful than source generators in what it allows us to actually do, but can have significant memory usage and “cold start” problems (seems to depend on exact configurations, so it’s not a given that you’ll have these issues). In this post I’ll show the facility we have to “generate ahead” the code to dodge the memory and cold start issues at production time.

Before V4, Marten had used the common model of building up Expression trees and compiling them into lambda functions at runtime to “bake” some of our dynamic behavior into fast running code. That does work and a good percentage of the development tools you use every day probably use that technique internally, but I felt that we’d already outgrown the dynamic Expression generation approach as it was, and the new functionality requested in V4 was going to substantially raise the complexity of what we were going to need to do.

Instead, I (Marten is a team effort, but I get all the blame for this one) opted to use the dynamic code generation and compilation approach using LamarCodeGeneration and LamarCompiler that I’d originally built for other projects. This allowed Marten to generate much more complex code than I thought was practical with other models (we could have used IL generation too of course, but that’s an exercise in masochism). If you’re interested, I gave a talk about these tools and the approach at NDC London 2019.

I do think this has worked out in terms of performance improvements at runtime and certainly helped to be able to introduce the new document and event store metadata features in V4, but there’s a catch. Actually two:

  1. The Roslyn compiler sucks down a lot of memory sometimes and doesn’t seem to ever release it. It’s gotten better with newer releases and it’s not consistent, but still.
  2. There’s a sometimes significant lag in cold start scenarios on the first time Marten needs to generate and compile code at runtime

What we could do though, is provide what I call..

Marten’s “Generate Ahead” Strategy

To side step the problems with the Roslyn compilation, I developed a model (I did this originally in Jasper) to generate the code ahead of time and have it compiled into the entry assembly for the system. The last step is to direct Marten to use the pre-compiled types instead of generating the types at runtime.

Jumping straight into a sample console project to show off this functionality, I’m configuring Marten with the AddMarten() method you can see in this code on GitHub.

The important line of code you need to focus on here is this flag:

opts.GeneratedCodeMode = TypeLoadMode.LoadFromPreBuiltAssembly;

This flag in the Marten configuration directs Marten to first look in the entry assembly of the application for any types that it would normally try to generate at runtime, and if that type exists, load it from the entry assembly and bypass any invocation of Roslyn. I think in a real application you’d wrap that call something like this so that it only applies when the application is running in production mode:

if (Environment.IsProduction())
{

    options.GeneratedCodeMode = 
        TypeLoadMode.LoadFromPreBuiltAssembly;
}

The next thing to notice is that I have to tell Marten ahead of time what the possible document types and even about any compiled query types in this code so that Marten will “know” what code to generate in the next section. The compiled query registration is new, but you already had to let Marten know about the document types to make the schema migration functionality work anyway.

Generating and exporting the code ahead of time is done from the command line through an Oakton command. First though, add the LamarCodeGeneration.Commands Nuget to your entry project, which will also add a transitive reference to Oakton if you’re not already using it. This is all described in the Oakton getting started page, but you’ll need to change your Program.Main() method slightly to activate Oakton:

        // The return value needs to be Task<int> 
        // to communicate command success or failure
        public static Task<int> Main(string[] args)
        {
            return CreateHostBuilder(args)
                
                // This makes Oakton be your CLI
                // runner
                .RunOaktonCommands(args);
        }

If you’ll open up the command terminal of your preference at the root directory of the entry project, type this command to see the available Oakton commands:

dotnet run -- help

That’ll spit out a list of commands and the assemblies where Oakton looked for command types. You should see output similar to this:

Searching 'LamarCodeGeneration.Commands, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' for commands
Searching 'Marten.CommandLine, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' for commands

  -------------------------------------------------
    Available commands:
  -------------------------------------------------
        check-env -> Execute all environment checks against the application
          codegen -> Utilities for working with LamarCodeGeneration and LamarCompiler

and assuming you got that far, now type dotnet run -- help codegen to see the list of options with the codegen command.

If you just want to preview the generated code in the console, type:

dotnet run -- codegen preview

To just verify that the dynamic code can be successfully generated and compiled, use:

dotnet run -- codegen test

To actually export the generated code, use:

dotnet run — codegen write

That command will write a single C# file at /Internal/Generated/DocumentStorage.cs for any document storage types and another at `/Internal/Generated/Events.cs` for the event storage and projections.

Just as a short cut to clear out any old code, you can use:

dotnet run -- codegen delete

If you’re curious, the generated code — and remember that it’s generated code so it’s going to be butt ugly — is going to look like this for the document storage, and this code for the event storage and projections.

The way that I see this being used is something like this:

  • The LoadFromPreBuiltAssembly option is only turned on in Production mode so that developers can iterate at will during development. That should be disabled at development time.
  • As part of the CI/CD process for a project, you’ll run the dotnet run -- codegen write command as an initial step, then proceed to the normal compilation and testing cycles. That will bake in the generated code right into the compiled assemblies and enable you to also take advantage of any kind AOT compiler optimizations

Duh. Why not source generators doofus?

Why didn’t we use source generators you may ask? The Roslyn-based approach in Marten is both much better and much worse than source generators. Source generators are part of the compilation process itself and wouldn’t have any kind of cold start problem like Marten has with the runtime Roslyn compilation approach. That part is obviously much better, plus there’s no weird two step compilation process at CI/CD time. But on the downside, source generators can only use information that’s available at compilation time, and the Marten code generation relies very heavily on type reflection, conventions applied at runtime, and configuration built up through a fluent interface API. I do not believe that we could use source generators for what we’ve done in Marten because of that dependency on runtime information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s