Using Roslyn for Runtime Code Generation in Marten

I’m using Roslyn to dynamically compile and load assemblies built at runtime from generated code in Marten and other than some concern over the warmup time, it’s been going very well so far.

Like so many other developers with more cleverness than sense, I’ve spent a lot of time trying to build Hollywood Principle style frameworks that try to dynamically call application code at runtime through Reflection or some kind of related mechanism. Reflection itself has traditionally been the easiest mechanism to use in .Net to create dynamic behavior at runtime, but it can be a performance problem, especially if you use it naively.

A Look Back at What Came Before…

Taking my own StructureMap IoC tool as an example, over the years I’ve accomplished dynamic runtime behavior in a couple different ways:

Using IL directly using Reflection.Emit from the original versions through StructureMap 2.5. Working with IL is just barely a higher abstraction than assembly code and I don’t recommend using that if your goal is maintainability or making it easy for other developers to work in your code. I don’t miss generating IL by hand whatsoever. For those of you reading this and saying “pfft, IL isn’t so bad if you just understand how it works…”, my advice to you is to immediately go outside and get some fresh air and sunshine because you clearly aren’t thinking straight.
From StructureMap 2.6 I crudely used the trick of building Expression trees representing what I needed to do, then compiling those Expression trees into objects of the right Func or Action signatures. This approach is easier – at least for me – because the Expression model is much closer semantically to the actual code you’re trying to mimic than the stack-based IL.
From StructureMap 3.* on, there’s a much more complex dynamic Expression compilation model that’s robust enough to call constructor functions, setter properties, thread in interception, and surround all of that with try/catch logic for expressive exception messages and pseudo stack traces.

The current dynamic Expression approach in the StructureMap 3/4 internals is mostly working out well, but I barely remember how it works and it would take me a good day to just to get back into that code if I ever had to change something.

What if instead we could just work directly in plain old C# that we largely know and understand, but somehow get that compiled at runtime instead? Well, thanks to Roslyn and its “compiler as a service”, we now can.

I’ve said before that I want to eventually replace the Expression compilation with the Roslyn code compilation shown in this post, but I’m not sure I’m ambitious enough to mess with a working project.

How Marten uses Roslyn Runtime Generation

As I explained in my last blog post, Marten generates some “glue code” to connect a document object to the proper ADO.Net command objects for loading, storing, or deleting. For each document class, Marten generates an IDocumentStorage class with this signature:

public interface IDocumentStorage
{
    NpgsqlCommand UpsertCommand(object document, string json);
    NpgsqlCommand LoaderCommand(object id);
    NpgsqlCommand DeleteCommandForId(object id);
    NpgsqlCommand DeleteCommandForEntity(object entity);
    NpgsqlCommand LoadByArrayCommand(TKey[] ids);
    Type DocumentType { get; }
}

In the test library, we have a class I creatively called “Target” that I’ve been using to test how Marten handles various .Net Types and queries. At runtime, Marten generates a class called TargetDocumentStorage that implements the interface above. Part of the generated code — modified by hand to clean up some extraneous line breaks and added comments — is shown below:

using Marten;
using Marten.Linq;
using Marten.Schema;
using Marten.Testing.Fixtures;
using Marten.Util;
using Npgsql;
using NpgsqlTypes;
using Remotion.Linq;
using System;
using System.Collections.Generic;

namespace Marten.GeneratedCode
{
    public class TargetStorage : IDocumentStorage, IBulkLoader, IdAssignment
    {
        public TargetStorage()
        {

        }

        public Type DocumentType => typeof (Target);

        public NpgsqlCommand UpsertCommand(object document, string json)
        {
            return UpsertCommand((Target)document, json);
        }

        public NpgsqlCommand LoaderCommand(object id)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForId(object id)
        {
            return new NpgsqlCommand("delete from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForEntity(object entity)
        {
            return DeleteCommandForId(((Target)entity).Id);
        }

        public NpgsqlCommand LoadByArrayCommand(T[] ids)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = ANY(:ids)").WithParameter("ids", ids);
        }

        // I configured the "Date" field to be a duplicated/searchable field in code
        public NpgsqlCommand UpsertCommand(Target document, string json)
        {
            return new NpgsqlCommand("mt_upsert_target")
                .AsSproc()
                .WithParameter("id", document.Id)
                .WithJsonParameter("doc", json).WithParameter("arg_date", document.Date, NpgsqlDbType.Date);
        }

        // This Assign() method would use a HiLo sequence generator for numeric Id fields
        public void Assign(Target document)
        {
            if (document.Id == System.Guid.Empty) document.Id = System.Guid.NewGuid();
        }

        public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable documents)
        {
            using (var writer = conn.BeginBinaryImport("COPY mt_doc_target(id, data, date) FROM STDIN BINARY"))
            {
                foreach (var x in documents)
                {
                    writer.StartRow();
                    writer.Write(x.Id, NpgsqlDbType.Uuid);
                    writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
                    writer.Write(x.Date, NpgsqlDbType.Date);
                }
            }
        }
    }
}

Now that you can see what code I’m generating at runtime, let’s move on to a utility for generating the code.

SourceWriter

SourceWriter is a small utility class in Marten that helps you write neatly formatted, indented C# code. SourceWriter wraps a .Net StringWriter for efficient string manipulation and provides some helpers for adding namespace using statements and tracking indention levels for you. After experimenting with some different usages, I mostly settled on using the Write(text) method that allows you to provide a section of code as a multi-line string. The TargetDocumentStorage code I showed above is generated from within a class called DocumentStorageBuilder with a call to the SourceWriter.Write() method shown below:

            writer.Write(
                $@"
BLOCK:public class {mapping.DocumentType.Name}Storage : IDocumentStorage, IBulkLoader<{mapping.DocumentType.Name}>, IdAssignment<{mapping.DocumentType.Name}>

{fields}

BLOCK:public {mapping.DocumentType.Name}Storage({ctorArgs})
{ctorLines}
END

public Type DocumentType => typeof ({mapping.DocumentType.Name});

BLOCK:public NpgsqlCommand UpsertCommand(object document, string json)
return UpsertCommand(({mapping.DocumentType.Name})document, json);
END

BLOCK:public NpgsqlCommand LoaderCommand(object id)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForId(object id)
return new NpgsqlCommand(`delete from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForEntity(object entity)
return DeleteCommandForId((({mapping.DocumentType.Name})entity).{mapping.IdMember.Name});
END

BLOCK:public NpgsqlCommand LoadByArrayCommand(T[] ids)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = ANY(:ids)`).WithParameter(`ids`, ids);
END


BLOCK:public NpgsqlCommand UpsertCommand({mapping.DocumentType.Name} document, string json)
return new NpgsqlCommand(`{mapping.UpsertName}`)
    .AsSproc()
    .WithParameter(`id`, document.{mapping.IdMember.Name})
    .WithJsonParameter(`doc`, json){extraUpsertArguments};
END

BLOCK:public void Assign({mapping.DocumentType.Name} document)
{mapping.IdStrategy.AssignmentBodyCode(mapping.IdMember)}
END

BLOCK:public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable<{mapping.DocumentType.Name}> documents)
BLOCK:using (var writer = conn.BeginBinaryImport(`COPY {mapping.TableName}(id, data{duplicatedFieldsInBulkLoading}) FROM STDIN BINARY`))
BLOCK:foreach (var x in documents)
writer.StartRow();
writer.Write(x.Id, NpgsqlDbType.{id_NpgsqlDbType});
writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
{duplicatedFieldsInBulkLoadingWriter}
END
END
END

END

");
        }

There’s a couple things to note about the code generation above:

String interpolation makes this so much easier than I think it would be with just string.Format(). Thank you to the C# 6 team.
Each line of code is written to the underlying StringWriter with the level of indention added to the left by SourceWriter itself
The “BLOCK” prefix directs SourceWriter to add an opening brace “{” to the next line, then increment the indention level
The “END” text directs SourceWriter to decrement the current indention level, then write a closing brace “}” to the next line and a blank line after that.

Now that we’ve got ourselves some generated code, let’s get Roslyn involved to compile it and actually get at an object of the new Type we want.

Roslyn Compilation with AssemblyGenerator

Based on a blog post by Tugberk Ugurlu, I built the AssemblyGenerator class in Marten shown below that invokes Roslyn to compile C# code and load the new dynamically built Assembly into the application:

public class AssemblyGenerator
{
    private readonly IList _references = new List();

    public AssemblyGenerator()
    {
        ReferenceAssemblyContainingType<object>();
        ReferenceAssembly(typeof (Enumerable).Assembly);
    }

    public void ReferenceAssembly(Assembly assembly)
    {
        _references.Add(MetadataReference.CreateFromFile(assembly.Location));
    }

    public void ReferenceAssemblyContainingType<T>()
    {
        ReferenceAssembly(typeof (T).Assembly);
    }

    public Assembly Generate(string code)
    {
        var assemblyName = Path.GetRandomFileName();
        var syntaxTree = CSharpSyntaxTree.ParseText(code);

        var references = _references.ToArray();
        var compilation = CSharpCompilation.Create(assemblyName, new[] {syntaxTree}, references,
            new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));


        using (var stream = new MemoryStream())
        {
            var result = compilation.Emit(stream);

            if (!result.Success)
            {
                var failures = result.Diagnostics.Where(diagnostic =>
                    diagnostic.IsWarningAsError ||
                    diagnostic.Severity == DiagnosticSeverity.Error);


                var message = failures.Select(x => $"{x.Id}: {x.GetMessage()}").Join("\n");
                throw new InvalidOperationException("Compilation failures!\n\n" + message + "\n\nCode:\n\n" + code);
            }

            stream.Seek(0, SeekOrigin.Begin);
            return Assembly.Load(stream.ToArray());
        }
    }
}

At runtime, you use the AssemblyGenerator class by telling it which other assemblies it should reference and giving it the source code to compile:

// Generate the actual source code
var code = GenerateDocumentStorageCode(mappings);

var generator = new AssemblyGenerator();

// Tell the generator which other assemblies that it should be referencing 
// for the compilation
generator.ReferenceAssembly(Assembly.GetExecutingAssembly());
generator.ReferenceAssemblyContainingType<NpgsqlConnection>();
generator.ReferenceAssemblyContainingType<QueryModel>();
generator.ReferenceAssemblyContainingType<DbCommand>();
generator.ReferenceAssemblyContainingType<Component>();

mappings.Select(x => x.DocumentType.Assembly).Distinct().Each(assem => generator.ReferenceAssembly(assem));

// build the new assembly -- this will blow up if there are any
// compilation errors with the list of errors and the actual code
// as part of the exception message
var assembly = generator.Generate(code);

Finally, once you have the new Assembly, use Reflection just to find the new Type you want by either searching through Assembly.GetExportedTypes() or by name. Once you have the Type object, you can build that object through Activator.CreateInstance(Type) or any of the other normal Reflection mechanisms.

The Warmup Problem

So I’m very happy with using Roslyn in this way so far, but the initial “warmup” time on the very first usage of the compilation is noticeably slow. It’s a one time hit on startup, but this could get annoying when you’re trying to quickly iterate or debug a problem in code by frequently restarting the application. If the warmup problem really is serious in real applications, we may introduce a mode that just lets you export the generated code to file and have that code compiled with the rest of your project for much faster startup times.

9 thoughts on “Using Roslyn for Runtime Code Generation in Marten”

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1967
Pingback: Dew Drop – November 12, 2015 (#2131) | Morning Dew
Cyrus says:

November 12, 2015 at 10:33 pm

In you could go down the Roslyn path even further – if you wanted to for some reason.

For example a using statement in Roslyn – looks like
Syntax.List(Syntax.UsingDirective(name: Syntax.ParseName(“System”)))
https://jacobcarpenter.wordpress.com/2011/10/20/hello-roslyn/

Creating this code
public virtual Class class { get; set; }

Looks like:
var property = SyntaxFactory.PropertyDeclaration(
SyntaxFactory.ParseTypeName(“Class”)
.WithTrailingTrivia(space),
SyntaxFactory.Identifier(“Class”))
.AddModifiers(
SyntaxFactory.Token(SyntaxKind.PublicKeyword)
.WithTrailingTrivia(space)
)
.AddModifiers(
SyntaxFactory.Token(SyntaxKind.VirtualKeyword)
.WithTrailingTrivia(space)
)
.AddAccessorListAccessors(
SyntaxFactory.AccessorDeclaration(SyntaxKind.GetAccessorDeclaration)
.WithSemicolonToken(SyntaxFactory.Token(SyntaxKind.SemicolonToken))
.WithTrailingTrivia(space)
)
.AddAccessorListAccessors(
SyntaxFactory.AccessorDeclaration(SyntaxKind.SetAccessorDeclaration)
.WithSemicolonToken(SyntaxFactory.Token(SyntaxKind.SemicolonToken))
.WithTrailingTrivia(space)
)
.WithLeadingTrivia(twoTabs)
.WithTrailingTrivia(endOfline);

I’m wondering if T4 templates can be called in proc, that could be another way of accomplishing the same thing. Just feed the results of the T4 into the generator.

Kristian says:

November 15, 2015 at 10:56 am

Very interesting! Could you expand a bit on:

* Why you ended up with SourceWriter (and the low overhead lucid syntax) vs. pre-existing tools like T4?

* How bad is the warm-up issue – several seconds?

1. jeremydmiller says:
  
  November 16, 2015 at 9:52 pm
  
  I had honestly forgotten about T4, but even so, I think what I already have is a lot easier to use than T4 would be anyway for simple things like Marten’s generation. Having intenseness inside of the string interpolation goes a long way toward making it usable IMO.
  
  The warm-up? I think it’s consistently about 2 seconds. I’m willing to gamble for now that either the Roslyn team gets that way down or that we can create some reasonable workarounds in our code.
  
Pingback: Szumma #016 – 2015 46. hét | d/fuel
Pingback: Optimizing Marten Part 2 | The Shade Tree Developer
Pingback: Jasper’s Roslyn-Powered “Special Sauce” | The Shade Tree Developer
Pingback: Marten V4 Preview: Linq and Performance | The Shade Tree Developer