Optimizing Marten Part 2

This is an update to an earlier blog post on optimizing for performance in Marten. Marten is a new OSS project I’m working on that allows .Net applications to treat the Postgresql database as a document database. Our hope at work is that Marten will be a more performant and easier to support replacement in our ecosystem for RavenDb (and possibly a replacement event store mechanism inside of the applications that use event sourcing, but that’s going to come later).

Before we should think about using Marten for real, we’re undergoing some efforts to optimize the performance both in reading and writing data from Postgresql.

Optimizing Queries with Indexes

In my previous post, my former colleague Joshua Flanagan suggested using the Postgresql containment operator and gin indexes as part of my performance comparisons. After adding some ability to define database indexes for  Marten document types like this:

public class ContainmentOperator : MartenRegistry
{
    public ContainmentOperator()
    {
        // For persisting a document type called 'Target'
        For<Target>()

            // Use a gin index against the json data field
            .GinIndexJsonData()

            // directs Marten to try to use the containment
            // operator for querying against this document type
            // in the Linq support
            .PropertySearching(PropertySearching.ContainmentOperator);
    }
}

and like this for indexing what we’re calling “searchable” fields where Marten duplicates some element of a document into a separate database column for optimized searching:

public class DateIsSearchable : MartenRegistry
{
    public DateIsSearchable()
    {
        // This can also be done with attributes
        // This automatically adds a "BTree" index
        For<Target>().Searchable(x => x.Date);
    }
}

As of now, when you choose to make a field or property of a document “searchable”, Marten is automatically adding a database index to that column on the document storage table. By default, the index is the standard Postgresql btree index, but you do have the ability to override how the index is created.

Now that we have support for querying using the containment operator and support for defining indexes, I reran the query performance tests and updated the results with some new data:

Serializer: JsonNetSerializer

Query Type 1K 10K 100K
JSON Locator Only 7 77.2 842.4
jsonb_to_record + lateral join 9.4 88.6 1170.4
searching by duplicated field 1 16.4 135.4
searching by containment operator 4.6 14.8 132.4

Serializer: JilSerializer

Query Type 1K 10K 100K
JSON Locator Only 6 54.8 827.8
jsonb_to_record + lateral join 8.6 76.2 1064.2
searching by duplicated field 1 6.8 64
searching by containment operator 4 7.8 66.8

Again, searching by a field that is duplicated as a simple database column with a btree index is clearly the fastest approach. The containment operator plus gin index comes in second, and may be the best choice when you will have to issue many different kinds of queries against the same document type. Based on this data, I think that we’re going to make the containment operator be the preferred way of querying json documents, but fallback to using the json locator approach for all other query operators besides equality tests.

I still think that we have to ship with Newtonsoft.Json as our default json serializer because of F# and polymorphism concerns among other things, but if you can get away with it for your document types, Jil is clearly much faster.

There is some conversation in the Marten Gitter room about possibly adding gin indexes to every document type by default, but I think we first need to pay attention to the data in the next section:

 

Insert Timings

The querying is definitely important, but we certainly want the write side of Marten to be fast too. We’ve had what we call “BulkInsert” support using Npgsql & Postgresql’s facility for bulk copying. Recently, I’ve changed Marten’s internal unit of work class to issue all of its delete and “upsert” commands in one single ADO.Net DbCommand to try to execute multiple sql statements in a single network round trip.

My best friend the Oracle database guru (I’ll know if he reads this because he’ll be groaning about the Oracle part;)) suggested that this approach might not matter against issuing multiple ADO.Net commands against the same stateful transaction and connection, but we were both surprised by how much difference batching the SQL commands turned out to be.

To better understand the impact on insert timing using our bulk insert facility, the new batched update mechanism, and the original “ADO.Net command per document update” approach, I ran a series of tests that tried to insert 500 documents using each technique.

Because we also need to understand the implications on insertion and update timing of using the searchable, duplicated fields and gin indexes (there is some literature in the Postgresql docs stating that gin indexes could be expensive on the write side), I ran each permutation of update strategy against three different indexing strategies on the document storage table:

  1. No indexes whatsoever
  2. A duplicated field with a btree index
  3. Using a gin index against the JSON data column

And again, just for fun, I used both the Newtonsoft.Json and Jil serializers to also understand the impact that they have on performance.

You can find the code I used to make these tables in GitHub in the insert_timing class.

Using Newtonsoft.Json as the Serializer

Index Bulk Insert Batch Update Command per Document
No Index 62 149 244
Duplicated Field w/ Index 53 152 254
Gin Index on Json 96 186 300

 

Using Jil as the Serializer

Index Bulk Insert Batch Update Command per Document
No Index 47 134 224
Duplicated Field w/ Index 57 151 245
Gin Index on Json 79 180 270

As you can clearly see, the new batch update mechanism looks to be a pretty big win for performance over our original, naive “command per document” approach. The only downside is that this technique has a certain ceiling insofar as how many or how large the documents can be before the single command exceeds technical limits. For right now, I think I’d like to simply beat that problem with documentation pushing users to using the bulk insert mechanism for large data sets. In the longer term, we’ll throttle the batch update by paging updates into some to be determined number of document updates at a time.

The key takeaway for me just reinforces the very first lesson I had drilled into me about software performance: network round trips are evil. We are certainly reducing the number of network round trips between our application and the database server by utilizing the command batching.

You can also see that using a gin index slows down the document updates considerably. I think the only good answer to users is that they’ll have to do performance testing as always.

 

Other Optimization Things

  • We’ve been able to cutdown on Reflection hits and dynamic runtime behavior by using Roslyn as a crude metaprogramming mechanism to just codegen the document storage code.
  • Again in the theme of reducing network round trips, we’re going to investigate being able to batch up deferred queries into a single request to the Postgresql database.
  • We’re not sure about the details yet, but we’ll be investigating approaches for using asynchronous projections inside of Postgresql (maybe using Javascript running inside of the database, maybe .Net code in an external system, maybe both approaches).
  • I’m leaving the issues out in http://up-for-grabs.net, but we’ll definitely add the ability to just retrieve the raw JSON so that HTTP endpoints could stream data to clients without having to take the unnecessary hit of deserializing to a .Net type just to immediately serialize right back to JSON for the HTTP response. We’ll also support a completely asynchronous querying and update API for maximum scalability.

 

Using Roslyn for Runtime Code Generation in Marten

I’m using Roslyn to dynamically compile and load assemblies built at runtime from generated code in Marten and other than some concern over the warmup time, it’s been going very well so far.

Like so many other developers with more cleverness than sense, I’ve spent a lot of time trying to build Hollywood Principle style frameworks that try to dynamically call application code at runtime through Reflection or some kind of related mechanism. Reflection itself has traditionally been the easiest mechanism to use in .Net to create dynamic behavior at runtime, but it can be a performance problem, especially if you use it naively.

A Look Back at What Came Before…

Taking my own StructureMap IoC tool as an example, over the years I’ve accomplished dynamic runtime behavior in a couple different ways:

  1. Using IL directly using Reflection.Emit from the original versions through StructureMap 2.5. Working with IL is just barely a higher abstraction than assembly code and I don’t recommend using that if your goal is maintainability or making it easy for other developers to work in your code. I don’t miss generating IL by hand whatsoever. For those of you reading this and saying “pfft, IL isn’t so bad if you just understand how it works…”, my advice to you is to immediately go outside and get some fresh air and sunshine because you clearly aren’t thinking straight.
  2. From StructureMap 2.6 I crudely used the trick of building Expression trees representing what I needed to do, then compiling those Expression trees into objects of the right Func or Action signatures. This approach is easier – at least for me – because the Expression model is much closer semantically to the actual code you’re trying to mimic than the stack-based IL.
  3. From StructureMap 3.* on, there’s a much more complex dynamic Expression compilation model that’s robust enough to call constructor functions, setter properties, thread in interception, and surround all of that with try/catch logic for expressive exception messages and pseudo stack traces.

The current dynamic Expression approach in the StructureMap 3/4 internals is mostly working out well, but I barely remember how it works and it would take me a good day to just to get back into that code if I ever had to change something.

What if instead we could just work directly in plain old C# that we largely know and understand, but somehow get that compiled at runtime instead? Well, thanks to Roslyn and its “compiler as a service”, we now can.

I’ve said before that I want to eventually replace the Expression compilation with the Roslyn code compilation shown in this post, but I’m not sure I’m ambitious enough to mess with a working project.

How Marten uses Roslyn Runtime Generation 

As I explained in my last blog post, Marten generates some “glue code” to connect a document object to the proper ADO.Net command objects for loading, storing, or deleting. For each document class, Marten generates an IDocumentStorage class with this signature:

public interface IDocumentStorage
{
    NpgsqlCommand UpsertCommand(object document, string json);
    NpgsqlCommand LoaderCommand(object id);
    NpgsqlCommand DeleteCommandForId(object id);
    NpgsqlCommand DeleteCommandForEntity(object entity);
    NpgsqlCommand LoadByArrayCommand(TKey[] ids);
    Type DocumentType { get; }
}

In the test library, we have a class I creatively called “Target” that I’ve been using to test how Marten handles various .Net Types and queries. At runtime, Marten generates a class called TargetDocumentStorage that implements the interface above. Part of the generated code — modified by hand to clean up some extraneous line breaks and added comments — is shown below:

using Marten;
using Marten.Linq;
using Marten.Schema;
using Marten.Testing.Fixtures;
using Marten.Util;
using Npgsql;
using NpgsqlTypes;
using Remotion.Linq;
using System;
using System.Collections.Generic;

namespace Marten.GeneratedCode
{
    public class TargetStorage : IDocumentStorage, IBulkLoader, IdAssignment
    {
        public TargetStorage()
        {

        }

        public Type DocumentType => typeof (Target);

        public NpgsqlCommand UpsertCommand(object document, string json)
        {
            return UpsertCommand((Target)document, json);
        }

        public NpgsqlCommand LoaderCommand(object id)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForId(object id)
        {
            return new NpgsqlCommand("delete from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForEntity(object entity)
        {
            return DeleteCommandForId(((Target)entity).Id);
        }

        public NpgsqlCommand LoadByArrayCommand(T[] ids)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = ANY(:ids)").WithParameter("ids", ids);
        }

        // I configured the "Date" field to be a duplicated/searchable field in code
        public NpgsqlCommand UpsertCommand(Target document, string json)
        {
            return new NpgsqlCommand("mt_upsert_target")
                .AsSproc()
                .WithParameter("id", document.Id)
                .WithJsonParameter("doc", json).WithParameter("arg_date", document.Date, NpgsqlDbType.Date);
        }

        // This Assign() method would use a HiLo sequence generator for numeric Id fields
        public void Assign(Target document)
        {
            if (document.Id == System.Guid.Empty) document.Id = System.Guid.NewGuid();
        }

        public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable documents)
        {
            using (var writer = conn.BeginBinaryImport("COPY mt_doc_target(id, data, date) FROM STDIN BINARY"))
            {
                foreach (var x in documents)
                {
                    writer.StartRow();
                    writer.Write(x.Id, NpgsqlDbType.Uuid);
                    writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
                    writer.Write(x.Date, NpgsqlDbType.Date);
                }
            }
        }
    }
}

Now that you can see what code I’m generating at runtime, let’s move on to a utility for generating the code.

SourceWriter

SourceWriter is a small utility class in Marten that helps you write neatly formatted, indented C# code. SourceWriter wraps a .Net StringWriter for efficient string manipulation and provides some helpers for adding namespace using statements and tracking indention levels for you. After experimenting with some different usages, I mostly settled on using the Write(text) method that allows you to provide a section of code as a multi-line string. The TargetDocumentStorage code I showed above is generated from within a class called DocumentStorageBuilder with a call to the SourceWriter.Write() method shown below:

            writer.Write(
                $@"
BLOCK:public class {mapping.DocumentType.Name}Storage : IDocumentStorage, IBulkLoader<{mapping.DocumentType.Name}>, IdAssignment<{mapping.DocumentType.Name}>

{fields}

BLOCK:public {mapping.DocumentType.Name}Storage({ctorArgs})
{ctorLines}
END

public Type DocumentType => typeof ({mapping.DocumentType.Name});

BLOCK:public NpgsqlCommand UpsertCommand(object document, string json)
return UpsertCommand(({mapping.DocumentType.Name})document, json);
END

BLOCK:public NpgsqlCommand LoaderCommand(object id)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForId(object id)
return new NpgsqlCommand(`delete from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForEntity(object entity)
return DeleteCommandForId((({mapping.DocumentType.Name})entity).{mapping.IdMember.Name});
END

BLOCK:public NpgsqlCommand LoadByArrayCommand(T[] ids)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = ANY(:ids)`).WithParameter(`ids`, ids);
END


BLOCK:public NpgsqlCommand UpsertCommand({mapping.DocumentType.Name} document, string json)
return new NpgsqlCommand(`{mapping.UpsertName}`)
    .AsSproc()
    .WithParameter(`id`, document.{mapping.IdMember.Name})
    .WithJsonParameter(`doc`, json){extraUpsertArguments};
END

BLOCK:public void Assign({mapping.DocumentType.Name} document)
{mapping.IdStrategy.AssignmentBodyCode(mapping.IdMember)}
END

BLOCK:public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable<{mapping.DocumentType.Name}> documents)
BLOCK:using (var writer = conn.BeginBinaryImport(`COPY {mapping.TableName}(id, data{duplicatedFieldsInBulkLoading}) FROM STDIN BINARY`))
BLOCK:foreach (var x in documents)
writer.StartRow();
writer.Write(x.Id, NpgsqlDbType.{id_NpgsqlDbType});
writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
{duplicatedFieldsInBulkLoadingWriter}
END
END
END

END

");
        }

There’s a couple things to note about the code generation above:

  • String interpolation makes this so much easier than I think it would be with just string.Format(). Thank you to the C# 6 team.
  • Each line of code is written to the underlying StringWriter with the level of indention added to the left by SourceWriter itself
  • The “BLOCK” prefix directs SourceWriter to add an opening brace “{” to the next line, then increment the indention level
  • The “END” text directs SourceWriter to decrement the current indention level, then write a closing brace “}” to the next line and a blank line after that.

Now that we’ve got ourselves some generated code, let’s get Roslyn involved to compile it and actually get at an object of the new Type we want.

Roslyn Compilation with AssemblyGenerator

Based on a blog post by Tugberk Ugurlu, I built the AssemblyGenerator class in Marten shown below that invokes Roslyn to compile C# code and load the new dynamically built Assembly into the application:

public class AssemblyGenerator
{
    private readonly IList _references = new List();

    public AssemblyGenerator()
    {
        ReferenceAssemblyContainingType<object>();
        ReferenceAssembly(typeof (Enumerable).Assembly);
    }

    public void ReferenceAssembly(Assembly assembly)
    {
        _references.Add(MetadataReference.CreateFromFile(assembly.Location));
    }

    public void ReferenceAssemblyContainingType<T>()
    {
        ReferenceAssembly(typeof (T).Assembly);
    }

    public Assembly Generate(string code)
    {
        var assemblyName = Path.GetRandomFileName();
        var syntaxTree = CSharpSyntaxTree.ParseText(code);

        var references = _references.ToArray();
        var compilation = CSharpCompilation.Create(assemblyName, new[] {syntaxTree}, references,
            new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));


        using (var stream = new MemoryStream())
        {
            var result = compilation.Emit(stream);

            if (!result.Success)
            {
                var failures = result.Diagnostics.Where(diagnostic =>
                    diagnostic.IsWarningAsError ||
                    diagnostic.Severity == DiagnosticSeverity.Error);


                var message = failures.Select(x => $"{x.Id}: {x.GetMessage()}").Join("\n");
                throw new InvalidOperationException("Compilation failures!\n\n" + message + "\n\nCode:\n\n" + code);
            }

            stream.Seek(0, SeekOrigin.Begin);
            return Assembly.Load(stream.ToArray());
        }
    }
}

At runtime, you use the AssemblyGenerator class by telling it which other assemblies it should reference and giving it the source code to compile:

// Generate the actual source code
var code = GenerateDocumentStorageCode(mappings);

var generator = new AssemblyGenerator();

// Tell the generator which other assemblies that it should be referencing 
// for the compilation
generator.ReferenceAssembly(Assembly.GetExecutingAssembly());
generator.ReferenceAssemblyContainingType<NpgsqlConnection>();
generator.ReferenceAssemblyContainingType<QueryModel>();
generator.ReferenceAssemblyContainingType<DbCommand>();
generator.ReferenceAssemblyContainingType<Component>();

mappings.Select(x => x.DocumentType.Assembly).Distinct().Each(assem => generator.ReferenceAssembly(assem));

// build the new assembly -- this will blow up if there are any
// compilation errors with the list of errors and the actual code
// as part of the exception message
var assembly = generator.Generate(code);

Finally, once you have the new Assembly, use Reflection just to find the new Type you want by either searching through Assembly.GetExportedTypes() or by name. Once you have the Type object, you can build that object through Activator.CreateInstance(Type) or any of the other normal Reflection mechanisms.

The Warmup Problem

So I’m very happy with using Roslyn in this way so far, but the initial “warmup” time on the very first usage of the compilation is noticeably slow. It’s a one time hit on startup, but this could get annoying when you’re trying to quickly iterate or debug a problem in code by frequently restarting the application. If the warmup problem really is serious in real applications, we may introduce a mode that just lets you export the generated code to file and have that code compiled with the rest of your project for much faster startup times.

Optimizing for Performance in Marten

For the last couple weeks I’ve been working on a new project called Marten that is meant to exploit Postgresql’s JSONB data as a full fledged document database for .Net development as a drop in replacement for RavenDb in our production environment. I think that I would say that our primary goal with Marten is improved stability and supportability, but maximizing performance and throughput is a very close second in the priority list.

This is my second update on Marten progress. From last week, also see Marten Development So Far.

So far, I’ve mostly been focusing on optimizing the SQL queries generated by the Linq support for faster fetching. I’ve been experimenting with a few different query modes for the SQL generation based on what fields or properties you’re trying to search on:

  1. By default in the absence of any explicit configuration, Marten tries to use the “jsonb_to_record” function with a LATERAL join approach to optimize queries against members on the root of the document.
  2. You can also force Marten to only use basic Postgresql JSON locators to generate the where clauses in the SQL statements
  3. Finally, if you know that your application will be frequently querying a document type against a certain member, Marten can use a “searchable” field such that it duplicates that data in a normal database field and searches directly against that database field. This mechanism will clearly slow down your inserts and take up somewhat more storage space, but the numbers I’m about to display don’t lie, this is very clearly the fastest way to optimize queries using Marten (so far).

I’ve also experimented with both the Newtonsoft.Json serializer and the faster, but less flexible Jil serializer. Again, the numbers are pretty clear that for bigger result sets, Jil is much faster (NetJSON was a complete bust for me when I tried it). So far I’ve been able to keep Marten serializer-agnostic and I can easily see times when you’d have to opt for Newtonsoft’s flexibility.

Default jsonb_to_record/LATERAL JOIN

Using this approach, the SQL generated is:

select d.data from mt_doc_target as d, LATERAL jsonb_to_record(d.data) as l("Date" date) where l."Date" = :arg0

Json Locators Only

While you can configure this behavior on a field by field basis, the quickest way is to just set the default document behavior:

public class JsonLocatorOnly : MartenRegistry
{
    public JsonLocatorOnly()
    {
        // This can also be done with attributes
        For<Target>().PropertySearching(PropertySearching.JSON_Locator_Only);
    }
}

With this setting, the generated SQL is:

select d.data from mt_doc_target as d where CAST(d.data ->> 'Date' as date) = :arg0

Searchable, Duplicated Field

Again, to configure this option, I used this code:

public class DateIsSearchable : MartenRegistry
{
    public DateIsSearchable()
    {
        // This can also be done with attributes
        For<Target>().Searchable(x => x.Date);
    }
}

When I do this, the table for the Target type has an additional field called “date” that will get the value of the Target.Date property every time a Target object is inserted or updated in the database.

The resulting SQL is:

select d.data from mt_doc_target as d where d.date = :arg0

The Performance Results

I created the table below by generating randomized data, then trying to search by a DateTime field using three different mechanisms:

var theDate = DateTime.Today.AddDays(3);
var queryable = session.Query<Target>().Where(x => x.Date == theDate);

In all cases, I used the same sample data for the document count and took an average of running the same query five times after throwing out an initial attempt where Postgresql seemed to be “warming up” the JSONB data.

Serializer: JsonNetSerializer

Query Type 1K 10K 100K 1M
JSON Locator Only 9.6 75.2 691.2 9648
jsonb_to_record + lateral join 10 93.6 922.6 12091.2
searching by duplicated field 2.4 15 169.6 2777.8

Serializer: JilSerializer

Query Type 1K 10K 100K 1M
JSON Locator Only 6.8 61 594.8 7265.6
jsonb_to_record + lateral join 8.4 86.6 784.2 9655.8
searching by duplicated field 1 8.8 115.4 2234.2

To be honest, I expected the JSONB_TO_RECORD + LATERAL JOIN mechanism to be faster than the JSON locator only approach, but I need to go back and try to add some indexes because that’s supposed to be the benefit of using JSONB_TO_RECORD to avoid the object casts that inevitably defeat indexes. I’d be happy to get some Postgresql gurus to weigh in here if there are any reading this.

If you’re curious to see my mechanism for recording this data, see the performance_tuning code file in GitHub.

Bulk Loading Documents

From time to time (testing or data migrations maybe) you’ll have some need to very rapidly load a large set of documents into your database. I added a feature this morning to Marten that exploits Postgresql’s COPY feature supported by Npgsql:

public void load_with_small_batch()
{
    // This is just creating some randomized
    // document data
    var data = Target.GenerateRandomData(100).ToArray();

    // Load all of these into a Marten-ized database
    theSession.BulkLoad(data);

    // And just checking that the data is actually there;)
    theSession.Query<Target>().Count().ShouldBe(data.Length);
    theSession.Load<Target>(data[0].Id).ShouldNotBeNull();
}

Behind the scenes, Marten is using code generation at runtime and compiled by Roslyn to do the bulk loading as efficiently as possible without any hit from using reflection:

public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable documents)
{
    using (var writer = conn.BeginBinaryImport("COPY mt_doc_target(id, data) FROM STDIN BINARY"))
    {
        foreach (var x in documents)
        {
            writer.StartRow();
            writer.Write(x.Id, NpgsqlDbType.Uuid);
            writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
        }
    }
}

Do note that the code generation mechanism is smart enough to also add any fields or properties of the document type that are marked as duplicated for searching.

Other Outstanding Optimization Tasks 

  • Optimize the mechanics for applying all the changes in a unit of work. I’m hoping that we can do something to reduce the number of network round trips between the application and the postgresql server. My fallback approach is going to be to use a custom PLV8 sproc, but not until we exhaust other possibilities with the Npgsql library.
  • I want some mechanism for queuing up queries and submitting them in one network round trip
  • The ability to make a named, reusable Linq query so you can reuse the underlying ADO.Net command generated from parsing the Linq expression without having to go through all the Expression parsing gymnastics on each usage
  • Really more for scalability than performance, but we’ll get around to asynchronous query methods. I’m just not judging that to be a critical path item right now.
  • It’s probably minor in the grand scheme of things, but the actual Linq expression to Sql query generation is grotesque in how it concatenates strings

Feel very free to make suggestions and other feedback on these items;-)

Testing HTTP Handlers with No Web Server in Sight

FubuMVC 2.0 and 3.0 introduced some tooling I called “Scenarios” that allow users to write mostly declarative integration tests against the entire HTTP pipeline in memory without having to host the application in a web server. I promised a coworker that I would write a blog post about using Scenarios for an internal team that wants to start using it much more in their work. A week of procrastination later and here you go:

NOTE: All samples are using FubuMVC 3.0

Why Integration Tests?

From the very beginning, we tried very hard to make unit testing FubuMVC action methods in isolation as easy as possible. I think we largely succeeded in that goal. However, within the context of a handling an HTTP request, FubuMVC like most web frameworks will potentially wrap those action methods with various middleware strategies for cross cutting technical things like authentication, authorization, logging, transaction management, and content negotiation. At some point, to truly exercise an HTTP endpoint you really do need to write an integration test that exercises the entire chain of HTTP handlers for an HTTP request exactly the way it will be configured inside the running application.

Toward that end, I built a class called EndpointDriver in early versions of FubuMVC that you could use to write integration tests against a FubuMVC application hosted with an embedded Katana server. This early tooling just wrapped WebClient with a FubuMVC specific fluent interface for resolving url’s, setting common options like the content-type and accepts headers, and verifying parts of the HTTP response. Below is a sample from our content negotiation support integration tests in FubuMVC 1.3 (“endpoints” is a reference to the EndpointDriver object for the running application):

[Test]
public void force_to_json_with_querystring()
{
    endpoints.Get("conneg/override/Foo?Format=Json", acceptType: "text/html")
        .ContentTypeShouldBe(MimeType.Json)
        .ReadAsJson<OverriddenResponse>()
        .Name.ShouldEqual("Foo");
}

EndpointDriver was fine at first, but our test library started getting slower as we added more and more tests and the fluent interface just never kept up with everything we needed for HTTP testing (plus I think that WebClient is awkward to use).

Using OWIN for HTTP “Scenarios”

As part of my FubuMVC 2.0 effort last year, I knew that I wanted a much better mechanism than the older EndpointDriver for doing integration testing of HTTP endpoints. Specifically, I wanted:

  • To be able to run HTTP requests and verify the response without having to take the performance hit of a web server
  • To run a FubuMVC application as it would be configured in production
  • To completely configure any part of an HTTP request
  • To be able to declaratively express multiple assertions against the expected response
  • To utilize FubuMVC’s support for “reverse URL resolution” for more traceable tests
  • Access to the raw HTTP request and response for anything unusual you would need to do that didn’t have a specific helper

The end result was a mechanism I called “Scenario’s” that exploited FubuMVC’s OWIN support to run HTTP requests in memory using this signature off of the new FubuRuntime object I explained in an earlier blog post:

OwinHttpResponse Scenario(Action<Scenario> configuration)

The Scenario object models both the HTTP request provides a way to specify expectations about the HTTP response for commonly used things like HTTP status codes, header values, and checking for the presence of string values in the HTTP response body. If need be, you also have access to FubuMVC’s abstractions for the entire HTTP request and response (more on this later).

To make this concrete, let’s say that you’re working through a “Hello, World” exercise with FubuMVC with this class and action method that just returns the text “Hello, World” when you issue a GET to the root “/” url of an application:

public class HomeEndpoint
{
    public string Index()
    {
        return "Hello, World";
    }
}

A scenario test for the action above would look like this code below:

using (var runtime = FubuRuntime.Basic())
{
    // Execute the home route and verify
    // the response
    runtime.Scenario(_ =>
    {
        _.Get.Url("/");

        _.StatusCodeShouldBeOk();
        _.ContentShouldBe("Hello, World");
        _.ContentTypeShouldBe("text/plain");
    });
}

In the scenario above, I’m issuing a GET request to the “/” url of the application and specifying that the resulting status code should be HTTP 200, “content-type” response header should be “text/plain”, and the exact contents of the response body should be “Hello, World.” When a Scenario is executed, it will run every single assertion instead of quitting on the first failure and report on every failed expectation in the specification output. This behavior is valuable when you have to author specifications with slower running scenario setup.

Specifying Url’s

FubuMVC has a model for reverse URL lookup from any endpoint method or the input model that we exploited in Scenario’s for traceable tests:

host.Scenario(_ =>
{
    // Specify a GET request to the Url that runs an endpoint method:
    _.Get.Action<InMemoryEndpoint>(e => e.get_memory_hello());

    // Or specify a POST to the Url that would handle an input message:
    _.Post

        // This call serializes the input object to Json using the 
        // application's configured JSON serializer and setting
        // the contents on the Request body
        .Json(new HeaderInput {Key = "Foo", Value1 = "Bar"});

    // Or specify a GET by an input object to get the route parameters
    _.Get.Input(new InMemoryInput { Color = "Red" });
});

I like the reverse url lookup instead of specifying Url’s directly in the scenarios because:

  1. It makes your scenario tests traceable to the actual handling code
  2. It insulates your scenarios from changes to the Url structures later

Checking the Response Body

For the 3.0 work I did a couple months ago, I fleshed out the Scenario support with more mechanisms to analyze the HTTP response body:

host.Scenario(_ =>
{
    // set up a request here

    // Read the response body as text
    var bodyText = _.Response.Body.ReadAsText();

    // Read the response body by deserializing Json
    // into a .net type with the application's
    // configured Json serializer
    var output = _.Response.Body.ReadAsJson<MyResponse>();

    // If you absolutely have to work with Xml...
    var xml = _.Response.Body.ReadAsXml();
});

Some Other Things…

I’ll happily explain the details of this list on request, but here are some other attributes of Scenario’s that FubuMVC supports right now:

  • You can specify expected values for HTTP response headers
  • You can assert on status codes and descriptions
  • There are helpers to send Json or Xml serialized data based on an input object message
  • There is a mechanism that allows you to disable all security middleware in the application for a single Scenario that has been frequently helpful in testing
  • You have access to the underlying IoC container for the running application from the Scenario if you need to resolve and use application services
  • FubuMVC is now StructureMap 4.0-only for its IoC usage, so we’re able to rely on StructureMap’s child container feature to resolve services during a Scenario execution from a unique child container per run. This allows you to replace services in your application with fakes, mocks, and stubs in a way that prevents your fake services from impacting more than one test.

Scenarios in Jasper

If you didn’t see my blog post earlier this year, FubuMVC is getting a complete reboot into a new project called Jasper late this year/early next year. I absolutely plan on bringing the Scenario support forward into Jasper very early, but this time around we’re completely dropping all of FubuMVC’s HTTP abstractions in favor of directly using the OWIN environment dictionary as the single model of HTTP requests and responses. My thought right now is that we’ll invest heavily in extension methods hanging off of IDictionary<string, object> for commonly used operations against that OWIN dictionary.

To some extent, we’re hoping as well that there will be a good ecosystem of OWIN helpers from other people and projects that will be usable from within Jasper.

Other Reading

Marten Development So Far (Postgresql as Doc Db)

Last week I mentioned that I had started a new OSS project called “Marten” that aims to allow .Net developers treat Postgresql 9.5 (we’re using the new “upsert” functionality ) as a document database using Postgresql’s JSONB data type. We’ve already had some interest and feedback on Github and the Gitter room — plus links to at least three other ongoing efforts to do something similar with Postgresql that I’m interpreting as obvious validation for the basic idea.

Please feel very free to chime in on the approach or requirements here or Github or Gitter. We’re going to proceed with this project regardless at work, but I’d love to see it also be a viable community project with input from outside our little development organization.

What’s Already Done

I’d sum up the Marten work as “so far, so good”. If you look closely into the Marten code, do know that I have been purposely standing the functionality with simple mechanics and naive implementations. My philosophy here is to get the functionality up with good test coverage before starting any heavy optimization work.

As of now:

  • Our thought is that the main service facade to Marten is the IDocumentSession interface that very closely mimics the same interface in RavenDb. This work is for my day job at Extend Health, and our immediate goal is to move systems off of RavenDb early next year, so I think that design decision is pretty understandable. That doesn’t mean that that’ll be the only way to interact with Marten in the long run.
  • In the “development mode”, Marten is able to create database tables and an “upsert” stored procedure for any new document type it encounters in calls to the IDocumentSession.
  • The real DocumentSession facade can store documents, load documents by either a single or array of id’s, and delete documents by the same.
  • DocumentSession implements a “unit of work” with similar usage to RavenDb’s.
  • You can completely bypass the Linq provider I’m describing in the next section and just use raw SQL to fetch documents
  • A DocumentCleaner service that you can use to tear down document data or even the schema objects that Marten builds inside of automated testing harnesses

Linq Support

I don’t think I need to make the argument that Marten is going to be more usable and definitely more popular if it has decent Linq support. While I was afraid that building a Linq provider on top of the Postgresql JSON operators was going to be tedious and hard, the easy to use Relinq library has made it just “tedious.”

As early as next week I’m going to start working over the Linq support and the SQL it generates to try to optimize searching.

The Linq support hangs off of the IDocumentSession.Query<T>() method like so:

        public void query()
        {
            theSession.Store(new Target{Number = 1, DateOffset = DateTimeOffset.Now.AddMinutes(5)});
            theSession.Store(new Target{Number = 2, DateOffset = DateTimeOffset.Now.AddDays(1)});
            theSession.Store(new Target{Number = 3, DateOffset = DateTimeOffset.Now.AddHours(1)});
            theSession.Store(new Target{Number = 4, DateOffset = DateTimeOffset.Now.AddHours(-2)});
            theSession.Store(new Target{Number = 5, DateOffset = DateTimeOffset.Now.AddHours(-3)});

            theSession.SaveChanges();

            theSession.Query<Target>()
                .Where(x => x.DateOffset > DateTimeOffset.Now).ToArray()
                .Select(x => x.Number)
                .ShouldHaveTheSameElementsAs(1, 2, 3);
        }

For right now, the Linq IQueryable support includes:

  • IQueryable.Where() support with strings, int’s, long’s, decimal’s, DateTime’s, enumeration values, and boolean types.
  • Multiple or chained Where().Where().Where() clauses like you might use when you’re calculating optional where clauses or letting multiple pieces of code add additional filters
  • “&&” and “||” operators in the Where() clauses
  • Deep nested properties in the Where() clauses like x.Address.City == “Austin”
  • First(), FirstOrDefault(), Single(), and SingleOrDefault() support for the IQueryable
  • Count() and Any() support
  • Contains(), StartsWith(), and EndsWith() support for string values — but it’s case sensitive right now. Case-insensitive searches are probably going to be an “up-for-grabs” task;)
  • Take() and Skip() support for paging
  • OrderBy() / ThenBy() / OrderByDescending() support

Right now, I’m using my audit of our largest system at work that uses RavenDb to guide and prioritize the Linq support. The only thing missing for us is searching within child collections of a document.

What we’re missing right now is:

  • Projections via IQueryable.Select(). Right now you have to do IQueryable.ToArray() to force the documents into memory before trying to use Select() projections.
  • Last() and LastOrDefault()
  • A lot of things I probably hadn’t thought about at all;-)

Using Roslyn for Runtime Code Compilation

We’ll see if this turns out to be a good idea or not, but as of today Marten is using Roslyn to generate strategy classes that “know” how to build database commands for updating, deleting, and loading document data for each document type instead of using Reflection or IL emitting or compiling Expression’s on the fly. Other than the “warm up” performance hit on doing the very first compilation, this is working smoothly so far. We’ll be watching it for performance. I’ll blog about that separately sometime soon-ish.

Next Week: Get Some Data and Optimize!

My focus for Marten development next week is on getting a non-trivial database together and working on pure optimization. My thought is to grab data from Github using Ocktokit.Net to build a semi-realistic document database of users, repositories, and commits from all my other OSS projects. After that, I’m going to try out:

  • Using GIN indexes against the jsonb data to see how that works
  • Trying to selectively duplicate data into normal database fields for lightweight sql searches and indexes
  • Trying to use Postgresql’s jsonb_to_record functionality inside of the Linq support to see if that makes searches faster
  • I’m using Newtonsoft.Json as the JSON serializer right now thinking that I’d want the extra flexibility later, but I want to try out Jil too for the comparison
  • After the SQL generation settles down, try to clean up the naive string concatenation going on inside of the Linq support
  • Optimize the batch updates through DocumentSession.SaveChanges(). Today it’s just making individual sql commands in one transaction. For some optimization, I’d like to at least try to make the updates happen in fewer remote calls to the database. My fallback plan is to use a *gasp* stored procedure using postgresql’s PLV8 javascript support to take any number of document updates or deletions as a single json payload.

That list above is enough to keep me busy next week, but there’s more in the open Github issue list and we’re all ears about whatever we’ve missed, so feel free to add more feature requests or comment on existing issues.

Why “Marten?”

One of my colleagues was sneering at the name I was using, so I googled for “natural predators of ravens” and the marten was one of the few options, so we ran with it.

My .Net Unboxed 2015 Wrapup

I had a blast this week at the .Net Unboxed conference in Dallas. The content and speaker lineup was good, the vibe was great, the venue and location was great, and it was remarkably well organized. My hat is completely off to the organizers and I sincerely hope they’re up for doing this again next year.

For my part, I thought my Storyteller 3 talk went well and I was thrilled with the interest and questions I got about it later. I’ll definitely be posting a link to the recording when that’s posted.

Some thoughts and highlights in no particular order:

  • Strong naming in .Net continues to be a major source of angst and frustration for those of us heavily involved in OSS or simply wanting to consume OSS projects. Now that Nuget makes it somewhat easier to push out incremental releases and bug fix releases, strong naming is causing more and more headaches. I enjoyed my conversations with Daniel Plaisted of Microsoft who for the very first time has convinced me that anybody in Redmond understands how much trouble strong naming is causing. I’m still iffy on having to take on the overhead of ilrepack/ilmerge in publishing or doing the “Newtonsoft Lie to Your Users” version strategy. I think that CoreCLR’s much looser usage of strong naming might very well be enough reason for us as a community to hurry up and get our code up on the new runtime. In the meantime, I’ll be closely following the new Strongnamer project as a possible way to eliminate some of the pain for non-CoreCLR packages.
  • I’m not buying that DNX is going to be usable until later next year. I think, and conversations this week reinforced this idea, that I very much like the ASP.Net team’s general vision for vNext, but they’ve just bitten off more than they can handle.
  • Nik Molnar gave me a compliment about my UI work on Storyteller that made my day since I’m infamously bad at UI design and layout. I’m pretty sure he followed that up with “but, [something negative]” but I didn’t pay any attention to that part;)
  • I started to get a little irritated during one talk and wanted to start arguing with the speaker, so I quietly snuck out and went to the other ongoing talk *just* in time to hear Jimmy Bogard telling folks how he made a mistake by copying the old static ObjectFactory idea from StructureMap. I’ve apologized in public dozens of times on that one and I’m sure I’ll have to do it plenty more times. Sigh.
  • I did enjoy Jimmy’s talk on his experiences running OSS projects and appreciated his candor about the earlier decisions and approaches that didn’t necessarily work out. For my money, many of the best conference talks are about lessons learned from mistakes and fixing problems.
  • I definitely appreciated the lack of “let me tell you how wonderful I am and can I have an MVP award now?” talks that so frequently pop up in many .Net-centric “eyes-forward” conferences. I love how interactive the talks were and how engaged the audiences were in asking questions. I especially enjoy it when talks seem to be just a way of jumpstarting conversations.
  • I’ve thought for over a year that the forthcoming “K”/DNX work from Redmond was probably going to suck all the oxygen out of the room for alternative frameworks and I think you’ve definitely seen that happen. On a much more positive note, I think that we might see a resurgence of those things next year as we get to start taking advantage of the improvements to the .Net framework. More and more, I’m hearing about folks treating DNX as almost a reset for .Net OSS, and that might not be a terrible thing.
  • I enjoyed the talk on Falcor.Net and I’m very interested in Falcor in general as a possibly easier – or at least less weird – approach than GraphQL for React.js client to server communication.

 

 

 

Postgresql as a Document Db for .Net Development

I’m one of those guys who normally doesn’t like to talk much about new OSS projects until there’s a lot to show, but just for fun this time, I’m gonna talk about something that I’ve just barely started in the hopes of getting some feedback and because there’s already been some interest from outside my company. Besides, it’s not like the ways I’ve ran OSS projects in the past have been all that successful anyway.

We use RavenDb at work in several projects, and while I still think there are some great features and attributes in RavenDb for easy development, it hasn’t held up very well in production usage and we want to replace it next year. I’ve gotten to spend some time over the past couple weeks laying out the skeleton of a new project on GitHub we’re calling “Marten” that will in theory allow us to treat Postgresql as a document database for .Net development.

We want to keep what we see as the advantages of RavenDb:

  • Schema-less development based on our objects without any kind of ORM mapping or limitations on object structure
  • The ability to quickly get a clean database per automated test for reliable testing
  • Linq support — I’ve already gotten some Linq support for basic operators using Re-linq and I’ve been pleasantly surprised at how well that went.
  • Batched updates and the built in unit of work — my working theory is to use DbDataAdapter’s to rig up batched updates
  • Defered and/or batched queries — at least one of our apps is getting killed by network chattiness, so this is going to be a pretty high priority

In the end, what we’d really like to have is all the development advantages of RavenDb and document databases, but have full ACID support, all the DevOps tooling that already exists around Postgresql, and sit on top of a proven database engine.

Roadmap and Contributing

I’ve done enough spiking and proof of concept type work to feel like this is viable — pending performance testing down the road of course. I spent this morning trying to write up my thoughts on where we should go with thing into the GitHub issue list mostly as a way to start a detailed conversation about what this thing should be and where it’s going to go. If you’ve got any opinions, we’d love to hear them either on individual issues or in the Gitter room.

Roughly speaking, the features we’re thinking about are:

  • Support basic document saving and retrieval through a new IDocumentSession service facade purposely modeled after RavenDb’s
  • Basic Linq support against documents
  • The ability to bypass Linq and provide the raw SQL yourself when necessary (already working)
  • Schema creation and migration support for deployments
  • Read side/view projections in the database?
  • Some way to define and use indexes in queries
  • For lack of a better term, “Stored Procedures” that let you generate SQL queries from a convoluted Linq expression once and reuse across requests
  • Maybe make this thing a plugin or separate provider for EF7. I’m not sure there’s a technical reason to do that yet, you you know it’d make a lot more people interested in this thing

Maybe just as a vanity project for my satisfaction, but also build an EventStore capability including user-defined projects into Marten using Postgresql’s ability to embed Javascript.

If you have any interest in contributing or following this thing, hit us up in the Gitter room or start weighing in on GitHub issues.

Marten in Action

To see the itty bit that’s done so far in action, say that you have a .Net type representing a document like this one from my test project:

    // The IDocument interface is just a temporary crutch
    // for now. It won't be necessary in the end
    public class User : IDocument
    {
        public User()
        {
            Id = Guid.NewGuid();
        }

        public Guid Id { get; set; }

        public string FirstName { get; set; }
        public string LastName { get; set; }

        public string FullName
        {
            get { return "{0} {1}".ToFormat(FirstName, LastName); }
        }
    }

Starting from a blank Postgresql 9.5 schema (because we’re already depending on the new “upsert” capabilities) with Marten’s version of IDocumentSession, I’ll create a new User object, save it, then load a new copy of it from the database by its id:

        public void persist_and_reload_a_document()
        {
            var user = new User { FirstName = "James", LastName = "Worthy" };

            // theSession is Marten's IDocumentSession service
            theSession.Store(user);
            theSession.SaveChanges();

            // Marten is NOT coupled to StructureMap, but
            // I found it convenient to use StructureMap for object assembly
            // in the tests
            using (var session2 = theContainer.GetInstance<IDocumentSession>())
            {
                session2.ShouldNotBeSameAs(theSession);

                var user2 = session2.Load<User>(user.Id);

                user.ShouldNotBeSameAs(user2);
                user2.FirstName.ShouldBe(user.FirstName);
                user2.LastName.ShouldBe(user.LastName);
            }
        }

Behind the scenes, Marten sees that it doesn’t have a preexisting table to store User documents, so it quietly makes us one like this:

CREATE TABLE public.mt_doc_user
(
  id uuid NOT NULL,
  data jsonb NOT NULL,
  CONSTRAINT pk_mt_doc_user PRIMARY KEY (id)
)

Right now, we’re only adding an Id field as the primary key and a second JSONB field to hold the actual document representation. Later on we’ll probably add timestamps, version numbers, or duplicate selected fields in the document structure for more efficient querying and indexing.

Why didn’t you use…

Because a flood of “why not Y” questions inevitably follow any statement of “we chose X”:

  • I’ve seen too many stories about MongoDb losing data and Postgresql v. MongoDb performance comparisons.
  • SimpleDb does look cool, but I’m not a huge fan of their query language and for some crazy reason, our organization (including me) is suddenly being very conservative about trying newer databases.
  • I need to do more research on Kafka before I can answer that one
  • I really don’t want to have to fall back all the way to developing applications primarily on an RDBMS. I’ve had enough of heavy ORM’s, the only somewhat more palatable micro-ORM’s, and writing procedural code using raw tabular data.

Other Reading

The name “Marten” has already stuck in conversations at work, so we’re keeping it for now. Besides, look how cute martens are:

marten6

Storyteller 3: Executable Specifications and Living Documentation for .Net

tl;dr: The open source Storyteller 3 is an all new version of an old tool that my shop (and others) use for customer facing acceptance tests, large scale test automation, and “living documentation” generation for code-centric systems.

A week from today I’m giving a talk at .Net Unboxed on Storyteller 3, an open source tool largely built by myself and my colleagues for creating, running, and managing Executable Specifications against .Net projects based on what we feel are the best practices for automated testing based on over a decade of working with automated integration testing. As I’ll try to argue in my talk and subsequent blog posts, the most complete approach in the .Net ecosystem for reliably and economically writing large scale automated integration tests.

My company and a couple other early adopters have been using Storyteller for daily work since June and the feedback has been pleasantly positive so far. Now is as good of time as any to make a public beta release for the express purpose of getting more feedback on the tool so we can continue to improve the tool prior to an official 3.0 release in January.

If you’re interested in kicking the tires on Storyteller, the latest beta as of now is 3.0.0.279-alpha available on Nuget.org. For help getting started, see our tutorial and getting started pages.

Some highlights:

It’s improved a lot since then, but I gave a talk at work in March previewing Storyteller 3 that at least discusses the goals and philosophy behind the tool and Storyteller’s approach to acceptance tests and integration tests.

A Brief History

I had a great time at Codemash this year catching up with old friends. I was pleasantly surprised when I was there to be asked several times about the state of Storyteller, an OSS project others had originally built in 2008 as a replacement for FitNesse as our primary means of expressing and executing automated customer facing acceptance tests. Frankly, I always thought that Storyteller 1 and the incrementally better Storyteller 2 were failures in terms of usability and I was so burnt out on working with it that I had largely given up on it and ignored it for years.

Unfortunately, my shop has a large investment in Storyteller tests and our largest and most active project was suffering with heinously slow and unreliable Storyteller regression test suites that probably caused more harm than good with their support costs. After a big town hall meeting to decide whether to scrap and replace Storyteller with something else, we instead decided to try to improve Storyteller to avoid having to rewrite all of our tests. The result has been an effective rewrite of Storyteller with an all new client. While trying very hard to mostly preserve backward compatibility with the previous version in its public API’s, the .Net engine is also a near rewrite in order to squeeze out as much performance and responsiveness as we could.

Roadmap

The official 3.0 release is going to happen in early January to give us a chance to possibly get more early user feedback and maybe to get some more improvements in place. You can see the currently open issue list on GitHub. The biggest things outstanding on our roadmap are:

  • Modernize the client technology to React.js v14 and introduce Redux and possibly RxJS as a precursor to doing any big improvements to the user interface and trying to improve the performance of the user interface with big specification suites
  • A “step through” mode in the interactive specification running so users can step through a specification like you would in a debugger
  • The big one, allow users to author the actual specification language in the user interface editor with some mechanics to attach that language to actual test support code later

Thoughts on Running an OSS Project

One of my favorite development events is the every couple years Pablo’s Fiesta open spaces in Austin. However, I have to roll my eyes pretty hard every single time when one of the celebrity programmers here in town submits a session on running an OSS project for the express purpose of telling everyone exactly how great he is as an OSS lead.

While I’ve been heavily involved in OSS for years as the primary author and technical lead of StructureMapFubuMVC / Jasper, and the recently rebooted Storyteller, I am certainly not the awesome OSS leader that the celebrity programmer above purports to be at every single Pablo’s Fiesta.

In my own OSS efforts I’ve:

  • Taken far too long to answer user questions or completely missed questions on user lists and emails
  • Probably been a little too impatient and testy with other developers a little too frequently
  • Left pull requests to rot without decent feedback
  • Consistently failed to write adequately useful docs and getting started tutorials
  • Never been terribly successful in building community around OSS projects

But hey, there’s oodles of conference talks and blog posts about being awesome at OSS, so how about instead some frank thoughts from a guy whose made a lot of mistakes at running OSS projects.

Bugs from Users

Let’s just work under the assumption that bugs are going to be reported from your users. My experience is that even when I’m doing my best to ratchet up the test coverage and software engineering discipline on my work enough users can always find scenarios that I didn’t anticipate or adequately accommodate in the code.

In a way, getting bug reports is a good thing because it means you actually have interested users and it’s frequently useful feedback. Because you’re frequently dealing with users remotely, I think the biggest challenge many times is to make sure that you fully understand the exact problem that the user is encountering. At one extreme, it’s becoming common for me to get bug reports in StructureMap that come with failing unit tests or example code that demonstrates exactly what the problem is and how to reproduce it. My cutesy saying to describe that is:

Blessed are those who attach failing unit tests to their bug reports, for their issues shall be addressed first

Even if you don’t get concrete examples from users, I think it’s valuable to try to do that yourself and get whoever reported the bug to look at your reproduction steps.

A couple other things about bugs:

  • I try not to close issues in GitHub from other people if there’s any question about whether or not a fix resolved their problems, but I’ll still flush out old issues that have gone dormant
  • Try to pin down every reported bug with an automated test of some kind that runs in your CI build to avoid regression problems. Bugs tend to be non-obvious edge cases, so it’s even more important than most code to have some kind of test coverage. You can see the result of that philosophy here in the StructureMap testing library.
  • This is a long term play, but I’m hoping that I’ll be able to embed diagnostics in my tools that users could use to export some kind of data describing their system to make my life a lot easier when I’m trying to diagnose problems with nothing but a random stack trace.

Taking Pull Requests

I’ve been getting some great pull requests to StructureMap recently for performance and some big features that I knew other people have wanted in the past but I frankly didn’t want to build. That’s been great, but in the past I’ve also brought in some pull requests that have come back to haunt me through support problems and structural problems in the code.

Some thoughts on pull requests:

  • Do a much better job than I do about giving feedback to submitters at least to say that “got it, thank you, I’ll try to get to this by…” 😉
  • And I’m sorry, but I cannot and should not take in a pull request with no tests if it impacts the code. I’m sure that your code did work when you used it, but tests are also there to demonstrate to other users how it should work and to keep me or yet more pull requests from breaking your code later
  • As much as you’d like to make everybody happy who takes the time to submit pull requests, you cannot automatically take in pull requests with spotty quality because at the end of the day you’re the one responsible. Take Harry S. Truman’s philosophy of “the buck stops here” in regards to pull requests.
  • Don’t be too quick to take in pull requests because “that looks fine to me.” I’ve been burned several times by not thinking through the implications and edge cases that get introduced by a pull request — especially when it seems well coded with tests and everything. The author of the pull request may be thinking tactically about his or her immediate problem, but you have to take the strategic view of that code change.
  • If a user is going to submit a pull request that makes a substantial change in direction or internals, I’d rather know about it early to avoid any hard feelings and wasted time on their part if you don’t want to take it in.
  • Go easy on stylistic elements of the code like naming and formatting, but my experience has been that other developers have been pretty reasonable when they get timely feedback about a pull request. I will from time to time take in a pull request that has some problems and just address those myself quietly off to the side. You have to walk a line between maintaining an acceptable level of quality and consistency within the codebase and not making it too much trouble for other folks to contribute.

Push vs Pull Features

While I can point to exceptions to this rule, I’ve consistently found that features built for a demonstrated need or use case on my project or something requested from other users have been more successful than features that started from “wouldn’t it be cool if…”

Play the Long Game

In my strong opinion and experience, the most reliable way to achieve high quality and great usability is to iterate over time in response to feedback and the problems encountered along the way. If you want your project to be good you better be ready to have a long enough attention span to improve it over time by responding to the problems that pop up and never being complacent about your current approach.

My 11-12 years and counting involvement with StructureMap is an extreme case. Maybe more relevant is my old Storyteller tool for acceptance test driven development in .Net. The first two versions of Storyteller have some severe usability problems and more friction in its usage than I care to admit. Later this month I’m going to do my first public presentation on the new Storyteller 3.0 version that’s vastly better so far in daily usage as a direct result of paying attention to all the lessons we learned in using and building it over 5-6 years.

Don’t be afraid to jettison old features that no longer make sense or you no longer want to support. My rule of thumb on StructureMap has been that any feature that makes me cringe when someone asks how to use it because I just know this is going to be trouble has to go in the next release.

Dealing with Angry Users

Let’s face it, software developers are not particularly known for having great interpersonal skills and primarily interacting through online mediums lets some of the worst behaviors leak through that would probably never happen in person. If you publish an OSS tool of any complexity it’s not unlikely that you’re going to get to deal with irate users at the end of their rope. You may be saying to yourself that your tool’s usage is so obvious that there won’t be any problem and I’m going to tell you that perfectly intelligent developers come from a lot of different backgrounds, think differently than you, and absolutely will be confused by something that’s obvious to you.

All I can tell you is to:

  • Remember that they’re probably not at their best right now and it’s almost a stereotype that developers who are unquestionably assholes online can easily be calm, pleasant people in real life
  • Not take it too personally and remember that even if it turns out to be your fault, you can’t be expected to be omniscient and cut yourself some slack
  • Really don’t worry too much about your version of “if this isn’t fixed right now I’m going to switch to AutoFac!” because that person is probably someone you just don’t need to be interacting with anyway. My internal response is usually that I’d be happy to dump that user onto someone else’s user list or gitter room;-)

I know I’ll get yelled at for this one in comments or Twitter, but my consistent experience is that the more strident and angry a developer is being online the more likely it is that they’re doing something stupid.

Ironically, some of the worst bugs and problems I’ve had uncovered by users have come from people that were very reasonable and patient. I suspect that might be because it’s just easier to communicate without the venom flying. I also know that I do try harder to help out folks that are being polite and patient.

Should I Follow My Project on Twitter or Stackoverflow?

I think it really depends on how thick your skin is and how important the stewardship of an OSS tool is to your career. I’ve gone both ways with Twitter, but I’m back to following references to StructureMap at least just to understand what people are saying about it and occasionally to lend a hand. At other times I’ve had to ignore Twitter because developers as a whole tend to be assholes on Twitter safely behind their cutesy little cartoon avatars and the online snark and griping was just too aggravating. I guess my advice is to make sure that you’re decoupling your mental and emotional well-being from any kind of online negativity from other people. You also might remember that developers probably tweet much more when they’re angry or frustrated.

No matter how competitive you happen to be, don’t you dare let it ruin your weekend when folks are extolling the benefits of a competitor tool instead of yours.

As for Stackoverflow, I have to admit that I purposely avoid following my tools on Stackoverflow because I just can’t handle the stress of trying to deal with all of the deluge. Granted, some of that is because StructureMap questions more frequently verge into needing to give software design and architectural advice more than answering simple API usage type of questions. I do try to stay on top of questions that are more or less directed to me from mailing lists, Gitter rooms, or Twitter.

If you’re making a big career bet on some kind of OSS tool, I think you’d better watch Stackoverflow just to understand what the usability problems are with your tool — and sadly enough, you may need to combat misinformation about your tool from other people.

Building an Awesome Community Around Your Project

Y’all will have to fill this section in yourself in the comments because I’ve got nothing.

Exploiting Generic Types with StructureMap

I’ve got a short window at work that I’m using to try to finally fill in the holes in StructureMap documentation. Everything I’m showing here is old, but some of it I’ve never documented or written about before and the usage of generic types has spawned a lot of questions on the StructureMap list over the years. 

The content of this blog post looks a lot better in the actual StructureMap documentation site here.

Example 1: Visualizing an Activity Log

I worked years ago on a system that could be used to record and resolve customer support problems. Since it was very workflow heavy in its logic, we tracked user and system activity as an event stream of small objects that reflected all the different actions or state changes that could happen to an issue. To render and visualize the activity log to HTML, we used many of the open generic type capabilities shown in this topic to find and apply the correct HTML rendering strategy for each type of log object in an activity stream.

Given a log object, we wanted to look up the right visualizer strategy to render that type of log object to html on the server side.

To start, we had an interface like this one that we were going to use to get the HTML for each log object:

    public interface ILogVisualizer
    {
        // If we already know what the type of log we have
        string ToHtml<TLog>(TLog log);

        // If we only know that we have a log object
        string ToHtml(object log);
    }

So for an example, if we already knew that we had an IssueCreated object, we should be able to use StructureMap like this:

            // Just setting up a Container and ILogVisualizer
            var container = Container.For<VisualizationRegistry>();
            var visualizer = container.GetInstance<ILogVisualizer>();

            // If I have an IssueCreated lob object...
            var created = new IssueCreated();

            // I can get the html representation:
            var html = visualizer.ToHtml(created);

If we had an array of log objects, but we do not already know the specific types, we can still use the more generic ToHtml(object) method like this:

            var logs = new object[]
            {
                new IssueCreated(), 
                new TaskAssigned(), 
                new Comment(), 
                new IssueResolved()
            };

            // SAMPLE: using-visualizer-knowning-the-type   
            // Just setting up a Container and ILogVisualizer
            var container = Container.For<VisualizationRegistry>();
            var visualizer = container.GetInstance<ILogVisualizer>();

            var items = logs.Select(visualizer.ToHtml);
            var html = string.Join("<hr />", items);

The next step is to create a way to identify the visualization strategy for a single type of log object. We certainly could have done this with a giant switch statement, but we wanted some extensibility for new types of activity log objects and even customer specific log types that would never, ever be in the main codebase. We settled on an interface like the one shown below that would be responsible for rendering a particular type of log object (“T” in the type):

    public interface IVisualizer<TLog>
    {
        string ToHtml(TLog log);
    }

Inside of the concrete implementation of ILogVisualizer we need to be able to pull out and use the correct IVisualizer<T> strategy for a log type. We of course used a StructureMap Container to do the resolution and lookup, so now we also need to be able to register all the log visualization strategies in some easy way. On top of that, many of the log types were simple and could just as easily be rendered with a simple html strategy like this class:

    public class DefaultVisualizer<TLog> : IVisualizer<TLog>
    {
        public string ToHtml(TLog log)
        {
            return string.Format("
{0}
"
, log); } }

Inside of our StructureMap usage, if we don’t have a specific visualizer for a given log type, we’d just like to fallback to the default visualizer and proceed.

Alright, now that we have a real world problem, let’s proceed to the mechanics of the solution.

Registering Open Generic Types

Let’s say to begin with all we want to do is to always use the DefaultVisualizer for each log type. We can do that with code like this below:

        [Test]
        public void register_open_generic_type()
        {
            var container = new Container(_ =>
            {
                _.For(typeof (IVisualizer<>)).Use(typeof (DefaultVisualizer<>));
            });

            
            Debug.WriteLine(container.WhatDoIHave(@namespace:"StructureMap.Testing.Acceptance.Visualization"));
            

            container.GetInstance<IVisualizer<IssueCreated>>()
                .ShouldBeOfType<DefaultVisualizer<IssueCreated>>();

            Debug.WriteLine(container.WhatDoIHave(@namespace: "StructureMap.Testing.Acceptance.Visualization"));
            


            container.GetInstance<IVisualizer<IssueResolved>>()
                .ShouldBeOfType<DefaultVisualizer<IssueResolved>>();
        }

With the configuration above, there are no specific registrations for IVisualizer<IssueCreated>. At the first request for that interface, StructureMap will run through its “missing family policies“, one of which is to try to find registrations for an open generic type that could be closed to make a valid registration for the requested type. In the case above, StructureMap sees that it has registrations for the open generic type IVisualizer<T> that could be used to create registrations for the closed type IVisualizer<IssueCreated>.

Using the WhatDoIHave() diagnostics, the original state of the container for the visualization namespace is:

===========================================================================================================================
PluginType            Namespace                                         Lifecycle     Description                 Name     
---------------------------------------------------------------------------------------------------------------------------
IVisualizer<TLog>     StructureMap.Testing.Acceptance.Visualization     Transient     DefaultVisualizer<TLog>     (Default)
===========================================================================================================================

After making a request for IVisualizer<IssueCreated>, the new state is:

====================================================================================================================================================================================
PluginType                    Namespace                                         Lifecycle     Description                                                                  Name     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<IssueCreated>     StructureMap.Testing.Acceptance.Visualization     Transient     DefaultVisualizer<IssueCreated> ('548b4256-a7aa-46a3-8072-bd8ef0c5c430')     (Default)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<TLog>             StructureMap.Testing.Acceptance.Visualization     Transient     DefaultVisualizer<TLog>                                                      (Default)
====================================================================================================================================================================================

Generic Registrations and Default Fallbacks

A powerful feature of generic type support in StructureMap is the ability to register specific handlers for some types, but allow users to register a “fallback” registration otherwise. In the case of the visualization, some types of log objects may justify some special HTML rendering while others can happily be rendered with the default visualization strategy. This behavior is demonstrated by the following code sample:

        [Test]
        public void generic_defaults()
        {
            var container = new Container(_ =>
            {
                // The default visualizer just like we did above
                _.For(typeof(IVisualizer<>)).Use(typeof(DefaultVisualizer<>));

                // Register a specific visualizer for IssueCreated
                _.For<IVisualizer<IssueCreated>>().Use<IssueCreatedVisualizer>();
            });


            // We have a specific visualizer for IssueCreated
            container.GetInstance<IVisualizer<IssueCreated>>()
                .ShouldBeOfType<IssueCreatedVisualizer>();

            // We do not have any special visualizer for TaskAssigned,
            // so fall back to the DefaultVisualizer<T>
            container.GetInstance<IVisualizer<TaskAssigned>>()
                .ShouldBeOfType<DefaultVisualizer<TaskAssigned>>();
        }

Connecting Generic Implementations with Type Scanning

It’s generally harmful in software projects to have a single code file that has to be frequently edited to for unrelated changes, and StructureMap Registry classes that explicitly configure services can easily fall into that category. Using type scanning registration can help teams avoid that problem altogether by eliminating the need to make any explict registrations as new providers are added to the codebase.

For this example, I have two special visualizers for the IssueCreated and IssueResolved log types:

    public class IssueCreatedVisualizer : IVisualizer<IssueCreated>
    {
        public string ToHtml(IssueCreated log)
        {
            return "special html for an issue being created";
        }
    }

    public class IssueResolvedVisualizer : IVisualizer<IssueResolved>
    {
        public string ToHtml(IssueResolved log)
        {
            return "special html for issue resolved";
        }
    }

In the real project that inspired this example, we had many, many more types of log visualizer strategies and it could have easily been very tedious to manually register all the different little IVisualizer<T> strategy types in a Registry class by hand. Fortunately, part of StructureMap’s type scanning support is the ConnectImplementationsToTypesClosing()auto-registration mechanism via generic templates for exactly this kind of scenario.

In the sample below, I’ve set up a type scanning operation that will register any concrete type in the Assembly that contains the VisualizationRegistry that closes IVisualizer<T> against the proper interface:

    public class VisualizationRegistry : Registry
    {
        public VisualizationRegistry()
        {
            // The main ILogVisualizer service
            For<ILogVisualizer>().Use<LogVisualizer>();

            // A default, fallback visualizer
            For(typeof(IVisualizer<>)).Use(typeof(DefaultVisualizer<>));

            // Auto-register all concrete types that "close"
            // IVisualizer<TLog>
            Scan(x =>
            {
                x.TheCallingAssembly();
                x.ConnectImplementationsToTypesClosing(typeof(IVisualizer<>));
            });

        }
    }

If we create a Container based on the configuration above, we can see that the type scanning operation picks up the specific visualizers for IssueCreated and IssueResolved as shown in the diagnostic view below:

==================================================================================================================================================================================
PluginType                     Namespace                                         Lifecycle     Description                                                               Name     
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ILogVisualizer                 StructureMap.Testing.Acceptance.Visualization     Transient     StructureMap.Testing.Acceptance.Visualization.LogVisualizer               (Default)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<IssueResolved>     StructureMap.Testing.Acceptance.Visualization     Transient     StructureMap.Testing.Acceptance.Visualization.IssueResolvedVisualizer     (Default)
                                                                                 Transient     DefaultVisualizer<IssueResolved>                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<IssueCreated>      StructureMap.Testing.Acceptance.Visualization     Transient     StructureMap.Testing.Acceptance.Visualization.IssueCreatedVisualizer      (Default)
                                                                                 Transient     DefaultVisualizer<IssueCreated>                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<TLog>              StructureMap.Testing.Acceptance.Visualization     Transient     DefaultVisualizer<TLog>                                                   (Default)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IVisualizer<TLog>              StructureMap.Testing.Acceptance.Visualization     Transient     DefaultVisualizer<TLog>                                                   (Default)
==================================================================================================================================================================================

The following sample shows the VisualizationRegistry in action to combine the type scanning registration plus the default fallback behavior for log types that do not have any special visualization logic:

        [Test]
        public void visualization_registry()
        {
            var container = Container.For<VisualizationRegistry>();

            Debug.WriteLine(container.WhatDoIHave(@namespace: "StructureMap.Testing.Acceptance.Visualization"));

            container.GetInstance<IVisualizer<IssueCreated>>()
                .ShouldBeOfType<IssueCreatedVisualizer>();

            container.GetInstance<IVisualizer<IssueResolved>>()
                .ShouldBeOfType<IssueResolvedVisualizer>();

            // We have no special registration for TaskAssigned,
            // so fallback to the default visualizer
            container.GetInstance<IVisualizer<TaskAssigned>>()
                .ShouldBeOfType<DefaultVisualizer<TaskAssigned>>();
        }

Building Closed Types with ForGenericType() and ForObject()

Working with generic types and the common IHandler<T> pattern can be a little bit tricky if all you have is an object that is declared as an object. Fortunately, StructureMap has a couple helper methods and mechanisms to help you bridge the gap between DoSomething(object something) and DoSomething<T>(T something).

If you remember the full ILogVisualizer interface from above:

    public interface ILogVisualizer
    {
        // If we already know what the type of log we have
        string ToHtml<TLog>(TLog log);

        // If we only know that we have a log object
        string ToHtml(object log);
    }

The method ToHtml(object log) somehow needs to be able to find the right IVisualizer<T> and execute it to get the HTML representation for a log object. The StructureMap IContainer provides two different methods called ForObject() and ForGenericType() for exactly this case, as shown below in a possible implementation of ILogVisualizer:

    public class LogVisualizer : ILogVisualizer
    {
        private readonly IContainer _container;

        // Take in the IContainer directly so that
        // yes, you can use it as a service locator
        public LogVisualizer(IContainer container)
        {
            _container = container;
        }

        // It's easy if you already know what the log
        // type is
        public string ToHtml<TLog>(TLog log)
        {
            return _container.GetInstance<IVisualizer<TLog>>()
                .ToHtml(log);
        }

        public string ToHtml(object log)
        {
            // The ForObject() method uses the 
            // log.GetType() as the parameter to the open
            // type Writer<T>, and then resolves that
            // closed type from the container and
            // casts it to IWriter for you
            return _container.ForObject(log)
                .GetClosedTypeOf(typeof (Writer<>))
                .As<IWriter>()
                .Write(log);
        }

        public string ToHtml2(object log)
        {
            // The ForGenericType() method is again creating
            // a closed type of Writer<T> from the Container
            // and casting it to IWriter
            return _container.ForGenericType(typeof (Writer<>))
                .WithParameters(log.GetType())
                .GetInstanceAs<IWriter>()
                .Write(log);
        }

        // The IWriter and Writer<T> class below are
        // adapters to go from "object" to <T>() signatures
        public interface IWriter
        {
            string Write(object log);
        }

        public class Writer<T> : IWriter
        {
            private readonly IVisualizer<T> _visualizer;

            public Writer(IVisualizer<T> visualizer)
            {
                _visualizer = visualizer;
            }

            public string Write(object log)
            {
                return _visualizer.ToHtml((T) log);
            }
        }
    }

The two methods are almost identical in result with some slight differences:

  1. ForObject(object subject) can only work with open types that have only one generic type parameter, and it will pass the argument subject to the underlying Container as an explicit argument so that you can inject that subject object into the object graph being created.
  2. ForGenericType(Type openType) is a little clumsier to use, but can handle any number of generic type parameters

Example #2: Generic Instance Builder

As I recall, the following example was inspired by a question about how to use StructureMap to build out MongoDB MongoCollection objects from some sort of static builder or factory — but I can’t find the discussion on the mailing list as I write this today. This has come up often enough to justify its inclusion in the documentation.

Say that you have some sort of persistence tooling that you primarily interact with through an interface like this one below, where TDocument and TQuery are classes in your persistent domain:

    public interface IRepository<TDocument, TQuery>
    {

    }

Great, StructureMap handles generic types just fine, so you can just register the various closed types and off you go. Except you can’t because the way that your persistence tooling works requires you to create the IRepository<,>objects with a static builder class like this one below:

    public static class RepositoryBuilder
    {
        public static IRepository<TDocument, TQuery> Build<TDocument, TQuery>()
        {
            return new Repository<TDocument, TQuery>();
        }
    }

StructureMap has an admittedly non-obvious way to handle this situation by creating a new subclass of Instance that will “know” how to create the real Instance for a closed type of IRepository<,>.

First off, let’s create a new Instance type that knows how to build a specific type of IRepository<,> by subclassing the LambdaInstance type and providing a Func to build our repository type with the static RepositoryBuilder class:

    public class RepositoryInstance<TDocument, TQuery> : LambdaInstance<IRepository<TDocument, TQuery>>
    {
        public RepositoryInstance() : base(() => RepositoryBuilder.Build<TDocument, TQuery>())
        {
        }

        // This is purely to make the diagnostic views prettier
        public override string Description
        {
            get
            {
                return "RepositoryBuilder.Build<{0}, {1}>()"
                    .ToFormat(typeof(TDocument).Name, typeof(TQuery).Name);
            }
        }
    }

As you’ve probably surmised, the custom RepositoryInstance above is itself an open generic type and cannot be used directly until it has been closed. You could use this class directly if you have a very few document types like this:

            var container = new Container(_ =>
            {
                _.For<IRepository<string, int>>().UseInstance(new RepositoryInstance<string, int>());

                // or skip the custom Instance with:

                _.For<IRepository<string, int>>().Use(() => RepositoryBuilder.Build<string, int>());
            });

To handle the problem in a more generic way, we can create a second custom subclass of Instance for the open type IRepository<,> that will help StructureMap understand how to build the specific closed types of IRepository<,> at runtime:

    public class RepositoryInstanceFactory : Instance
    {
        // This is the key part here. This method is called by
        // StructureMap to "find" an Instance for a closed
        // type of IRepository<,>
        public override Instance CloseType(Type[] types)
        {
            // StructureMap will cache the object built out of this,
            // so the expensive Reflection hit only happens
            // once
            var instanceType = typeof (RepositoryInstance<,>).MakeGenericType(types);
            return Activator.CreateInstance(instanceType).As<Instance>();
        }

        // Don't worry about this one, never gets called
        public override IDependencySource ToDependencySource(Type pluginType)
        {
            throw new NotSupportedException();
        }

        public override string Description
        {
            get { return "Build Repository<T, T1>() with RepositoryBuilder"; }
        }

        public override Type ReturnedType
        {
            get { return typeof (Repository<,>); }
        }
    }

The key part of the class above is the CloseType(Type[] types) method. At that point, we can determine the right type of RepositoryInstance<,> to build the requested type of IRepository<,>, then use some reflection to create and return that custom Instance.

Here’s a unit test that exercises and demonstrates this functionality from end to end:

        [Test]
        public void show_the_workaround_for_generic_builders()
        {
            var container = new Container(_ =>
            {
                _.For(typeof (IRepository<,>)).Use(new RepositoryInstanceFactory());
            });

            container.GetInstance<IRepository<string, int>>()
                .ShouldBeOfType<Repository<string, int>>();

            Debug.WriteLine(container.WhatDoIHave(assembly:Assembly.GetExecutingAssembly()));
        }

After requesting IRepository<string, int> for the first time, the container configuration from Container.WhatDoIHave() is:

===================================================================================================================================================
PluginType                         Namespace                           Lifecycle     Description                                          Name     
---------------------------------------------------------------------------------------------------------------------------------------------------
IRepository<String, Int32>         StructureMap.Testing.Acceptance     Transient     RepositoryBuilder.Build<String, Int32>()             (Default)
---------------------------------------------------------------------------------------------------------------------------------------------------
IRepository<TDocument, TQuery>     StructureMap.Testing.Acceptance     Transient     Build Repository<T, T1>() with RepositoryBuilder     (Default)
===================================================================================================================================================