Marten v4.0 Planning Document (Part 1)

As I wrote about a couple weeks ago in a post called Kicking off Marten V4 Development, the Marten team is actively working on the long delayed v4.0 release with planned improvements for performance, the Linq support, and a massive planned jump in capability for the event store functionality. This post is a the result of a long comment thread and many discussions between the Marten community.

We don’t yet have a great consensus about the exact direction that the event store improvements are going to go, so I’m strictly focusing on improvements to the Marten internals and the Document Database support. I’ll follow up with a part 2 on the event store as that starts to gel.

If you’re interested in Marten, here’s some links:

 

Pulling it Off

We’ve got the typical problems of needing to address incoming pull requests and bug issues in master while probably needing to have a long lived branch for v4.

As an initial plan, let’s:

  1. Start with the unit testing improvements as a way to speed up the build before we dive into much more of this? This is in progress with about a 25% reduction in test throughput time so far in this pull request
  2. Do a bug sweep v3.12 release to address as many of the tactical problems as we can before branching to v4
  3. Possibly do a series of v4, then v5 releases to do this in smaller chunks? We’ve mostly said do the event store as v4, then Linq improvements as v5 — Nope, full speed ahead with a large v4 release in order to do as many breaking changes as possible in one release
  4. Extract the generic database manipulation code to its own library to clean up Marten, and speed up our builds to make the development work be more efficient.
  5. Do the Event Store v4 work in a separate project built as an add on from the very beginning, but leave the existing event store in place. That would enable us to do a lot of work and mostly be able to keep that work in master so we don’t have long-lived branch problems. Break open the event store improvement work because that’s where most of the interest is for this release.

Miscellaneous Ideas

  • Look at some kind of object pooling for the DocumentSession / QuerySession objects?
  • Ditch the document by document type schema configuration where Document A can be in one schema, and Document “B” is in another schema. Do that, and I think we open the door for multi-tenancy by schema.
  • Eliminate ManagedConnection altogether. I think it results in unnecessary object allocations and it’s causing more harm that help as it’s been extended over time. After studying that more today, it’s just too damn embedded. At least try to kill off the Execute methods that take in a Lambda. See this GitHub issue.
  • Can we consider ditching < .Net Core or .Net v5 for v4? The probable answer is “no,” so let’s just take this off the table.
  • Do a hunt for classes in Marten marked public that should be internal. Here’s the GitHub issue.
  • Make the exceptions a bit more consistent

Dynamic Code Generation

If you look at the pull request for Document Metadata and the code in Marten.Schema.Arguments you can see that our dynamic Expression to Lambda compilation code is getting extremely messy, hard to reason with, and difficult to extend.

Idea: Introduce a dependency on LamarCodeGeneration and LamarCompiler. LamarCodeGeneration has a strong model for dynamically generating C# code at runtime. LamarCompiler adds runtime Roslyn support to compile assemblies on the fly and utilities to attach/create these classes. We could stick with Expression to Lambda compilation, but that can’t really handle any kind of asynchronous code without some severe pain and it’s far more difficult to reason about (Jeremy’s note: I’m uniquely qualified to make this statement unfortunately).

What gets dynamically generated today:

  • Bulk importer handling for a single entity
  • Loading entities and tracking entities in the identity map or version tracking

What could be generated in the future:

  • Document metadata properties — but sad trombone, that might have to stay with Expressions if the setters are internal/private :/
  • Much more of the ISelector implementations, especially since there’s going to be more variability when we do the document metadata
  • Finer-grained manipulation of the IIdentityMap

Jeremy’s note: After doing some detailed analysis through the codebase and the spots that would be impacted by the change to dynamic code generation, I’m convinced that this will lead to significant performance improvements by eliminating many existing runtime conditional checks and casts

Track this work here.

Unit Testing Approach

This is in progress, and going well.

If we introduce the runtime code generation back into Marten, that’s unfortunately a non-trivial “cold start” testing issue. To soften that, I suggest we get a lot more aggressive with reusable xUnit.Net class fixtures between tests to reuse generated code between tests, cut way down on the sheer number of database calls by not having to frequently check the schema configuration, and other DocumentStore overhead.

A couple other points about this:

  • We need to create more unique document types so we’re not having to use different configurations for the same document type. This would enable more reuse inside the testing runtime
  • Be aggressive with separate schemas for different configurations
  • We could possibly turn on xUnit.net parallel test running to speed up the test cycles

Document Metadata

  • From the feedback on GitHub, it sounds like the desire to extend the existing metadata to tracking data like correlation identifiers, transaction ids, user ids, etc. To make this data easy to query on, I would prefer that this data be separate columns in the underlying storage
  • Use the configuration and tests from pull request for Document Metadata, but use the Lamar-backed dynamic code generation from the previous section to pull this off.
  • I strongly suggest using a new dynamic codegen model for the ISelector objects that would be responsible for putting Marten’s own document metadata like IsDeleted or TenantId or Version onto the resolved objects (but that falls apart if we have to use private setters)
  • I think we could expand the document metadata to allow for user defined properties like “user id” or “transaction id” much the same way we’ll do for the EventStore metadata. We’d need to think about how we extend the document tables and how metadata is attached to a document session

My thought is to designate one (or maybe a few?) .Net type as the “metadata type” for your application like maybe this one:

    public class MyMetadata
    {
        public Guid CorrelationId { get; set; }
        public string UserId { get; set; }
    }

Maybe that gets added to the StoreOptions something like:

var store = DocumentStore.For(x => {
    // other stuff

    // This would direct Marten to add extra columns to
    // the documents and events for the metadata properties
    // on the MyMetadata type.

    // This would probably be a fluent interface to optionally fine tune
    // the storage and applicability -- i.e., to all documents, to events, etc.
    x.UseMetadataType<MyMetadata>();
});

Then at runtime, you’d do something like:

session.UseMetadata<MyMetadta>(metadata);

Either through docs or through the new, official .Net Core integration, we have patterns to have that automatically set upon new DocumentSession objects being created from the IoC to make the tracking be seemless.

Extract Generic Database Helpers to its own Library

  • Pull everything to do with Schema object generation, difference detection, and DDL generation to a separate library (IFeatureSchemaISchemaObject, etc.). Mostly to clean out the main library, but also because this code could easily be reused outside of Marten. Separating it out might make it easier to test and extend that functionality, which is something that occasionally gets requested. There’s also the possibility of further breaking that into abstractions and implementations for the long run of getting us ready for Sql Server or other database engine support. The tests for this functionality are slow, and rarely change. It would be advantageous to get this out of the main Marten library and testing project.
  • Pull the ADO.Net helper code like CommandBuilder and the extension methods into a small helper library somewhere else (I’m nominating the Baseline repository). This code is copied around to other projects as it is, and it’s another way of getting stuff out of the main library and the test suite.

Track this work in this GitHub issue.

F# Improvements

We’ll have a virtual F# subcommittee to be watching this work for F#-friendliness:

HostBuilder Integration

We’ll bring Joona-Pekka Kokko’s ASP.Net Core integration library into the main repository and make that the officially blessed and documented recipe for integrating Marten into .Net Core applications based on the HostBuilder in .Net Core. I suppose we could also multi-target IWebHostBuilder for ASP.Net Core 2.*.

That HostBuilder integration could be extended to:

  • Optionally set up the Async Daemon in an IHostedService — more on this in the Event Store section
  • Optionally register some kind of IDocumentSessionBuilder that could be used to customize session construction?
  • Have some way to have container resolved IDocumentSessionListener objects attached to IDocumentSession. This is to have an easy recipe for folks who want events broadcast through messaging infrastructure in CQRS architectures

See the GitHub issue for this.

Command Line Support

The Marten.CommandLine package already uses Oakton for command line parsing. For easier integration in .Net Core applications, we could shift that to using the Oakton.AspNetCore package so the command line support can be added to any ASP.net Core 2.* or .Net Core 3.* project by installing the Nuget. This might simplify the usage because you’d no longer need a separate project for the command line support.

There are some long standing stories about extending the command line support for the event store projection rebuilding. I think that would be more effective if it switches over to Oakton.AspNetCore.

See the GitHub issue

Linq

This is also covered by the Linq Overhaul issue.

  • Bring back the work in the linq branch for the revamped IField model within the Linq provider. This would be advantageous for performance, cleans up some conditional code in the Linq internals, could make the Linq support be aware of Json serialization customizations like [JsonProperty], and probably helps us deal more with F# types like discriminated unions.
  • Completely rewrite the Include() functionality. Use Postgresql Common Table Expression and UNION queries to fetch both the parent and any related documents in one query without needing to do any kind of JOIN s that complicate the selectors. There’d be a column for document type the code could use to switch. The dynamic code generation would help here. This could finally knock out the long wished for Include() on child collections feature. This work would nuke the InnerJoin stuff in the ISelector implementations, and that would hugely simplify a lot of code.
  • Finer grained code generation would let us optimize the interactions with IdentityMap. For purely query sessions, you could completely skip any kind of interaction with IdentityMap instead of wasting cycles on nullo objects. You could pull out a specific IdentityMap<TEntity, TKey> out of the larger identity map just before calling selectors to avoid some repetitive “find the right inner dictionary” on each document resolved.
  • Maybe introduce a new concept of ILinqDialect where the Expression parsing would just detect what logical thing it finds (like !BoolProperty), and turns around and calls this ILinqDialect to get at a WhereFragment or whatever. This way we could ready ourselves to support an alternative json/sql dialect around JSONPath for Postgresql v12+ and later for Sql Server vNext. I think this would fit into the theme of making the Linq support more modular. It should make the Linq support easier to unit test as we go. Before we do anything with this, let’s take a deep look into the EF Core internals and see how they handle this issue
  • Consider replacing the SelectMany() implementation with Common Table Expression sql statements. That might do a lot to simplify the internal mechanics. Could definitely get us to an n-deep model.
  • Do the Json streaming story because it should be compelling, especially as part of the readside of a CQRS architecture using Marten’s event store functionality.
  • Possibly use a PLV8-based JsonPath polyfill so we could use sql/json immediately in the Linq support. More research necessary.

Partial Updates

Use native postgresql partial JSON updates wherever possible. Let’s do a perf test on that first though.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s