Marten V4 Preview: Linq and Performance

Marten is an open source library for .Net that allows developers to treat the robust Postgresql database as a full featured and transactional document database (NoSQL) as well as supporting the event sourcing pattern of application persistence.

After a false start last summer, development on the long awaited and delayed Marten V4.0 release is heavily in flight and we’re making a lot of progress. The major focus of the remaining work is improving the event store functionality (that I’ll try to blog about later in the week if I can). We posted the first Marten V4 alpha on Friday for early adopters — or folks that need Linq provider fixes ASAP! — to pull down and start trying out. So far the limited feedback has been a nearly seamless upgrade.

You can track the work and direction of things through the GitHub issues that are already done and the ones that are still planned.

For today though, I’d like to focus on what’s been done so far in V4 in terms of making Marten simply better and faster at its existing feature set.

Being Faster by Doing Less

One of the challenging things about Marten’s feature set is the unique permutations of what exactly happens when you store, delete, or load document to and from the database. For example, some documents may or may not be:

On top of that, Marten supports a couple different flavors of document sessions:

  • Query-only sessions that are strictly read only querying
  • The normal session that supports an internal identity map functionality that caches previously loaded documents
  • Automatic dirty checking sessions that are the heaviest Marten sessions
  • “Lightweight” sessions that don’t use any kind of identity map caching or automatic dirty checking for faster performance and better memory usage — at the cost of a little more developer written code.

The point here is that there’s a lot of variability in what exactly happens in Marten when you save, load, or delete a document with Marten. In the current version, Marten uses a combination of runtime if/then logic, some “Nullo” classes, and a little bit of “Expression to Lambda” runtime compilation.

For V4, I completely re-wired the internals to use C# code generated and compiled at runtime using Roslyn’s runtime compilation capabilities. Marten is using the LamarCompiler and LamarCodeGeneration libraries as helpers. You can see these two libraries and this technique in action in a talk I gave at NDC London in 2019.

The end result of all this work is that we can generated the tightest possible C# handling code and the tightest possible SQL for the exact permutation of document storage characteristics and session type. Along the way, we’ve striven to reduce the number of dictionary lookups, runtime branching logic, empty Nullo objects, and generally the number of computer instructions that would have to be executed by the underlying processor just to save, load, or delete a document.

So far, so good. It’s hard to say exactly how much this is going to impact any given Marten-using application, but the existing test suite clearly runs faster now and I’m not seeing any noticeable issue with the “cold start” of the initial, one time code generation and compilation (that was a big issue in early Roslyn to the point where we ripped that out of pre 1.0 Marten, but seems to be solved now).

If anyone is curious, I’d be happy to write a blog post diving into the guts of how that works. And why the new .Net source generator feature wouldn’t work in this case if anyone wants to know about that too.

Linq Provider Almost-Rewrite

To be honest, I think Marten’s existing Linq provider (pre-V4) is pretty well stuck at the original proof of concept stage thrown together 4-5 years ago. The number of open issues where folks had hit limitations in the Linq provider support built up — especially with anything involving child collections on document types.

For V4, we’ve heavily restructured the Linq parsing and SQL generation code to address the previous shortcomings. There’s a little bit of improvement in the performance of Linq parsing and also a little bit of optimization of the SQL generated by avoiding unnecessary CASTs. Most of the improvement has been toward addressing previously unsupported scenarios. A potential improvement that we haven’t yet exploited much is to make the SQL generation and Linq parsing more able to support custom value types and F#-isms like discriminated unions through a new extensibility mechanism that teaches Marten about how these types are represented in the serialized JSON storage.

Querying Descendent Collections

Marten pre-V4 didn’t handle querying through child collections very well and that’s been a common source of user issues. With V4, we’re heavily using the Common Table Expression query support in Postgresql behind the scenes to make Linq queries like this one shown below possible:

var results = theSession.Query<Top>()
.Where(x => x.Middles.Any(b => b.Bottoms.Any()))
.ToList();

I think that at this point Marten can handle any combination of querying through child collections through any number of levels with all possible query operators (Any() / Count()) and any supported Where() fragment within the child collection.

Multi-Document Includes

Marten has long had some functionality for fetching related documents together in one database round trip for more efficient document reading. A long time limitation in Marten is that this Include() capability was only usable for logical “one to one” or “many to one” document relationships. In V4, you can now use Include() querying for “one to many” relationships as shown below:

[Fact]
public void include_many_to_list()
{
var user1 = new User { };
var user2 = new User { };
var user3 = new User { };
var user4 = new User { };
var user5 = new User { };
var user6 = new User { };
var user7 = new User { };

theStore.BulkInsert(new User[]{user1, user2, user3, user4, user5, user6, user7});

var group1 = new Group
{
Name = "Odds",
Users = new []{user1.Id, user3.Id, user5.Id, user7.Id}
};

var group2 = new Group {Name = "Evens", Users = new[] {user2.Id, user4.Id, user6.Id}};

using (var session = theStore.OpenSession())
{
session.Store(group1, group2);
session.SaveChanges();
}

using (var query = theStore.QuerySession())
{
var list = new List<User>();

query.Query<Group>()
.Include(x => x.Users, list)
.Where(x => x.Name == "Odds")
.ToList()
.Single()
.Name.ShouldBe("Odds");

list.Count.ShouldBe(4);
list.Any(x => x.Id == user1.Id).ShouldBeTrue();
list.Any(x => x.Id == user3.Id).ShouldBeTrue();
list.Any(x => x.Id == user5.Id).ShouldBeTrue();
list.Any(x => x.Id == user7.Id).ShouldBeTrue();
}
}

This was a longstanding request from users, and to be honest, we had to completely rewrite the Include() internals to add this support. Again, we used Common Table Expression SQL statements in combination with per session temporary tables to pull this off.

Compiled Queries Actually Work

I think the Compiled Query feature is unique in Marten. It’s probably easiest and best to think of it as a “stored procedure” for Linq queries in Marten. The value of a compiled query in Marten is:

  1. It potentially cleans up the application code that has to interact with Marten queries, especially for more complex queries
  2. It’s potentially some reuse for commonly executed queries
  3. Mostly though, it’s a significant performance improvement because it allows Marten to “remember” the Linq query plan.

While compiled queries have been supported since Marten 1.0, there’s been a lot of gap between what works in Marten’s Linq support and what functions correctly inside of compiled queries. With the advent of V4, the compiled query planning was rewritten with a new strategy that so far seems to support all of the Linq capabilities of Marten. We think this will make the compiled query feature much more useful going forward.

Here’s an example compiled query that was not possible before V4:

public class FunnyTargetQuery : ICompiledListQuery<Target>
{
public Expression<Func<IMartenQueryable<Target>, IEnumerable<Target>>> QueryIs()
{
return q => q
.Where(x => x.Flag && x.NumberArray.Contains(Number));
}

public int Number { get; set; }
}

And in usage:

var actuals = session.Query(new FunnyTargetQuery{Number = 5}).ToArray();

Multi-Level SelectMany because why not?

Marten has long supported the SelectMany() keyword in the Linq provider support, but in V4 it’s much more robust with the ability to chain SelectMany() clauses n-deep and do that in combination with any kind of Count() / Distinct() / Where() / OrderBy() Linq clauses. Here’s an example:

[Fact]
public void select_many_2_deep()
{
var group1 = new TargetGroup
{
Targets = Target.GenerateRandomData(25).ToArray()
};

var group2 = new TargetGroup
{
Targets = Target.GenerateRandomData(25).ToArray()
};

var group3 = new TargetGroup
{
Targets = Target.GenerateRandomData(25).ToArray()
};

var groups = new[] {group1, group2, group3};

using (var session = theStore.LightweightSession())
{
session.Store(groups);
session.SaveChanges();
}

using var query = theStore.QuerySession();

var loaded = query.Query<TargetGroup>()
.SelectMany(x => x.Targets)
.Where(x => x.Color == Colors.Blue)
.SelectMany(x => x.Children)
.OrderBy(x => x.Number)
.ToArray()
.Select(x => x.Id).ToArray();

var expected = groups
.SelectMany(x => x.Targets)
.Where(x => x.Color == Colors.Blue)
.SelectMany(x => x.Children)
.OrderBy(x => x.Number)
.ToArray()
.Select(x => x.Id).ToArray();

loaded.ShouldBe(expected);
}

Again, we pulled that off with Common Table Expression statements.

6 thoughts on “Marten V4 Preview: Linq and Performance

  1. I hate to hear that the .Net source generators won’t help in this case, I’ve been planning on using them for a number of similar things where I currently use Roslyn to compile dynamic c# to generate assemblies.

    1. I’m not saying that it *couldn’t* work, but the generators can’t really use Reflection against the built assemblies to analyze the configuration or easily use the logic in the Marten configuration model. Without easy access to Reflection, the generators are limited here. They wouldn’t work with Marten (or Jasper), but could with something like generating validation code based on attributes.

  2. Querying descendent collections and multi-document includes were the only real missing features for using Marten as a document database – so pleased to see these make it into v4!

  3. Does this mean that I will be able to use Marten as the data source for odata? That would be great, because so far I had to load the data into in-memory collections using Marten, and use those collections as IQueryable, but obviously that only works well if the data is static in nature.

      1. Hi, indeed, and it worked when I did not do filtering in the query string. I’m sorry I do not have the scenario exactly on the top of my head, but I remember giving up on it when I started to do filtering and my service returned incomplete json.

        To be honest, I was under tremendous time pressure at the time, so I did not put huge effort into investigating. Since the data I worked with was static and only a few megabytes, I just loaded it in memory and used the in-mem LINQ. I also had to admit that I had no expectations to start with: I have yet to see one 3rd party LINQ provider that works with more complex odata queries the way it should, so I just gave this one a try and moved on when I hit a wall.

        I will download the alpha version and give it another try. If I still have issues, I’ll open an issue. 🙂

Leave a comment