
Working inside of the LINQ provider code that’s ultimately interpreting C# code and trying to turn that into the most efficient Pgpsql code possible somehow makes me think about Alice in Wonderland where I’m definitely in the role of “Alice.”
Just continuing a loose series about recent improvements in the big Marten 7.0 release:
- Revisioned Documents in Marten 7
- Marten 7 makes “Write Model” Projections Super
- Recent Critter Stack Multi-Tenancy Improvements (w/ Wolverine 2.0 too!)
- Resiliency and Low Level Improvements in Marten 7
A major set of emphasis for the Marten 7 release was to address the large accretion of LINQ related issues with a mostly new LINQ provider subsystem. In terms of improvements or just changes, we:
- Completely removed the Remotion.Linq dependency (and the results of that are mixed so far)
- Reworked the SQL generation for child collection querying
- Greatly expanded the usage of .NET
Dictionarytypes within LINQ queries - Expanded the reach of
Select()transformations quite a bit — and that code hadn’t been touched or improved in any substantial way since Marten 1.0. You can see the list of patterns that Marten now supports in the acceptance test code. - Allowed for filtering of
Include()queries, which has been a longstanding user “ask”
Child Collection Improvements
Alright, so back to the real problem. When Marten today encounters a LINQ query like this one:
var results = theSession.Query<Top>().Where(x =>
x.Middles.Any(m => m.Color == Colors.Green && m.Bottoms.Any(b => b.Name == "Bill")));
Marten versions 4-6 had to generate a complicated SQL query using PostgreSQL Common Table Expressions to explode out the child collections into flat rows that can then be filtered to matching child rows, then finally uses a sub query filter on the original table to find the right rows. To translate, all that mumbo jumbo I said translates to “a big ass, slow query that doesn’t allow PostgreSQL to utilize its fancy GIN index support for faster JSONB querying.”
The Marten v7 support is smart enough to “know” when it can generate more efficient SQL for certain child collection filtering. In the case above, Marten v7 can use the PostgreSQL containment operator to utilize the GIN indexing support and just be simpler in general with SQL like this:
electd.id, d.data frompublic.mt_doc_top asd whereCAST(d.data ->> 'Middles'asjsonb) @> &1 LIMIT &2 &3: [{"Color":2,"Bottoms":[{"Name":"Bill"}]}]
Another little micro-optimization we did for Marten V7 was to have as much SQL generation as possible use positional parameters (“&1”) as opposed to named parameters (“:p0”) that PostgreSQL itself expects to skip some SQL preprocessing in memory that we were forcing Npgsql to do for us earlier.
From early adopter feedback, some child collection queries in real systems have been a touch over an order of magnitude faster from V6 to V7 because of this change.
One more sample that I’m especially proud of. Let’s say you use this LINQ query:
var result = await theSession.Query<Root>()
.Where(r => r.ChildsLevel1.Count(c1 => c1.Name == "child-1.1") == 1)
.ToListAsync();
This one’s a little more complicated because you need to do a test of the number of matching child elements within a child collection. Again, Marten V6 will use a nasty and not terribly efficient common table expression approach to give you the right data. For Marten v7, we specifically asked the Marten user base if we could abandon support for any PostgreSQL versions lower than PostgreSQL 12 so we could use PostgreSQL’s JSONPath query support within our LINQ provider. That change got us to this SQL for the LINQ query from up above:
select d.id, d.data from public.mt_doc_root as d where jsonb_array_length(jsonb_path_query_array(d.data, '$.ChildsLevel1[*] ? (@.Name == $val1)', $1)) = $2
&1: {"val1":"child-1.1"}
&2: 1
That’s the child collection querying, which was absolutely a first class goal and time consuming task along the way.
Dictionary Queries
There was a large effort to improve Marten’s ability to query through Dictionary<TKey, TValue> members of a parent document type for V7 — which immediately led to plenty more user reported bugs we had to fix during the V7 pre-release cycle.
Here’a an example that’s now possible querying through Dictionary.ContainsKey():
var results = await theSession
.Query<Target>()
// This is querying through a dictionary
.Where(x => x.GuidDict.ContainsKey(guid))
.ToListAsync();
And some SelectMany() action:
var pairs = await theSession
.Query<Target>()
.SelectMany(x => x.StringDict.Keys)
.ToListAsync();
And selecting the dictionary:
var data = await theSession
.Query<SelectDict>()
.Select(x => x.Dict)
.ToListAsync();
And querying through dictionary values:
var values = await theSession.Query<Target>().SelectMany(x => x.StringDict.Values)
.Where(x => x == "value2")
.OrderBy(x => x)
.ToListAsync();
Filtered Include
Marten’s Include() functionality is an important way to improve system performance by fetching related documents in one database round trip. Many folks through the years have asked for a way to limit the number of “included” documents returned in these queries by specifying a Where() filter against the included document type. That very functionality landed in Marten V7 with this syntax shown below from our testing code:
var holders = await theSession.Query<TargetHolder>()
// Limit the fetched Target documents from the Include()
// by specifying a filter on Target documents
.Include<Target>(x => x.TargetId, x => list.Add(x), t => t.Color == Colors.Blue)
.ToListAsync();
Summary
While there are still some regression bugs coming in from the LINQ provider work (but they’re getting way more “edge case-y” and “WTH *would* you do that?”), I’m feeling good about the LINQ provider subsystem in Marten as a better foundation for us moving forward.
And for the appropriate coding soundtrack for me when I need to duck into the LINQ provider code in Marten to fix bugs: