Imagining the Ideal GraphQL Integration for Marten

I’m seeing an increasing amount of interest in exposing Marten data behind GraphQL endpoints from folks in the Critter Stack Discord channels and from current JasperFx Software clients. After having mostly let other folks handle the Marten + Hot Chocolate combination, I finally spent some significant time looking into what it takes to put Marten behind Hot Chocolate’s GraphQL handling — and I unfortunately saw some real issues for unwary users that I wrote about last week in Hot Chocolate, GraphQL, and the Critter Stack.

Today, I want to jot down my thoughts about how a good GraphQL layer for Marten could be constructed, with a couple caveats that hopefully much of this will be possible once I know much more about Hot Chocolate internals and that the rest of the Marten core team and I have zero interest in building a GraphQL layer from scratch.

Command Batching

A big part of GraphQL usage is wanting a way to aggregate queries from your client to the backend without making a lot of network round trips in a way that pretty well destines you for poor performance. Great, awesome, but on the server side, Hot Chocolate runs every query in parallel, which for Marten means opening a session for each query or serializing the usage of Marten’s sessions and therefore losing the parallelization.

Instead of that parallelization, what I’d love to do is cut in higher up in the GraphQL execution pipeline and instead, batch up the queries into a single database command. What we’ve repeatedly found over 8 years of Marten development (where did the time go?) is that batching database queries into a single network round trip to a PostgreSQL database consistently leads to better performance than making serialized requests. And that’s even with the more complex query building we do within Marten, Weasel, and Wolverine.

Streaming Marten JSON

In the cases where you don’t need to do any transformation of the JSON data being fetched by Marten into the GraphQL results (and remember, it is legal to return more fields than the client actually requested), Marten has an option for very fast HTTP services where it can just happily stream the server stored JSON data right to the HTTP response byte by byte. That’s vastly more efficient than the normal “query data, transform that to objects, then use a JSON serializer to write those objects to HTTP” mechanics.

More Efficient Parsing

Go easy commenting on this one, because this is all conjecture on my part here.

The process of going from a GraphQL query to actual results (which then have to be serialized down to the HTTP response) in Hot Chocolate + Marten is what a former colleague of mine would refer to as a “crime against computer science”:

  1. Hot Chocolate gets the raw string for the GraphQL request that’s sorta like JSON, but definitely not compliant JSON
  2. GraphQL is (I’m guessing) translated to some kind of intermediate model
  3. When using a Hot Chocolate query based on returning a LINQ IQueryable — and most Hot Chocolate samples do this — Hot Chocolate is building up a LINQ Expression on the fly
  4. Marten’s LINQ provider is then taking that newly constructed LINE Expression, and parsing that to create first an intermediate model representing the basics of the operation (are we fetching a list? limiting or skipping results? transforming the raw document data? where/order clauses?)
  5. Marten’s LINQ provider takes Marten’s intermediate model and creates a model that represents fragments of SQL and also determines a query handler strategy for the actual results (list results? FirstOrDefault()? Single() Count()? )
  6. Marten evaluates all these SQL fragments to build up a PostgreSQL SQL statement, executes that, and uses its query handler to resolve the actual resulting documents

If you read that list above and thought to yourself, that sounds like a ton of object allocations and overhead and I wonder if that could end up being slow, yeah, me, too.

What I’d ideally like to see is a model where Marten can take whatever GraphQL’s intermediate model is and effectively skip down from #2 straight down to #5/6. I’d also love to see some kind of way to cache “query plans” in a similar way to Marten’s compiled query mechanism where repetitive patterns of GraphQL queries can be cached to skip even more of the parsing and LINQ query/SQL generation/handler strategy selection overhead.

Batching Mutations to Marten

Betting this would be the easiest thing to pull off. Instead of depending on ambient transactions in .NET (ick), I’d like to be able to look ahead at all the incoming mutations, and if they are all Marten related, use Marten’s own unit of work mechanics and native database transactions.

Wrapping Up

That’s it for now. Not every blog post has to be War and Peace:-)

I might be back next week with an example of how to do integration testing of GraphQL endpoints with Alba — right after I learn how to do that so I can show a JasperFx client.

Leave a comment