Planned Event Store Improvements for Marten V4, Daft Punk Edition

There’s a new podcast about Marten on the .Net Core Show that posted last week.

Marten V4 development has been heavily underway this year. To date, the work has mostly focused on the document store functionality (Linq, general performance improvements, and document metadata).

While I certainly hope the other improvements to Marten V4 will make a positive difference to our users, the big leap forward in capability is going to be on the event sourcing side of Marten. We’ve gathered a lot of user feedback on this feature set in the past couple years, but there’s always room for more discussion as things are taking shape.

First though, to set the mood:

The master issue for V4 event sourcing improvements is on GitHub here.

Scalability

We know there’s plenty of concern about how well Marten’s event store will scale over time. Beyond the performance improvements I’ll try to outline in following sections below, we’re planning to introduce support for:

Event stream archiving. Duh, keep the underlying Postgresql tables as small as you can.
Enable the usage of native Postgresql database sharding within the event store tables. One thing we haven’t discussed much yet is allowing users to define multiple, logically separate event stores within one Marten database.

Event Metadata

Similar to the document storage, the event storage in V4 will allow users to capture additional metadata to the event storage. There will be support in the event store Linq provider to query against this metadata, and this metadata will be available to the projections. Right now, the plan is to have opt in, additional fields for:

Correlation Id
Causation Id
User name

Additionally, the plan is to also have a “headers” field for user defined data that does not fall into the fields listed above. Marten will capture the metadata at the session level, with the thinking being that you could opt into custom Marten session creation that would automatically apply metadata for the current HTTP request or service bus message or logical unit of work.

There’ll be a follow up post on this soon.

Event Capture Improvements

When events are appended to event streams, we’re planning some small improvements for V4:

“Inline” projections will have access to the event metadata
The StartStream() feature will assert that the stream does not already exist
Better optimistic versioning of an individual event stream (here and here) — but we don’t have any detailed ideas about how we’re going about this

Projections, Projections, Projections!

This work is heavily in flight, so please shoot any feedback you might have our (Marten team’s) way.

Building your own event store is actually pretty easy — until the time you want to actually do something with the events you’ve captured or keep a “read-side” view of the status up to date with the incoming events. Based on a couple years of user feedback, all of that is exactly where Marten needs to grow up the most.

The master issue tracking the projection improvements is here. The Marten community (mostly me to be honest) has gone back and forth quite a bit on the shape of the new projection work and nothing I say here is set in stone. The main goals are to:

Significantly improve performance and throughput. We’re doing this partially by reducing in memory object allocations, but mostly by introducing much, much more parallelization of the projection work in the async daemon.
Simplify the usage of immutable data structures as the projected documents (note that we have plenty of F# users, and now C# record types make that a lot easier too).
Introduce snapshotting
Supplement the existing ViewProjection mechanism with conventional methods similar to the .Net StartUp class
Completely gut the existing ViewProjection to improve its performance while hopefully avoiding breaking API compatibility

There is some thought about breaking the projection support into its own project or making the event sourcing support be storage-agnostic, but I’m not sure about that making it to V4. My personal focus is on performance and scalability, and way too many of the possible optimizations seem to require coupling to details of Marten’s existing storage.

“Async Daemon”

The Async Daemon is an under-documented Marten subsystem we use to process asynchronously built event projections and do projection rebuilds. While it’s “functional” today, it has a lot of shortcomings (it can only run in one node at a time, and we don’d have any kind of leader election or failover) that prevent most folks from adopting it.

The master issue for the Async Daemon V4 is here, but the tl:dr is:

Make sure there’s adequate documentation (duh.)
Should be easy to integrate in your application
Has to be able to run in an application cluster in such a way that it guarantees that every projected view (or slice of a projected view) is being updated on exactly one node at a time
Improved performance and throughput of normal projection building
No downtime projection rebuilds
Way, way faster projection rebuilds

Now, to the changes coming in V4. Let’s assume that you’re doing “serious” work and needing to host your Marten-using .Net Core application across multiple nodes via some sort of cloud hosting. With minimal configuration, you’d like to have the asynchronous projection building “just work” across your cluster.

Here’s a visual representation of my personal “vision” for the async daemon in V4:

In V4 the async daemon will become a .Net Core BackgroundService that will be registered by the AddMarten() integration with HostBuilder. That mechanism will allow us to run background work inside of your .Net Core application.

Inside that background process the async daemon is going to have to elect a single “leader/distributor” agent that can only run on one node. That leader/distributor agent will be responsible for assigning work to the async daemon running inside all the active nodes in the application. What we’re hoping to do is to distribute and parallelize the projection building across running nodes. And oh yeah, do this without having to need any other kind of infrastructure besides the Postgresql database.

Within a single node, we’re adding a lot more parallelization to the projection building instead of treating everything as a dumb “left fold” single threaded queue problem. I’m optimistic that that’s going to make a huge difference for throughput. On top of that, I’m hoping that the new async daemon will be able to split work between different nodes without the nodes stepping on each other.

There’s still plenty of details to work out, and this post is just meant to be a window into some of the work that is happening within Marten for our big V4 release sometime in 2021.

Planned Event Store Improvements for Marten V4, Daft Punk Edition

Scalability

Event Metadata

Event Capture Improvements

Projections, Projections, Projections!

“Async Daemon”

Published by jeremydmiller

Leave a comment Cancel reply

Scalability

Event Metadata

Event Capture Improvements

Projections, Projections, Projections!

“Async Daemon”

Share this:

Published by jeremydmiller

Leave a comment Cancel reply