
There’s been a definite theme lately about increasing the performance and scalability of Marten, as evident (I hope) in my post last week describing new optimization options in Marten 7.25. Today I was able to push a follow up feature that got missed in that release that allows Marten users to utilize PostgreSQL table partitioning behind the scenes for document storage (7.25 added a specific utilization of table partitioning for the event store). The goal here is in selected scenarios, this would enable PostgreSQL to be mostly working with far smaller tables than it would otherwise, and hence perform better in your system.
Think of these common usages of Marten:
- You’re using soft deletes in Marten against a document type, and the mass majority of the time Marten is putting a default filter in for you to only query for “not deleted” documents
- You are aggressively using the Marten feature to mark event streams as archived when whatever process they model is complete. In this case, Marten is usually querying against the event table using a value of
is_archived= false - You’re using “conjoined” multi-tenancy within a single Marten database, and most of the time your system is naturally querying for data from only one tenant at a time
- Maybe you have a table where you’re frequently querying against a certain date property or querying for documents by a range of expected values
In all of those cases, it would be more performant to opt into PostgreSQL table partitioning where PostgreSQL is separating the storage for a single, logical table into separate “partition” tables. Again, in all of those cases above we can enable PostgreSQL + Marten to largely be querying against a much smaller table partition than the entire table would be — and querying against smaller database tables can be hugely more performant than querying against bigger tables.
The Marten community has been kicking around the idea of utilizing table partitioning for years (since 2017 by my sleuthing last week through the backlog), but it always got kicked down the road because of the perceived challenges in supporting automatic database migrations for partitions the same way we do today in Marten for every other database schema object (and in Wolverine too for that matter).
Thanks to an engagement with a JasperFx customer who has some pretty extreme scalability needs, I was able to spend the time last week to break through the change management challenges with table partitioning, and finally add table partitioning support for Marten.
As for what’s possible, let’s say that you want to create table partitioning for a certain very large table in your system for a particular document type. Here’s the new option for 7.26:
var store = DocumentStore.For(opts =>
{
opts.Connection("some connection string");
// Set up table partitioning for the User document type
opts.Schema.For<User>()
.PartitionOn(x => x.Age, x =>
{
x.ByRange()
.AddRange("young", 0, 20)
.AddRange("twenties", 21, 29)
.AddRange("thirties", 31, 39);
});
// Or use pg_partman to manage partitioning outside of Marten
opts.Schema.For<User>()
.PartitionOn(x => x.Age, x =>
{
x.ByExternallyManagedRangePartitions();
// or instead with list
x.ByExternallyManagedListPartitions();
});
// Or use PostgreSQL HASH partitioning and split the users over multiple tables
opts.Schema.For<User>()
.PartitionOn(x => x.UserName, x =>
{
x.ByHash("one", "two", "three");
});
opts.Schema.For<Issue>()
.PartitionOn(x => x.Status, x =>
{
// There is a default partition for anything that doesn't fall into
// these specific values
x.ByList()
.AddPartition("completed", "Completed")
.AddPartition("new", "New");
});
});
To use the “hot/cold” storage on soft-deleted documents, you have this new option:
var store = DocumentStore.For(opts =>
{
opts.Connection("some connection string");
// Opt into partitioning for one document type
opts.Schema.For<User>().SoftDeletedWithPartitioning();
// Opt into partitioning and and index for one document type
opts.Schema.For<User>().SoftDeletedWithPartitioningAndIndex();
// Opt into partitioning for all soft-deleted documents
opts.Policies.AllDocumentsSoftDeletedWithPartitioning();
});
And to partition “conjoined” tenancy documents by their tenant id, you have this feature:
storeOptions.Policies.AllDocumentsAreMultiTenantedWithPartitioning(x =>
{
// Selectively by LIST partitioning
x.ByList()
// Adding explicit table partitions for specific tenant ids
.AddPartition("t1", "T1")
.AddPartition("t2", "T2");
// OR Use LIST partitioning, but allow the partition tables to be
// controlled outside of Marten by something like pg_partman
// https://github.com/pgpartman/pg_partman
x.ByExternallyManagedListPartitions();
// OR Just spread out the tenant data by tenant id through
// HASH partitioning
// This is using three different partitions with the supplied
// suffix names
x.ByHash("one", "two", "three");
// OR Partition by tenant id based on ranges of tenant id values
x.ByRange()
.AddRange("north_america", "na", "nazzzzzzzzzz")
.AddRange("asia", "a", "azzzzzzzz");
// OR use RANGE partitioning with the actual partitions managed
// externally
x.ByExternallyManagedRangePartitions();
});
Summary
Your mileage will vary of course depending on how big your database is and how you really query the database, but at least in some common cases, the Marten community is pretty excited for the potential of table partitioning to improve Marten performance and scalability.