Batch Querying with Marten

Before I talk about the batch querying feature set in Marten, let’s take a little detour through a common approach to persistence in .Net architectures that commonly causes the exact problem that Marten’s batch querying seeks to solve.

I’ve been in several online debates lately about the wisdom or applicability of granular repository abstractions over inner persistence infrastructure like EF Core or Marten like this sample below:

    public interface IRepository<T>
    {
        Task<T> Load(Guid id, CancellationToken token = default);
        Task Insert(T entity, CancellationToken token = default);
        Task Update(T entity, CancellationToken token = default);
        Task Delete(T entity, CancellationToken token = default);

        IQueryable<T> Query();
    }

That’s a pretty common approach, and I’m sure it’s working out for some people in at least simpler CRUD-centric applications. Unfortunately though, that reliance on fine-grained repositories also breaks down badly in more complicated systems where a single logical operation may need to span multiple entity types. Coincidentally, I have frequently seen this kind of fine grained abstraction directly lead to performance problems in the systems I’ve helped with after their original construction over the past 6-8 years.

For an example, let’s say that we have a message handler that will need to access and modify data from three different entity types in one logical transaction. Using the fine grained repository strategy, we’d have something like this:

    public class SomeMessage
    {
        public Guid UserId { get; set; }
        public Guid OrderId { get; set; }
        public Guid AccountId { get; set; }
    }

    public class Handler
    {
        private readonly IUnitOfWork _unitOfWork;
        private readonly IRepository<Account> _accounts;
        private readonly IRepository<User> _users;
        private readonly IRepository<Order> _orders;

        public Handler(
            IUnitOfWork unitOfWork,
            IRepository<Account> accounts,
            IRepository<User> users,
            IRepository<Order> orders)
        {
            _unitOfWork = unitOfWork;
            _accounts = accounts;
            _users = users;
            _orders = orders;
        }

        public async Task Handle(SomeMessage message)
        {
            // The potential performance problem is right here.
            // Multiple round trips to the database
            var user = await _users.Load(message.UserId);
            var account = await _accounts.Load(message.AccountId);
            var order = await _orders.Load(message.OrderId);

            var otherOrders = await _orders.Query()
                .Where(x => x.Amount > 100)
                .ToListAsync();

            // Carry out rules and whatnot

            await _unitOfWork.Commit();
        }
    }

So here’s the problem with the code up above as I see it:

  1. You’re having to inject separate dependencies for the matching repository type for each entity type, and that adds code ceremony and noise code.
  2. The code is making repeated round trips to the database server every time it needs more data. This is a contrived example, and it’s only 4 trips, but in real systems this could easily be many more. To make this perfectly clear, one of the very most pernicious sources of slow code is chattiness (frequent network round trips) between the application layer and backing database.

Fortunately, Marten has a facility called batch querying that we can use to fetch multiple data queries at one time, and even start processing against the earlier results while the later results are still being read. To use that, we’ve got to ditch the “one size fits all, least common denominator” repository abstraction and use the raw Marten IDocumentSession service as shown in this version below:

    public class MartenHandler
    {
        private readonly IDocumentSession _session;

        public MartenHandler(IDocumentSession session)
        {
            _session = session;
        }

        public async Task Handle(SomeMessage message)
        {
            // Not gonna lie, this is more code than the first alternative
            var batch = _session.CreateBatchQuery();

            var userLookup = batch.Load<User>(message.UserId);
            var accountLookup = batch.Load<Account>(message.AccountId);
            var orderLookup = batch.Load<Order>(message.OrderId);
            var otherOrdersLookup = batch.Query<Order>().Where(x => x.Amount > 100).ToList();

            await batch.Execute();

            // We can immediately start using the data from earlier
            // queries in memory while the later queries are still processing
            // in the background for a little bit of parallelization
            var user = await userLookup;
            var account = await accountLookup;
            var order = await orderLookup;

            var otherOrders = await otherOrdersLookup;

            // Carry out rules and whatnot

            // Commit any outstanding changes with Marten
            await _session.SaveChangesAsync();
        }

The code above creates a single, batched query for the four queries this handler needs, meaning that Marten is making a single database query for the four SELECT statements. As an improvement in the Marten V4 release, the results coming back from Postgresql are processed in a background Task, meaning that in the code above we can start working with the initial Account, User, and Order data while Marten is still building out the last Order results (remember that Marten has to deserialize JSON data to build out your documents and that can be non-trivial for large documents).

I think these are the takeaways for the before and after code here:

  1. Network round trips are expensive and chattiness can be a performance bottleneck, but batch querying approaches like Marten’s can help a great deal.
  2. Putting your persistence tooling behind least common denominator abstractions like the IRepository<T> approach shown above eliminate the ability to use advanced features of your actual persistence tooling. That’s a serious drawback as that disallows the usage of the exact features that allow you to create high performance solutions — and this isn’t specific to using Marten as your backing persistence tooling.
  3. Writing highly performant code can easily mean writing more code as you saw above with the batch querying. The point there being to not automatically opt for the most highly performant approach if it’s unnecessary and more complex than a slower, but simpler approach. Premature optimization and all that.

I’m only showing a small fraction of what the batch query supports, so certainly checkout the documentation for more examples.

2 thoughts on “Batch Querying with Marten

  1. ” we can start working with the initial Account, User, and Order data while Marten is still building out the last Order results”

    What if the account query takes longer than the Order etc. Would we not prefer that for each query to have its own callback to run when complete?

    1. It’s not running the queries in parallel, it’s batching the request and letting you have the results when they’re ready. The play here is that there’s more advantage to reducing the network round trips than parallelizing work.

Leave a comment