Marten Development So Far (Postgresql as Doc Db)

Last week I mentioned that I had started a new OSS project called “Marten” that aims to allow .Net developers treat Postgresql 9.5 (we’re using the new “upsert” functionality ) as a document database using Postgresql’s JSONB data type. We’ve already had some interest and feedback on Github and the Gitter room — plus links to at least three other ongoing efforts to do something similar with Postgresql that I’m interpreting as obvious validation for the basic idea.

Please feel very free to chime in on the approach or requirements here or Github or Gitter. We’re going to proceed with this project regardless at work, but I’d love to see it also be a viable community project with input from outside our little development organization.

What’s Already Done

I’d sum up the Marten work as “so far, so good”. If you look closely into the Marten code, do know that I have been purposely standing the functionality with simple mechanics and naive implementations. My philosophy here is to get the functionality up with good test coverage before starting any heavy optimization work.

As of now:

Our thought is that the main service facade to Marten is the IDocumentSession interface that very closely mimics the same interface in RavenDb. This work is for my day job at Extend Health, and our immediate goal is to move systems off of RavenDb early next year, so I think that design decision is pretty understandable. That doesn’t mean that that’ll be the only way to interact with Marten in the long run.
In the “development mode”, Marten is able to create database tables and an “upsert” stored procedure for any new document type it encounters in calls to the IDocumentSession.
The real DocumentSession facade can store documents, load documents by either a single or array of id’s, and delete documents by the same.
DocumentSession implements a “unit of work” with similar usage to RavenDb’s.
You can completely bypass the Linq provider I’m describing in the next section and just use raw SQL to fetch documents
A DocumentCleaner service that you can use to tear down document data or even the schema objects that Marten builds inside of automated testing harnesses

Linq Support

I don’t think I need to make the argument that Marten is going to be more usable and definitely more popular if it has decent Linq support. While I was afraid that building a Linq provider on top of the Postgresql JSON operators was going to be tedious and hard, the easy to use Relinq library has made it just “tedious.”

As early as next week I’m going to start working over the Linq support and the SQL it generates to try to optimize searching.

The Linq support hangs off of the IDocumentSession.Query<T>() method like so:

        public void query()
        {
            theSession.Store(new Target{Number = 1, DateOffset = DateTimeOffset.Now.AddMinutes(5)});
            theSession.Store(new Target{Number = 2, DateOffset = DateTimeOffset.Now.AddDays(1)});
            theSession.Store(new Target{Number = 3, DateOffset = DateTimeOffset.Now.AddHours(1)});
            theSession.Store(new Target{Number = 4, DateOffset = DateTimeOffset.Now.AddHours(-2)});
            theSession.Store(new Target{Number = 5, DateOffset = DateTimeOffset.Now.AddHours(-3)});

            theSession.SaveChanges();

            theSession.Query<Target>()
                .Where(x => x.DateOffset > DateTimeOffset.Now).ToArray()
                .Select(x => x.Number)
                .ShouldHaveTheSameElementsAs(1, 2, 3);
        }

For right now, the Linq IQueryable support includes:

IQueryable.Where() support with strings, int’s, long’s, decimal’s, DateTime’s, enumeration values, and boolean types.
Multiple or chained Where().Where().Where() clauses like you might use when you’re calculating optional where clauses or letting multiple pieces of code add additional filters
“&&” and “||” operators in the Where() clauses
Deep nested properties in the Where() clauses like x.Address.City == “Austin”
First(), FirstOrDefault(), Single(), and SingleOrDefault() support for the IQueryable
Count() and Any() support
Contains(), StartsWith(), and EndsWith() support for string values — but it’s case sensitive right now. Case-insensitive searches are probably going to be an “up-for-grabs” task;)
Take() and Skip() support for paging
OrderBy() / ThenBy() / OrderByDescending() support

Right now, I’m using my audit of our largest system at work that uses RavenDb to guide and prioritize the Linq support. The only thing missing for us is searching within child collections of a document.

What we’re missing right now is:

Projections via IQueryable.Select(). Right now you have to do IQueryable.ToArray() to force the documents into memory before trying to use Select() projections.
Last() and LastOrDefault()
A lot of things I probably hadn’t thought about at all;-)

Using Roslyn for Runtime Code Compilation

We’ll see if this turns out to be a good idea or not, but as of today Marten is using Roslyn to generate strategy classes that “know” how to build database commands for updating, deleting, and loading document data for each document type instead of using Reflection or IL emitting or compiling Expression’s on the fly. Other than the “warm up” performance hit on doing the very first compilation, this is working smoothly so far. We’ll be watching it for performance. I’ll blog about that separately sometime soon-ish.

Next Week: Get Some Data and Optimize!

My focus for Marten development next week is on getting a non-trivial database together and working on pure optimization. My thought is to grab data from Github using Ocktokit.Net to build a semi-realistic document database of users, repositories, and commits from all my other OSS projects. After that, I’m going to try out:

Using GIN indexes against the jsonb data to see how that works
Trying to selectively duplicate data into normal database fields for lightweight sql searches and indexes
Trying to use Postgresql’s jsonb_to_record functionality inside of the Linq support to see if that makes searches faster
I’m using Newtonsoft.Json as the JSON serializer right now thinking that I’d want the extra flexibility later, but I want to try out Jil too for the comparison
After the SQL generation settles down, try to clean up the naive string concatenation going on inside of the Linq support
Optimize the batch updates through DocumentSession.SaveChanges(). Today it’s just making individual sql commands in one transaction. For some optimization, I’d like to at least try to make the updates happen in fewer remote calls to the database. My fallback plan is to use a *gasp* stored procedure using postgresql’s PLV8 javascript support to take any number of document updates or deletions as a single json payload.

That list above is enough to keep me busy next week, but there’s more in the open Github issue list and we’re all ears about whatever we’ve missed, so feel free to add more feature requests or comment on existing issues.

Why “Marten?”

One of my colleagues was sneering at the name I was using, so I googled for “natural predators of ravens” and the marten was one of the few options, so we ran with it.

9 thoughts on “Marten Development So Far (Postgresql as Doc Db)”

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1959
Pingback: Dew Dump – November 2, 2015 (#2124) | Morning Dew
Bret Ferrier (@runxc1) says:

November 2, 2015 at 10:26 pm

So first off let me start by saying that I love the idea and have been watching PostgreSQL JSONB functionality for quite some time now. I have never used RavenDB but have used MongoDB very extensively and really liked that it doesn’t force the Unit of Work pattern on you. The Unit of Work Pattern is very useful but in many of the more complex situations where there are multiple databases/caches/external services I often see it brake down and find it a lot more useful to not use it as the ‘Unit of Work’ does not just affect one database and not all of them can be rolled back so work has to be paused and retried/queued etc.

If you haven’t checked out MongoDB and some of its increment, upsert, etc commands you should take a look as there are a lot more people using MongoDB than RavenDB that might look to converting to use your project for Document Storage.

Matt Honeycutt says:

November 2, 2015 at 10:50 pm

This sounds very cool! Out of curiosity though: what are you guys running into at your day job that’s necessitated migrating from RavenDB to a new, but similar, tool? I ask because I was considering using Raven for an upcoming project…

1. jeremydmiller says:
  
  November 4, 2015 at 4:09 pm
  
  @Matt,
  
  It can be a memory hog if you’re not careful. We’ve had trouble with indexes getting corrupted and having to be rebuilt, which can make the entire db effectively unresponsive until that completes. Performance isn’t the best, the stale data issues can be a killer, DevOps things have been bad for us.
  
  1. Matt Honeycutt says:
    
    November 4, 2015 at 4:17 pm
    
    Yikes. I’m glad I asked. 🙂
cthames says:

November 6, 2015 at 5:35 pm

FYI: Keith recently posted about PostgreSQL as a DocumentDB. Not sure if anything in there helps with what you’re doing.
http://www.keithf4.com/document-storage-in-postgresql-open-source-benefits/
GitHub
https://github.com/keithf4/pg_doc_store

1. jeremydmiller says:
  
  November 9, 2015 at 11:38 pm
  
  @cthames,
  
  Thanks for the heads up. Since we started this, I’m seeing a *lot* of these kinds of projects. I’m hoping that’s an indication that we’re on to something here.
  
  I’ll happily be borrowing support code for the full text support and whatever hilo/star id generator code we can find.
  
  – Jeremy
  
Pingback: Optimizing for Performance in Marten | The Shade Tree Developer