Postgresql as a Document Db for .Net Development

I’m one of those guys who normally doesn’t like to talk much about new OSS projects until there’s a lot to show, but just for fun this time, I’m gonna talk about something that I’ve just barely started in the hopes of getting some feedback and because there’s already been some interest from outside my company. Besides, it’s not like the ways I’ve ran OSS projects in the past have been all that successful anyway.

We use RavenDb at work in several projects, and while I still think there are some great features and attributes in RavenDb for easy development, it hasn’t held up very well in production usage and we want to replace it next year. I’ve gotten to spend some time over the past couple weeks laying out the skeleton of a new project on GitHub we’re calling “Marten” that will in theory allow us to treat Postgresql as a document database for .Net development.

We want to keep what we see as the advantages of RavenDb:

  • Schema-less development based on our objects without any kind of ORM mapping or limitations on object structure
  • The ability to quickly get a clean database per automated test for reliable testing
  • Linq support — I’ve already gotten some Linq support for basic operators using Re-linq and I’ve been pleasantly surprised at how well that went.
  • Batched updates and the built in unit of work — my working theory is to use DbDataAdapter’s to rig up batched updates
  • Defered and/or batched queries — at least one of our apps is getting killed by network chattiness, so this is going to be a pretty high priority

In the end, what we’d really like to have is all the development advantages of RavenDb and document databases, but have full ACID support, all the DevOps tooling that already exists around Postgresql, and sit on top of a proven database engine.

Roadmap and Contributing

I’ve done enough spiking and proof of concept type work to feel like this is viable — pending performance testing down the road of course. I spent this morning trying to write up my thoughts on where we should go with thing into the GitHub issue list mostly as a way to start a detailed conversation about what this thing should be and where it’s going to go. If you’ve got any opinions, we’d love to hear them either on individual issues or in the Gitter room.

Roughly speaking, the features we’re thinking about are:

  • Support basic document saving and retrieval through a new IDocumentSession service facade purposely modeled after RavenDb’s
  • Basic Linq support against documents
  • The ability to bypass Linq and provide the raw SQL yourself when necessary (already working)
  • Schema creation and migration support for deployments
  • Read side/view projections in the database?
  • Some way to define and use indexes in queries
  • For lack of a better term, “Stored Procedures” that let you generate SQL queries from a convoluted Linq expression once and reuse across requests
  • Maybe make this thing a plugin or separate provider for EF7. I’m not sure there’s a technical reason to do that yet, you you know it’d make a lot more people interested in this thing

Maybe just as a vanity project for my satisfaction, but also build an EventStore capability including user-defined projects into Marten using Postgresql’s ability to embed Javascript.

If you have any interest in contributing or following this thing, hit us up in the Gitter room or start weighing in on GitHub issues.

Marten in Action

To see the itty bit that’s done so far in action, say that you have a .Net type representing a document like this one from my test project:

    // The IDocument interface is just a temporary crutch
    // for now. It won't be necessary in the end
    public class User : IDocument
    {
        public User()
        {
            Id = Guid.NewGuid();
        }

        public Guid Id { get; set; }

        public string FirstName { get; set; }
        public string LastName { get; set; }

        public string FullName
        {
            get { return "{0} {1}".ToFormat(FirstName, LastName); }
        }
    }

Starting from a blank Postgresql 9.5 schema (because we’re already depending on the new “upsert” capabilities) with Marten’s version of IDocumentSession, I’ll create a new User object, save it, then load a new copy of it from the database by its id:

        public void persist_and_reload_a_document()
        {
            var user = new User { FirstName = "James", LastName = "Worthy" };

            // theSession is Marten's IDocumentSession service
            theSession.Store(user);
            theSession.SaveChanges();

            // Marten is NOT coupled to StructureMap, but
            // I found it convenient to use StructureMap for object assembly
            // in the tests
            using (var session2 = theContainer.GetInstance<IDocumentSession>())
            {
                session2.ShouldNotBeSameAs(theSession);

                var user2 = session2.Load<User>(user.Id);

                user.ShouldNotBeSameAs(user2);
                user2.FirstName.ShouldBe(user.FirstName);
                user2.LastName.ShouldBe(user.LastName);
            }
        }

Behind the scenes, Marten sees that it doesn’t have a preexisting table to store User documents, so it quietly makes us one like this:

CREATE TABLE public.mt_doc_user
(
  id uuid NOT NULL,
  data jsonb NOT NULL,
  CONSTRAINT pk_mt_doc_user PRIMARY KEY (id)
)

Right now, we’re only adding an Id field as the primary key and a second JSONB field to hold the actual document representation. Later on we’ll probably add timestamps, version numbers, or duplicate selected fields in the document structure for more efficient querying and indexing.

Why didn’t you use…

Because a flood of “why not Y” questions inevitably follow any statement of “we chose X”:

  • I’ve seen too many stories about MongoDb losing data and Postgresql v. MongoDb performance comparisons.
  • SimpleDb does look cool, but I’m not a huge fan of their query language and for some crazy reason, our organization (including me) is suddenly being very conservative about trying newer databases.
  • I need to do more research on Kafka before I can answer that one
  • I really don’t want to have to fall back all the way to developing applications primarily on an RDBMS. I’ve had enough of heavy ORM’s, the only somewhat more palatable micro-ORM’s, and writing procedural code using raw tabular data.

Other Reading

The name “Marten” has already stuck in conversations at work, so we’re keeping it for now. Besides, look how cute martens are:

marten6

Advertisements

14 thoughts on “Postgresql as a Document Db for .Net Development

  1. David Boike

    Why a uuid for the primary key instead of a Raven-like string document id supporting natural keys? Does Postgres not have the same issues efficiently storing UUIDs as keys that Raven does?

    Reply
    1. jeremydmiller Post author

      Guid’s are just easy to get started because you can make them in memory w/o collisions. I’ve got some issues logged to make sure that Marten can handle numbers or strings as the document Id.

      I’m not aware of any issues with UUID’s and Postgresql, but that just means that I don’t know yet;)

      Reply
  2. trailmax

    Wow, something I look forward to see in production. I have used RavenDb and sometimes it is a very frustrating experience, especially the documentation. I know Raven can do a lot of things, but for the most part I don’t need all of it’s gun power. Sometimes I need a way to save 20 objects somewhere so I can retrieve them later and what you are doing sounds just like the way to go!

    Reply
  3. Rob Conery

    Hey Jeremy – I see you linked to some posts but wanted you know I made a nice set of functions that can really take the pain out of the lack of a coherent API. In essence, they allow PG to mimic Mongo if you will. But Postgres has full text and it’s built right in (have a look around!).

    Anyway – they’re here: https://github.com/robconery/pg_docs_api. Also, RE APIs, I have a nice generator if you want a snowflake style one 🙂 – http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/.

    Good luck and have fun!

    Reply
    1. jeremydmiller Post author

      @Rob,

      I’ll absolutely take you up on the id generator. That wasn’t particularly something I was excited to write;)

      So far so good with this thing. Right now the only thing we’re really concerned about is to try out some kind of non-trivial load with it.

      Thanks,

      Jeremy

      Reply
    1. jeremydmiller Post author

      @kelsojim,

      Shame on me, but I wasn’t aware of your project before yesterday. Yeah, it does look like what we’re doing is similar, and that’s not too shocking. I’m going to be out the first half of next week with a little conference, but talk after that maybe?

      And congrats on your new addition!

      -Jeremy

      Reply
  4. Pingback: Marten Development So Far (Postgresql as Doc Db) | The Shade Tree Developer

  5. Pingback: Optimizing for Performance in Marten | The Shade Tree Developer

  6. Pingback: JasperFx/Marten | GITROOM

  7. Pingback: Optimizing Marten Part 2 | The Shade Tree Developer

  8. Pingback: Marten is Ready for Early Adopters | The Shade Tree Developer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s