Schema Management with Marten (Why document databases rock)

This blog post describes the preliminary support and thinking behind how we’ll manage schema changes to production using Marten. I’m writing this to further the conversation one of our teams is having about how best to accomplish this. Expect the details of how this works to change after we face real world usages for awhile;)

Some of our projects at work are transitioning from RavenDb to an OSS project I help lead called Marten that uses Postgresql as a fully fledged document database. Among the very large advantages of document databases over relational databases is how much simpler it is to evolve a system over time because it takes so much less mechanical work to keep your document database synchronized to the application code.

Exhibit #1 in the case against relational databases is the need for laboriously tracking database migrations (assuming you give a damn about halfway decent software engineering in regards to your database).

Let’s compare the steps in adding a new property to one of your persisted objects in your system. Using a relational database with any kind of ORM (even if it describes itself as “micro” or “simple”), your steps in some order would be to:

  1. Add the new property
  2. Add a migration script that adds a new column to your database schema
  3. Change your ORM mapping or SQL statements to reflect the new property

Using a document database approach like Marten’s, you’d:

  1. Add the new property and continue on with your day

Notice which list is clearly shorter and simpler — not to mention less error prone for that matter.

Marten does still need to create matching schema objects in your Postgresql database, and it’s unlikely that any self-respecting DBA is going to allow your application to have rights to execute schema changes programmatically, so we’re stuck needing some kind of migration strategy as we add document types, Javascript transformations, and retrofit indexes. Fortunately, we’ve got a decent start on doing just that that’s demonstrated below:

 

Just Get Stuff Done in Development!

As long as you have rights to alter your Postgresql database, you can happily set up Marten in one of the “AutoCreate” modes and not worry about schema changes at all as you happily code new features and change existing document types:

var store = DocumentStore.For(_ =>
{
    // Marten will create any new objects that are missing,
    // attempt to update tables if it can, but drop and replace
    // tables that it cannot patch. 
    _.AutoCreateSchemaObjects = AutoCreate.All;


    // Marten will create any new objects that are missing or
    // attempt to update tables if it can. Will *never* drop
    // any existing objects, so no data loss
    _.AutoCreateSchemaObjects = AutoCreate.CreateOrUpdate;


    // Marten will create missing objects on demand, but
    // will not change any existing schema objects
    _.AutoCreateSchemaObjects = AutoCreate.CreateOnly;
});

As long as you’re using a permissive auto creation mode, you should be able to code in your application model and let Marten change your development database as needed behind the scenes.

Patching Production Databases

In the next section, I demonstrate how to dump the entire data definition language (DDL) that matches your Marten configuration as if you were starting from an empty database, but first, I want to focus on how to make incremental changes between production or staging releases.

In the real world, you’re generally not going to allow your application to willy nilly make changes to the running schema and you’ll be forced into this setting:

var store = DocumentStore.For(_ =>
{
    // Marten will not create or update any schema objects 
    // and throws an exception in the case of a schema object
    // not reflecting the Marten confi
guration
    _.AutoCreateSchemaObjects = AutoCreate.None;
});

This leaves us with the problem of how to get our production database matching however we’ve configured Marten in our application code. At this point, our theory is that we’ll use the “WritePatch” feature to generate delta DDL files:

IDocumentStore.Schema.WritePatch(string file);

When this is executed against a configured Marten document store, it will loop through all of the known document types, javascript transforms, the event store usage, and check the configured storage against the actual database. Marten writes two files, one to move your schema “up” to match the configured document store, and a second “drop” file that would rollback your database schema to reverse the changes in the “up” file.

The patching today is able to:

  1. Add all new tables, indexes, and functions
  2. Detect when a generated function has changed and rebuild it after dropping the old version
  3. Determine which indexes are new or modified and generate the necessary DDL to match
  4. Add the event store schema objects if they’re active and missing
  5. Add the database objects Marten needs for its “Hilo” identity strategy

This is very preliminary, but my concept of how we’ll use this in real life (admittedly with some gaps) is to:

  • Use “AutoCreateSchemaObjects = AutoCreate.All” in development and CI and basically not worry at all about incremental schema changes.
  • For each deployment to staging or production, we’ll use the WritePatch() method shown above to generate a patch SQL file that will then be committed to Git.
  • I’m assuming that the patch SQL files generated by Marten could feed into a real database migration tool like RoundhousE, and we would incorporate RoundhousE into our automated deployments to execute the “up” scripts to the most current database version.

 

 

Dump all the Sql

If you just want a database script that will build all the necessary schema objects for your Marten configuration, you can export either a single file:

// Export the SQL to a file
store.Schema.WriteDDL("my_database.sql");

Or write a SQL file for each document type and functional area of Marten to a directory like this:

// Or instead, write a separate sql script
// to the named directory
// for each type of document
store.Schema.WriteDDLByType("some folder");

In the second usage, Marten also writes a file called “all.sql” that executes the constituent sql files in the correct order just in case you’re using Marten’s support for foreign keys between document types.

The SQL dumps from the two methods shown above will write out every possible database schema object necessary to support your Marten configuration (document types, the event store, and a few other things) including tables, the generated functions, indexes, and even a stray sequence or two.

Relational Databases are the Buggy Whips of Software Development

I think that there’s going to be a day when you tell your children stories about how we built systems against relational databases with ORM’s or stored procedures or hand written SQL and they’re going to be appalled at how bad we had it, much like I did when my grandfather told me stories about ploughing with a horse during the Great Depression.

New Marten Release and What’s Next?

I uploaded a new Nuget for Marten v0.9.2 yesterday with a couple new features and about two weeks worth of bug fixes and some refinements. You can find the full list of issues and pull requests in this release from the v0.9.1 and v0.9.2 milestones in GitHub.

The highlight of this release in terms of raw usability is probably some overdue improvements to Marten’s underlying schema management:

  1. Marten can detect when the configured upsert functions are missing or do not match the configuration and rebuild them
  2. Marten can detect missing or changed indexes and make the appropriate updates.

Some other things that are new:

  • There’s now a synchronous batch querying option
  • You can now use the AsJson() Linq operator in combination with Select() transforms (this is going to get its own blog post soon-ish).
  • The default transaction isolation level is ReadCommitted
  • It won’t provide much value until there’s more there, but I’ve added some rolling buffer queueing support for being able to do asynchronous projections in the event store. There’ll be a blog post about that one soon just to see if I can trick some of you into being technical reviewers or contributors on that one;)

The two big features are discussed below:

Paging Support

We flat out copied part of RavenDb’s paging support for more efficient paging support. Take the example of showing a large data set in a user interface and wanting to do that one page at a time. You need to know how many total documents match the query criteria to be able to present an accurate paging bar. Fortunately, you can get that total number now without making a second round trip to the database with this syntax:

// We're going to use stats as an output
// parameter to the call below, so we
// have to declare the "stats" object
// first
QueryStatistics stats = null;

var list = theSession
    .Query<Target>()
    .Stats(out stats)
    .Where(x => x.Number > 10).Take(5)
    .ToList();

list.Any().ShouldBeTrue();

// Now, the total results data should
// be available
stats.TotalResults.ShouldBe(count);

In combination with the existing support for the Take() and Skip() Linq operators, you should have everything you need for efficient paging with Marten.

Include() inside of Compiled Queries

The Include() feature is now usable from within the compiled query feature, so finally, two of our best features for optimizing data access can work together. Below is a sample:

public class IssueByTitleIncludingUsers : ICompiledQuery<Issue>
{
    public string Title { get; set; }
    public User IncludedAssignee { get; private set; } = new User();
    public User IncludedReported { get; private set; } = new User();
    public JoinType JoinType { get; set; } = JoinType.Inner;

    public Expression<Func<IQueryable<Issue>, Issue>> QueryIs()
    {
        return query => query
            .Include<Issue, IssueByTitleIncludingUsers>(x => x.AssigneeId, x => x.IncludedAssignee, JoinType)
            .Include<Issue, IssueByTitleIncludingUsers>(x => x.ReporterId, x => x.IncludedReported, JoinType)
            .Single(x => x.Title == Title);
    }
}

 

What’s Next?

Besides whatever bug fixes come up, I think the next things I’m working on for the document database support are soft deletes, bulk insert improvements, and finally getting a versioned document story going. On the event store side of things, it’s all about projections. We’ll have a working asynchronous projection feature in the next release, maybe support for arbitrary categorization inside of aggregated projections, and some preliminary support for Javascript projections.

Got other requests, needs, or problems with Marten? Tell us all about it anytime in the Gitter room.

 

Marten v0.9 is Out!

The Marten community made a big, big release today that marks a big milestone for Marten’s viability as a usable product. I just uploaded v0.9 to Nuget, and published all the outstanding documentation updates to our project website.

A lot of folks contributed to this release, and I’ll inevitably miss several names, so I’m just going to issue a huge thank you to everyone who put in pull requests or provided valuable feedback in our Gitter room or helped with the documentation updates. Marten is well on its way to being the most positive OSS experience I’ve ever had.

While you can see the full list of changes, fixes, and improvements in the GitHub milestone (81 issues!), the big highlights are:

  1. The Event Store feature is finally ready for very early adopters with some basic projection support. I discussed the vision for this functionality in a blog post yesterday. The very early feedback suggests that the existing API is in for some changes, but I’d still like to get more folks to look at what we have.
  2. Compiled Queries to avoid re-parsing Linq queries
  3. The Include() feature for more efficient data fetching
  4. Select() transformations inside of the Linq support
  5. Improved new ways to fetch the raw JSON for documents with the new AsJson() method.
  6. The ability to configure which schema Marten will use for document storage or the event store as part of the DocumentStore options. That was a frequent user request, and thanks to Tim Cools, we’ve got that now.
  7. Ability to express foreign key relationships between document types. Hey, Postgresql is a relational database, so why not take advantage of that when it’s valuable?
  8. More efficient asynchronous querying internals
  9. Linq support for aggregate functions like Average(), Min(), Max(), and Sum()

 

Next Steps

Other than the event store functionality that’s just getting off the ground, I think the pace of Marten development is about to slow down and be more about refinements than adding lots of new functionality. I’m hoping to switch to dribbling out incremental releases every couple weeks until we’re ready to declare 1.0.

 

Marten as an Event Store — Now and the Future Vision

The code shown in this post isn’t quite out on Nuget yet, but will be part of Marten v0.9 later this week when I catch up on documentation.

From the very beginning, we’ve always intended to use Marten and Postgresql’s JSONB support as the technical foundation for event sourcing persistence. The document database half of Marten has been a more urgent need for my company, but lately we have a couple project teams and several folks in our community interested in seeing the event store functionality proceed. Between a couple substantial pull requests and me finally getting some time to work on it, I think we finally have a simplistic event store implementation that’s ready for early adopters to pull it down and kick the tires on it.

Why a new Event Store?

NEventStore has been around for years, and we do use an older version on one of our big applications. It’s working, but I’m not a big fan of how it integrates into the rest of our architecture and it doesn’t provide any support for “readside projections.” Part of the rationale behind Marten’s event store design is the belief that we could eliminate a lot of our hand coded projection support in a big application.

There’s also GetEventStore as a standalone, language agnostic event store based on an HTTP protocol. While I like many elements of GetEventStore and am taking a lot of inspiration from that tool for Marten’s event store usage, we prefer to have our event sourcing based on the Postgresql database instead of a standalone store.

Why, you ask?

  • Since we’re already going to be using Postgresql, this leads to fewer things to deploy and monitor
  • We’ll be able to just use all the existing DevOps tools for Postgresql like replication and backups without having to bring in anything new
  • Building on a much larger community and a more proven technical foundation
  • Being able to do event capture, document database changes, and potentially just raw SQL commands inside of the same native database transaction. No application in our portfolio strictly uses event sourcing as the sole persistence mechanism, and I think this makes Marten’s event sourcing feature compelling.
  • The projection documents published by Marten’s event store functionality are just more document types and you’ll have the full power of Marten’s Linq querying, compiled querying, and batched querying to fetch the readside data from the event storage.

 

Getting Started with Simple Event Capture

Marten is just a .Net client library (4.6 at the moment), so all you need to get started is to have access to a Postgresql 9.4/9.5 database and to install Marten via Nuget.

Because I’ve read way too much epic fantasy series, my sample problem domain is an application that records, analyses, and visualizes the status of quests. During a quest,  you may want to record events like:

  • QuestStarted
  • MembersJoined
  • MembersDeparted
  • ArrivedAtLocation

With Marten, you would want to describe these events with simple, serializable DTO classes like this:

public class MembersJoined
{
    public MembersJoined()
    {
    }

    public MembersJoined(int day, string location, params string[] members)
    {
        Day = day;
        Location = location;
        Members = members;
    }

    public int Day { get; set; }

    public string Location { get; set; }

    public string[] Members { get; set; }

    public override string ToString()
    {
        return $"Members {Members.Join(", ")} joined at {Location} on Day {Day}";
    }
}

 

Now that we have some event classes identified and built, we can start to store a “stream” of events for a given quest as shown in this code below:

 

var store = DocumentStore.For("your connection string");

var questId = Guid.NewGuid();

using (var session = store.OpenSession())
{
    var started = new QuestStarted {Name = "Destroy the One Ring"};
    var joined1 = new MembersJoined(1, "Hobbiton", "Frodo", "Merry");

    // Start a brand new stream and commit the new events as 
    // part of a transaction
    session.Events.StartStream(questId, started, joined1);
    session.SaveChanges();

    // Append more events to the same stream
    var joined2 = new MembersJoined(3, "Buckland", "Merry", "Pippen");
    var joined3 = new MembersJoined(10, "Bree", "Aragorn");
    var arrived = new ArrivedAtLocation { Day = 15, Location = "Rivendell" };
    session.Events.Append(questId, joined2, joined3, arrived);
    session.SaveChanges();
}

Now, if we want to fetch back all of the events for our new quest, we can do that with:

using (var session = store.OpenSession())
{
    // events are an array of little IEvent objects
    // that contain both the actual event object captured
    // previously and metadata like the Id and stream version
    var events = session.Events.FetchStream(questId);
    events.Each(evt =>
    {
        Console.WriteLine($"{evt.Version}.) {evt.Data}");
    });
}

When I execute this code, this is the output that I get:

1.) Quest Destroy the One Ring started
2.) Members Frodo, Merry joined at Hobbiton on Day 1
3.) Members Merry, Pippen joined at Buckland on Day 3
4.) Members Aragorn joined at Bree on Day 10
5.) Arrived at Rivendell on Day 15

At this point, Marten is assigning a Guid id, timestamp, and version number to each event (but thanks to a recent pull request, you do not have to have an Id property or field on your event classes). I didn’t show it here, but you can also fetch all of the events for a stream by the version number or timestamp to perform historical state queries.

 

Projection Support

What I’m showing in this section is brand spanking new and isn’t out on Nuget yet, but will be as Marten v0.9 by the middle of this week (after I get the documentation updated).

So raw event streams might be useful to some of you, but we think that Marten will be most useful when you combine the raw event sourcing with “projections” that create parallel “readside” views of the event data suitable for consumption in the rest of your application for concerns like validation, business logic, or supplying data through HTTP services.

Let’s say that we need a view of our quest data just to see what the current member composition of our quest party is. In our event store usage, you would create an “aggregate” document class and teach Marten how to update that aggregate based on event data types. The easiest way to expose the aggregation right now is to expose public “Apply([event type])” methods like the QuestParty class shown below:

public class QuestParty
{
    private readonly IList _members = new List();

    public string[] Members
    {
        get
        {
            return _members.ToArray();
        }
        set
        {
            _members.Clear();
            _members.AddRange(value);
        }
    }

    public IList Slayed { get; } = new List();

    public void Apply(MembersJoined joined)
    {
        _members.Fill(joined.Members);
    }

    public void Apply(MembersDeparted departed)
    {
        _members.RemoveAll(x => departed.Members.Contains(x));
    }

    public void Apply(QuestStarted started)
    {
        Name = started.Name;
    }

    public string Name { get; set; }

    public Guid Id { get; set; }

    public override string ToString()
    {
        return $"Quest party '{Name}' is {Members.Join(", ")}";
    }
}

Without any further configuration, I could create a live aggregation of a given quest stream calculated on the fly by using this syntax:

using (var session = store.OpenSession())
{
    var party = session.Events.AggregateStream(questId);
    Console.WriteLine(party);
}

When I execute the code above for our new “Destroy the One Ring” stream of events, this is the output:

Quest party 'Destroy the One Ring' is Frodo, Merry, Pippen, Aragorn

Great, but since it’s potentially expensive to calculate the QuestParty from large streams of events, we may opt to have the aggregated view built in our Marten database ahead of time. Let’s say that we’re okay with just having the aggregate view updated every time a quest event stream is appended to. For that case, you can register a “live” aggregation in your DocumentStore initialization with this code:

var store2 = DocumentStore.For(_ =>
{
    _.Events.AggregateStreamsInlineWith<QuestParty>();
});

With the configuration above, the “QuestParty” view of a stream of related quest events is updated as part of the transaction that is persisting the new events being appended to the event log. If we were using this configuration, then we would be able to query Marten for the “QuestParty” as just another document type:

If your browser is cutting off the code formatting, the syntax is “session.Load<QuestParty>(questId)” down below:

using (var session = store.OpenSession())
{
    var party = session.Load<QuestParty>(questId);
    Console.WriteLine(party);

    // or
    var party2 = session.Query<QuestParty>()
        .Where(x => x.Name == "Destroy the One Ring")
        .FirstOrDefault();
}

Our Projections “Vision”

A couple years ago, I got to do what turned into a proof of concept project for building out an event store on top of Postgresql’s JSON support. My thought for Marten’s projection support is largely taken from this blog post I wrote on the earlier attempt at writing an event store on Postgresql.

Today the projection ability is very limited. So far you can use the live or “inline” aggregation of a single stream shown above or a simple pattern that allows you to create a single readside document for a given event type.

The end state we envision is to be able to allow users to:

  • Express projections in either .Net code or by using Javascript functions running inside of Postgresql itself
  • To execute the projection building either “inline” with event capture for pure ACID, asynchronously for complicated aggregations or better performance (and there comes eventual consistency back into our lives), or do aggregations “live” on demand. We think that this break down of projection timings will give users the ability to handle systems with many writes, but few reads with on demand projections, or to handle systems with few writes, but many reads with inline projections.
  • To provide and out of the box “async daemon” that you would host as a stateful process within your applications to continuously calculate projections in the background. We want to at least experiment with using Postgresql’s NOTIFY/LISTEN functionality to avoid making this a purely polling process.
  • Support hooks to perform your own form of event stream processing using the existing IDocumentSessionListener mechanism and maybe some way to plug more processors into the queue reading in the async daemon described above
  • Add some “snapshotting” functionality that allows you to perform aggregated views on top of occasional snapshots every X times an event is captured on an aggregate
  • Aggregate data across streams
  • Support arbitrary categorization of events across streams

 

 

Anything I didn’t cover that you’re wondering about — or just want to give us some “constructive feedback” — please feel free to pop into Marten’s Gitter room since we’re talking about this today anyway;)

The compiled query feature in Marten and why it rocks.

The “compiled query” feature is a brand new addition to Marten (as of v0.8.9), and one we think will have some very positive impact on the performance of our systems. I’m also hopeful that using this feature will make some of our internal code easier to read and understand. Down the line, the combination of compiled queries with the batch querying support should be the foundation of a decent dynamic, aggregated query mechanism to support our React/Redux based client architectures (with a server side implementation of Falcor maybe?). Big thanks and a shoutout to Corey Kaylor for adding this feature.

Linq is easily one of the most popular features in .Net and arguably the one thing that other platforms strive to copy. We generally like being able to express document queries in compiler-safe manner, but there is a non-trivial cost in parsing the resulting Expression trees and then using plenty of string concatenation to build up the matching SQL query. Fortunately, as of v0.8.10, Marten supports the concept of a Compiled Query that you can use to reuse the SQL template for a given Linq query and bypass the performance cost of continuously parsing Linq expressions.

All compiled queries are classes that implement the ICompiledQuery<TDoc, TResult> interface shown below:

    public interface ICompiledQuery<TDoc, TOut>
    {
        Expression<Func<IQueryable<TDoc>, TOut>> QueryIs();
    }

In its simplest usage, let’s say that we want to find the first user document with a certain first name. That class would look like this:

public class FindByFirstName : ICompiledQuery<User, User>
{
    public string FirstName { get; set; }

    public Expression<Func<IQueryable<User>, User>> QueryIs()
    {
        return q => q.FirstOrDefault(x => x.FirstName == FirstName);
    }
}

So a couple things to note in the class above:

  1. The QueryIs() method returns an Expression representing a Linq query
  2. FindByFirstName has a property (it could also be just a public field) called FirstName that is used to express the filter of the query

To use the FindByFirstName query, just use the code below:

            var justin = theSession.Query(new FindByFirstName {FirstName = "Justin"});

            var tamba = await theSession.QueryAsync(new FindByFirstName {FirstName = "Tamba"});

Or to use it as part of a batched query, this syntax:

var batch = theSession.CreateBatchQuery();

var justin = batch.Query(new FindByFirstName {FirstName = "Justin"});
var tamba = batch.Query(new FindByFirstName {FirstName = "Tamba"});

await batch.Execute();

(await justin).Id.ShouldBe(user1.Id);
(await tamba).Id.ShouldBe(user2.Id);

How does it work?

The first time that Marten encounters a new type of ICompiledQuery, it executes the QueryIs() method and:

  1. Parses the Expression just to find which property getters or fields are used within the expression as input parameters
  2. Parses the Expression with our standard Linq support and to create a template database command and the internal query handler
  3. Builds up an object with compiled Func’s that “knows” how to read a query model object and set the command parameters for the query
  4. Caches the resulting “plan” for how to execute a compiled query

On subsequent usages, Marten will just reuse the existing SQL command and remembered handlers to execute the query.

What is supported?

To the best of our knowledge and testing, you may use any Linq feature that Marten supports within a compiled query. So any combination of:

  • Select() transforms
  • First/FirstOrDefault()
  • Single/SingleOrDefault()
  • Where()
  • OrderBy/OrderByDescending etc.
  • Count()
  • Any()

At this point (v0.9), the only limitations are:

  1. You cannot yet incorporate the Include’s feature with compiled queries, but there is an open GitHub issue you can use to track progress on adding this feature.
  2. You cannot use the Linq ToArray() or ToList() operators. See the next section for an explanation of how to query for multiple results

Querying for multiple results

To query for multiple results, you need to just return the raw IQueryable<T> as IEnumerable<T> as the result type. You cannot use the ToArray() or ToList() operators (it’ll throw exceptions from the Relinq library if you try). As a convenience mechanism, Marten supplies these helper interfaces:

If you are selecting the whole document without any kind of Select() transform, you can use this interface:

    public interface ICompiledListQuery<TDoc> : ICompiledListQuery<TDoc, TDoc>
    {
    }

A sample usage of this type of query is shown below:

    public class UsersByFirstName : ICompiledListQuery<User>
    {
        public static int Count;
        public string FirstName { get; set; }

        public Expression<Func<IQueryable<User>, IEnumerable<User>>> QueryIs()
        {
            // Ignore this line, it's from a unit test;)
            Count++;
            return query => query.Where(x => x.FirstName == FirstName);
        }
    }

If you do want to use a Select() transform, use this interface:

    public interface ICompiledListQuery<TDoc, TOut> : ICompiledQuery<TDoc, IEnumerable<TOut>>
    {
    }

A sample usage of this type of query is shown below:

    public class UserNamesForFirstName : ICompiledListQuery<User, string>
    {
        public Expression<Func<IQueryable<User>, IEnumerable<string>>> QueryIs()
        {
            return q => q
                .Where(x => x.FirstName == FirstName)
                .Select(x => x.UserName);
        }

        public string FirstName { get; set; }
    }

Querying for a single document

Finally, if you are querying for a single document with no transformation, you can use this interface as a convenience:

    public interface ICompiledQuery<TDoc> : ICompiledQuery<TDoc, TDoc>
    {
    }

And an example:

    public class FindUserByAllTheThings : ICompiledQuery<User>
    {
        public string Username { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public Expression<Func<IQueryable<User>, User>> QueryIs()
        {
            return query =>
                    query.Where(x => x.FirstName == FirstName && Username == x.UserName)
                        .Where(x => x.LastName == LastName)
                        .Single();

        }
    }

 

 

 

 

Optimizing Marten Performance by using “Include’s”

Continuing my series of short blog posts about concepts or new features in Marten, this time I’m going to show how to use the brand new IQueryable<T>().Include() feature to improve performance by reducing the number of network round trips to the underlying Postgresql database. Check out the Marten tag for related blog posts.

In one of the formative experiences of my early software career, our instructor repeated the phrase “network round trips are evil” as a kind of mantra. The point, as I quickly learned then and plenty of times later, is that a “chatty” system making lots of network round trips between boxes can be very slow if you’re not careful.

To that end, many of our early adopters and would be adopters said that they would switch to Marten if only we had the Include() feature from RavenDb. The point of this feature is to reduce network round trips to the database server from your application by being able to fetch related documents at the same time.

Jumping right to a concrete example, let’s say that your domain has two document types, one called “Issue” and another called “User.” In this case the Issue maybe have logical links to the assigned user responsible for addressing the issue, partially shown below:

    public class Issue
    {
        // The AssigneeId would be the Id of the
        // related User document
        public Guid? AssigneeId { get; set; }

    }

If I want to load an Issue by its Id, but also get the assigned User at the same time, I can use Marten’s new “Include()” feature inspired by RavenDb and NHibernate’s QueryOver mechanism:

[Fact]
public void simple_include_for_a_single_document()
{
    var user = new User();
    var issue = new Issue {AssigneeId = user.Id, Title = "Garage Door is busted"};

    theSession.Store<object>(user, issue);
    theSession.SaveChanges();

    using (var query = theStore.QuerySession())
    {
        User included = null;
        var issue2 = query.Query<Issue>()
            // Using the call below, Marten will execute
            // the supplied callback to pass back the related
            // User document assigned to the Issue
            .Include(x => x.AssigneeId, x => included = x)
            .Where(x => x.Title == issue.Title)
            .Single();

        included.ShouldNotBeNull();
        included.Id.ShouldBe(user.Id);

        issue2.ShouldNotBeNull();

        // All of this was done with exactly one call to Postgresql
        query.RequestCount.ShouldBe(1);
    }
}

The actual SQL statement sent to Postgresql in the code above would be:

select d.data, d.id, assignee_id.data, assignee_id.id from public.mt_doc_issue as d INNER JOIN public.mt_doc_user as assignee_id ON CAST(d.data ->> 'AssigneeId' as uuid) = assignee_id.id where d.data ->> 'Title' = :arg0 LIMIT 1

In the code above, I’m using the fairly new “Include()” statement to direct Marten to fetch the related User document at the same time it’s retrieving the Issue. We deviated somewhat from RavenDb in this feature. Instead of just adding the included documents to the internal identity map and expecting the user to just “know” that they are cached, we opted to make the included documents accessible to the caller through either:

  1. Passing a callback function into the Include() method
  2. Passing an IList<T>, where T is the included document type, into Include(). In this case, Marten will fill the list with all the included documents found.
  3. Passing an IDictionary<TKey, T> into the Include() method that will be filled by the Id of the included documents found

 

Since the pace of development on Marten is temporarily outpacing my efforts at keeping the documentation website completely up to date, the best resource for seeing what’s possible with the Include() functionality is our acceptance tests.

Let me end with a couple salient points about the new Include() functionality:

  • The included documents are resolved through the internal identity map of the current session, so there will not be any duplicates from repeated documents. Think about the case of fetching 100 Issue’s that are all assigned to one of 5 different User’s. In this case, only the 5 reoccurring User documents would be returned.
  • You can do multiple Include()’s on one query
  • The Include() functionality is available in the batched query feature
  • This will be a topic for another post, but Marten already supports the creation of foreign key relationships between documents
  • By default, Marten uses an outer join to fetch the included documents. I didn’t show it above, but there is also an optional argument in the Include() method you can use to force Marten to use a more efficient inner join — but just remember that means that nothing will be returned in the case of a NULL Issue.AssigneeId in the example above.

 

For my next blog post, I’ll talk about our brand new as-of-this-morning “Compiled Query” feature.

Select Projections in Marten

When I did the DotNetRocks episode on Marten awhile back, they asked me what I thought the biggest holes in Marten functionality and where we would take it next. The first thing that came to my mind was the “read side.” By that I meant built in functionality to transform the raw documents stored with Marten into the shape needed for API’s, business functionality, and web pages. Fortunately, the very latest versions of Marten add some important functionality to support the Linq Select() keyword for fetching document data with transformations.

See CQRS from Martin Fowler for more context on where and how the “read side” fits into a software architecture.

To make this concrete, let’s say that you only really want one property or field of a stored document. That one field can now be fetched without incurring the cost of deserializing the raw JSON of the whole document. Instead, we’ll just fetch our one field:

        [Fact]
        public void use_select_in_query_for_one_field_and_first()
        {
            theSession.Store(new User { FirstName = "Hank" });
            theSession.Store(new User { FirstName = "Bill" });
            theSession.Store(new User { FirstName = "Sam" });
            theSession.Store(new User { FirstName = "Tom" });

            theSession.SaveChanges();

            theSession.Query<User>().OrderBy(x => x.FirstName).Select(x => x.FirstName)
                .First().ShouldBe("Bill");

        }

Maybe not *that* commonly useful, so what about if you want to select a subset of a large document? If you want to select into a smaller type, you can use code like this:

[Fact]
public void use_select_with_multiple_fields_to_other_type()
{
    theSession.Store(new User { FirstName = "Hank", LastName = "Aaron" });
    theSession.Store(new User { FirstName = "Bill", LastName = "Laimbeer" });
    theSession.Store(new User { FirstName = "Sam", LastName = "Mitchell" });
    theSession.Store(new User { FirstName = "Tom", LastName = "Chambers" });

    theSession.SaveChanges();

    var users = theSession.Query<User>().Select(x => new User2{ First = x.FirstName, Last = x.LastName }).ToList();

    users.Count.ShouldBe(4);

    users.Each(x =>
    {
        x.First.ShouldNotBeNull();
        x.Last.ShouldNotBeNull();
    });
}

In the case above, you are selecting the User document data into another type called “User2.” That’s great of course, but you can also bypass the need for a custom class and just select straight to an anonymous type like this:

[Fact]
public void use_select_with_multiple_fields_in_anonymous()
{
    theSession.Store(new User { FirstName = "Hank", LastName = "Aaron"});
    theSession.Store(new User { FirstName = "Bill", LastName = "Laimbeer"});
    theSession.Store(new User { FirstName = "Sam", LastName = "Mitchell"});
    theSession.Store(new User { FirstName = "Tom", LastName = "Chambers"});

    theSession.SaveChanges();

    var users = theSession.Query<User>().Select(x => new {First = x.FirstName, Last = x.LastName}).ToList();

    users.Count.ShouldBe(4);

    users.Each(x =>
    {
        x.First.ShouldNotBeNull();
        x.Last.ShouldNotBeNull();
    });
}

Finally, if all you need to do is stream the raw JSON data of your transformed documents straight to the HTTP response in a web request, you can skip the unnecessary JSON deserialization and serialization and probably achieve much better throughput in your web application by selecting straight to a JSON string:

[Fact]
public void use_select_to_another_type_and_to_json()
{
    theSession.Store(new User { FirstName = "Hank" });
    theSession.Store(new User { FirstName = "Bill" });
    theSession.Store(new User { FirstName = "Sam" });
    theSession.Store(new User { FirstName = "Tom" });

    theSession.SaveChanges();

    theSession.Query().OrderBy(x => x.FirstName).Select(x => new UserName { Name = x.FirstName })
        .ToListJson()
        .ShouldBe("[{\"Name\" : \"Bill\"},{\"Name\" : \"Hank\"},{\"Name\" : \"Sam\"},{\"Name\" : \"Tom\"}]");
}

In the code above, the JSON of the User document is transformed by Postgresql itself and returned to the caller as a single JSON string.

Not that you necessarily want to do a lot of this by hand (but you always could), the SQL above can be found and printed with this:

    var command = theSession
                .Query()
                .OrderBy(x => x.FirstName)
                .Select(x => new UserName {Name = x.FirstName})
                
                // ToCommand() is a Marten specific extension
                // we use as a diagnostic tool to understand
                // how Marten is treating any given Linq expression
                .ToCommand(FetchType.FetchMany);

    Console.WriteLine(command.CommandText);

That gives us this generated SQL:

select json_build_object('Name', d.data ->> 'FirstName') as json from public.mt_doc_user as d order by d.data ->> 'FirstName'

What’s left to do?

We’ve had a couple bugs with our Select() support from early adopters with permutations I hadn’t thought to test beforehand, so I’m pretty sure that there are more things to iron out. The best way to solve that problem is to just try to get more early users to beat on it and find what’s still missing.

The Select() support today doesn’t yet include any transformations of child collection data or transforming data within a JSON document. We’d probably also want to support selecting data straight out of child collections as well.

The big new thing in our “read side” repertoire is going to be calculated projections. You can follow that work on GitHub.

The Identity Map Pattern in Marten

I’m still a believer in learning and discussing design patterns — even though everyone has seen a naive architect of some kind write stupidly over-engineered code with every possible GoF buzzword possible. That being said, there’s some significant value in having industry wide understanding of common coding solutions and the design pattern names should make it much easier to find information about prior art online. As an aside, I hate it when developers online make an argument against any particular tool or technique because “one time they were on a project where it sucked” without any thought about how it was used or whether or not the problem just wasn’t a good fit for that tool or technique. I think it’s sloppy thinking.

The Identity Map pattern is an important conceptual design pattern underneath many database and persistence libraries, including Marten. I think it’s important to understand because the usage of an identity map can help performance in some cases, hurt your system’s memory utilization in other cases, and quite potentially prevent data integrity and consistency issues.

As usual, I’ll pull the definition of the Identity Map pattern from Martin Fowler’s PEAA book:

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

The purpose of using an identity map is to avoid accidentally making multiple copies of a loaded entity in memory. In the case of a complex operation that is complex enough to be handled by multiple collaborator classes or functions, it is frequently valuable to depend on using a shared identity map between the collaborators to prevent unnecessarily fetching the exact same data from the database more than once.

To see this in action in Marten, consider the following usage in Marten. Let’s say that we have a document called “User” that is identified by a surrogate Guid property. If I open up a new document session with the default configuration, you would see this behavior:

public void using_identity_map()
{
    var container = Container.For<DevelopmentModeRegistry>();
    var store = container.GetInstance<IDocumentStore>();

    var user = new User {FirstName = "Tamba", LastName = "Hali"};
    store.BulkInsert(new [] {user});

    // Open a document session with the identity map
    using (var session = store.OpenSession())
    {
        // Load a user with the same Id will return the very same object
        session.Load<User>(user.Id)
            .ShouldBeTheSameAs(session.Load<User>(user.Id));

        // And to make this more clear, Marten is only making a single
        // database call
        session.RequestCount.ShouldBe(1);
    }
} 

In our applications at work, the IDocumentSesssion (“session” above) that wraps the intenal identity map (and unit of work too) would usually be scoped to a web request in HTTP applications and to a single message in our service bus applications. We do this so that different pieces of middleware code and message handlers would all be using the same identity map to avoid double loading or inconsistent state.

 

Automatic Dirty Checking

A heavier weight flavor of identity map is one that does automatic “dirty checking” to know what documents loaded through the IDocumentSession have been changed in memory and should therefore be persisted when the session is saved.

[Fact]
public void when_querying_and_modifying_multiple_documents_should_track_and_persist()
{
    var user1 = new User { FirstName = "James", LastName = "Worthy 1" };
    var user2 = new User { FirstName = "James", LastName = "Worthy 2" };
    var user3 = new User { FirstName = "James", LastName = "Worthy 3" };

    theSession.Store(user1);
    theSession.Store(user2);
    theSession.Store(user3);

    theSession.SaveChanges();

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>().Where(x => x.FirstName == "James").ToList();

        // Mutating each user
        foreach (var user in users)
        {
            user.LastName += " - updated";
        }

        // Persisting the session will save all the documents
        // that have changed
        session2.SaveChanges();
    }

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>()
            .Where(x => x.FirstName == "James")
            .OrderBy(x => x.LastName).ToList();

        // Just proving out that every User was persisted
        users.Select(x => x.LastName)
            .ShouldHaveTheSameElementsAs("Worthy 1 - updated", "Worthy 2 - updated", "Worthy 3 - updated");
    }
}

In the usage above, I never had to explicitly mark with User objects had been changed. In this type of session, Marten is tracking the raw JSON used to load each document. At the time SaveChanges() is called, Marten will do a logical comparison of the current document state to the original, loaded state by doing a logical comparison of the JSON structure (it’s inevitably using Newtonsoft.Json under the covers to do the comparison of the JSON data, but you already guessed that).

Some of our users really like the convenience of the automatic dirty checking, but other times you’ll definitely want to forgo the heavier, more memory and processor intensive version of the identity map in favor of lightweight sessions as shown in the next section.

A favorite ritual of my childhood was rooting hard for the Showtime Lakers every summer in the NBA finals while my Dad was all about the Larry Bird/Kevin McHale/Robert Parrish Celtics. Somewhere or another there’s a good chunk of the roster of the ’85 Lakers as test data in most of the projects I work on.

Opting out of the Identity Map

Veterans users of RavenDb are probably painfully aware of how fetching a large amount of data can quickly blow up your system’s memory usage by having it keep so much of the raw JSON structures and pointers to the loaded objects in memory (if you use the default configuration). Because of this all too frequent problem with RavenDb usage, we designed Marten to make it as easy and declarative as possible to use lightweight sessions or pure query sessions that have no identity map or automatic dirty tracking, like so:

            // Opened from an existing IDocumentStore called "store"
            using (var session = store.LightweightSession())
            {

            }

            // A lightweight, readonly session 
            using (var query = store.QuerySession())
            {

            }

Likewise, we made the very heavyweight, automatic dirty tracking flavor of a document session be “opt in” with the belief that this option doesn’t shoot unsuspecting users in the foot.

Marten does not yet support any kind of notion of “Evict()” to remove previously loaded documents from the underlying identity map. To date, my philosophy is to give the users easier access to the lightweight sessions to side step the whole issue of needing to evict documents manually.

What about Queries?

You might notice that all of my examples of the identity map behavior used the IDocumentSession.Load<T>(id) method to load a single document by its id. In this usage, a Marten document session first checks its internal identity map to see if that document has already been loaded. If not, the session will load the document and save it to the underlying identity map.

Great, but you’re likely asking “what about Linq queries?” We introduced the identity map mechanics fairly early in Marten and run all queries through the identity map caching, but the Linq query works by returning a data reader of a document’s Id and the raw, persisted JSON. As Marten reads through the results of a data reader, for each row it will call the following method in Marten’s internal IIdentityMap interface:

// This method would either return an existing document
// with the id, or deserialize the JSON into a new
// document object and store that in the identity map
T Get<T>(object id, string json) where T : class;

While using a Linq query does honor the identity map tracking, it can result in fetching the raw JSON data multiple times, but does prevent duplication of documents and unnecessary deserialization at runtime.

 

Natural versus Surrogate Keys

Using document databases over the past 5 or so years has changed my old attitudes toward the choice of natural database keys (some piece of data that has actual meaning like names) versus surrogate keys like Guid’s or  sequential numbers. 5-10 years ago in the days of heavyweight ORM’s like NHibernate (or today if you’re a mainstream .Net developer using tools like EF) I would have been adamantly in favor of only using surrogate keys. One, because a natural key can change and it can be clumsy to modify the primary key of a relational database table. Two because using a surrogate key meant that you could adopt some kind of layer supertype for all of your entities that would allow you to centralize and reuse a lot of your application’s workflow for typical CRUD operations.

Today however, I think that there are such valuable performance advantages to being able to efficiently load documents by their natural identifier through an identity map, that this choice is no longer so clear cut. Take the example of a document representing a “User.” At login time or even after authentication, you mostly likely have the user name, but not necessarily any kind of Guid representing that user. If we modeled the “User” document with the login name as a natural key, we can efficiently load user documents by that user name.

The example above isn’t the slightest bit contrived. Rather it’s exactly the mistake I made when I designed the persisted membership feature of FubuMVC that is backed by RavenDb that is still in a couple of our systems at work. In our case, we have to load a user by querying by the user name we have instead of the Guid surrogate key that we don’t know upfront. That’s not that big of a deal with Postgresql-backed Marten, but it became a significant problem for us with RavenDb because it forces RavenDb to have to load the document by using a readside index, which is a less efficient mechanism than loading by id. In this case, we could have had a more efficient login identity solution if I’d broken away from the “old think” belief in the primacy of surrogate keys in all situations. Lesson learned.

The Unit of Work Pattern in Marten

Design patterns got a bad rap when they were horrendously overused in the decade plus following the publication of the gang of four book. I’m still a believer in learning design patterns. I even think it’s valuable to know and understand the “official” names for patterns. One, because it’s really nice for other developers to understand the jargon that you’re using to describe possible solutions, but mostly so that you can easily go Google for a lot more information about how, when, and when not to use it later as you need to know more.

That being said, I think it’s useful for new users to Marten to understand the old “Unit of Work” design pattern, both conceptually and how Marten implements the pattern.

Jeremy’s Law of Nerd-Rage: a group of developers being enthusiastic about anything related to development will eventually make a second group of developers angry and cause a backlash.

 

Unit of Work

From Martin Fowler’s PEAA book:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

The unit of work pattern is a simple way to collect all the changes to a backing data store you need to be committed in a single transaction. It’s especially useful if you need to have several unrelated pieces of code collaborating on the same logical transaction. If you just pass around a “unit of work” container for the changes to each “worker” object or function, you can keep the various things completely decoupled from each other and still collaborate on a single business transaction.

In Marten, the unit of work is buried behind our IDocumentSession interface (that might change later). When you register changes to an IDocumentSession (inserts, updates, deletes, appending events, etc.), these changes are only staged until the SaveChanges() method is called to persist all the pending changes in one database transaction.

You can see Marten’s unit of work mechanics in action in the code sample below:

// theStore is a DocumentStore
using (var session = theStore.OpenSession())
{
    // All I'm doing here is recording references
    // to all the ADO.Net commands executed by
    // this session
    var logger = new RecordingLogger();
    session.Logger = logger;

    // Insert some new documents
    session.Store(new User {UserName = "luke", FirstName = "Luke", LastName = "Skywalker"});
    session.Store(new User {UserName = "leia", FirstName = "Leia", LastName = "Organa"});
    session.Store(new User {UserName = "wedge", FirstName = "Wedge", LastName = "Antilles"});


    // Delete all users matching a certain criteria
    session.DeleteWhere<User>(x => x.UserName == "hansolo");

    // deleting a single document by Id, if you had one
    session.Delete<User>(Guid.NewGuid());

    // Persist in a single transaction
    session.SaveChanges();

    // All of this was done in one batched command
    // in the same transaction
    logger.Commands.Count.ShouldBe(1);

    // I'm just writing out the Sql executed here
    var sql = logger.Commands.Single().CommandText;
    new FileSystem()
        .WriteStringToFile("unitofwork.sql", sql);

}

All of the database changes above are made in a single database call within one transaction (I added extra new lines to make it readable):

select mt_upsert_user(doc := :p0, docId := :p1);
select mt_upsert_user(doc := :p2, docId := :p3);
select mt_upsert_user(doc := :p4, docId := :p5);
delete from mt_doc_user as d where d.data ->> 'UserName' = :arg6;
delete from mt_doc_user where id=:p6;

As for how to consume IDocumentSession’s and how to scope them for transactional boundary management, see my blog post on transactions with RavenDb. My intention is for us to use the same conceptual setup with Marten at work.

Typically, I would recommend using an IDocumentSession per HTTP request or within the handling for a single service bus message. We even build our basic infrastructure around this concept. You still need an easy way to use explicit transaction boundaries with a new IDocumentSession on demand. I wrote about our transaction management strategies with RavenDb a couple years ago, and my intention is that we’ll use Marten in a very similar manner.

Marten on DotNetRocks

Remarkably coincidental with the new Marten v0.8 release, the DotNetRocks guys just published a podcast talking with me about Marten. Give it a shot if you wanna learn a little bit about the motivation behind Marten, using Postgresql for .Net development, and using “polyglot persistence.”

I think the DNR guys were trying to get me to trash talk a little bit about Marten versus its obvious competitor, so how about just saying that I feel like Marten will lead to more developer productivity than using Entity Framework or micro-ORM’s and better performance and reliability than existing NoSQL solutions for .Net.