The Identity Map Pattern in Marten

I’m still a believer in learning and discussing design patterns — even though everyone has seen a naive architect of some kind write stupidly over-engineered code with every possible GoF buzzword possible. That being said, there’s some significant value in having industry wide understanding of common coding solutions and the design pattern names should make it much easier to find information about prior art online. As an aside, I hate it when developers online make an argument against any particular tool or technique because “one time they were on a project where it sucked” without any thought about how it was used or whether or not the problem just wasn’t a good fit for that tool or technique. I think it’s sloppy thinking.

The Identity Map pattern is an important conceptual design pattern underneath many database and persistence libraries, including Marten. I think it’s important to understand because the usage of an identity map can help performance in some cases, hurt your system’s memory utilization in other cases, and quite potentially prevent data integrity and consistency issues.

As usual, I’ll pull the definition of the Identity Map pattern from Martin Fowler’s PEAA book:

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

The purpose of using an identity map is to avoid accidentally making multiple copies of a loaded entity in memory. In the case of a complex operation that is complex enough to be handled by multiple collaborator classes or functions, it is frequently valuable to depend on using a shared identity map between the collaborators to prevent unnecessarily fetching the exact same data from the database more than once.

To see this in action in Marten, consider the following usage in Marten. Let’s say that we have a document called “User” that is identified by a surrogate Guid property. If I open up a new document session with the default configuration, you would see this behavior:

public void using_identity_map()
{
    var container = Container.For<DevelopmentModeRegistry>();
    var store = container.GetInstance<IDocumentStore>();

    var user = new User {FirstName = "Tamba", LastName = "Hali"};
    store.BulkInsert(new [] {user});

    // Open a document session with the identity map
    using (var session = store.OpenSession())
    {
        // Load a user with the same Id will return the very same object
        session.Load<User>(user.Id)
            .ShouldBeTheSameAs(session.Load<User>(user.Id));

        // And to make this more clear, Marten is only making a single
        // database call
        session.RequestCount.ShouldBe(1);
    }
} 

In our applications at work, the IDocumentSesssion (“session” above) that wraps the intenal identity map (and unit of work too) would usually be scoped to a web request in HTTP applications and to a single message in our service bus applications. We do this so that different pieces of middleware code and message handlers would all be using the same identity map to avoid double loading or inconsistent state.

 

Automatic Dirty Checking

A heavier weight flavor of identity map is one that does automatic “dirty checking” to know what documents loaded through the IDocumentSession have been changed in memory and should therefore be persisted when the session is saved.

[Fact]
public void when_querying_and_modifying_multiple_documents_should_track_and_persist()
{
    var user1 = new User { FirstName = "James", LastName = "Worthy 1" };
    var user2 = new User { FirstName = "James", LastName = "Worthy 2" };
    var user3 = new User { FirstName = "James", LastName = "Worthy 3" };

    theSession.Store(user1);
    theSession.Store(user2);
    theSession.Store(user3);

    theSession.SaveChanges();

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>().Where(x => x.FirstName == "James").ToList();

        // Mutating each user
        foreach (var user in users)
        {
            user.LastName += " - updated";
        }

        // Persisting the session will save all the documents
        // that have changed
        session2.SaveChanges();
    }

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>()
            .Where(x => x.FirstName == "James")
            .OrderBy(x => x.LastName).ToList();

        // Just proving out that every User was persisted
        users.Select(x => x.LastName)
            .ShouldHaveTheSameElementsAs("Worthy 1 - updated", "Worthy 2 - updated", "Worthy 3 - updated");
    }
}

In the usage above, I never had to explicitly mark with User objects had been changed. In this type of session, Marten is tracking the raw JSON used to load each document. At the time SaveChanges() is called, Marten will do a logical comparison of the current document state to the original, loaded state by doing a logical comparison of the JSON structure (it’s inevitably using Newtonsoft.Json under the covers to do the comparison of the JSON data, but you already guessed that).

Some of our users really like the convenience of the automatic dirty checking, but other times you’ll definitely want to forgo the heavier, more memory and processor intensive version of the identity map in favor of lightweight sessions as shown in the next section.

A favorite ritual of my childhood was rooting hard for the Showtime Lakers every summer in the NBA finals while my Dad was all about the Larry Bird/Kevin McHale/Robert Parrish Celtics. Somewhere or another there’s a good chunk of the roster of the ’85 Lakers as test data in most of the projects I work on.

Opting out of the Identity Map

Veterans users of RavenDb are probably painfully aware of how fetching a large amount of data can quickly blow up your system’s memory usage by having it keep so much of the raw JSON structures and pointers to the loaded objects in memory (if you use the default configuration). Because of this all too frequent problem with RavenDb usage, we designed Marten to make it as easy and declarative as possible to use lightweight sessions or pure query sessions that have no identity map or automatic dirty tracking, like so:

            // Opened from an existing IDocumentStore called "store"
            using (var session = store.LightweightSession())
            {

            }

            // A lightweight, readonly session 
            using (var query = store.QuerySession())
            {

            }

Likewise, we made the very heavyweight, automatic dirty tracking flavor of a document session be “opt in” with the belief that this option doesn’t shoot unsuspecting users in the foot.

Marten does not yet support any kind of notion of “Evict()” to remove previously loaded documents from the underlying identity map. To date, my philosophy is to give the users easier access to the lightweight sessions to side step the whole issue of needing to evict documents manually.

What about Queries?

You might notice that all of my examples of the identity map behavior used the IDocumentSession.Load<T>(id) method to load a single document by its id. In this usage, a Marten document session first checks its internal identity map to see if that document has already been loaded. If not, the session will load the document and save it to the underlying identity map.

Great, but you’re likely asking “what about Linq queries?” We introduced the identity map mechanics fairly early in Marten and run all queries through the identity map caching, but the Linq query works by returning a data reader of a document’s Id and the raw, persisted JSON. As Marten reads through the results of a data reader, for each row it will call the following method in Marten’s internal IIdentityMap interface:

// This method would either return an existing document
// with the id, or deserialize the JSON into a new
// document object and store that in the identity map
T Get<T>(object id, string json) where T : class;

While using a Linq query does honor the identity map tracking, it can result in fetching the raw JSON data multiple times, but does prevent duplication of documents and unnecessary deserialization at runtime.

 

Natural versus Surrogate Keys

Using document databases over the past 5 or so years has changed my old attitudes toward the choice of natural database keys (some piece of data that has actual meaning like names) versus surrogate keys like Guid’s or  sequential numbers. 5-10 years ago in the days of heavyweight ORM’s like NHibernate (or today if you’re a mainstream .Net developer using tools like EF) I would have been adamantly in favor of only using surrogate keys. One, because a natural key can change and it can be clumsy to modify the primary key of a relational database table. Two because using a surrogate key meant that you could adopt some kind of layer supertype for all of your entities that would allow you to centralize and reuse a lot of your application’s workflow for typical CRUD operations.

Today however, I think that there are such valuable performance advantages to being able to efficiently load documents by their natural identifier through an identity map, that this choice is no longer so clear cut. Take the example of a document representing a “User.” At login time or even after authentication, you mostly likely have the user name, but not necessarily any kind of Guid representing that user. If we modeled the “User” document with the login name as a natural key, we can efficiently load user documents by that user name.

The example above isn’t the slightest bit contrived. Rather it’s exactly the mistake I made when I designed the persisted membership feature of FubuMVC that is backed by RavenDb that is still in a couple of our systems at work. In our case, we have to load a user by querying by the user name we have instead of the Guid surrogate key that we don’t know upfront. That’s not that big of a deal with Postgresql-backed Marten, but it became a significant problem for us with RavenDb because it forces RavenDb to have to load the document by using a readside index, which is a less efficient mechanism than loading by id. In this case, we could have had a more efficient login identity solution if I’d broken away from the “old think” belief in the primacy of surrogate keys in all situations. Lesson learned.

Advertisement

2 thoughts on “The Identity Map Pattern in Marten

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s