How we did authorization in FubuMVC, and what I’d do differently today

FubuMVC as an OSS project is still mostly dead, but we still use it very heavily at work and one of our teams asked me to explain how the authorization works. You’re probably never going to use FubuMVC, but we did some things that were interesting and successful from a technical perspective. Besides, it’s always more fun to learn from other people’s mistakes — in this case, my mistakes;)

Early in the project that originally spawned FubuMVC we spent almost two months tackling authorization rules in our application. We had a couple needs:

Basic authorization to allow or deny user actions based on their roles and permissions
Authorization rules based on the current business object state (rules like “an amount > 5,000 requires manager approval”).
The ability to extend the authorization of the application with custom rules and exceptions for a customer without any impact on the core code
Use authorization rules to enable, disable, show, or hide navigation elements in the user interface

The core architectural concept of FubuMVC was what we called the “Russian Doll Model” that allowed you to effectively move cross cutting into our version of middleware. Authorization was an obvious place to use a wrapping middleware class around endpoint actions. If a user has the proper rights, continue on. If the user does not, return an HTTP 403 and some kind of authorization failure response and don’t execute the inner action at all.

For some context, you can see the part of our AuthorizationBehavior class we used to enforce authorization rules:

protected override void invoke(Action action)
{
    if (!_settings.AuthorizationEnabled)
    {
        action();
        return;
    }

    var access = _authorization.IsAuthorized(_context);

    // If authorized, continue to the inner behavior in the 
    // chain (filters, controller actions, views, etc.)
    if (access == AuthorizationRight.Allow)
    {
        action();
    }
    else
    {
        // If authorization fails, hand off to the failure handler
        // and stop the inner behaviors from executing
        // the failure handler service was pluggable in fubu too
        var continuation = _failureHandler.Handle();
        _context.Service<IContinuationProcessor>().Continue(continuation, Inner); 
    }
}

Pulling the authorization checks into a separate middleware independent of the inner HTTP action had a couple advantages:

It’s easier to share common authorization logic
Reversibility — meaning that it was easy to retrofit authorization logic onto existing code without having to dig into the actual endpoint action code.
The inner endpoint action code could be simpler by not having to worry about authorization (in most cases)

FubuMVC itself might have failed, but the strategy of using composable middleware chains is absolutely succeeding as you find it in almost all of the newer HTTP frameworks and service bus frameworks including the new ASP.Net Core MVC code.

Authorization Rights and Policies

The core abstraction in FubuMVC’s authorization subsystem was the IAuthorizationPolicy interface that could be executed to determine if a user had rights to the current action:

    public interface IAuthorizationPolicy
    {
        AuthorizationRight RightsFor(IFubuRequestContext request);
    }

The “AuthorizationRight” class above was a strong typed enumeration consisting of:

“None” — meaning that the policy just didn’t apply
“Allow”
“Deny”

If multiple rules applied to a given endpoint, each rule would be executed to determine if the user had rights. Any rule evaluating to “Deny” automatically failed the authorization check. Otherwise, at least one rule had to be evaluated as “Allow” to proceed.

Being able to combine these checks enabled us to model both the simple, role and permission-based authorization, and also rules based on business logic and the current system state.

Simple Role Based Authorization

In the following example, we have an endpoint action that has a single “AllowRole” authorization rule:

        // This action would have the Url: /authorized/hello
        [AllowRole("Greeter")]
        public string get_authorized_hello()
        {
            return "Hello.";
        }

Behind the scenes, that [AllowRole] attribute is evaluated once at application startup time and adds a new AllowRole object to the underlying model for the “GET /authorized/hello” endpoint.

For some context, the AllowRole rule (partially elided) looks like this:

    public class AllowRole : IAuthorizationPolicy
    {
        private readonly string _role;

        public AllowRole(string role)
        {
            _role = role;
        }

        public AuthorizationRight RightsFor(IFubuRequestContext request)
        {
            return PrincipalRoles.IsInRole(_role) ? AuthorizationRight.Allow : AuthorizationRight.None;
        }

    }

Model Based Configuration

FubuMVC has an underlying model called the “behavior graph” that models exactly which middleware handlers are applicable to each HTTP route. Part of that model is an exact linkage to the authorization policies that were applicable to each route. In the case above, the “GET /authorized/hello” endpoint has a single AllowRole rule added by the attribute.

More powerfully though, FubuMVC also allowed you to reach into the underlying behavior graph model and add additional authorization rules for a given endpoint. You could do this through the marker attributes (even your own attributes if you wanted), programmatically if you had to, or through additive conventions like “all endpoints with a route starting with /admin require the administrator role.” We heavily exploited this ability to enable customer-specific authorization checks in the original FubuMVC application.

Authorization & Navigation

By attaching the IAuthorizationPolicy objects to each behavior chain, FubuMVC is also able to tell you programmatically if a user could navigate to or access any HTTP endpoint using the IEndpointService like this:

            IEndpointService endpoints = runtime
                .Get<EndpointService>();             

            var endpoint = endpoints
                 .EndpointFor<SomeEndpoint>(x => x.get_hello());

            var isAuthorized = endpoint.IsAuthorized;

We used the IEndpointService (and a simpler one not shown here called IAuthorizationPreviewService) in server side rendering to know when to show, hide, enable, or disable navigation elements based on whether or not a user would have rights to access those routes. By decoupling the authorization rules a little bit from the actual endpoint action code, we were able to define the authorization rules exactly once for every endpoint, then reuse the same logic in navigation accessibility that we did at runtime when the actual resource was requested.

This ability to “preview” authorization rights for HTTP endpoints was also useful for hypermedia endpoints where you used authorization rights to include or exclude additional links in the response body.

Lastly, having the model of what authorization rules applied to each route enabled FubuMVC to be able to present diagnostic information and visualizations of the middleware configuration for each HTTP endpoint. That kind of diagnostics becomes very important when you start using conventional policies or extensibility mechanism to insert authorization rules from outside of the core application.

Rule Object Lifecycle

FubuMVC, especially in its earlier versions, is awful for the number of object allocations it makes at runtime. Starting with FubuMVC 2, I tried to reduce that overhead by making the authorization rule objects live through the lifecycle of the application itself instead of being created fresh by the underlying IoC container on every single request. Great, but there are some services that you may need to access to perform authorization checks — and those services sometimes need to be scoped to the current HTTP request. To get around that, FubuMVC’s IAuthorizationPolicy takes in an IFubuRequestContext object which among other information contains a *gasp* service locator for the current request scope that authorization rules can use to perform their logic.

There’s been an almost extreme backlash against any and all usages of service locators over the past several years. Most of that is probably very justified, but in my opinion, it’s still very valid to use a service locator within an object that has a much longer lifetime than the dependencies it needs to use within certain operations. And no, using Lazy<T> or Func<T> builders injected in at the original time of creation will not work without making a potentially harmful dependency on things like the HttpContext for the scoping to work.

Please don’t dare commenting on this post with any form of “but Mark Seemann says…” I swear that I’ll reach through the interwebs and smack you silly if you do. Probably make a point of never, ever doing that on any kind of StructureMap list too for that matter.

What I’d do differently if there’s a next time

For the past couple years we’ve kicked around the idea of an all new framework we’re going to call “Jasper” that would be essentially a reboot of a core subset of FubuMVC on the new CoreCLR and all the forthcoming ASP.Net Core goodies. At some point I’ve said that I wanted to bring over the authorization model from FubuMVC roughly as is, but the first step is to figure out if we could use something off the shelf so we don’t have to support our own custom infrastructure (and there’s always the possibility that we’ll just give up and go to the mainstream tools).

The single biggest thing I’d change the next time around is to make it far easier to do one-off authorization rules as close as possible to the actual endpoint action methods. My thought has been to have something like a convention or yet another interface something like “IAuthorized” so that endpoint classes could happily expose some kind of Authorize() : AuthorizationRight method for one off rules.

Honestly though, I’m content with how the authorization model played out in FubuMVC. A big part of the theoretical plans for “Jasper” is be much, much more conscious about allocating new objects (i.e., use the IoC container much less) and adopting an async by default, all the way through approach to the runtime model.

The compiled query feature in Marten and why it rocks.

The “compiled query” feature is a brand new addition to Marten (as of v0.8.9), and one we think will have some very positive impact on the performance of our systems. I’m also hopeful that using this feature will make some of our internal code easier to read and understand. Down the line, the combination of compiled queries with the batch querying support should be the foundation of a decent dynamic, aggregated query mechanism to support our React/Redux based client architectures (with a server side implementation of Falcor maybe?). Big thanks and a shoutout to Corey Kaylor for adding this feature.

Linq is easily one of the most popular features in .Net and arguably the one thing that other platforms strive to copy. We generally like being able to express document queries in compiler-safe manner, but there is a non-trivial cost in parsing the resulting Expression trees and then using plenty of string concatenation to build up the matching SQL query. Fortunately, as of v0.8.10, Marten supports the concept of a Compiled Query that you can use to reuse the SQL template for a given Linq query and bypass the performance cost of continuously parsing Linq expressions.

All compiled queries are classes that implement the ICompiledQuery<TDoc, TResult> interface shown below:

    public interface ICompiledQuery<TDoc, TOut>
    {
        Expression<Func<IQueryable<TDoc>, TOut>> QueryIs();
    }

In its simplest usage, let’s say that we want to find the first user document with a certain first name. That class would look like this:

public class FindByFirstName : ICompiledQuery<User, User>
{
    public string FirstName { get; set; }

    public Expression<Func<IQueryable<User>, User>> QueryIs()
    {
        return q => q.FirstOrDefault(x => x.FirstName == FirstName);
    }
}

So a couple things to note in the class above:

The QueryIs() method returns an Expression representing a Linq query
FindByFirstName has a property (it could also be just a public field) called FirstName that is used to express the filter of the query

To use the FindByFirstName query, just use the code below:

            var justin = theSession.Query(new FindByFirstName {FirstName = "Justin"});

            var tamba = await theSession.QueryAsync(new FindByFirstName {FirstName = "Tamba"});

Or to use it as part of a batched query, this syntax:

var batch = theSession.CreateBatchQuery();

var justin = batch.Query(new FindByFirstName {FirstName = "Justin"});
var tamba = batch.Query(new FindByFirstName {FirstName = "Tamba"});

await batch.Execute();

(await justin).Id.ShouldBe(user1.Id);
(await tamba).Id.ShouldBe(user2.Id);

How does it work?

The first time that Marten encounters a new type of ICompiledQuery, it executes the QueryIs() method and:

Parses the Expression just to find which property getters or fields are used within the expression as input parameters
Parses the Expression with our standard Linq support and to create a template database command and the internal query handler
Builds up an object with compiled Func’s that “knows” how to read a query model object and set the command parameters for the query
Caches the resulting “plan” for how to execute a compiled query

On subsequent usages, Marten will just reuse the existing SQL command and remembered handlers to execute the query.

What is supported?

To the best of our knowledge and testing, you may use any Linq feature that Marten supports within a compiled query. So any combination of:

Select() transforms
First/FirstOrDefault()
Single/SingleOrDefault()
Where()
OrderBy/OrderByDescending etc.
Count()
Any()

At this point (v0.9), the only limitations are:

You cannot yet incorporate the Include’s feature with compiled queries, but there is an open GitHub issue you can use to track progress on adding this feature.
You cannot use the Linq ToArray() or ToList() operators. See the next section for an explanation of how to query for multiple results

Querying for multiple results

To query for multiple results, you need to just return the raw IQueryable<T> as IEnumerable<T> as the result type. You cannot use the ToArray() or ToList() operators (it’ll throw exceptions from the Relinq library if you try). As a convenience mechanism, Marten supplies these helper interfaces:

If you are selecting the whole document without any kind of Select() transform, you can use this interface:

    public interface ICompiledListQuery<TDoc> : ICompiledListQuery<TDoc, TDoc>
    {
    }

A sample usage of this type of query is shown below:

    public class UsersByFirstName : ICompiledListQuery<User>
    {
        public static int Count;
        public string FirstName { get; set; }

        public Expression<Func<IQueryable<User>, IEnumerable<User>>> QueryIs()
        {
            // Ignore this line, it's from a unit test;)
            Count++;
            return query => query.Where(x => x.FirstName == FirstName);
        }
    }

If you do want to use a Select() transform, use this interface:

    public interface ICompiledListQuery<TDoc, TOut> : ICompiledQuery<TDoc, IEnumerable<TOut>>
    {
    }

A sample usage of this type of query is shown below:

    public class UserNamesForFirstName : ICompiledListQuery<User, string>
    {
        public Expression<Func<IQueryable<User>, IEnumerable<string>>> QueryIs()
        {
            return q => q
                .Where(x => x.FirstName == FirstName)
                .Select(x => x.UserName);
        }

        public string FirstName { get; set; }
    }

Querying for a single document

Finally, if you are querying for a single document with no transformation, you can use this interface as a convenience:

    public interface ICompiledQuery<TDoc> : ICompiledQuery<TDoc, TDoc>
    {
    }

And an example:

    public class FindUserByAllTheThings : ICompiledQuery<User>
    {
        public string Username { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public Expression<Func<IQueryable<User>, User>> QueryIs()
        {
            return query =>
                    query.Where(x => x.FirstName == FirstName && Username == x.UserName)
                        .Where(x => x.LastName == LastName)
                        .Single();

        }
    }

A quick followup to my opinions on .Net OSS

After one last puff piece to blow up my blogging numbers, I will do penance by sticking to actual technical content for awhile.

So the other day I made a somewhat conscious decision to get more blog traffic and posted something I’d been dwelling on for a while: What I think is and is not better about .Net OSS these days. A couple follow up points from the feedback I’ve gotten:

Do most .Net developers care?

I honestly don’t know. I’d guess the answer is “no,” but I’m firmly convinced that there is no such thing as a “typical” .Net software shop anyway. I do know that when folks say that “most development shops do this,” it’s frequently something that I’ve never experienced.

Why does OSS even matter to .Net?

There’s a lot of smart folks in the .Net community that don’t work in Redmond, and OSS is a great way to capture and cultivate innovation in .Net from more people. Having a greater diversity of tools that take different approaches to solving technical problems is always nice.

Why don’t I leave .Net and go to <insert other platform here>?

.Net has been very good to me, and I probably get to do much more interesting work than the average developer on any platform. I’m one of the very few people who actually likes their job, and .Net happens to be a part of most of our systems.

Besides, I get to do enough Javascript development that I don’t particularly feel trapped into only working on .Net anyway.

Please stop using the word “Zealot”

Despite a comment or two to the contrary, I am certainly not a Richard Stallman-style OSS “zealot.” That’s one of the words that developers use very sloppily in technical arguments. I’m just gonna stick a link to my old post about why I hate the word “Zealot” or “Pragmatic” or “Dogma” tossed around in technical discussions.

About Microsoft, one last time

I don’t actually believe that Microsoft works directly against OSS efforts in .Net. I think the biggest issues simply arise from how utterly dominant of a mindshare that Microsoft has within the .Net community, and that’s the main thing that’s really different about .Net versus other communities.

Even with the lack of attention and publicity issues aside, I think that overall, the .Net community is probably more conservative about adopting new ideas or tools than other communities. I think that also explains why many software development concepts aren’t widely adopted by .Net developers until it’s been promoted, supported, or endorsed by Microsoft themselves.

But Microsoft themselves use Newtonsoft!

So? Everything does, and that’s why Newtonsoft binding conflicts were such an awful mess a couple years ago. I’d guess that any considerably sized .Net system has at least 3-4 privately merged copies of Newtonsoft.Json somewhere.

I’m just bitter

“You’re just bitter about FubuMVC” – I’m somewhat guilty on that score (human nature and all), but my main point was just to say that it’s highly unlikely you can succeed with an OSS tool that directly competes with an established MS offering. There were a whole raft of things I did wrong with fubu, but competing directly against ASP.Net MVC probably put our ceiling as a successful OSS project pretty low right out of the gate.

I’ll have to admit that it irritates me when I see folks all excited about something new in ASP.Net MVC Core that is very similar to something we had in FubuMVC years previously. A healthier way to look at it is that sometimes it has been nice to see our design ideas validated by similar designs in the real MS .Net tools.

What I think is and is not better about .Net OSS these days

I meant to write this a couple months ago after I made a sarcastic remark about how saying “I take pull requests” was a fast way to end a conversation with .Net developers. That got retweeted quite a bit, but honestly, my experiences doing OSS work in .Net have gotten so very much more positive in the last couple years and this post might be warranted.

There’s a never ending stream of consternation and angst about the state of OSS development inside of the .Net community — and yes, I’ve been a part of that from time to time. One common perception is that OSS is not as vibrant in .Net as it is in many other technology stacks. Another complaint that I see frequently (and make) is that there sometimes seem like there’s a lot of “not invented here” type duplication between OSS projects in .Net.

A lot of that is probably true, and I think unlikely to ever completely change, but working on .Net OSS projects now is definitely far better than it was just a few years ago.

A quick rundown of my experiences and involvement for some context:

StructureMap has been around since 2004, and probably been widely used since about 2007/8. For whatever reasons, StructureMap has actually enjoyed a pretty significant uptick in contributions and development over the past couple years.
Marten is about six months old, but it’s already the busiest OSS community I’ve been a part of
Storyteller doesn’t have a lot of usage yet, but the early feedback has been encouraging
FubuMVC was a great learning experience and directly responsible for my current position and several personal relationships with other developers, but was a otherwise a massive failure as an OSS project and a tremendous opportunity cost for me

What I think is definitely better now

The most important difference in being an OSS author now as opposed to even just a few years ago is how much more pleasant my interactions are with users. In the past, it was all too common to deal with endless belligerence and an annoying sense of entitlement from users who were probably much more used to leaning on official support lines and documentation from software vendors. I still get a little bit of that once in awhile, but lately I’m seeing a lot more of:

Users acting as collaborators with me to solve their issues
Getting much clearer reproduction steps for their problems, often including failing unit tests to demonstrate their issues
Patience and pleasantness from users needing help
A willingness to take on pull request work to fix their issues and contribute back to the project
An obvious sense that the project is a community effort
Users helping each other in our Gitter rooms before I have a chance to respond

I think that GitHub and its like have definitely made OSS development a much more positive experience. I’d even say that Microsoft’s recent adoption and de facto endorsement of GitHub have definitely made git and GitHub skills much more common throughout the .Net community — and that in turn has made it much more common for users to be able to contribute back to OSS projects.

At least for Marten, we’ve been able to attract quite a bit of early interest and several active contributors and early adopters. It’s easily been the most positive OSS experience I’ve ever had.

I think it’s helped a bit that more .Net developers are comfortable and familiar with automated build and testing tools. I rarely see pull requests submitted without tests anymore, but that would have been a common occurrence a few years ago.

What’s still not great about .Net OSS

A large number of shops will simply not adopt anything that isn’t an official tool from Microsoft
I think that Microsoft still stomps all over the existing OSS projects in .Net by building direct competitors or not taking any cognizance of existing tools when they plan .Net framework changes. I should say that I don’t think this is purposeful, but I’ve never gotten the sense that the teams in Redmond have much awareness or visibility into the .Net OSS ecosystem
The unusual status of the .Net community being so centered on Microsoft and the tools that they provide out of the box just doesn’t leave enough Oxygen for community OSS projects
I think that the forthcoming CoreCLR / DNX / New Name? tooling is probably going to make a lot of the existing OSS projects go extinct. The runtime is very different in some aspects (it wasn’t that bad to move StructureMap up to the CoreCLR, but Storyteller will be much more work and Marten cannot work yet on the DNX runtime as we wait on Roslyn work). Even on top of waiting for your dependencies to support CoreCLR, I think that there won’t be enough interest or activity to bring a lot of tools forward. .Net OSS might be a lot better in the long term as .Net simply becomes a better platform for development
I think that ASP.Net Core and related efforts are going to take most of the .Net community’s attention for quite awhile
It’s still hard to get visibility for .Net OSS projects and that in turn makes it unlikely that a community coalesces around any one project. This is my working theory for why there are so many duplicated projects in .Net
There aren’t many venues, groups, or conferences dedicated to OSS efforts in .Net. Most .Net technical conferences tend to skew heavily toward Microsoft offerings
I think the MVP program skews the community toward official Microsoft tools and away from OSS projects

What about Microsoft being so much more involved in OSS themselves?

I honestly don’t know. It’s making much more developers aware of OSS in general, but I don’t know that it will really encourage folks to be more involved in community projects.

In Summary

My advice is yes, definitely get involved in .Net OSS projects or start your own if you enjoy doing that kind of work, you want to expand your technical skills, or want to make yourself more marketable. If you are making a big bet on the success of that project or have high hopes for it to be widely adopted, I think I’d advise you to set your sights much lower. You also have to realize that it’s very difficult for .Net OSS projects to gain visibility and widespread adoption.

I’d also discourage anyone from trying to compete directly with any tool from Microsoft itself. It can be done, but that’s an uphill slog. It’s probably a lot easier to either find niches where Microsoft tooling has not real entry, or focus on projects that try to add value to mainstream Microsoft .Net tooling. Then hope that Microsoft doesn’t start building their own version of your tool and send one of their celebrity programmers to conferences to sell it;)

Optimizing Marten Performance by using “Include’s”

Continuing my series of short blog posts about concepts or new features in Marten, this time I’m going to show how to use the brand new IQueryable<T>().Include() feature to improve performance by reducing the number of network round trips to the underlying Postgresql database. Check out the Marten tag for related blog posts.

In one of the formative experiences of my early software career, our instructor repeated the phrase “network round trips are evil” as a kind of mantra. The point, as I quickly learned then and plenty of times later, is that a “chatty” system making lots of network round trips between boxes can be very slow if you’re not careful.

To that end, many of our early adopters and would be adopters said that they would switch to Marten if only we had the Include() feature from RavenDb. The point of this feature is to reduce network round trips to the database server from your application by being able to fetch related documents at the same time.

Jumping right to a concrete example, let’s say that your domain has two document types, one called “Issue” and another called “User.” In this case the Issue maybe have logical links to the assigned user responsible for addressing the issue, partially shown below:

    public class Issue
    {
        // The AssigneeId would be the Id of the
        // related User document
        public Guid? AssigneeId { get; set; }

    }

If I want to load an Issue by its Id, but also get the assigned User at the same time, I can use Marten’s new “Include()” feature inspired by RavenDb and NHibernate’s QueryOver mechanism:

[Fact]
public void simple_include_for_a_single_document()
{
    var user = new User();
    var issue = new Issue {AssigneeId = user.Id, Title = "Garage Door is busted"};

    theSession.Store<object>(user, issue);
    theSession.SaveChanges();

    using (var query = theStore.QuerySession())
    {
        User included = null;
        var issue2 = query.Query<Issue>()
            // Using the call below, Marten will execute
            // the supplied callback to pass back the related
            // User document assigned to the Issue
            .Include(x => x.AssigneeId, x => included = x)
            .Where(x => x.Title == issue.Title)
            .Single();

        included.ShouldNotBeNull();
        included.Id.ShouldBe(user.Id);

        issue2.ShouldNotBeNull();

        // All of this was done with exactly one call to Postgresql
        query.RequestCount.ShouldBe(1);
    }
}

The actual SQL statement sent to Postgresql in the code above would be:

select d.data, d.id, assignee_id.data, assignee_id.id from public.mt_doc_issue as d INNER JOIN public.mt_doc_user as assignee_id ON CAST(d.data ->> 'AssigneeId' as uuid) = assignee_id.id where d.data ->> 'Title' = :arg0 LIMIT 1

In the code above, I’m using the fairly new “Include()” statement to direct Marten to fetch the related User document at the same time it’s retrieving the Issue. We deviated somewhat from RavenDb in this feature. Instead of just adding the included documents to the internal identity map and expecting the user to just “know” that they are cached, we opted to make the included documents accessible to the caller through either:

Passing a callback function into the Include() method
Passing an IList<T>, where T is the included document type, into Include(). In this case, Marten will fill the list with all the included documents found.
Passing an IDictionary<TKey, T> into the Include() method that will be filled by the Id of the included documents found

Since the pace of development on Marten is temporarily outpacing my efforts at keeping the documentation website completely up to date, the best resource for seeing what’s possible with the Include() functionality is our acceptance tests.

Let me end with a couple salient points about the new Include() functionality:

The included documents are resolved through the internal identity map of the current session, so there will not be any duplicates from repeated documents. Think about the case of fetching 100 Issue’s that are all assigned to one of 5 different User’s. In this case, only the 5 reoccurring User documents would be returned.
You can do multiple Include()’s on one query
The Include() functionality is available in the batched query feature
This will be a topic for another post, but Marten already supports the creation of foreign key relationships between documents
By default, Marten uses an outer join to fetch the included documents. I didn’t show it above, but there is also an optional argument in the Include() method you can use to force Marten to use a more efficient inner join — but just remember that means that nothing will be returned in the case of a NULL Issue.AssigneeId in the example above.

For my next blog post, I’ll talk about our brand new as-of-this-morning “Compiled Query” feature.

Select Projections in Marten

When I did the DotNetRocks episode on Marten awhile back, they asked me what I thought the biggest holes in Marten functionality and where we would take it next. The first thing that came to my mind was the “read side.” By that I meant built in functionality to transform the raw documents stored with Marten into the shape needed for API’s, business functionality, and web pages. Fortunately, the very latest versions of Marten add some important functionality to support the Linq Select() keyword for fetching document data with transformations.

See CQRS from Martin Fowler for more context on where and how the “read side” fits into a software architecture.

To make this concrete, let’s say that you only really want one property or field of a stored document. That one field can now be fetched without incurring the cost of deserializing the raw JSON of the whole document. Instead, we’ll just fetch our one field:

        [Fact]
        public void use_select_in_query_for_one_field_and_first()
        {
            theSession.Store(new User { FirstName = "Hank" });
            theSession.Store(new User { FirstName = "Bill" });
            theSession.Store(new User { FirstName = "Sam" });
            theSession.Store(new User { FirstName = "Tom" });

            theSession.SaveChanges();

            theSession.Query<User>().OrderBy(x => x.FirstName).Select(x => x.FirstName)
                .First().ShouldBe("Bill");

        }

Maybe not *that* commonly useful, so what about if you want to select a subset of a large document? If you want to select into a smaller type, you can use code like this:

[Fact]
public void use_select_with_multiple_fields_to_other_type()
{
    theSession.Store(new User { FirstName = "Hank", LastName = "Aaron" });
    theSession.Store(new User { FirstName = "Bill", LastName = "Laimbeer" });
    theSession.Store(new User { FirstName = "Sam", LastName = "Mitchell" });
    theSession.Store(new User { FirstName = "Tom", LastName = "Chambers" });

    theSession.SaveChanges();

    var users = theSession.Query<User>().Select(x => new User2{ First = x.FirstName, Last = x.LastName }).ToList();

    users.Count.ShouldBe(4);

    users.Each(x =>
    {
        x.First.ShouldNotBeNull();
        x.Last.ShouldNotBeNull();
    });
}

In the case above, you are selecting the User document data into another type called “User2.” That’s great of course, but you can also bypass the need for a custom class and just select straight to an anonymous type like this:

[Fact]
public void use_select_with_multiple_fields_in_anonymous()
{
    theSession.Store(new User { FirstName = "Hank", LastName = "Aaron"});
    theSession.Store(new User { FirstName = "Bill", LastName = "Laimbeer"});
    theSession.Store(new User { FirstName = "Sam", LastName = "Mitchell"});
    theSession.Store(new User { FirstName = "Tom", LastName = "Chambers"});

    theSession.SaveChanges();

    var users = theSession.Query<User>().Select(x => new {First = x.FirstName, Last = x.LastName}).ToList();

    users.Count.ShouldBe(4);

    users.Each(x =>
    {
        x.First.ShouldNotBeNull();
        x.Last.ShouldNotBeNull();
    });
}

Finally, if all you need to do is stream the raw JSON data of your transformed documents straight to the HTTP response in a web request, you can skip the unnecessary JSON deserialization and serialization and probably achieve much better throughput in your web application by selecting straight to a JSON string:

[Fact]
public void use_select_to_another_type_and_to_json()
{
    theSession.Store(new User { FirstName = "Hank" });
    theSession.Store(new User { FirstName = "Bill" });
    theSession.Store(new User { FirstName = "Sam" });
    theSession.Store(new User { FirstName = "Tom" });

    theSession.SaveChanges();

    theSession.Query().OrderBy(x => x.FirstName).Select(x => new UserName { Name = x.FirstName })
        .ToListJson()
        .ShouldBe("[{\"Name\" : \"Bill\"},{\"Name\" : \"Hank\"},{\"Name\" : \"Sam\"},{\"Name\" : \"Tom\"}]");
}

In the code above, the JSON of the User document is transformed by Postgresql itself and returned to the caller as a single JSON string.

Not that you necessarily want to do a lot of this by hand (but you always could), the SQL above can be found and printed with this:

    var command = theSession
                .Query()
                .OrderBy(x => x.FirstName)
                .Select(x => new UserName {Name = x.FirstName})
                
                // ToCommand() is a Marten specific extension
                // we use as a diagnostic tool to understand
                // how Marten is treating any given Linq expression
                .ToCommand(FetchType.FetchMany);

    Console.WriteLine(command.CommandText);

That gives us this generated SQL:

select json_build_object('Name', d.data ->> 'FirstName') as json from public.mt_doc_user as d order by d.data ->> 'FirstName'

What’s left to do?

We’ve had a couple bugs with our Select() support from early adopters with permutations I hadn’t thought to test beforehand, so I’m pretty sure that there are more things to iron out. The best way to solve that problem is to just try to get more early users to beat on it and find what’s still missing.

The Select() support today doesn’t yet include any transformations of child collection data or transforming data within a JSON document. We’d probably also want to support selecting data straight out of child collections as well.

The big new thing in our “read side” repertoire is going to be calculated projections. You can follow that work on GitHub.

The Identity Map Pattern in Marten

I’m still a believer in learning and discussing design patterns — even though everyone has seen a naive architect of some kind write stupidly over-engineered code with every possible GoF buzzword possible. That being said, there’s some significant value in having industry wide understanding of common coding solutions and the design pattern names should make it much easier to find information about prior art online. As an aside, I hate it when developers online make an argument against any particular tool or technique because “one time they were on a project where it sucked” without any thought about how it was used or whether or not the problem just wasn’t a good fit for that tool or technique. I think it’s sloppy thinking.

The Identity Map pattern is an important conceptual design pattern underneath many database and persistence libraries, including Marten. I think it’s important to understand because the usage of an identity map can help performance in some cases, hurt your system’s memory utilization in other cases, and quite potentially prevent data integrity and consistency issues.

As usual, I’ll pull the definition of the Identity Map pattern from Martin Fowler’s PEAA book:

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

The purpose of using an identity map is to avoid accidentally making multiple copies of a loaded entity in memory. In the case of a complex operation that is complex enough to be handled by multiple collaborator classes or functions, it is frequently valuable to depend on using a shared identity map between the collaborators to prevent unnecessarily fetching the exact same data from the database more than once.

To see this in action in Marten, consider the following usage in Marten. Let’s say that we have a document called “User” that is identified by a surrogate Guid property. If I open up a new document session with the default configuration, you would see this behavior:

public void using_identity_map()
{
    var container = Container.For<DevelopmentModeRegistry>();
    var store = container.GetInstance<IDocumentStore>();

    var user = new User {FirstName = "Tamba", LastName = "Hali"};
    store.BulkInsert(new [] {user});

    // Open a document session with the identity map
    using (var session = store.OpenSession())
    {
        // Load a user with the same Id will return the very same object
        session.Load<User>(user.Id)
            .ShouldBeTheSameAs(session.Load<User>(user.Id));

        // And to make this more clear, Marten is only making a single
        // database call
        session.RequestCount.ShouldBe(1);
    }
}

In our applications at work, the IDocumentSesssion (“session” above) that wraps the intenal identity map (and unit of work too) would usually be scoped to a web request in HTTP applications and to a single message in our service bus applications. We do this so that different pieces of middleware code and message handlers would all be using the same identity map to avoid double loading or inconsistent state.

Automatic Dirty Checking

A heavier weight flavor of identity map is one that does automatic “dirty checking” to know what documents loaded through the IDocumentSession have been changed in memory and should therefore be persisted when the session is saved.

[Fact]
public void when_querying_and_modifying_multiple_documents_should_track_and_persist()
{
    var user1 = new User { FirstName = "James", LastName = "Worthy 1" };
    var user2 = new User { FirstName = "James", LastName = "Worthy 2" };
    var user3 = new User { FirstName = "James", LastName = "Worthy 3" };

    theSession.Store(user1);
    theSession.Store(user2);
    theSession.Store(user3);

    theSession.SaveChanges();

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>().Where(x => x.FirstName == "James").ToList();

        // Mutating each user
        foreach (var user in users)
        {
            user.LastName += " - updated";
        }

        // Persisting the session will save all the documents
        // that have changed
        session2.SaveChanges();
    }

    using (var session2 = CreateSession())
    {
        var users = session2.Query<User>()
            .Where(x => x.FirstName == "James")
            .OrderBy(x => x.LastName).ToList();

        // Just proving out that every User was persisted
        users.Select(x => x.LastName)
            .ShouldHaveTheSameElementsAs("Worthy 1 - updated", "Worthy 2 - updated", "Worthy 3 - updated");
    }
}

In the usage above, I never had to explicitly mark with User objects had been changed. In this type of session, Marten is tracking the raw JSON used to load each document. At the time SaveChanges() is called, Marten will do a logical comparison of the current document state to the original, loaded state by doing a logical comparison of the JSON structure (it’s inevitably using Newtonsoft.Json under the covers to do the comparison of the JSON data, but you already guessed that).

Some of our users really like the convenience of the automatic dirty checking, but other times you’ll definitely want to forgo the heavier, more memory and processor intensive version of the identity map in favor of lightweight sessions as shown in the next section.

A favorite ritual of my childhood was rooting hard for the Showtime Lakers every summer in the NBA finals while my Dad was all about the Larry Bird/Kevin McHale/Robert Parrish Celtics. Somewhere or another there’s a good chunk of the roster of the ’85 Lakers as test data in most of the projects I work on.

Opting out of the Identity Map

Veterans users of RavenDb are probably painfully aware of how fetching a large amount of data can quickly blow up your system’s memory usage by having it keep so much of the raw JSON structures and pointers to the loaded objects in memory (if you use the default configuration). Because of this all too frequent problem with RavenDb usage, we designed Marten to make it as easy and declarative as possible to use lightweight sessions or pure query sessions that have no identity map or automatic dirty tracking, like so:

            // Opened from an existing IDocumentStore called "store"
            using (var session = store.LightweightSession())
            {

            }

            // A lightweight, readonly session 
            using (var query = store.QuerySession())
            {

            }

Likewise, we made the very heavyweight, automatic dirty tracking flavor of a document session be “opt in” with the belief that this option doesn’t shoot unsuspecting users in the foot.

Marten does not yet support any kind of notion of “Evict()” to remove previously loaded documents from the underlying identity map. To date, my philosophy is to give the users easier access to the lightweight sessions to side step the whole issue of needing to evict documents manually.

What about Queries?

You might notice that all of my examples of the identity map behavior used the IDocumentSession.Load<T>(id) method to load a single document by its id. In this usage, a Marten document session first checks its internal identity map to see if that document has already been loaded. If not, the session will load the document and save it to the underlying identity map.

Great, but you’re likely asking “what about Linq queries?” We introduced the identity map mechanics fairly early in Marten and run all queries through the identity map caching, but the Linq query works by returning a data reader of a document’s Id and the raw, persisted JSON. As Marten reads through the results of a data reader, for each row it will call the following method in Marten’s internal IIdentityMap interface:

// This method would either return an existing document
// with the id, or deserialize the JSON into a new
// document object and store that in the identity map
T Get<T>(object id, string json) where T : class;

While using a Linq query does honor the identity map tracking, it can result in fetching the raw JSON data multiple times, but does prevent duplication of documents and unnecessary deserialization at runtime.

Natural versus Surrogate Keys

Using document databases over the past 5 or so years has changed my old attitudes toward the choice of natural database keys (some piece of data that has actual meaning like names) versus surrogate keys like Guid’s or sequential numbers. 5-10 years ago in the days of heavyweight ORM’s like NHibernate (or today if you’re a mainstream .Net developer using tools like EF) I would have been adamantly in favor of only using surrogate keys. One, because a natural key can change and it can be clumsy to modify the primary key of a relational database table. Two because using a surrogate key meant that you could adopt some kind of layer supertype for all of your entities that would allow you to centralize and reuse a lot of your application’s workflow for typical CRUD operations.

Today however, I think that there are such valuable performance advantages to being able to efficiently load documents by their natural identifier through an identity map, that this choice is no longer so clear cut. Take the example of a document representing a “User.” At login time or even after authentication, you mostly likely have the user name, but not necessarily any kind of Guid representing that user. If we modeled the “User” document with the login name as a natural key, we can efficiently load user documents by that user name.

The example above isn’t the slightest bit contrived. Rather it’s exactly the mistake I made when I designed the persisted membership feature of FubuMVC that is backed by RavenDb that is still in a couple of our systems at work. In our case, we have to load a user by querying by the user name we have instead of the Guid surrogate key that we don’t know upfront. That’s not that big of a deal with Postgresql-backed Marten, but it became a significant problem for us with RavenDb because it forces RavenDb to have to load the document by using a readside index, which is a less efficient mechanism than loading by id. In this case, we could have had a more efficient login identity solution if I’d broken away from the “old think” belief in the primacy of surrogate keys in all situations. Lesson learned.

The Unit of Work Pattern in Marten

Design patterns got a bad rap when they were horrendously overused in the decade plus following the publication of the gang of four book. I’m still a believer in learning design patterns. I even think it’s valuable to know and understand the “official” names for patterns. One, because it’s really nice for other developers to understand the jargon that you’re using to describe possible solutions, but mostly so that you can easily go Google for a lot more information about how, when, and when not to use it later as you need to know more.

That being said, I think it’s useful for new users to Marten to understand the old “Unit of Work” design pattern, both conceptually and how Marten implements the pattern.

Jeremy’s Law of Nerd-Rage: a group of developers being enthusiastic about anything related to development will eventually make a second group of developers angry and cause a backlash.

Unit of Work

From Martin Fowler’s PEAA book:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

The unit of work pattern is a simple way to collect all the changes to a backing data store you need to be committed in a single transaction. It’s especially useful if you need to have several unrelated pieces of code collaborating on the same logical transaction. If you just pass around a “unit of work” container for the changes to each “worker” object or function, you can keep the various things completely decoupled from each other and still collaborate on a single business transaction.

In Marten, the unit of work is buried behind our IDocumentSession interface (that might change later). When you register changes to an IDocumentSession (inserts, updates, deletes, appending events, etc.), these changes are only staged until the SaveChanges() method is called to persist all the pending changes in one database transaction.

You can see Marten’s unit of work mechanics in action in the code sample below:

// theStore is a DocumentStore
using (var session = theStore.OpenSession())
{
    // All I'm doing here is recording references
    // to all the ADO.Net commands executed by
    // this session
    var logger = new RecordingLogger();
    session.Logger = logger;

    // Insert some new documents
    session.Store(new User {UserName = "luke", FirstName = "Luke", LastName = "Skywalker"});
    session.Store(new User {UserName = "leia", FirstName = "Leia", LastName = "Organa"});
    session.Store(new User {UserName = "wedge", FirstName = "Wedge", LastName = "Antilles"});


    // Delete all users matching a certain criteria
    session.DeleteWhere<User>(x => x.UserName == "hansolo");

    // deleting a single document by Id, if you had one
    session.Delete<User>(Guid.NewGuid());

    // Persist in a single transaction
    session.SaveChanges();

    // All of this was done in one batched command
    // in the same transaction
    logger.Commands.Count.ShouldBe(1);

    // I'm just writing out the Sql executed here
    var sql = logger.Commands.Single().CommandText;
    new FileSystem()
        .WriteStringToFile("unitofwork.sql", sql);

}

All of the database changes above are made in a single database call within one transaction (I added extra new lines to make it readable):

select mt_upsert_user(doc := :p0, docId := :p1);
select mt_upsert_user(doc := :p2, docId := :p3);
select mt_upsert_user(doc := :p4, docId := :p5);
delete from mt_doc_user as d where d.data ->> 'UserName' = :arg6;
delete from mt_doc_user where id=:p6;

As for how to consume IDocumentSession’s and how to scope them for transactional boundary management, see my blog post on transactions with RavenDb. My intention is for us to use the same conceptual setup with Marten at work.

Typically, I would recommend using an IDocumentSession per HTTP request or within the handling for a single service bus message. We even build our basic infrastructure around this concept. You still need an easy way to use explicit transaction boundaries with a new IDocumentSession on demand. I wrote about our transaction management strategies with RavenDb a couple years ago, and my intention is that we’ll use Marten in a very similar manner.

Marten on DotNetRocks

Remarkably coincidental with the new Marten v0.8 release, the DotNetRocks guys just published a podcast talking with me about Marten. Give it a shot if you wanna learn a little bit about the motivation behind Marten, using Postgresql for .Net development, and using “polyglot persistence.”

I think the DNR guys were trying to get me to trash talk a little bit about Marten versus its obvious competitor, so how about just saying that I feel like Marten will lead to more developer productivity than using Entity Framework or micro-ORM’s and better performance and reliability than existing NoSQL solutions for .Net.

Marten Gets Serious with v0.8

EDIT: DotNetRocks just published a new episode about Marten.

I just pushed a new v0.8 version of Marten to Nuget.org with a handful of new features, a lot of bugfixes, and a slew of refinements in response to feedback from our early adopters. I feel like this release added some polish to Marten, and I largely attribute that to how helpful our community has been in making suggestions and contributing pull requests based on their early usage.

The documentation site has been updated to reflect the new changes and additions to Marten for v0.8. The full list of changes and bug fixes for v0.8 is available in the GitHub issue history for this milestone.

Release Highlights

Marten got a lot more sophisticated about how it updates schema objects in the underlying Postgresql database. See the changes to AutoCreateSchemaObjects and the ability to update a table for additional “searchable” columns instead of dropping and recreating tables in the documentation.
“Bulk Deletes” by a query without first having to load documents before deleting them
The Linq parsing support got worked over pretty hard, including some new linq query support extensibility.
New mechanisms for instrumentation and diagnostics within Marten. I’ll have a blog post next week about how we’re going to integrate with this at work for profiling database activity in our web and service bus apps.
A lot of work to integrate the forthcoming event store functionality. I left this out of the documentation updates for now, but there’ll be a lot more about this later.

What’s Up Next?

There’s some preliminary planning for Marten v0.9 on GitHub. The very next feature I’m tackling is going to be our equivalent to RavenDb’s Include() feature (but I think we’re going to do it differently). Other than that, I think the theme of the next release is going to be addressing the “read side” of Marten applications with view projections by finally supporting “Select()” in the Linq support and some mix of Javascript or .Net projections.

A note on the versioning

While I am a big believer in semantic versioning (or at least I think it’s far better than having nothing), Marten is still pre 1.0 and a few public API’s did change. We haven’t discussed a timeline for the all important “1.0” release, but I’m hopeful that that can happen by this summer (2016).