Optimistic Concurrency with Marten

The Marten community has been making substantial progress toward a potential 1.0-alpha release early next week. As part of that effort, I’m going to be blogging about the new changes and features.

Recent versions of Marten (>0.9.5) have a new feature that allows you to enforce offline optimistic concurrency checks against documents that you are attempting to persist. You would use this feature if you’re concerned about a document in your current session having been modified by another session since you originally loaded the document.

I first learned about this concept from Martin Fowler’s PEAA book. From his definition, offline optimistic concurrency:

Prevents conflicts between concurrent business transactions by detecting a conflict and rolling back the transaction.

In Marten’s case, you have to explicitly opt into optimistic versioning for each document type. You can do that with either an attribute on your document type like so:

    [UseOptimisticConcurrency]
    public class CoffeeShop : Shop
    {
        // Guess where I'm at as I code this?
        public string Name { get; set; } = "Starbucks";
    }

Or by using Marten’s configuration API to do it programmatically:

    var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);

        // Configure optimistic concurrency checks
        _.Schema.For<CoffeeShop>().UseOptimisticConcurrency(true);
    });

Once optimistic concurrency is turned on for the CoffeeShop document type, a session will now only be able to update a document if the document has been unchanged in the database since it was initially loaded.

To demonstrate the failure case, consider the following acceptance test from Marten’s codebase:

[Fact]
public void update_with_stale_version_standard()
{
    var doc1 = new CoffeeShop();
    using (var session = theStore.OpenSession())
    {
        session.Store(doc1);
        session.SaveChanges();
    }

    var session1 = theStore.DirtyTrackedSession();
    var session2 = theStore.DirtyTrackedSession();

    var session1Copy = session1.Load<CoffeeShop>(doc1.Id);
    var session2Copy = session2.Load<CoffeeShop>(doc1.Id);

    try
    {
        session1Copy.Name = "Mozart's";
        session2Copy.Name = "Dominican Joe's";

        // Should go through just fine
        session2.SaveChanges();

        // When session1 tries to save its changes, Marten will detect
        // that the doc1 document has been modified and Marten will
        // throw an AggregateException
        var ex = Exception<AggregateException>.ShouldBeThrownBy(() =>
        {
            session1.SaveChanges();
        });

        // Marten will throw a ConcurrencyException for each document
        // that failed its concurrency check
        var concurrency = ex.InnerExceptions.OfType<ConcurrencyException>().Single();
        concurrency.Id.ShouldBe(doc1.Id);
        concurrency.Message.ShouldBe($"Optimistic concurrency check failed for {typeof(CoffeeShop).FullName} #{doc1.Id}");
    }
    finally
    {
        session1.Dispose();
        session2.Dispose();
    }

    // Just proving that the document was not overwritten
    using (var query = theStore.QuerySession())
    {
        query.Load<CoffeeShop>(doc1.Id).Name.ShouldBe("Dominican Joe's");
    }

}

Marten is throwing an AggregateException for the entire batch of changes being persisted from SaveChanges()/SaveChangesAsync() after rolling back the current database transaction. The individual ConcurrencyException’s inside of the aggregated exception expose information about the actual document type and identity that failed.

Schema Management with Marten (Why document databases rock)

This blog post describes the preliminary support and thinking behind how we’ll manage schema changes to production using Marten. I’m writing this to further the conversation one of our teams is having about how best to accomplish this. Expect the details of how this works to change after we face real world usages for awhile;)

Some of our projects at work are transitioning from RavenDb to an OSS project I help lead called Marten that uses Postgresql as a fully fledged document database. Among the very large advantages of document databases over relational databases is how much simpler it is to evolve a system over time because it takes so much less mechanical work to keep your document database synchronized to the application code.

Exhibit #1 in the case against relational databases is the need for laboriously tracking database migrations (assuming you give a damn about halfway decent software engineering in regards to your database).

Let’s compare the steps in adding a new property to one of your persisted objects in your system. Using a relational database with any kind of ORM (even if it describes itself as “micro” or “simple”), your steps in some order would be to:

Add the new property
Add a migration script that adds a new column to your database schema
Change your ORM mapping or SQL statements to reflect the new property

Using a document database approach like Marten’s, you’d:

Add the new property and continue on with your day

Notice which list is clearly shorter and simpler — not to mention less error prone for that matter.

Marten does still need to create matching schema objects in your Postgresql database, and it’s unlikely that any self-respecting DBA is going to allow your application to have rights to execute schema changes programmatically, so we’re stuck needing some kind of migration strategy as we add document types, Javascript transformations, and retrofit indexes. Fortunately, we’ve got a decent start on doing just that that’s demonstrated below:

Just Get Stuff Done in Development!

As long as you have rights to alter your Postgresql database, you can happily set up Marten in one of the “AutoCreate” modes and not worry about schema changes at all as you happily code new features and change existing document types:

var store = DocumentStore.For(_ =>
{
    // Marten will create any new objects that are missing,
    // attempt to update tables if it can, but drop and replace
    // tables that it cannot patch. 
    _.AutoCreateSchemaObjects = AutoCreate.All;


    // Marten will create any new objects that are missing or
    // attempt to update tables if it can. Will *never* drop
    // any existing objects, so no data loss
    _.AutoCreateSchemaObjects = AutoCreate.CreateOrUpdate;


    // Marten will create missing objects on demand, but
    // will not change any existing schema objects
    _.AutoCreateSchemaObjects = AutoCreate.CreateOnly;
});

As long as you’re using a permissive auto creation mode, you should be able to code in your application model and let Marten change your development database as needed behind the scenes.

Patching Production Databases

In the next section, I demonstrate how to dump the entire data definition language (DDL) that matches your Marten configuration as if you were starting from an empty database, but first, I want to focus on how to make incremental changes between production or staging releases.

In the real world, you’re generally not going to allow your application to willy nilly make changes to the running schema and you’ll be forced into this setting:

var store = DocumentStore.For(_ =>
{
    // Marten will not create or update any schema objects 
    // and throws an exception in the case of a schema object
    // not reflecting the Marten confi
guration
    _.AutoCreateSchemaObjects = AutoCreate.None;
});

This leaves us with the problem of how to get our production database matching however we’ve configured Marten in our application code. At this point, our theory is that we’ll use the “WritePatch” feature to generate delta DDL files:

IDocumentStore.Schema.WritePatch(string file);

When this is executed against a configured Marten document store, it will loop through all of the known document types, javascript transforms, the event store usage, and check the configured storage against the actual database. Marten writes two files, one to move your schema “up” to match the configured document store, and a second “drop” file that would rollback your database schema to reverse the changes in the “up” file.

The patching today is able to:

Add all new tables, indexes, and functions
Detect when a generated function has changed and rebuild it after dropping the old version
Determine which indexes are new or modified and generate the necessary DDL to match
Add the event store schema objects if they’re active and missing
Add the database objects Marten needs for its “Hilo” identity strategy

This is very preliminary, but my concept of how we’ll use this in real life (admittedly with some gaps) is to:

Use “AutoCreateSchemaObjects = AutoCreate.All” in development and CI and basically not worry at all about incremental schema changes.
For each deployment to staging or production, we’ll use the WritePatch() method shown above to generate a patch SQL file that will then be committed to Git.
I’m assuming that the patch SQL files generated by Marten could feed into a real database migration tool like RoundhousE, and we would incorporate RoundhousE into our automated deployments to execute the “up” scripts to the most current database version.

Dump all the Sql

If you just want a database script that will build all the necessary schema objects for your Marten configuration, you can export either a single file:

// Export the SQL to a file
store.Schema.WriteDDL("my_database.sql");

Or write a SQL file for each document type and functional area of Marten to a directory like this:

// Or instead, write a separate sql script
// to the named directory
// for each type of document
store.Schema.WriteDDLByType("some folder");

In the second usage, Marten also writes a file called “all.sql” that executes the constituent sql files in the correct order just in case you’re using Marten’s support for foreign keys between document types.

The SQL dumps from the two methods shown above will write out every possible database schema object necessary to support your Marten configuration (document types, the event store, and a few other things) including tables, the generated functions, indexes, and even a stray sequence or two.

Relational Databases are the Buggy Whips of Software Development

I think that there’s going to be a day when you tell your children stories about how we built systems against relational databases with ORM’s or stored procedures or hand written SQL and they’re going to be appalled at how bad we had it, much like I did when my grandfather told me stories about ploughing with a horse during the Great Depression.

My Mini NDC Oslo 2016 Wrapup

I had a great time at NDC Oslo last week and I’ve got to send a pretty big thank you to the folks behind the NDC conferences for what an outstanding job they did bringing it all together.

I got to see several old friends, meet some new folks, and generally have a blast interacting with other developers doing interesting work. I got to give a talk on using React and Redux to build a large application that I thought went pretty well (my audience was great and that always makes talks go much better for me). I’ll post the link to the video when that’s up.

As always, I was reminded that I like many folks far better in real life than I do their online personas — and I’ve received the same feedback about myself over the past decade too. Betting there’s some kind of deeper meaning to that and how we need to be more careful communicating on Twitter and trying to stop being so easily offended, but that’s too deep for a Monday morning for me;)

And as an “achievement unlocked,” I gave my entire conference talk without ever once opening Visual Studio.Net. I think being part of the de facto React.js track has to give me a touch of hipster cred.

So here’s what I saw and what stood out for me in terms of development trends:

Elixir

There was a lot of Elixir content and buzz. I’m not doing any code in it, but most of what I saw looks very positive. Erlang’s syntax has always thrown me off, but I could happily live with Ruby inspired syntax. I think that community is going to struggle just a bit by having to build everything (web frameworks, service bus, etc.) from scratch, and they maybe aren’t learning lessons from other communities as well as they could in their efforts.

I’ve got to say that Bryan Hunter‘s “Elixir for Node.js Developers” was one of my favorite sessions of the week.

Electron

I’m excited to try out Electron on a couple projects this year. I took in David Neal‘s talk on Electron and got to speak with some other folks that are using it. I’d love to turn the Storyteller 3 client and maybe a forthcoming admin tool for Marten into Electron apps. We have one big WPF application that is justifiably a desktop application, but we prefer to write user interfaces as web apps, and Electron could be a good long term solution for us.

On the state of .Net OSS, yet again

I’ve written too many belly gazing types of posts about the state of OSS in .Net lately (and a follow up), but that topic came up a couple times in Oslo. On the positive side, I had several folks come up and ask about Marten, a couple nice comments on StructureMap, and even a positive remark on FubuMVC.

On the negative side, I had to field the inevitable question about Marten in regards to its potential adoption in the greater .Net community: “is it really worth the effort?” It is from the perspective that my employer will benefit and I’m generally enjoying the work. I’m not suffering from any delusions about Marten taking off into the .Net mainstream — at least until it’s technically feasible to build Marten’s functionality on top of Sql Server.

One of the things I’ve said about the CoreCLR and ASP.Net Core efforts from Microsoft is that I think they’ll suck all the Oxygen out of the room for .Net OSS offerings for quite awhile, and NDC was a perfect example of that. The talks on anything related to CoreCLR were jam packed, and I really don’t remember there being much else .Net content to be honest. If you’re a .Net OSS enthusiast, I think I’d either restrict my efforts to add ons to MS’s work or just sit it out until the CoreCLR hype settles down.

Talking React & Redux at NDC Oslo Next Week

I’ll be at NDC Oslo next week to give a talk entitled An Experience Report from Building a Large React.js/Redux Application. While I’ll definitely pull some examples from some of our ongoing React.js projects at work, most of the talk will be about the development of the new React.js-based web application within the open source Storyteller 3 application.

Honestly, just to help myself start getting more serious about preparation, I’m thinking that I’ll try to cover:

How the uni-directional flow idea works in a React.js application
The newer things in ES2015 that I think make React components easier to write and read. That might be old news to many people, but it’s still worth talking about
Using React.js for dynamically determined layouts and programmatic form generation
The transition from an adhoc Flux-like architecture built around Postal.js and hand-rolled data stores to a more systematic approach with Redux.
Utilizing the combination of pure function components in React.js with the react-redux bridge package and what that means for testability.
The very positive impact of Redux in your ability to automate behavioral testing of your screens, especially when the application has to constantly respond to frequent messages from the server
Integrating websocket communication to the Redux store, including a discussion of batching, debouncing, and occasional snapshots to keep the server and client state synchronized without crushing the browser from too many screen updates.
Designing the shape of your Redux state to avoid unnecessary screen updates.
Composing the reducer functions within your Redux store — as in, how will Jeremy get out of the humongous switch statement of doom?
Using Immutable.js within your Redux store and why you would or wouldn’t want to do that

By no means is this an exhaustive look at the React/Redux ecosystem (I’m not talking about Redux middleware, the Redux specific development tools, or GraphQL among other things), but it’s the places where I think I have something useful to say based on my experiences with Storyteller in the past 18 months.

The last time I was at NDC (2009) I gave a total of 8 talks in 4 days. While I still hope to get to do a Marten talk while I’m there, I’m looking forward to not being a burnt out zombie this time around with the much lighter speaking load;)

My surprisingly positive take on .Net Core’s current direction

I’m not nearly as upset about the recent changes and churn in the direction of .Net Core as many of the folks I follow online. Mostly it’s because I was refusing to invest very much into it during the early stages and therefore, didn’t really lose much when the direction shifted. Honestly, my main thought after the changes in direction is how much less rework it’s going to take to move some of the tools I support to .Net Core and less work for me is always a win.

My thoughts, such as they are:

True cross platform .Net? As soon as JetBrains Rider has a test runner and supports .Net Core, my plan is to switch almost all of my day to day development work to the Mac side of things and keep my Windows VM off. Yes I know about Mono, but it never worked out very well for any project where I tried to use it.
Strong naming is going to have much less negative impact on day to day development (maybe none) with the changes to how .Net Core will resolve assemblies. As some of you may know, I feel like strong naming is a huge source of friction and holds back the .Net OSS ecosystem by adding extra cost to development through binding conflicts and all the extra work OSS developers have to do to (hopefully) shield their downstream users from potential binding conflict issues. In other words, the author of Newtonsoft.Json will no longer be the most hated person in the entire .Net world once binding conflicts go away. The OSS signing option and the VS.Net or Nuget kinda, sorta being able to write binding redirects for you were not sufficient solutions for the strong naming pain.
Finally getting working wildcard includes in whatever the CsProj file replacement is. I just finished resolving a merge conflict with a *.csproj file after rebasing an older branch and it’s such a pain in the neck. Another common source of friction in .Net development gone.
On the subject of AppDomain’s getting put back in for .Net Core, I have mixed feelings. Taking out AppDomain’s and Remoting was going to almost completely denude .Net of working automated testing tools. I know there’s some talk and work toward a lightweight AssemblyLoadContext that might be a replacement, but I’ve found very, very little information about it and most of that has been contradictory. I don’t really like messing with .Net Remoting and separate AppDomain’s, but I wasn’t looking forward to making some kind of .Net Core alternative from scratch.
I’ve seen other folks making the point that .Net is now going to avoid the nasty Python 2/3 style bifurcation of its entire ecosystem. I don’t think all the common OSS tools are going to be quickly moved to .Net Core because of people waiting for it to be stable, but now the mechanics of doing that is going to be much less.
On the demise of project.json and the new, hopefully cutdown csproj file, I suspect that there’s some pretty seriously harmful coupling in MSBuild itself. It should have been possible to use project.json as just a new configuration mechanism for the underlying MSBuild engine. Hopefully for all our sakes, they get those structural issues resolved in the new project file system. I definitely approve of their plans to decouple much more of the project system from Visual Studio.Net.
I would hope that the fallback to csproj files means that Paket development continues. I personally think that Paket does a better job from the command line than OOTB Nuget.
I like some of the new mechanics around the new “dotnet” CLI support. I think they did a nice job of taking some of the things I like about the Node.js/NPM ecosystem. I’ve never thought that the .Net teams at Redmond were all that great at innovating minus huge hits like Linq or Roslyn, but they are pretty good at adapting ideas from other communities.
On the communication and mismanaged expectation front? Yeah, I think they blew that one pretty badly, but it’s not the end of the world for most of us. I suspect the problem was due to the organization structure in Microsoft and the lack of collaboration between some of the groups — but that seems to be better now.
The static linker sounds cool, and having far easier mechanisms for supporting multiple versions of the .Net runtime is going to be great for the OSS projects that try to support everything. I’m not all that wild about microservices, but I think that the .Net Core/static linker/Kestrel combination would make .Net a lot more attractive for developing microservice architectures.

For my part, StructureMap already supports .Net Core. Marten is going to get .Net Core soon, except we’re going to punt on running unit tests in .Net Core for awhile due to our usage of NSubstitute and that not supporting .Net Core yet. Storyteller is a lot more complicated and I want things to settle down before I even think of doing that one. Since .Net Core is no longer all that different from existing .Net 4.5/6, our current thinking is to just restart FubuMVC work and slowly morph that into a new, much more efficient and far smaller framework.

Automated Testing of Message Based Systems

Technology wise, everything in this blog post is related to OSS projects you’re very unlikely to ever use (FubuMVC and Storyteller 3), but I’d hope that the techniques and ideas would still be useful in whatever technical stack you happen to be using. At a minimum, I’m hoping this blog post is useful to some of our internal teams who have to wrestle with the tooling and the testing scenarios in this blogpost. And who knows, we still might try to make a FubuMVC comeback (renamed as “Jasper”) if the dust ever does settle on .Net Core and ASP.Net Core.

We heavily utilize message-based architectures using service bus tools at work. We’ve also invested somewhat in automating testing against those message based integrations. Automating tests against distributed systems with plenty of asynchronous behavior has been challenging. We’ve come up with a couple key aspects of our architectures and technical tooling that I feel have made our automated testing efforts around message-based architectures easier mechanically, more reliable, and even create some transparency into how the system actually behaves.

My recipe for automating tests against distributed, messaging systems is:

Try to choose tools that allow you to “inject” configuration programmatically at runtime. As I’ll try to make clear in this post, a tight coupling to the .Net configuration file can almost force you into having to swallow the extra complexity of separate AppDomain’s
Favor tools that require minimal steps to deploy. Ideally, I love using tools that allow for fullblown xcopy deployment, at least in functional testing scenarios
Try very hard to be able to run all the various elements of a distributed application in a single process because that makes it so much simpler to coordinate the test harness code with all the messages going through the greater system. It also makes debugging failures a lot simpler.
At worst, have some very good automated scripts that can install and teardown everything you need to be running in automated tests
Make sure that you can cleanly shut your distributed system down between tests to avoid having friction from locked resources on your testing box
Have some mechanism to completely rollback any intermediate state between tests to avoid polluting system state between tests

Repeatable Application Bootstrapping and Teardown

The first step in automating a message-based system is simply being able to completely and reliably start the system with all the related publisher and subscriber nodes you need to participate in the automated test. You could demand a black box testing approach where you would deploy the system under test and all of its various services by scripting out the installation and startup of Windows services or *nix daemon processes or programmatically launching other processes to host various distributed elements from the command line.

Personally, I’m a big fan of using whitebox testing techniques to make test automation efforts much more efficient, meaning that I think

My strong preference however is to adopt architectures that make it possible for the test automation harness itself to bootstrap and start the entire distributed system, rather than depending on external scripts to deploy and start the application services. In effect, this has meant running all of the normally distributed elements collapsed down to the single test harness process.

Our common technical stack at the moment is:

FubuMVC 3 as both our web application framework and as our service bus
Storyteller 3 as our test automation harness for integrated acceptance testing
LightningQueues as a “store and forward” persistent queue transport. Hopefully you’ll hear a lot more about this tool when we’re able to completely move off of Esent and onto LightningDB as the underlying storage engine.
Serenity as an add-on library as a standard recipe to host FubuMVC applications in Storyteller with diagnostics integration

In all cases, these tools can be programmatically bootstrapped in the test harness itself. LightningQueues being “x-copy deployable” as our messaging transport has been a huge advantage over queueing tools that require external servers or installations. Additionally, we have programmatic mechanisms in LightningQueues to delete any previous state in the persistent queues to prevent leftover state bleeding between automated tests.

After a lot of work last year to streamline the prior mess, a FubuMVC web and/or service bus node application is completely described with a FubuRegistry class (here’s a link to an example), conceptually similar to the “Startup” concept in the new ASP.Net Core.

Having the application startup and teardown expressed in a single place makes it easy for us to launch the application from within a test harness. In our Serenity/Storyteller projects, we have a base “SerenitySystem” class we use to launch one or more applications before any tests are executed. To build a new SerenitySystem, you just supply the name of a FubuRegistry class of your application as the generic argument to SerenitySystem like so:

public class TestSystem : SerenitySystem<WebsiteRegistry>

That declaration above is actually enough to tell Storyteller how to bootstrap the application defined by “WebsiteRegistry.”

It’s not just about bootstrapping applications, it’s also about being able to cleanly shut down your distributed application. By cleanly, I mean that you release all system resources like IP ports or file locks (looking at you Esent) that could prevent you from being able to quickly restart the application. This becomes vital anytime it takes more than one iteration of code changes to fix a failing test. My consistent experience over the years is that the ability to quickly iterate between code changes and executing a failing test is vital for productivity in test automation efforts.

We’ve beaten this problem through standardization of test harnesses. The Serenity library “knows” to dispose the running FubuMVC application when it shuts down, which will do all the work of releasing shared resources. As long as our teams use the standard Serenity test harness, they’re mostly set for clean activation and teardown (it’s an imperfect world and there are *always* complications).

Bootstrapping the Entire System in One Process

Ideally, I’d prefer to get away with running all of the running service bus nodes in a single AppDomain. Here’s a pretty typical case for us, say you have a web application defined as “WebRegistry” that communicates bi-directionally with a headless service bus service defined by a “BusRegistry” class that normally runs in a Windows service:

    // FubuTransportRegistry is a special kind of FubuRegistry
    // specifically to help configure service bus applications
    // BusSettings would hold configuration data for BusRegistry
    public class BusRegistry : FubuTransportRegistry
    {
    }

    public class WebRegistry : FubuTransportRegistry
    {
        
    }

To bootstrap both the website application and the bus registry in a single AppDomain, I could use code like this:

        public static void Bootstrap_Two_Nodes()
        {
            var busApp = FubuRuntime.For<BusRegistry>();
            var webApp = FubuRuntime.For<WebRegistry>(_ =>
            {
                // WebRegistry would normally run in full IIS,
                // but for the test harness we tend to use 
                // Katana to run all in process
                _.HostWith();
            });

            // Carry out tests that depend on both
            // busApp and webApp
        }

This is the ideal, and I hope to see us use this pattern more often, but it’s often defeated by tools that have hard dependencies on .Net’s System.Configuration that may cause configuration conflicts between running nodes. Occasionally we’ll also have conflicts between assembly versions across the nodes or hit cases where we cannot have a particular assembly deployed in one of the nodes.

In these cases we resort to using a separate AppDomain for each service bus application. We’ve built this pattern into Serenity itself to standardize the approach with what I named “Remote Systems.” For an example from the FubuMVC acceptance tests, we execute tests against an application called “WebsiteRegistry” running as the main application in the test harness, and open a second AppDomain for a testing application called “ServiceNode” that exchanges messages with “WebsiteRegistry.” In the Serenity test harness, I can just declare the second AppDomain for “ServiceNode” like so:

    public class TestSystem : SerenitySystem<WebsiteRegistry>
    {
        public TestSystem()
        {
            // Running the "ServiceNode" app in a second AppDomain
            AddRemoteSubSystem("ServiceNode", x =>
            {
                x.UseParallelServiceDirectory("ServiceNode");
                x.Setup.ShadowCopyFiles = false.ToString();
            });
        }

The code above directs Serenity to start a second application found at a directory named “ServiceNode” that is parallel to the main application. So if the main application is hosted at “src/FubuMVC.IntegrationTesting”, then “ServiceNode” is at “src/ServiceNode.” The Serenity harness is smart enough to figure out how to launch the second AppDomain pointed at this other directory. Serenity can also bootstrap that application by scanning the assemblies in that AppDomain to find a class that inherits from FubuRegistry — very similar to how ASP.Net Core is going to use the Startup class convention to bootstrap applications.

The biggest problem now is generally in dealing with the asynchronous behavior in the different AppDomain’s and “knowing” when it’s safe for the test harness to start checking for the outcome of the messages that are processed within the “arrange” and “act” portions of a test. In a following post, I’ll talk about the tooling and technique we use to coordinate activities between the different AppDomain’s.

This All Changes in .Net Core

AppDomain’s and .Net Remoting all go away in .Net Core. While no one is really that disappointed by those features going away, I think that’s going to set back the state of test automation tools in .Net for a little while because almost every testing tool uses AppDomain’s in some fashion. For our part, my thought is that we’d move to launching all new Process’s for the additional service bus nodes in testing.

I know there’s also some plans for an “AssemblyLoadContext” that sounds to me like a better, lightweight way of doing the kinds of Assembly loading sandboxes for testing that are only possible with separate AppDomain’s in .Net today. Other than rumors and an occasionally cryptic hint from members of the ASP.Net team, there’s basically no information about what AssemblyLoadContext will be able to do.

The new “Startup” mechanism in .Net Core might serve in the exact same fashion that our FubuRegistry does today in FubuMVC. That should make it much easier to carry out this strategy of collapsing distributed or microservice applications down into a single process.

I also think that the improved configuration mechanisms in ASP.Net Core (is this in .Net Core itself? I really don’t know). The tight coupling of certain libraries we use to the existence of that single app.config/web.config file today is a headache and the main reason we’re sometimes forced into the more complicated, multiple AppDomain issue.

What’s Next

This blog post went a lot longer than I anticipated, so I cut it in half. Next time up I’ll talk about how we coordinate the test harness timing with the various messaging nodes and how we expose application diagnostics in automated test outputs to help understand what’s actually happening in your application when things fail or just run too slowly.

New Marten Release and What’s Next?

I uploaded a new Nuget for Marten v0.9.2 yesterday with a couple new features and about two weeks worth of bug fixes and some refinements. You can find the full list of issues and pull requests in this release from the v0.9.1 and v0.9.2 milestones in GitHub.

The highlight of this release in terms of raw usability is probably some overdue improvements to Marten’s underlying schema management:

Marten can detect when the configured upsert functions are missing or do not match the configuration and rebuild them
Marten can detect missing or changed indexes and make the appropriate updates.

Some other things that are new:

There’s now a synchronous batch querying option
You can now use the AsJson() Linq operator in combination with Select() transforms (this is going to get its own blog post soon-ish).
The default transaction isolation level is ReadCommitted
It won’t provide much value until there’s more there, but I’ve added some rolling buffer queueing support for being able to do asynchronous projections in the event store. There’ll be a blog post about that one soon just to see if I can trick some of you into being technical reviewers or contributors on that one;)

The two big features are discussed below:

Paging Support

We flat out copied part of RavenDb’s paging support for more efficient paging support. Take the example of showing a large data set in a user interface and wanting to do that one page at a time. You need to know how many total documents match the query criteria to be able to present an accurate paging bar. Fortunately, you can get that total number now without making a second round trip to the database with this syntax:

// We're going to use stats as an output
// parameter to the call below, so we
// have to declare the "stats" object
// first
QueryStatistics stats = null;

var list = theSession
    .Query<Target>()
    .Stats(out stats)
    .Where(x => x.Number > 10).Take(5)
    .ToList();

list.Any().ShouldBeTrue();

// Now, the total results data should
// be available
stats.TotalResults.ShouldBe(count);

In combination with the existing support for the Take() and Skip() Linq operators, you should have everything you need for efficient paging with Marten.

Include() inside of Compiled Queries

The Include() feature is now usable from within the compiled query feature, so finally, two of our best features for optimizing data access can work together. Below is a sample:

public class IssueByTitleIncludingUsers : ICompiledQuery<Issue>
{
    public string Title { get; set; }
    public User IncludedAssignee { get; private set; } = new User();
    public User IncludedReported { get; private set; } = new User();
    public JoinType JoinType { get; set; } = JoinType.Inner;

    public Expression<Func<IQueryable<Issue>, Issue>> QueryIs()
    {
        return query => query
            .Include<Issue, IssueByTitleIncludingUsers>(x => x.AssigneeId, x => x.IncludedAssignee, JoinType)
            .Include<Issue, IssueByTitleIncludingUsers>(x => x.ReporterId, x => x.IncludedReported, JoinType)
            .Single(x => x.Title == Title);
    }
}

What’s Next?

Besides whatever bug fixes come up, I think the next things I’m working on for the document database support are soft deletes, bulk insert improvements, and finally getting a versioned document story going. On the event store side of things, it’s all about projections. We’ll have a working asynchronous projection feature in the next release, maybe support for arbitrary categorization inside of aggregated projections, and some preliminary support for Javascript projections.

Got other requests, needs, or problems with Marten? Tell us all about it anytime in the Gitter room.

Using Mocks or Stubs, Revisited

This post is a request from work. As part of our ongoing effort to convert a very large application from RavenDb to Marten for its backing persistence, we want to take some time to reconsider our automated testing strategies in regards to using unit tests versus integration tests versus full end to end tests.

I originally wrote a series of blog posts on these subjects in my CodeBetter days about a decade ago (the formatting of the old posts has all been destroyed after several blog engine migrations, sorry about that):

I read over those posts just now and feel like the content mostly holds up today (concepts and fundamentals tend to be useful much longer than any specific technology).

What’s Different Today?

For one, a couple years back there was a somewhat justified backlash against the over use of mocking libraries in unit tests. For another, I think there’s more of a bias in favor of doing much more integration testing and a focus on “vertical slice” testing than there used to be. For my part, I’d say that I use mock objects much more rarely now than I would have a decade ago.

Most of my coding efforts these days go into OSS projects, so a quick rundown:

Marten is mostly tested through integration tests all the way through to the underlying Postgresql database. I did a count today and only found about 5-6 test fixture classes that use NSubstitute for mocking or stubbing. This is mostly because I’m not confident enough in my knowledge of Postgresql JSON manipulation or the underlying Npgsql library mechanics to put much faith in intermediate tests that would just verify expected SQL strings or verifying the “correct” interaction with ADO.Net objects.
StructureMap is an old codebase, and you can see many different styles of testing throughout its history. At this point, there are very, very few unit tests (~20 out of about 1100) that use mock objects. With StructureMap, it’s generally very fast to set up full end to end scenarios for new features or bug fixes and that’s now what I prefer.

I’m not yet ready to say that mocking libraries are ready for the waste bin, but even I use them much more rarely now. Some of that may be that I’m in very different problem domains than I was a decade ago when we were still using the Model View Presenter architecture that made the usages of mocking more natural. What little client side work I do today is all in Javascript in the browser, and integration testing is much easier in that context than it ever was in WinForms or WPF and I don’t feel the need for mocks nearly as often.

State vs Interaction Testing

I spent a lot of time in my summers growing up helping my grandfather around the farm. Since there’s nothing in this world that breaks more often than farming implements (hay balers were the worst, but I remember the combine being pretty nasty too), I spent a lot of time using wrenches of all sorts. Ratchets when you could to be faster, closed ended wrenches if you had enough room to attach them, or open ended wrenches for tight spaces where you just couldn’t get to it otherwise. My point being that all those different tools for the same basic job were necessary to effectively get at all the bolts and nuts in all the crazy angles and tight spaces we ran into working on my grandfather’s equipment.

wrench-988762_960_720

Similar to my old choice of open or closed ended wrenches, when you’re writing automated tests, you’re measuring the expected outcome of the test in two ways:

By measurable changes in the system state or a value returned from a function call. You might be verifying data written to the system database, or files being dropped, or content appearing on a screen. This is what’s known as state-based testing.
By watching the messages sent between classes or subsystems. This style of testing is interaction-based testing.

Most developers are going to be more naturally comfortable with state-based testing and many developers will flat out refuse to use anything different. That being said, interaction testing is often the “open ended wrench” of test automation you have to use when it’s going to be much more work to verify some part of the system with state based checks.

A common objection I’ve seen in regards to mocking is “why would I ever care that a particular method was called?” I’m going to apply a little bit of sophistry here, because you don’t necessarily care that a method was called so much as “did it make the right decision” that happens to be determined.

Interaction testing is probably most effective when you’re purposely separating code that decides what actions to take next from actually taking that action — especially if your choice on verifying that action was taken is to either:

Verify that a particular message was sent to initiate the action
Check through the state of an external database, file, or email inbox to see if we can spot some kind of evidence that the desired action took place

My preference in this kind of case has always been to use mock objects to test that the decision to carry out the action takes place correctly, and then test that action completely independently.

Some examples of when I would still reach for interaction testing, often with a mock object:

Routing type of logic like “send this box to this warehouse”
Deciding to carry out actions like
From one of our systems at work, whether or not we should mark an active call as “on hold” or in an “active” state.

To get more specific, and since this post was originally supposed to be specifically about how we should be testing with Marten, here’s some examples of operations that I think are or are not appropriate to mock:

[Fact]
public void good_usages()
{
    var session = Substitute.For<IDocumentSession>();

    // Were the changes to a session saved?
    session.Received().SaveChanges();

    // or not
    session.DidNotReceive().SaveChanges();

    // Was a new Issue document stored into the session?
    var issue = new Issue {};
    session.Received().Store(issue);
}

[Fact]
public void not_places_or_interfaces_you_should_mock()
{
    var session = Substitute.For<IDocumentSession>();

    User user = null;

    // No possible way you should stub this functionality
    // Go to a fullblown integration test
    var issue = session
        .Query<Issue>()
        .Include<User>(x => x.AssigneeId, x => user = x)
        .FirstOrDefault(x => x.Title.StartsWith("some problem"));
}

Mocks versus Stubs

A lot of people get genuinely upset about even trying to make a distinction between mocks and stubs and some mocking tools even made it a point of pride to make absolutely no distinction between the two different things. Regardless of implementation, the important difference is strictly in the role of the testing “double” within the test.

“Stub” means that you are simply pre-canning the response to a request for data. You use stubs just to set up test inputs
“Mock” objects are used to verify the interactions with the object

To put that in context, here’s an example of stubbing and mocking using NSubstitute against Marten’s IDocumentSession interface:

[Fact]
public void stub_versus_mock()
{
    // Create a substitute IDocumentSession that could be used
    // as either a mock or stub
    var session = Substitute.For<IDocumentSession>();
    
    // Use session as a stub to return known data for
    // the given issue id
    var issueId = Guid.NewGuid();
    session.Load<Issue>(issueId).Returns(new Issue{});

    // Then inside the real code,
    var issue = session.Load<Issue>(issueId);

    // If the real code should be committing the current
    // Marten unit of work, we could verify that by seeing
    // if the SaveChanges() method was called.

    session.SaveChanges();

    // Using session as a mock object
    session.Received().SaveChanges();
}

I may be the last person left who thinks this, but I think it’s still valuable to be conscious of the different roles of mocks and stubs when you’re formulating a testing strategy. I also prefer that folks use the terminology correctly just for better communication. In reality though, most developers are so confused by the differences that most people have just thrown their hands up in the air and use the terms all interchangeably.

When are Stubs Desirable?

Automated tests are only effective when you can consistently create a known system state and inputs, then exercise the system to measure expected outputs. Quite often your system will have some architectural dependency on data provided by some kind of external system or subsystem that’s either impossible to control or just too much mechanical work to setup.

As I talked about last year in Succeeding with Automated Testing, I strongly believe in the effectiveness and efficiency of whitebox testing over insisting that only black box tests are valid. Following that philosophy, I’ll almost automatically prefer to use stubs in place of any external dependency that we can’t really control in the test data setup. Examples from our work include:

An identity service from a government entity that isn’t always available
A centralized database where we have no ability to set up on our own (using Docker might be an alternative here to start from a known state). Shared, centralized databases are hell in automated tests.
Authentication subsystems, especially Windows authentication
Active Directory access

Avoid Chatty Interfaces

Anytime you’re having to deal with a chatty interface that takes a lot of interactions to perform. Taking our Marten project as an example, it takes a lot of calls to manipulate the ADO.Net objects to do the simple work of resolving a document from a single row returned from an ADO.Net reader:

public virtual T Resolve(int startingIndex, DbDataReader reader, IIdentityMap map)
{
    if (reader.IsDBNull(startingIndex)) return null;

    var json = reader.GetString(startingIndex);
    var id = reader[startingIndex + 1];

    return map.Get <T> (id, json);
}

I could technically test the Resolve() method shown above by using mock objects in place of the IIdentityMap and the underlying database reader, but my tests would require several mocking expectations just to get both the reader and identity map to behave the way the test needs. A mocked test would certainly run faster than an end to end test, but in this case, only the end to end test would tell me with any confidence that the Resolve() method truly works as it should.

Avoid Mocking Interfaces You Don’t Understand

I used to say that you shouldn’t mock an interface that you don’t control, but the real danger is mocking any interface that you don’t entirely understand. I say this because it’s perfectly common to write a unit test using mock objects that passes, then turns out to fail when you run it in real life or within some kind of integration test. Maybe the best way of saying this is that it’s only appropriate to use mocking when the interaction being measured will definitely tell you something useful about how the code is going to behave.

Marten v0.9 is Out!

The Marten community made a big, big release today that marks a big milestone for Marten’s viability as a usable product. I just uploaded v0.9 to Nuget, and published all the outstanding documentation updates to our project website.

A lot of folks contributed to this release, and I’ll inevitably miss several names, so I’m just going to issue a huge thank you to everyone who put in pull requests or provided valuable feedback in our Gitter room or helped with the documentation updates. Marten is well on its way to being the most positive OSS experience I’ve ever had.

While you can see the full list of changes, fixes, and improvements in the GitHub milestone (81 issues!), the big highlights are:

The Event Store feature is finally ready for very early adopters with some basic projection support. I discussed the vision for this functionality in a blog post yesterday. The very early feedback suggests that the existing API is in for some changes, but I’d still like to get more folks to look at what we have.
Compiled Queries to avoid re-parsing Linq queries
The Include() feature for more efficient data fetching
Select() transformations inside of the Linq support
Improved new ways to fetch the raw JSON for documents with the new AsJson() method.
The ability to configure which schema Marten will use for document storage or the event store as part of the DocumentStore options. That was a frequent user request, and thanks to Tim Cools, we’ve got that now.
Ability to express foreign key relationships between document types. Hey, Postgresql is a relational database, so why not take advantage of that when it’s valuable?
More efficient asynchronous querying internals
Linq support for aggregate functions like Average(), Min(), Max(), and Sum()

Next Steps

Other than the event store functionality that’s just getting off the ground, I think the pace of Marten development is about to slow down and be more about refinements than adding lots of new functionality. I’m hoping to switch to dribbling out incremental releases every couple weeks until we’re ready to declare 1.0.

Marten as an Event Store — Now and the Future Vision

The code shown in this post isn’t quite out on Nuget yet, but will be part of Marten v0.9 later this week when I catch up on documentation.

From the very beginning, we’ve always intended to use Marten and Postgresql’s JSONB support as the technical foundation for event sourcing persistence. The document database half of Marten has been a more urgent need for my company, but lately we have a couple project teams and several folks in our community interested in seeing the event store functionality proceed. Between a couple substantial pull requests and me finally getting some time to work on it, I think we finally have a simplistic event store implementation that’s ready for early adopters to pull it down and kick the tires on it.

Why a new Event Store?

NEventStore has been around for years, and we do use an older version on one of our big applications. It’s working, but I’m not a big fan of how it integrates into the rest of our architecture and it doesn’t provide any support for “readside projections.” Part of the rationale behind Marten’s event store design is the belief that we could eliminate a lot of our hand coded projection support in a big application.

There’s also GetEventStore as a standalone, language agnostic event store based on an HTTP protocol. While I like many elements of GetEventStore and am taking a lot of inspiration from that tool for Marten’s event store usage, we prefer to have our event sourcing based on the Postgresql database instead of a standalone store.

Why, you ask?

Since we’re already going to be using Postgresql, this leads to fewer things to deploy and monitor
We’ll be able to just use all the existing DevOps tools for Postgresql like replication and backups without having to bring in anything new
Building on a much larger community and a more proven technical foundation
Being able to do event capture, document database changes, and potentially just raw SQL commands inside of the same native database transaction. No application in our portfolio strictly uses event sourcing as the sole persistence mechanism, and I think this makes Marten’s event sourcing feature compelling.
The projection documents published by Marten’s event store functionality are just more document types and you’ll have the full power of Marten’s Linq querying, compiled querying, and batched querying to fetch the readside data from the event storage.

Getting Started with Simple Event Capture

Marten is just a .Net client library (4.6 at the moment), so all you need to get started is to have access to a Postgresql 9.4/9.5 database and to install Marten via Nuget.

Because I’ve read way too much epic fantasy series, my sample problem domain is an application that records, analyses, and visualizes the status of quests. During a quest, you may want to record events like:

QuestStarted
MembersJoined
MembersDeparted
ArrivedAtLocation

With Marten, you would want to describe these events with simple, serializable DTO classes like this:

public class MembersJoined
{
    public MembersJoined()
    {
    }

    public MembersJoined(int day, string location, params string[] members)
    {
        Day = day;
        Location = location;
        Members = members;
    }

    public int Day { get; set; }

    public string Location { get; set; }

    public string[] Members { get; set; }

    public override string ToString()
    {
        return $"Members {Members.Join(", ")} joined at {Location} on Day {Day}";
    }
}

Now that we have some event classes identified and built, we can start to store a “stream” of events for a given quest as shown in this code below:

var store = DocumentStore.For("your connection string");

var questId = Guid.NewGuid();

using (var session = store.OpenSession())
{
    var started = new QuestStarted {Name = "Destroy the One Ring"};
    var joined1 = new MembersJoined(1, "Hobbiton", "Frodo", "Merry");

    // Start a brand new stream and commit the new events as 
    // part of a transaction
    session.Events.StartStream(questId, started, joined1);
    session.SaveChanges();

    // Append more events to the same stream
    var joined2 = new MembersJoined(3, "Buckland", "Merry", "Pippen");
    var joined3 = new MembersJoined(10, "Bree", "Aragorn");
    var arrived = new ArrivedAtLocation { Day = 15, Location = "Rivendell" };
    session.Events.Append(questId, joined2, joined3, arrived);
    session.SaveChanges();
}

Now, if we want to fetch back all of the events for our new quest, we can do that with:

using (var session = store.OpenSession())
{
    // events are an array of little IEvent objects
    // that contain both the actual event object captured
    // previously and metadata like the Id and stream version
    var events = session.Events.FetchStream(questId);
    events.Each(evt =>
    {
        Console.WriteLine($"{evt.Version}.) {evt.Data}");
    });
}

When I execute this code, this is the output that I get:

1.) Quest Destroy the One Ring started
2.) Members Frodo, Merry joined at Hobbiton on Day 1
3.) Members Merry, Pippen joined at Buckland on Day 3
4.) Members Aragorn joined at Bree on Day 10
5.) Arrived at Rivendell on Day 15

At this point, Marten is assigning a Guid id, timestamp, and version number to each event (but thanks to a recent pull request, you do not have to have an Id property or field on your event classes). I didn’t show it here, but you can also fetch all of the events for a stream by the version number or timestamp to perform historical state queries.

Projection Support

What I’m showing in this section is brand spanking new and isn’t out on Nuget yet, but will be as Marten v0.9 by the middle of this week (after I get the documentation updated).

So raw event streams might be useful to some of you, but we think that Marten will be most useful when you combine the raw event sourcing with “projections” that create parallel “readside” views of the event data suitable for consumption in the rest of your application for concerns like validation, business logic, or supplying data through HTTP services.

Let’s say that we need a view of our quest data just to see what the current member composition of our quest party is. In our event store usage, you would create an “aggregate” document class and teach Marten how to update that aggregate based on event data types. The easiest way to expose the aggregation right now is to expose public “Apply([event type])” methods like the QuestParty class shown below:

public class QuestParty
{
    private readonly IList _members = new List();

    public string[] Members
    {
        get
        {
            return _members.ToArray();
        }
        set
        {
            _members.Clear();
            _members.AddRange(value);
        }
    }

    public IList Slayed { get; } = new List();

    public void Apply(MembersJoined joined)
    {
        _members.Fill(joined.Members);
    }

    public void Apply(MembersDeparted departed)
    {
        _members.RemoveAll(x => departed.Members.Contains(x));
    }

    public void Apply(QuestStarted started)
    {
        Name = started.Name;
    }

    public string Name { get; set; }

    public Guid Id { get; set; }

    public override string ToString()
    {
        return $"Quest party '{Name}' is {Members.Join(", ")}";
    }
}

Without any further configuration, I could create a live aggregation of a given quest stream calculated on the fly by using this syntax:

using (var session = store.OpenSession())
{
    var party = session.Events.AggregateStream(questId);
    Console.WriteLine(party);
}

When I execute the code above for our new “Destroy the One Ring” stream of events, this is the output:

Quest party 'Destroy the One Ring' is Frodo, Merry, Pippen, Aragorn

Great, but since it’s potentially expensive to calculate the QuestParty from large streams of events, we may opt to have the aggregated view built in our Marten database ahead of time. Let’s say that we’re okay with just having the aggregate view updated every time a quest event stream is appended to. For that case, you can register a “live” aggregation in your DocumentStore initialization with this code:

var store2 = DocumentStore.For(_ =>
{
    _.Events.AggregateStreamsInlineWith<QuestParty>();
});

With the configuration above, the “QuestParty” view of a stream of related quest events is updated as part of the transaction that is persisting the new events being appended to the event log. If we were using this configuration, then we would be able to query Marten for the “QuestParty” as just another document type:

If your browser is cutting off the code formatting, the syntax is “session.Load<QuestParty>(questId)” down below:

using (var session = store.OpenSession())
{
    var party = session.Load<QuestParty>(questId);
    Console.WriteLine(party);

    // or
    var party2 = session.Query<QuestParty>()
        .Where(x => x.Name == "Destroy the One Ring")
        .FirstOrDefault();
}

Our Projections “Vision”

A couple years ago, I got to do what turned into a proof of concept project for building out an event store on top of Postgresql’s JSON support. My thought for Marten’s projection support is largely taken from this blog post I wrote on the earlier attempt at writing an event store on Postgresql.

Today the projection ability is very limited. So far you can use the live or “inline” aggregation of a single stream shown above or a simple pattern that allows you to create a single readside document for a given event type.

The end state we envision is to be able to allow users to:

Express projections in either .Net code or by using Javascript functions running inside of Postgresql itself
To execute the projection building either “inline” with event capture for pure ACID, asynchronously for complicated aggregations or better performance (and there comes eventual consistency back into our lives), or do aggregations “live” on demand. We think that this break down of projection timings will give users the ability to handle systems with many writes, but few reads with on demand projections, or to handle systems with few writes, but many reads with inline projections.
To provide and out of the box “async daemon” that you would host as a stateful process within your applications to continuously calculate projections in the background. We want to at least experiment with using Postgresql’s NOTIFY/LISTEN functionality to avoid making this a purely polling process.
Support hooks to perform your own form of event stream processing using the existing IDocumentSessionListener mechanism and maybe some way to plug more processors into the queue reading in the async daemon described above
Add some “snapshotting” functionality that allows you to perform aggregated views on top of occasional snapshots every X times an event is captured on an aggregate
Aggregate data across streams
Support arbitrary categorization of events across streams

Anything I didn’t cover that you’re wondering about — or just want to give us some “constructive feedback” — please feel free to pop into Marten’s Gitter room since we’re talking about this today anyway;)