Health Monitoring and Task Reassignment in our Service Bus Applications

FubuMVC 3.0 actually has a full blown service bus framework that started as an add on project called “FubuTransportation.” We’ve used it in production for 3 years, we’re generally happy with it, and it’s the main reason why we’ve done an about face and decided to continue with FubuMVC again.

Corey Kaylor and I have been actively working on FubuMVC again. We’re still planning a reboot of at least the service bus functionality to the CoreCLR with a more efficient architecture next year (“Jasper“), but for right now we’re just working to improve the performance and reliability of our existing service bus applications. The “health monitoring” and persistent task functionality explained here has been in our codebase for a couple years and used a little bit in production, but we’re about to try to use it for something mission critical for the first time. I’d love to have any feedback or suggestions for improvements you might have. All the code shown here is pulled from this namespace in GitHub.

A Distributed System Spread Over Several Nodes

For the sake of both reliability and the potential for horizontal scaling later, we want to be able to deploy multiple instances of our distributed application to different servers (or separate processes on the same box as shown below:

BasicApp — A distributed application behind a load balancer

We generally employ hardware load balancers to distribute incoming requests through all the available nodes. So far, all of this is pretty typical and relatively straight forward as long as any node can service any request.

However, what if your architecture includes some kind of stateful “agent” that can, or at least should, be active on only one of the nodes at a time?

I’m hesitant to describe what we’re doing as Agent Oriented Programming, but that’s what I’m drawing on to think through this a little bit.

Agents — “Agent” worker processes should only be running on a single node

In our case, we’re working with a system that is constantly updating a “grid” of information stored in memory and directing work through our call centers. Needless to say, it’s a mission critical process. What we’re attempting to do is to make the active “agent” for that planning grid be managed by FubuMVC’s service bus functionality so that it’s always running on exactly one node in the cluster. That means that we need to be able to:

Have the various nodes constantly checking up on each other to make sure that agent is running somewhere and the assigned node is actually up and responsive
Be able to initiate the assignment of that agent to a new node if it is not running at all
Potentially shut down any extraneous instances of that agent if there is more than one running

Years ago, Chris Patterson of MassTransit fame explained something to me called the Bully Algorithm that can be used for exactly this kind of scenario. With a lot of help from my colleague Ryan Hauert, we came up with the approach described in this post.

Persistent Tasks

I reserve the right to change the name later (IAgent maybe?), but for now the key interface for one of these sticky agents is shown below:

public interface IPersistentTask
{
    Uri Subject { get; }

    // This is supposed to be the health check
    // Should throw an exception if anything is wrong;)
    void AssertAvailable();
    void Activate();
    void Deactivate();
    bool IsActive { get; }

    // This method would perform the actual assignment
    Task<ITransportPeer> SelectOwner(IEnumerable<ITransportPeer> peers);
}

Hopefully the interface is largely self descriptive. We were already using Uri’s throughout the rest of the code, and that made sense to us to use that to identify the persistent tasks. This interface gives developers the hooks to start or stop the task from running, a way to do health checks, and a way to apply whatever kind of custom owner selection algorithm you want.

These persistent tasks are added to a FubuMVC application by registering an instance of this interface into the application container (there is a simple recipe for standalone tasks that deals with both interfaces in one class):

public interface IPersistentTaskSource
{
    // The scheme or protocol from the task Uri's
    string Protocol { get; }

    // Subjects of all the tasks built by this
    // object that should be running
    IEnumerable<Uri> PermanentTasks();

    // Create a task object for the given subject
    IPersistentTask CreateTask(Uri uri);
}

The IPersistentTaskSource might end up going away as unnecessary complexity in favor of just directly registering IPersistentTask’s. It was built with the idea of running, assigning, and monitoring agents per customer/tenant/region/etc. I’ve built a couple systems in the past half decade where it would have been very advantageous to have had that functionality.

The ITransportPeer interface used in the SelectOwner() method models the available nodes and it’s described in the next section.

Modeling the Nodes

The available nodes are modeled by the ITransportPeer shown below:

public interface ITransportPeer
{
        // Try to make this node take ownership of a task
	Task<OwnershipStatus> TakeOwnership(Uri subject);

        // Tries to ask the peer what the status is for all
        // of its assigned tasks
	Task<TaskHealthResponse> CheckStatusOfOwnedTasks();

	void RemoveOwnershipFromNode(IEnumerable<Uri> subjects);

	IEnumerable<Uri> CurrentlyOwnedSubjects();

	string NodeId { get; }
	string MachineName { get; }
	Uri ControlChannel { get; }

        // Shutdown a running task
	Task<bool> Deactivate(Uri subject);
}

ITransportPeer’s come in just two flavors:

A class called PersistentTaskController that directly controls and manages the tasks on the executing node.
A class called TransportPeer that represents one of the external nodes. The methods in this version send messages to the control channel of the node represented by the peer object and wait for a matching response. The other nodes will consume those messages and make the right calls on the local PersistentTaskController.

Reassigning Tasks

Now that we have a way to hook in tasks and a way to model the available peers, we need some kind of mechanism within IPersistentTask classes to execute the reassignment. Right now, the only thing we’ve built and used so far is a simple algorithm to assign a task based on an order of preference using the OrderedAssignment class shown below:

public class OrderedAssignment
{
	private readonly Uri _subject;
	private readonly ITransportPeer[] _peers;
	private int _index;

	public OrderedAssignment(Uri subject, IEnumerable<ITransportPeer> peers)
	{
		_subject = subject;
		_peers = peers.ToArray();
		_index = 0;
	}

	public async Task<ITransportPeer> SelectOwner()
	{
		return await tryToSelect().ConfigureAwait(false);
	}

	private async Task<ITransportPeer> tryToSelect()
	{
		var transportPeer = _peers[_index++];

		try
		{
			var status = await transportPeer.TakeOwnership(_subject).ConfigureAwait(false);

			if (status == OwnershipStatus.AlreadyOwned || status == OwnershipStatus.OwnershipActivated)
			{
				return transportPeer;
			}
		}
		catch (Exception e)
		{
			Debug.WriteLine(e);
		}

		if (_index >= _peers.Length) return null;

		return await tryToSelect().ConfigureAwait(false);
	}
}

Inside of an IPersistentTask class, the ordered assignment could be used something like this:

public virtual Task<ITransportPeer> SelectOwner(IEnumerable<ITransportPeer> peers)
{
    // it's lame, but just order by the control channel Uri
    var ordered = peers.OrderBy(x => x.ControlChannel.ToString());
    var completion = new OrderedAssignment(Subject, ordered);

    return completion.SelectOwner();
}

Health Monitoring via the Bully Algorithm

So now we have a way to model persistent tasks, reassign tasks, and model the connectivity to all the available nodes.

Inside of PersistentTaskController is this method that checks all the known persistent task state on every known running node:

public async Task EnsureTasksHaveOwnership()
{
	// Go run out and check the status of all the tasks that are
	// theoretically running on each node
	var healthChecks = AllPeers().Select(async x =>
	{
		var status = await x.CheckStatusOfOwnedTasks().ConfigureAwait(false);
		return new { Peer = x, Response = status };
	}).ToArray();

	var checks = await Task.WhenAll(healthChecks).ConfigureAwait(false);

	// Determine what corrective steps, if any, should be taken
        // to ensure that every known task is running in just one place
	var planner = new TaskHealthAssignmentPlanner(_permanentTasks);
	foreach (var check in checks)
	{
		planner.Add(check.Peer, check.Response);
	}

	var corrections = planner.ToCorrectionTasks(this);

	await Task.WhenAll(corrections).ConfigureAwait(false);

	_logger.Info(() => "Finished running task health monitoring on node " + NodeId);
}

In combination with the TaskHealthAssignmentPlanner class, this method is able to jumpstart any known tasks that either aren’t running or were running on a node that is no longer reachable or reports that its tasks are in an error state.

The EnsureTasksHaveOwnership() method is called from a system level polling job running in a FubuMVC application. There’s an important little twist on that though. To try to ensure that there’s much less chance of unpredictable behavior from the health monitoring checks running on each node simultaneously, the timing of the polling interval is randomized from this settings class:

public double Interval
{
    get
    {
        // The *first* execution of the health monitoring takes
        // place 100 ms after the app is initialized
        if (_initial)
        {
            _initial = false;
            return 100;
        }
                
        // After the first call, the polling interval is randomized
        // between each call
        return Random.Next(MinSeconds, MaxSeconds) * 1000;
    }
}

I found an article advising you to randomize the intervals somewhere online at the time we were building this two years ago, but I don’t remember where that was:(

By using the bully algorithm, we’re able to effectively make a cluster of related nodes able to check up on each other and start up or reassign any tasks that have gone down. We’re utilizing this first to do a “ready standby” failover of an existing system.

Actually Doing the Health Checks

The health check needs to run some kind of “heartbeat” action implemented through the IPersistentTask.AssertAvailable() method on each persistent task object to ensure that it’s really up and functioning. The following code is taken from PersistentTaskController where it does a health check on each running local task:

public async Task<TaskHealthResponse> CheckStatusOfOwnedTasks()
{
	// Figure out which tasks are running on this node right now
	var subjects = CurrentlyOwnedSubjects().ToArray();

	if (!subjects.Any())
	{
		return TaskHealthResponse.Empty();
	}

	// Check the status of each running task by calling the
	// IPersistentTask.AssertAvailable() method
	var checks = subjects
		.Select(async subject =>
		{
			var status = await CheckStatus(subject).ConfigureAwait(false);
			
			return new PersistentTaskStatus(subject, status);
		})
		.ToArray();

	var statusList = await Task.WhenAll(checks).ConfigureAwait(false);

	return new TaskHealthResponse
	{
		Tasks = statusList.ToArray()
	};
}

public async Task<HealthStatus> CheckStatus(Uri subject)
{
	var agent = _agents[subject];

	return agent == null 
		? HealthStatus.Unknown 
		: await checkStatus(agent).ConfigureAwait(false);
}

private static async Task<HealthStatus> checkStatus(IPersistentTaskAgent agent)
{
	return agent.IsActive
		? await agent.AssertAvailable().ConfigureAwait(false)
		: HealthStatus.Inactive;
}

Subscription Storage

Another obvious challenge is how does each node “know” about its peers? FubuMVC pulls that off with its “subscription” subsystem. In our case, each node is writing information about itself to a shared persistence store (mostly backed by RavenDb in our ecosystem, but we’re moving that to Marten). The subscription persistence also enables each node to discover its peers.

Once the subscriptions are established, each node can communicate with all of its peers through the control channel addresses in the subscription storage. That basic architecture is shown below with the obligatory boxes and arrows diagram:

Subscriptions

The subscription storage was originally written to enable dynamic message subscriptions between systems, but it’s also enabled our health monitoring strategy shown in this post.

Control Queue

We need the health monitoring and subscription messages between the various nodes to be fast and reliable. We don’t want the system level messages getting stuck in queues that might be backed up with normal activity. To that end, we finally put the idea of a designated “control channel” into FubuMVC so that you can designate a single channel as the preferred mechanism for sending control messages.

The syntax for making that designation is shown below in a code sample taken from FubuMVC’s integrated testing:

public ServiceRegistry()
{
    // The service bus functionality is "opt in"
    ServiceBus.Enable(true);

    // I explain what "Service" in the next code sample
    Channel(x => x.Service)
        // Listen for incoming messages on this channel
        .ReadIncoming()

        // Designate this channel as preferred for system level messages         
        .UseAsControlChannel()

        // Opts into LightningQueue's non-persistent mode               
        .DeliveryFastWithoutGuarantee(); 

    // I didn't want the health monitoring running on this node
    ServiceBus.HealthMonitoring
        .ScheduledExecution(ScheduledExecution.Disabled);
}

If you’re wondering what in the world “x => x.Service” refers to in the code above, that just ties into FubuMVC’s strong typed configuration support (effectively the same concept as the new IOptions configuration in ASP.Net Core, just with less cruft;)). The application described by ServiceRegistry shown above also includes a class that holds configuration items specific to this application:

public class TestBusSettings
{
    public Uri Service { get; set; } = "lq.tcp://localhost:2215/service".ToUri();
    public Uri Website { get; set; } = "lq.tcp://localhost:2216/website".ToUri();
}

The primary transport mechanism we use is LightningQueues (LQ), an OSS library built and maintained by my colleague Corey Kaylor. LQ is normally a “store and forward” queue, but it has a new, opt-in “non persistent” mode (like ZeroMQ, except .Net friendly) that we can exploit for our control channels in FubuMVC. In the case of the control queues, it’s advantageous to not persist those messages anyway.

My Concerns

It’s damn complicated and testing was obscenely hard. I’m a little worried about network hiccups causing it to unnecessarily try to reassign tasks. We might put some additional retries into the health checks. The central subscription persistence is a bit of a concern too because that’s a single point of failure.

Quick Twitch Coding with TestDriven.Net

EDIT: There’s a newer version available here.

I started working in earnest with CoreCLR and project.json-enabled projects a couple weeks ago, and by “working” I mean upgrading tools and cleaning out detritus in my /bin folders until I could actually sweet talk my computer into compiling code and running tests. I’ve been very hesitant to jump into the CoreCLR world in no small part because Test Driven Development (TDD) is still my preferred way to write code and I felt like the options for test runners in the CoreCLR ecosystem has temporarily taken a huge step backward from classic .Net in my opinion (not having AppDomain’s in CoreCLR knocked out a lot of the existing testing tools).

Fortunately, there’s a functioning EAP of TestDriven.Net – my long time favorite test runner – that works with xUnit and CoreCLR that dropped a couple weeks ago that I’m already using. You can download the alpha version of TestDriven.Net here.

If you’re not familiar with TestDriven.Net, it’s a very lightweight addon for Visual Studio.Net that allows you to run NUnit/xUnit.Net/Fixie tests through keyboard shortcuts or context menu commands. The test output is just the VS output window, so there’s no performance hit from launching a heavier graphical tool or updating UI. It’s simple and maybe a little crude, but I’ve always been a fan of TestDriven.Net because it supports a keyboard-centric workflow that makes it very easy to quickly transition from writing code to running tests and back again.

One of my pet peeves is working with folks in the main office who constantly give me lectures about why I should be using vim then proceed to use some absurdly clumsy mouse-centric process to trigger unit tests while I try hard to remain patient.

How I Use It

One of the few customizations I do to my Visual Studio.Net setup is to map the TestDriven.Net keyboard shortcuts to the list below. I’m not saying this is the ultimate way to use it, but I’ve done it for years and it’s worked out well for me.

CTRL-1: Run test(s). Put the cursor inside a single test, inside a test class outside of a method, or on a namespace declaration and use the keyboard shortcut to immediately build and execute the selected tests
CTRL-2: Rerun the last test(s). When I’m doing real TDD my common workflow is to write the next test (or a couple tests), then run the tests once just to make them TestDriven.Net’s active set. After that, I switch to writing the real code, trigger the CTRL-2 shortcut. From there TestDriven.Net will try to save all outstanding files with changes, recompile, and run the previously selected tests. I like this workflow, especially when it takes more than a single attempt to make a test pass, because it’s much faster than finding the right test to run via any kind of mouse-centric process. Warning though, this shortcut will run the test in the debugger if you previously debugged through the unit test the last time.
CTRL-3: Rerun the last test(s) in the debugger. Ideally, you really don’t want to spend a lot of time using the debugger, but when you do, it’s really nice to be able to quickly jump into the exact right place.
CTRL-4: Rerun the last test(s) in the original context. Say I have to jump into the debugger to figure out why a test is failing. As soon as I make the changes that I expect to fix the issue, I can trigger CTRL-4 to re-run the current test set without the debugger.
CTRL-5: Run all tests in the solution. For simpler solutions, I’ve typically found that running tests this way is faster than the corresponding command line tooling — but that advantage seems to have gone away with the new “dotnet test” tooling.

Why not auto-test?

I’m actually not a big fan of auto test tools, at least not on any kind of sizable project and test suite. I really liked using Mocha in its watched mode with Growl in my Javascript work, but even that started to break down when the project started getting larger.

My experience is that auto-test mechanisms are too slow a feedback cycle and they don’t allow you to very easily zero in on the subset of the system you’re actually interested in. Plus I’m getting really tired of Mocha tests getting accidentally checked in with temporary “.only()” calls;)

In addition, my opinion is that “dotnet watch test” functionality doesn’t become terribly useful to me until it’s integrated with something like Growl. Even then, I don’t think I would use it on anything but the smallest test suites.

I will admit thought that I’ve never tried out NCrunch and plenty of the folks I interact with like that, so maybe I’ll change my mind on this one later.

Building a Producer Consumer Queue with TPL Dataflow

I had never used the TPL Dataflow library until this summer and I was very pleasantly surprised at how easy and effective it was.

In my last post I introduced the new “Async Daemon” feature in Marten that allows you to continuously update projected views over the event store as new events are captured in the system. In essence, the async daemon has to do two things:

Fetch event data from the underlying Postgresql database and put it into the form that the projections and event processors expect
Run the event data previously fetched through each projection or event processor and commit any projected document views back to the database.

Looking at it that way, the async daemon looks like a good fit for a producer/consumer queue. In this case, the event fetching “produces” batches of events for the projection “consumer” to process downstream. The goal of this approach is to improve overall throughput by allowing the fetching and processing to happen in parallel.

I had originally assumed that I would use Reactive Extensions for the async daemon, but after way too much research and dithering back and forth on my part, I decided that the TPL Dataflow library was a better fit in this particular case.

The producer/consumer queue inside of the async daemon consists of a couple main players:

The Fetcher class is the “producer” that continuously polls the database for the new events. It’s smart enough to pause the polling if there are no new events in the database, but otherwise it’s pretty dumb.
An instance of the IProjection interface that does the actual work of processing events or updating projected documents from the events.
The ProjectionTrack class acts as a logical controller to both Fetcher and IProjection
A pair of ActionBlock‘s from the TPL Dataflow library used as the consumer queue for processing events and a second queue for coordinating the activities within ProjectionTrack.

In the pure happy path workflow of the async daemon, it functions like this sequence diagram below:

AsyncDaemonSequence

The Fetcher object runs continuously fetching a new “page” of events and queues each page where it will be consumed by ProjectionTrack in its ExecutePage() method in a different thread.

The usage of the ActionBlock objects to connect the workflow together turned out to be pretty simple. In the following code taken from the ProjectionTrack class, I’m setting up the ActionBlock for the execution queue with a lambda to call the ExecutePage() method. One thing to notice is that I had to configure a couple options to ensure that each item enqueued to that ActionBlock is executed serially in the same order that it was received.

_executionTrack 
    = new ActionBlock<EventPage>(page => ExecutePage(page, _cancellation.Token),
	new ExecutionDataflowBlockOptions
	{
		MaxDegreeOfParallelism = 1,
		EnsureOrdered = true
	});

The value of the ActionBlock class usage is that it does all the heavy lifting for me in regards to the threading. The ActionBlock will trigger the ExecutePage() method in a different thread and ensure that every page is executed sequentially.

Incorporating Backpressure

I also wanted to incorporate the idea of “back pressure” so that if the event fetching producer is getting too far ahead of the event processing consumer, the async daemon would stop fetching new events to prevent spikes in memory usage and possibly reserve more system resources for the consumer until the consumer could catch up.

To do that, there’s a little bit of logic in ProjectionTrack that checks how many events are queued up in the execution track shown above and pauses the Fetcher if the configured threshold is exceeded:

public async Task CachePage(EventPage page)
{
	// Accumulator is just a little helper used to
	// track how many events are in flight
	Accumulator.Store(page);

	// If the consumer is backed up, stop fetching
	if (Accumulator.CachedEventCount > _projection.AsyncOptions.MaximumStagedEventCount)
	{
		_logger.ProjectionBackedUp(this, Accumulator.CachedEventCount, page);
		await _fetcher.Pause().ConfigureAwait(false);
	}


	_executionTrack?.Post(page);
}

When the consumer works through enough of the staged events, ProjectionTrack knows to restart the Fetcher to begin producing new pages of events:

// This method is called after every EventPage is successfully
// executed
public Task StoreProgress(Type viewType, EventPage page)
{
	Accumulator.Prune(page.To);

	if (shouldRestartFetcher())
	{
		_fetcher.Start(this, Lifecycle);
	}

	return Task.CompletedTask;
}

The actual “cooldown” logic inside of ProjectionTrack is implemented in this method:

private bool shouldRestartFetcher()
{
	if (_fetcher.State == FetcherState.Active) return false;

	if (Lifecycle == DaemonLifecycle.StopAtEndOfEventData && _atEndOfEventLog) return false;

	if (Accumulator.CachedEventCount <= _projection.AsyncOptions.CooldownStagedEventCount &&
		_fetcher.State == FetcherState.Paused)
	{
		return true;
	}

	return false;
}

To make this more concrete, by default Marten will pause a Fetcher if the consuming queue has over 1,000 events and won’t restart the Fetcher until the queue goes below 500. Both thresholds are configurable.

As I said in my last post, I thought that the async daemon overall was very challenging, but I felt that the usage of TPL Dataflow went very smoothly.

Doing it the Old Way with BlockingCollection

In the past, I’ve used the BlockingCollection to build producer/consumer queues in .Net. In the Storyteller project, I used producer/consumer queues to parallelize executing batches of specifications by dividing the work in stages that all do some kind of work on a “SpecExecutionRequest” object (read in the specification file, do some preparation work to build a “plan”, and finally to actually execute the specification). At the heart of that is a the ConsumingQueue class that allows you to queue up tasks for one of these SpecExecutionRequest stages:

    public class ConsumingQueue : IDisposable, IConsumingQueue
    {
        private readonly BlockingCollection<SpecExecutionRequest> _collection =
            new BlockingCollection<SpecExecutionRequest>(new ConcurrentBag<SpecExecutionRequest>());

        private Task _readingTask;
        private readonly Action<SpecExecutionRequest> _handler;

        public ConsumingQueue(Action<SpecExecutionRequest> handler)
        {
            _handler = handler;
        }

        public void Dispose()
        {
            _collection.CompleteAdding();
            _collection.Dispose();
        }

        // This does not block the caller
        public void Enqueue(SpecExecutionRequest plan)
        {
            _collection.Add(plan);
        }

        private void runSpecs()
        {
            // This loop runs continuously and calls _handler() for
            // each plan added to the queue in the method above
            foreach (var request in _collection.GetConsumingEnumerable())
            {
                if (request.IsCancelled) continue;

                _handler(request);
            }
        }

        public void Start()
        {
            _readingTask = Task.Factory.StartNew(runSpecs);
        }
    }

For more context, you can see how these ConsumingQueue objects are assembled and used in the SpecificationEngine class in the Storyteller codebase.

After doing it both ways, I think I prefer the TPL Dataflow approach over the older BlockingCollection mechanism.

Offline Event Processing in Marten with the new “Async Daemon”

The feature I’m talking about here was very difficult to write, brand new, and definitely in need of some serious user testing from anyone interested in kicking the tires on it. We’re getting a lot of interest in the Marten Gitter room about doing the kinds of use cases that the async daemon described below is meant to address. This was also the very last feature on Marten’s “must have for 1.0” list, so there’s a new 1.0-alpha nuget for Marten. 1.0 is still at least a couple months away, but it’s getting closer.

A couple weeks ago I pulled the trigger on a new, but long planned, feature in Marten we’ve been calling the “async daemon” that allows users to build and update projected views against the event store data in a background process hosted in your application or an external service.

To put this in context, let’s say that you are building an application to track the status of a Github repositories with event sourcing for the persistence. In this application, you would record events for things like:

Project started
A commit pushed into the main branch
Issue opened
Issue closed
Issue re-opened

There’s a lot of value to be had by recording the raw event data, but you still need to frequently see a rolled up view of each project that can tell you the total number of open issues, closed issues, how many lines of code are in the project, and how many unique contributors are involved.

To do that rollup, you can build a new document type called ActiveProject just to present that information. Optionally, you can use Marten’s built in support for making aggregated projections across a stream by adding Apply([Event Type]) methods to consume events. In my end to end tests for the async daemon, I used this version of ActiveProject (the raw code is on GitHub if the formatting is cut off for you):

    public class ActiveProject
    {
        public ActiveProject()
        {
        }

        public ActiveProject(string organizationName, string projectName)
        {
            ProjectName = projectName;
            OrganizationName = organizationName;
        }

        public Guid Id { get; set; }
        public string ProjectName { get; set; }

        public string OrganizationName { get; set; }

        public int LinesOfCode { get; set; }

        public int OpenIssueCount { get; set; }

        private readonly IList<string> _contributors = new List<string>();

        public string[] Contributors
        {
            get { return _contributors.OrderBy(x => x).ToArray(); }
            set
            {
                _contributors.Clear();
                _contributors.AddRange(value);
            }
        }

        public void Apply(ProjectStarted started)
        {
            ProjectName = started.Name;
            OrganizationName = started.Organization;
        }

        public void Apply(IssueCreated created)
        {
            OpenIssueCount++;
        }

        public void Apply(IssueReopened reopened)
        {
            OpenIssueCount++;
        }

        public void Apply(IssueClosed closed)
        {
            OpenIssueCount--;
        }

        public void Apply(Commit commit)
        {
            _contributors.Fill(commit.UserName);
            LinesOfCode += (commit.Additions - commit.Deletions);
        }
    }

Now, you can update projected views in Marten at the time of event capture with what we call “inline projections.” You could also build the aggregated view on demand from the underlying event data. Both of those solutions can be appropriate in some cases, but if our GitHub projects are very active with a fair amount of concurrent writes to any given project stream, we’d probably be much better off to move the aggregation updates to a background process.

That’s where the async daemon comes into play. If you have a Marten document store, you can start up a new instance of the async daemon like so (the underlying code shown below is in GitHub):

[Fact] 
public async Task build_continuously_as_events_flow_in()
{
    // In the test here, I'm just adding an aggregation for ActiveProject
    StoreOptions(_ =>
    {
        _.Events.AsyncProjections.AggregateStreamsWith<ActiveProject>();
    });

    using (var daemon = theStore.BuildProjectionDaemon(logger: _logger, settings: new DaemonSettings
    {
        LeadingEdgeBuffer = 1.Seconds()
    }))
    {
        // Start all of the configured async projections
        daemon.StartAll();

        // This just publishes event data
        await _fixture.PublishAllProjectEventsAsync(theStore);


        // Runs all projections until there are no more events coming in
        await daemon.WaitForNonStaleResults().ConfigureAwait(false);

        await daemon.StopAll().ConfigureAwait(false);
    }

    // Compare the actual data in the ActiveProject documents with 
    // the expectation
    _fixture.CompareActiveProjects(theStore);
}

In the code sample above I’m starting an async daemon to run the ActiveProject projection updating, and running a series of events through the event store. The async daemon is continuously detecting newly available events and applying those to the correct ActiveProject document. This is the only place in Marten where we utilize the idea of eventual consistency to allow for faster writes, but it’s clearly warranted in some cases.

Rebuilding a Projection From Existing Data

If you’re going to use event sourcing with read side projections (the “Q” in your CQRS architecture), you’re probably going to need a way to rebuild projected views from the existing data to fix bugs or add new data. You’ll also likely introduce new projected views after the initial rollout to production. You’ll absolutely need to rebuild projected view data in development as you’re iterating your system.

To that end, you can also use the async daemon to completely tear down and rebuild the population of a projected document view from the existing event store data.

// This is just some test setup to establish the DocumentStore
StoreOptions(_ => { _.Events.AsyncProjections.AggregateStreamsWith<ActiveProject>(); });

// Publishing some pre-canned event data
_fixture.PublishAllProjectEvents(theStore);


using (var daemon = theStore.BuildProjectionDaemon(logger: _logger, settings: new DaemonSettings
{
    LeadingEdgeBuffer = 0.Seconds()
}))
{
    await daemon.Rebuild<ActiveProject&gt().ConfigureAwait(false);
}

Taken from the tests for the async daemon on Github.

Other Functionality Possibilities

The async daemon can be described as just a mechanism to accurately and reliably execute the events in order through the IProjection interface shown below:

    public interface IProjection
    {
        Type[] Consumes { get; }
        Type Produces { get; }

        AsyncOptions AsyncOptions { get; }
        void Apply(IDocumentSession session, EventStream[] streams);
        Task ApplyAsync(IDocumentSession session, EventStream[] streams, CancellationToken token);
    }

Today, the only built in projections in Marten are to do one for one transformations of a certain event type to a view document and the aggregation by stream use case shown above in the ActiveProject example. However, there’s nothing preventing you from creating your own custom IProjection classes to:

Aggregate views across streams grouped by some kind of classification like region, country, person, etc.
Project event data into flat relational tables for more efficient reporting
Do complex event processing

What’s Next for the Async Daemon

The async daemon is the only major thing missing from the Marten documentation, and I need to fill that in soon. This blog post is just a down payment on the async daemon docs.

I cut a lot of content out on how the async daemon works. Since I thought this was one of the hardest things I’ve ever coded myself, I’d like to write a post next week just about designing and building the async daemon and see if I can trick some folks into effectively doing a code review on it;)

This was my first usage of the TPL Dataflow library and I was very pleasantly surprised by how much I liked using it. If I’m ambitious enough, I’ll write a post later on building producer/consumer queues and using back pressure with the dataflow classes.

StructureMap 4.3 Fully Embraces CoreCLR

EDIT 8/2: A lot of folks are asking me why SM targets both Netstandard 1.3 and 1.5 and I left that explanation out of the blog post because I was in too much of a hurry yesterday. The only single difference is that with 1.5 StructureMap can try to load an assembly by file path, which only comes into play if you’re using StructureMap to discover assemblies from the file path and an assembly name does not match the file name. I thought it was worthwhile to drop down to 1.3 without that small feature to expand StructureMap’s reach. We’ll see if I’m even remotely right.

I just uploaded StructureMap 4.3 to Nuget today. The big change (95% of my work) was to completely embrace the new world order of CoreCLR, the dotnet cli, and (for now) the project.json system. As such, I consolidated all of the real code back into the root StructureMap.dll project and relied on conditional compilation. This release also adds a great deal of functionality for type scanning and assembly scanning to the CoreCLR targets that were previously only available in the full .Net framework version.

StructureMap >=4.0 supported the CoreCLR through the old “dotnet” target, but we were only compiling to that target. Between users having Nuget issues with the old nomenclature and a CoreCLR specific bug, it was time to convert all the way and make sure that the tests were running against the CoreCLR version of StructureMap on our CI server.

What a StructureMap user needs to know before adopting 4.3…

The Nuget now targets .Net 4.5, Netstandard 1.3, and Netstandard 1.5
PCL profiles have been dropped for now, but I’m willing to try to put that back in if anyone requests it. That’s definitely a place where I’d love to have some help because I don’t do any mobile development that would test out that build.
The old StructureMap.Net4 assembly that used to be in the StructureMap nuget is gone. I’m relying on conditional compilation instead.
Any project that uses FubuMVC 3’s service bus should probably update to 4.3 for a big performance optimization that was impacting that functionality.

The complete list of changes and bug fixes is here.

Indexing Options in Marten

The road to Marten 1.0 continues with a discussion of the indexing options that we directly support against Postgresql.

We’re aiming to make Marten be usable for a wide range of application scenarios, and an obvious one is to make querying faster. To that end, Marten has direct support for adding a couple different flavors of indexes to optimize querying.

In all cases, Marten’s schema migration support can detect changes, additions, and removals of index definitions.

Calculated Index

After getting some feedback from a 2nd Quadrant consultant, the recommended path for optimizing queries against JSON documents in Marten is to use a Postgresql calculated index, which Marten can build for you with:

    var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);
        _.Schema.For<Issue>().Index(x => x.Number);
    });

Marten creates this index behind the scenes against the Issue storage table:

CREATE INDEX mt_doc_issue_idx_number ON public.mt_doc_issue 
    ((CAST(data ->> 'Number' as integer)));

The advantages of using a calculated index are that you’re not duplicating storage and you’re causing fewer database schema changes as compared to our original “Duplicated Field” approach that’s described in the next section.

It’s not shown here, but there is some ability to configure how the calculated index is created. See the documentation on calculated indexes for an example of that usage.

I’ve been asked several times what it would take for Sql Server to add before it could support Marten. The calculated index feature as applicable to the JSONB data type isn’t explicitly necessary, but it’s a big advantage that Postgresql has over Sql Server at the moment.

Duplicated Field

Marten’s original approach was to optimize querying against designated fields by just duplicating the value within the JSON document into a separate database table column, and indexing that column. Marten does this behind the scenes when you use the Foreign Key option. Some of our users will opt for a duplicated field if they want to issue their own queries against the document table without having to worry about JSON locators.

To make a field duplicated, you can either use the [Duplicated] attribute:

    public class Team
    {
        public Guid Id { get; set; }

        [DuplicateField]
        public string Name { get; set; }
    }

Or you can specify the duplicated fields in the StoreOptions for your document store:

    using (var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);

        _.Schema.For<User>().Duplicate(x => x.UserName);
    }))
    {

    }

If you decide to add a duplicated field to an existing document type, Marten’s schema migration support is good enough to add the column and fill in the values from the JSON document as part of patching. Even so, we will recommend the computed index approach in the section above to simplify your schema migrations.

It’s not shown here, but you have quite a bit of flexibility in configuring exactly what index type and applicability. See the documentation on duplicated fields for an example.

Gin Index

If you’re needing to issue a lot of variable adhoc queries against a Marten document, you may want to opt for a Gin index. A gin index against a Postgresql JSONB object creates a generalized index of key/value pairs and arrays within the parsed JSON document. To add a Gin index to a Marten document type, you need to explicitly configure that document type like this:

    var store = DocumentStore.For(_ =>
    {
        _.Schema.For<Issue>().GinIndexJsonData();
    });

You can also decorate your document class with the [GinIndexed] attribute. It’s not shown above, but there are options to customize the index generated.

When the DDL for the Issue document is generated, you would see a new index added to its table like this one:

CREATE INDEX mt_doc_issue_idx_data ON public.mt_doc_issue USING gin ("data" jsonb_path_ops);

Do note that using a Gin index against a document type will result in slightly slower inserts and updates to that table. From our testing, it’s not that big of a hit, but still something to be aware of.

Soft Deletes in Marten

Yet more content on new features in Marten leading up to the 1.0 release coming soon. These posts aren’t getting many reads, but they’ll be stuck in Google/Bing for later users, so expect a couple more of these.

As part of the 1.0 release work, Marten gained the capability last week (>0.9.8) to support the concept of “soft deletes.” Instead of just deleting a document completely out of the database, a soft delete would just mark a “deleted” column in the database table to denote that the row is now obsolete. The value of a “soft delete” is simply that you don’t lose any data from the historic record. The downsides are now you’ve got more rows cluttering up your database and you probably need to filter out deleted documents from your queries.

To see this in action, let’s first configure a document type in Marten as soft deleted:

    var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);
        _.Schema.For<User>().SoftDeleted();
    });

By default, Marten does a hard delete of the row, so you’ll need to explicitly opt into soft deletes per document type. There is also a [SoftDeleted] attribute in Marten that you could use to decorate a document type to specify that it should be soft deleted.

When a document type is soft deleted, Marten adds a couple extra fields to the document storage table in the database called “mt_deleted” and “mt_deleted_at” just to track whether and when a document was deleted.

In usage, it’s transparent to the user that you’re doing a soft delete instead of a hard delete:

    // Create a new User document
    var user = new User();
    session.Store(user);
    session.SaveChanges();

    // Mark it deleted
    session.Delete(user);
    session.SaveChanges();

As I said earlier, one of your challenges is to filter out deleted documents in queries. Fortunately, Marten has you covered. As the following acceptance test from Marten shows, deleted documents are automatically filtered out of the Linq query results:

    [Fact]
    public void query_soft_deleted_docs()
    {
        var user1 = new User { UserName = "foo" };
        var user2 = new User { UserName = "bar" };
        var user3 = new User { UserName = "baz" };
        var user4 = new User { UserName = "jack" };

        using (var session = theStore.OpenSession())
        {
            session.Store(user1, user2, user3, user4);
            session.SaveChanges();

            // Deleting 'bar' and 'baz'
            session.DeleteWhere<User>(x => x.UserName.StartsWith("b"));
            session.SaveChanges();

            // no where clause, deleted docs should be filtered out
            session.Query<User>().OrderBy(x => x.UserName).Select(x => x.UserName)
                .ToList().ShouldHaveTheSameElementsAs("foo", "jack");

            var sql = session.Query<User>().OrderBy(x => x.UserName).Select(x => x.UserName).ToCommand().CommandText;
                _output.WriteLine(sql);

            // with a where clause
                session.Query<User>().Where(x => x.UserName != "jack")
                .ToList().Single().UserName.ShouldBe("foo");
        }
    }

Easy peasy. Of course you may want to query against the deleted documents or against all the documents. Marten’s got you covered there too, you can use these two custom Linq extensions in Marten to include deleted documents:

“IDocumentSession.Query<T>().MaybeDeleted()…” will include all documents in the query, regardless of whether or not they have been deleted
“IDocumentSession.Query<T>().IsDeleted()…” will only include documents marked as deleted

Avoid the Serialization Burn with Marten’s Patching API

This is a logical follow up to my last post on Document Transformations in Marten with Javascript and yet another signpost on the way to Marten 1.0. Instead of having to write your own Javascript, Marten supplies the “Patch API” described here for very common scenarios.

Before I started working on Marten, I read an article comparing the performance of writing and querying JSON data between MongoDB and Postgresql (I couldn’t find the link when I was writing this post). Long story short, Postgresql very clearly comes out on top in terms of throughput, but the author still wanted to stick with MongoDB because of its ability to do document patching where you’re able to change elements within the persisted document without having to first load it into your application, change it, and persist the whole thing back. It’s a fair point and a realistic scenario that I used with RavenDb’s Patch Commands in the past.

Fortunately, that argument is a moot point because we have a working “Patch API” model in Marten for doing document patches. This feature does require PLV8 be enabled in your Postgresql database if you want to play with this feature in our latest nugets.

For an example, let’s say that you want to change the user name of a User document without first loading it. To update a single property or field in a document by its Id, it’s just this:

public void change_user_name(IDocumentSession session, Guid userId, string newName)
{
    session.Patch<User>(userId).Set(x => x.UserName, newName);
    session.SaveChanges();
}

When IDocumentSession.SaveChanges() (or its async equivalent) is called, it will send all the patching requests queued up with all of the pending document changes in a single database call.

I should also point out that the Set() mechanism can be used with nested properties or fields and non-primitive types.

Looking at another example, what if you just want to add a new role to an existing User? For that, Marten exposes the Append() method:

public void append_role(IDocumentSession session, Guid userId, string role)
{
    session.Patch<User>(userId).Append(x => x.Roles, role);
    session.SaveChanges();
}

In the case above, the new role will be appended to the “Roles” collection in the persisted JSON document in the Postgresql database. Again, this method can be used for nested or deep properties or fields and with non-primitive elements.

As a third example, let’s say that you only want to increment some kind of counter in the JSON document:

public void increment_login_count(IDocumentSession session, Guid userId)
{
    session.Patch<User>(userId).Increment(x => x.LoginCount);
    session.SaveChanges();
}

When the above command is issued, Marten will find the current numeric value in the JSON document, add 1 to it (the increment is an optional argument not shown here), and persist the new JSON data without ever fetching it into the client. The Increment() method can be used with int’s, long’s, double’s, and float’s.

Lastly, if you want to make a patch update to many documents by some kind of criteria, you can do that too:

public void append_role_to_internal_users(IDocumentSession session, Guid userId, string role)
{
    // Adds the role to all internal users
    session.Patch<User>(x => x.Internal)
        .Append(x => x.Roles, role);

    session.SaveChanges();
}

Other “Patch” mechanisms include the ability to rename a property or field within the JSON document and the ability to insert an item into a child collection at a given index. Other patch mechanisms are planned for later as well.

Document Transformations in Marten with Javascript

In all likelihood, Marten would garner much more rapid adoption if we were able to build on top of Sql Server 2016 instead of Postgresql. Hopefully, .Net folks will be willing to try switching databases after they see how many helpful capabilities that Postgresql has that Sql Server can’t match yet. This blog post, yet one more stop along the way to Marten 1.0, demonstrates how we’re taking advantage of Postgresql’s built in Javascript engine (PLV8).

A Common .Net Approach to “Readside” Views

Let’s say that you’re building HTTP services, and some of your HTTP endpoints will need to return some sort of “readside” representation of your persisted domain model. For the purpose of making Marten shine, let’s say that you’re going to need to work with hierarchical data. In a common .Net technology stack, you’d:

Load the top level model object through Entity Framework or some other kind of ORM. EF would issue a convoluted SQL query with lots of OUTER JOIN’s so that it can make a single call to the database to fetch the entire hierarchy of data you need from various tables. EF would then proceed to iterate through the sparsely populated recordset coming back and turn that into the actual domain model object represented by the data with lots of internally generated code.
You’d then use something like AutoMapper to transform the domain model object into a “read side” Data Transfer Object (view models, etc.) that’s more suitable to going over the wire to clients outside of your service.
Serialize your DTO to a JSON string and write that out to the HTTP response

Depending on how deep your hierarchy is, #1 can be expensive in the database query. The serialization in #3 is also somewhat CPU intensive.

As a contrast, here’s an example of how you might approach that exact same use case with Marten:

    var json = session.Query<User>()
        .Where(x => x.Id == user.Id)
        .TransformToJson("get_fullname").Single();

In the usage above, I’m retrieving the data for a single User document from Marten and having Postgresql transform the persisted JSON data to the format I need for the client with a pre-loaded Javascript transformation. In the case of Marten, the workflow is to:

Find the entire hierarchical document JSON in a single database row by its primary key
Apply a Javascript function to transform the persisted JSON to the format that the client needs and return a JSON representation as a String
Stream the JSON from the Linq query directly to the HTTP response without any additional serialization work

Not to belabor the point too much, but the Marten mechanics are simpler and probably much more efficient at runtime because:

The underlying database query is much simpler if all the data is in one field in one row
The Javascript transformation probably isn’t that much faster or slower than the equivalent AutoMapper mechanics, so let’s call that a wash
You don’t have the in memory allocations to load a rich model object just to immediately transform that into a completely different model object
You avoid the now unnecessary cost of serializing the DTO view models to a JSON string

A couple additional points:

Jimmy Bogard reviewed this and pointed out that in some cases you could bypass the Domain Model to DTO transformation by selecting straight to the DTO, but that wouldn’t cover all cases by any means. The same limitations apply to Marten and its Select() transformation features.
To get even more efficient in your Marten usage, the Javascript transformations can be used inside of Marten’s Compiled Query feature to avoid the CPU cost of repetitively parsing Linq statements. You can also do Javascript transformations inside of batched queries – which can of course, also be combined with the aforementioned compiled queries;)

Now, let’s see how it all works…

Building the Javascript Function

The way this works in Marten is that you write your Javascript function into a single file and export the main function with the “module.exports = ” CommonJS syntax. Marten is expecting the main function to have the signature “function(doc)” and return the transformed document.

Here’s a sample Javascript function I used to test this feature that works against a User document type:

module.exports = function(doc) {
    return {fullname: doc.FirstName + ' ' + doc.LastName};
}

Given the persisted JSON for a User document, this transformation would return a different object that would then be streamed back to the client as a JSON string.

There is some thought and even infrastructure for doing Javascript transformations with multiple, related documents, but that feature won’t make it into Marten 1.0.

To load the function into a Javascript-enabled Postgresql schema, Marten exposes this method:

    var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);

        // Let Marten derive the transform name
        // from the file name
        _.Transforms.LoadFile("get_fullname.js");

        // or override the transform name
        _.Transforms.LoadFile("get_fullname.js", "fullname");
    });

Internally, Marten will wrap a PLV8 function wrapper around your Javascript function like this:

CREATE OR REPLACE FUNCTION public.mt_transform_get_fullname(doc jsonb)
  RETURNS jsonb AS
$BODY$

  var module = {export: {}};

module.exports = function (doc) {
    return {fullname: doc.FirstName + ' ' + doc.LastName};
}

  var func = module.exports;

  return func(doc);

$BODY$
  LANGUAGE plv8 IMMUTABLE STRICT;

My intention with the approach shown above was to allow users to write simple Javascript functions and be able to test their transformations in simple test harnesses like Mocha. By having Marten wrap the raw Javascript in a generated PLV8 function, users won’t have to be down in the weeds and worrying about Postgresql mechanics.

Depending on the configuration, Marten is good enough to build, or rebuild, the function to match the current version of the Javascript code on the first usage of that transformation. The Javascript transforms are also part of our schema management support for database migrations.

Transformations

The persisted JSON documents in Marten are a reflection of your .Net classes. Great, that makes it absurdly easy to keep the database schema in synch with your application code at development time — especially compared to the typical development process against a relational database. However, what happens when you really do need to make breaking changes or additions to a document type but you already have loads of persisted documents in your Marten database with the old structure?

To that end, Marten allows you to use Javascript functions to alter the existing documents in the database. As an example, let’s go back to the User document type and assume for some crazy reason that we didn’t immediately issue a user name to some subset of users. As a default, we might just assign their user names by combining their first and last names like so:

module.exports = function (doc) {
    doc.UserName = (doc.FirstName + '.' + doc.LastName).toLowerCase();

    return doc;
}

To apply this transformation to existing rows in the database, Marten exposes this syntax:

    var store = DocumentStore.For(_ =>
    {
        _.Connection(ConnectionSource.ConnectionString);

        _.Transforms.LoadFile("default_username.js");
    });

    store.Transform
        .Where<User>("default_username", x => x.UserName == null);

When you run the code above, Marten will issue a single SQL statement that issues an UPDATE to the rows matching the given criteria by applying the Javascript function above to alter the existing document. No data is ever fetched or processed in the actual application tier.

Supercharging Marten with the Jil Serializer

Some blog posts you write for attention or self promotion, some you write just because you’re excited about the topic, and some posts you write just to try to stick some content for later into Google searches. This one’s all about users googling for this information down the road.

Out of the box, Marten uses Newtonsoft.Json as its primary JSON serialization mechanism. While Newtonsoft has outstanding customizability and the most flexible feature set, you can opt to forgo some of that flexibility in favor of higher performance by switching instead to the Jil serializer.

In the last couple months I finally made a big effort to be able to run Marten’s test suite using the Jil serializer. I had to make one small adjustment to our JilSerializer (turning on includeInherited), and a distressingly intrusive structural change to make the internal handling of Enum values (!) in Linq queries be dependent upon the internal serializer’s behavior for enum storage.

At this point, we’re not supplying a separate Marten.Jil adapter package, but the code to swap in Jil is just this class:

public class JilSerializer : ISerializer
{
    private readonly Options _options 
        = new Options(dateFormat: DateTimeFormat.ISO8601, includeInherited:true);

    public string ToJson(object document)
    {
        return JSON.Serialize(document, _options);
    }

    public T FromJson<T>(string json)
    {
        return JSON.Deserialize<T>(json, _options);
    }

    public T FromJson<T>(Stream stream)
    {
        return JSON.Deserialize<T>(new StreamReader(stream), _options);
    }

    public object FromJson(Type type, string json)
    {
        return JSON.Deserialize(json, type, _options);
    }

    public string ToCleanJson(object document)
    {
        return ToJson(document);
    }

    public EnumStorage EnumStorage => EnumStorage.AsString;
}

And this one line of code in your document store set up:

var store = DocumentStore.For(_ =>
{
    _.Connection("the connection string");

    // Replace the ISerializer w/ the JilSerializer
    _.Serializer<JilSerializer>();
});

A couple things to note about using Jil in place of Newtonsoft:

The enumeration persistence behavior is different from Newtonsoft as it stores enum values by their string representation. Most Marten users seem to prefer this anyway, but watch the value of the “EnumStorage” property in your custom serializer.
We’ve tried very hard with Marten to ensure that the Json stored in the database doesn’t require .Net type metadata, but the one thing we can’t address is having polymorphic child collections. For that particular use case, you’ll have to stick with Newtonsoft.Json and turn on its type metadata handling.