How I’m Testing Redux’ified React Components

Some of you are going to be new to all of the tools I’m describing here, and for you, I wanted to show how I think I’ve made authoring automated tests somewhat painless. For those of you who are already familiar with the whole React/Redux stack, feel free to make suggestions on making things better in the comments;)

As part of my big push to finally release Storyteller 3.0, I recently upgraded all of its JavaScript client dependencies (React.js/Babel/Webpack/etc. I might write a full on rant-y blog post about that later). As part of that effort, I’ve known for some time that I wanted to convert the client’s homegrown Flux-lite architecture based around Postal.js to Redux before I started to add any new features to it before making the final 3.0 release. After finishing that conversion to Redux, I can’t say that I’m really thrilled with how much work it took to make that transition, but I’m happy with the final results. In particular, I really like how easy the usage of react-redux has made the Karma specs for many of my React.js components.

Step 1 was to effectively shatter all of my existing karma specs. Step 2 was to figure out how to most easily connect my components under test to the new Redux architecture. I had an existing testing harness that had been somewhat helpful I used to first sketch out what a new Karma harness should do:

For various reasons, I’m insisting on using a real browser when using Karma on my React.js components instead of something like jsdom, so I wanted the new harness to make it as quick as possible to render a React.js component in the browser
I wanted the harness to take care of spinning up a new Redux store with the correct reducer function
Despite my preference for “self-contained” tests and dislike of shared testing data sets, I opted to have the new harness start up with an existing JSON state of the client recorded from the server output to a JS file.
Give me quick access to the mounted React.js component instance or the actual DOM element.
I do still use Postal.js to broadcast requests from my React.js components to the rest of the application, so for the sake of testing I wanted some testing spy‘s to listen for messages to Postal.js to verify some of the event handlers of my components.

Those requirements led to a harness class I quite creatively called “ComponentHarness.” Looking at the interesting parts of the constructor function for ComponentHarness, you can see how I set up an isolated test state and element for a React.js component:

        // Make sure you aren't failing because of faulty
        // Postal listeners left behind by previous tests
        Postal.reset();
        
        // Sets up a new Redux store with the correct
        // Reducer function.
        this.store = createStore(Reducer);
        
        // Establish an initial data set based on 
        // server side data from the .Net tests
        this.store.dispatch(initialization);

        // Create a brand new container div for the 
        // React.js component being tested and add that
        // to the current document
		this.div = document.createElement('div');
		document.documentElement.appendChild(this.div);

        // Sets up some test spy's for Postal.js channels
        // that just listen for messages being received
        // during a spec run
        this.engineMessages = new Listener('engine-request');
        this.editorMessages = new Listener('editor');
        this.explorerMessages = new Listener('explorer');

Now, to put this into usage, I have a small React component called “QueueCount” that sits in the header bar of the Storyteller client and displays a Bootstrap “badge” element showing how many specifications are currently queued up for execution and links to another page showing the active queue. In the system’s initial state, there are no specifications queued and this badge element should be completely hidden.

At the top of my specification code for this component, I start up a new ComponentHarness and render the QueueCount component that I want to test against:

describe('QueueCount', function(){
	var component, harness;

	before(function(){
            component = (<QueueCount />);
            harness = new ComponentHarness();
            harness.render(component);
	});

Inside of ComponentHarness, the render(component) method renders the component you pass into it in the DOM, but nested within the <Provider /> component from react-redux that does the work of wiring the Redux state to a React.js component:

    render(component){
        ReactDOM.render(
        (
            // The Provider component here is from
            // react-redux and acts to "wire up"
            // the given redux store to all the elements
            // nested inside of it
            <Provider store={this.store}>
                {component}
            </Provider>
        )
        , this.div);
    }

Since the ComponentHarness is starting the store at a known state with no specifications currently queued for execution, the QueueCount component should be rendered as an empty <span /> element, and the first specification states this:

it('is just a blank span with no specs queued', function(){
        // element() gives me access to the root DOM element
        // for the rendered React component
        var element = harness.element();
	expect(element.nodeName).to.equal('SPAN');
	expect(element.innerText).to.equal('');
});

Next, I needed to specify that the QueueCount component would render the proper count when there are specifications queued for execution. When running the full application, this information flows in as JSON messages from the .Net server via web sockets — and can update so quickly that it’s very difficult to really verify visually or with automated tests against the whole stack. Fortunately, this “how many specs are queued up” state is very easily to set up in tests by just dispatching the JSON messages to the Redux store and verifying the expected state of the component afterward as shown in the following Karma spec:

it('displays the updated counts after some specs are queued', function(){
        // Dispatch an 'action' to the underlying
        // Redux store to mutate the state
        harness.store.dispatch({
            type: 'queue-state', 
            queued: ['embeds', 'sentence1', 'sentence3']
        })

        // Check out the DOM element again to see the
        // actual state
        var element = harness.element();
	expect(element.nodeName).to.equal('BUTTON');
	expect(element.firstChild.innerText).to.equal('3');
});

Digging into the DOM

Call me completely uncool, but I do still use jQuery, especially for reading and querying the DOM during these kinds of tests. For ComponentHarness, I added a couple helper methods to it that allow you to quickly query the DOM from the mounted React component with jQuery:

    // jQuery expression within the mounted component 
    // DOM elements
    $(match){
        return $(match, this.div);
    }
    
    // Find either the root element of the 
    // mounted component or search via css
    // selectors within the DOM
	element(css){
        if (!css){
            return this.div.firstChild;
        }
        
		return $(css, this.div).get(0);
	}

These have been nice just because you’re constantly adding components to new <div /> ‘s dynamically added to the running browser page. In usage, these methods are used like this (from a different Karma testing file):

    it('does not render when there are no results', () => {
        // this is just a convenience method to mount a particular
        // React.js component that shows up in dozens of tests
        harness.openEditor('embeds');
        
        var isRendered = harness.$('#spec-result-header').length > 0;
        
        expect(isRendered).to.be.false;
    });

Postal Test Spy’s

Another usage for me has been to test event handlers to either prove that they’re successfully updating the state of the Redux store by dispatching actions (I hate the Redux/Flux parlance of ‘actions’ when they really mean messages, but when in Rome…) or to verify that an expected message has been sent to the server by listening in on what messages are broadcast via Postal. In the unit test below, I’m doing just this by looking to see that the “Cancel All Specifications” button in part of the client sends a message to the server to remove all the queued specifications and stop anything that might already be running:

	it('can cancel all the specs', function(){
        // click tries to find the matched element
        // inside the rendered component and click it
        harness.click('#cancel-all-specs');

		var message = harness
            .engineMessages
            .findPublishedMessage('cancel-all-specs');
            
		expect(message).to.not.be.null;
	});

Summary

The ComponentHarness class has been a pretty big win in my opinion. For one thing, it’s made it relatively quick to mount React.js components connected to all the proper state in tests. Maybe more importantly, it’s made it pretty simple to get the system into the proper state to exercise React.js components by just dispatching little JSON actions into the mounted Redux store.

I’m not a fan of pre-canned test data sets, but in this particular case it’s been a huge time saver. The downsides are that many unit tests will likely break if I ever have to update that data set in the future, and sometimes it’s harder to understand a unit test without peering through the big JSON data of initial data.

In the longer term, as more of our clients at work are transitioned to React.js with Redux (that’s an ongoing process), I think I’m voting to move quite a bit of the testing we do today with Webdriver and fully integrated tests to using something like the Karma/Redux approach I’m using here. While there are some kinds of integration problems you’ll never be able to flush out with purely Karma tests and faked data being pushed into the Redux stores, at least we could probably make the Karma tests be much faster and far more reliable than the equivalent Webdriver tests are today. Food for thought, and we’ll see how that goes.

“Introduction to Marten” Video

I gave an internal talk today at our Salt Lake City office on Marten that we were able to record and post publicly. I discussed why Postgresql, why or when to choose a document database over a relational database, what’s already done in Marten, and where it still needs to go.

And of course, if you just wanna know what Marten is, the website is here.

Any feedback is certainly welcome here or in the Marten Gitter room.

Today I learned that the only thing worse than doing a big, important talk on not enough sleep is doing two talks and a big meeting on technical strategy on the same day.

Webinar on Storyteller 3 and Why It’s Different

JetBrains is graciously letting me do an online webinar on the new Storyteller 3 tool this Thursday (Jan. 21st). Storyteller is a tool for expressing automated software tests in a form that is consumable by non-technical folks and suitable for the idea of “executable specifications” or Behavior Driven Development. While Storyteller fills the same niche as Gherkin-based tools like SpecFlow or Cucumber, it differs sharply in the mechanical approach (Storyteller was originally meant to be a “better” FitNesse and is much more inspired by the original FIT concept than Cucumber).

In this webinar I’m going to show what Storyteller can do, how we believe you make automated testing more successful, and how that thinking has been directly applied to Storyteller. To try to answer the pertinent question of “why should I care about Storyteller?,” I’m going to demonstrate:

The basics of crafting a specification language for your application
How Storyteller integrates into Continuous Integration servers
Why Storyteller is a great tool for crafting deep reaching integration tests and allows teams to address complicated scenarios that might not be feasible in other tools
The presentation of specification results in a way that makes diagnosing test failures easier
The steps we’ve taken to make test data setup and authoring “self-contained” tests easier
The ability to integrate application diagnostics into Storyteller with examples from web applications and distributed messaging systems (I’m showing integration with FubuMVC, but we’re interested in doing the same thing next year with ASP.Net MVC6)
The effective usage of table driven testing
How to use Storyteller to diagnose performance problems in your application and even apply performance criteria to the specifications
If there’s time, I’ll also show Storyteller’s secondary purpose as a tool for crafting living documentation

A Brief History of Storyteller

Just to prove that Storyteller has been around for awhile and there is some significant experience behind it:

2004 – I worked on a project that tried to use the earliest .Net version of FIT to write customer facing acceptance testing. It was, um, interesting.
2005 – On a new project, my team invested very heavily in FitNesse testing with the cooperation of a very solid tester with quite a bit of test automation experience. We found FitNesse to be very difficult to work with and frequently awkward — but still valuable enough to continue using it. In particular, I felt like we were spending too much time troubleshooting syntax issues with how FitNesse parsed the wiki text written by our tester.
2006-2008 – The original incarnation of Storyteller was just a replacement UI shell and command line runner for the FitNesse engine. This version was used on a couple projects with mixed success.
2008-2009 – For reasons that escape me at the moment, I abandoned the FitNesse engine and rewrote Storyteller as its own engine with a new hybrid WPF/HTML client for editing tests. My concern at the time was to retain the strengths of FIT, especially table-driven testing, while eliminating much of the mechanical friction in FIT. The new “Storyteller 1.0” on Github was somewhat successful, but still had a lot of usability problems.
2012 – Storyteller 2 came with some mild improvements on usability when I changed into my current position.
End of 2014 – My company had a town hall style meeting to address the poor results we were having with our large Storyteller test suites. Our major concerns were the efficiency of authoring specs, the reliability of the automated specs, and the performance of Storyteller itself. While we considered switching to SpecFlow or even trying to just do integration tests with xUnit tools and giving up on the idea of executable specifications altogether, we decided to revamp Storyteller instead of ditching it.
First Half of 2015 – I effectively rewrote the Storyteller test engine with an eye for performance and throughput. I ditched the existing WPF client (and nobody mourned it) and wrote an all new embedded web client based on React.js for editing and interactively running specifications. The primary goals of this new Storyteller 3.0 effort has been to make specification authoring more efficient and to try to make the execution more performant. Quite possibly the biggest success of Storyteller 3 in real project usage has been the extra diagnostics and performance information that it exposes to help teams understand why tests and the underlying systems are behaving the way that they are.
July 2015 – now: The alpha versions of Storyteller 3 are being used by several teams at my shop and a handful of early adopter teams. We’ve gotten a couple useful pull requests — including several usability improvements from my colleagues — and some help with understanding what teams really need.

Deleting Code

I’m in the middle of a now weeks-long effort to “modernize” a good sized web client to the latest, greatest React.js stack. I just deleted a good chunk of existing code that I was able to render unnecessary with my tooling, and that seems like a good time for me to take a short break to muse about deleting code.

I’ve had quite a bit of cause over the six months to purge quite a bit of code out of some of my ongoing OSS projects. That’s gotten me to thinking about the causes of code deletion and when that is or is not a good thing.

It’s So Much Better Now

I’m in the process of retrofitting the Storyteller 3 React.js client to use Redux and Immutable.js in place of the homegrown Flux-like architecture using Postal.js I originally used way, way back (in React.js time) at this time last year. I was just able to rip out several big, complicated Javascript files that were replaced by much more concise, simpler, and hopefully less error prone code using Redux. The end result is being very positive, but you have to weigh that against the intermediate cost of making the changes. In this case, I’m really not sure if this was a clear win.

My Judgement: I’m usually pretty happy when I’m able to replace some clumsy code with something simpler and smoother — but at what cost?

More on my redux experiences in a couple weeks when it’s all done.

That Code Served Its Purpose

Last July I came back from a family vacation all rested up and raring to go on an effort to consolidate and cut down FubuMVC and the remaining ecosystem. Most of my work for that month was removing or simplifying features that were no longer considered valuable by my shop. In particular, FubuMVC had a lot of features especially geared toward building large server side rendered web applications, among them:

Html conventions to build matching HTML displays, headers, and labels for .Net class properties
Modularity through “Bottles” to create drop in assemblies that could add any kind of client content (views, CSS, JS, whatnot) or server elements to an existing web application
“Content Extensions” that allowed users to create extensible views

All of the features above had already provided value in previous projects, but were no longer judged necessary for the kind of applications that we build today using much more JS and far less server side rendering. In those cases, it felt more like the code was being retired after a decent run rather than any kind of failure.

My Judgement: It’s kind of a good feeling

What was I thinking?

Some code you have to nuke just because it was awful or a massive waste of time that will never provide much value. I had a spell when I was younger as one of those traveling consultants flying out every Monday to a client site. On one occasion I ended up having to stay in Chicago over a three day weekend instead of getting to come home. Being more ambitious back then, I spent most of that weekend building a WinForms application to explore and diagnose problems with StructureMap containers. That particular project was a complete flop and I’ve always regretted that I wasted an opportunity to go sight see in downtown Chicago instead of wasting my time.

I think there has to be a constantly running daemon process running in your mind during any major coding effort that can tell you “this isn’t working” or “this shouldn’t be this hard” that shakes you out of an approach or project that is probably headed toward failure.

My Judgement: Grumble. Fail fast next time and don’t pay the opportunity cost!

git reset –hard

Git makes it ridiculously easy to do quick, throwaway experiments in your codebase. Wanna see if you can remove a class without too much harm? No problem, just try it out, and if it flops, just reset or checkout or one of the million ways to do the same basic thing in git.

My Judgement: No harm, no foul. Surprisingly valuable for longer lived projects

I don’t want to support this anymore

When I was readying the StructureMap 3.0 release a couple years ago, I purposely removed several old, oddball features in StructureMap that I just didn’t want to support any longer. In every case, there actually was other, simpler ways to accomplish what the user was trying to do without that feature. My criteria there was “do I groan anytime a user asks me a question about this feature…” If the answer was “yes”, I killed it.

I was helping my wife through a class on learning Python, and watching over her shoulder I think I have to admire Python’s philosophy of having only one way to do any kind of task. Compare that philosophy to the seemingly infinite number of ways you can create objects in Javascript. In the case of StructureMap 3, I deleted some custom fluent interfaces for conditional object construction based on runtime conditions that could easily be accomplished much more flexibly by just letting users provide C# Func’s. In one blow, I removed a now unnecessary feature that confused users and caused me headaches on the user list without moving backward in capability.

My Judgement: Mixed. You wish it wasn’t necessary to do it, but the result should be favorable in the end.

How I’m Documenting OSS Projects

This post talks about the “living documentation” authoring support in Storyteller 3.0. I’m working toward finally making an official 3.0 release of the completely rebooted Storyteller tool just in time for a webinar for JetBrains on the 21st (I don’t have the link yet, but I’ll blog it here later). Expect as much Storyteller content as I can force myself to write for the next month.

OSS projects succeed or fail not just upon their technical merits or ease of usage. Effective documentation and samples matter too, and this is something that I haven’t always done well. When I restarted work on Storyteller last year I made a pledge to myself that any new OSS work that I attempted would not fail due to bad or missing documentation. I’m going to claim that you can already see the result of that attitude in the latest online docs for Marten, StructureMap, and Storyteller.

Doing Documentation Badly

I have unfortunately earned myself a bad reputation for doing a very poor job of documenting my OSS projects in the past:

I tried gamely to create comprehensive documentation for StructureMap as part of a big 2.5 release in 2008, but I foolishly did it with static html and the API very quickly got out of sync with the static content and that documentation probably caused more harm than good by confusing users. Only in the past couple months has the StructureMap documentation finally gotten completely updated for the latest release.
I cited the lack of quality documentation as the primary reason why I think that FubuMVC, the single largest effort of my technical career by far, failed as an OSS project. Sure, there were other issues, but more documentation might have led to many more users which surely would have led to much more usable feedback and improvement to the tooling.

Better Documentation with Storyteller 3.0

After my experiences with StructureMap and FubuMVC documentation, I knew I needed to some kind of “living documentation” approach that would make it easy to keep the documentation in sync with an API that might be rapidly evolving and relatively painless to incrementally update and publish. In addition, I really wanted to be able to get contributions from other folks with the documentation content. And finally, I wanted to be able to host the documentation by using GitHub gh-pages, and that means that I needed to export the documentation as static HTML. I’m not doing this yet, but since it might be nice to embed the HTML documentation inside of Nuget’s or in a downloadable zip file, I also want to be able to export the documentation as HTML that could be browsed from the file system.

To that end, the new Storyteller has some tooling inspired by readthedocs (what ASP.Net uses for the vNext documentation now) to author, publish, and easily maintain “living documentation” that can stay in sync with the actual API.

The key features are:

The actual documentation content is authored as Markdown text files because that’s now a de facto standard for technical documentation, many developers understand it already, and it does not require any special editor.
Storyteller can derive a navigation structure from the markdown file structure with or without some hints to make it easier to grow a documentation website as a project grows
A system for embedding code samples taken directly from the actual source code explained in the next section
A preview tool that will allow you to run the documentation project in a browser that auto-reloads based on your edits. The auto-reloading mechanism scans for changes to code samples and content files like CSS or custom JS files in addition to the Markdown content files. The preview tool has some keyboard shortcuts as well to open the underlying Markdown page being rendered in your default editor for .md files.
“Skinning” support to theme your documentation site with some preprocessors to enable navigation links using the navigation derived from the file structure (next/preview/home etc.). If you look at the sample documentation websites I linked to at the beginning of this post, you’ll see that they all use roughly the same theme and layout that is very obviously based on Bootstrap. Storyteller itself does not require Bootstrap or that particular theme, it’s just that I’m not the world’s greatest web designer and I kept reusing the first theme layout that I got to look okay;-)
A command line tool for quickly generating the final HTML contents and exporting to a directory. This tool supports different publishing modes for generating internal links for hosting at an organization level (http://structuremap.github.io), at the project level (http://jasperfx.github.io/marten), or for browsing from a file system. In real usage, I export directly to a clone of a gh-pages Git branch on my box, then manually push the changes to GitHub.

Code Samples

The single biggest problem I had with technical documentation in the past has been embedding code samples into HTML files. Both from the standpoint of how awkward that was in the past and in keeping the code samples up to date with changing API’s — and API’s tend to change fast when documentation efforts inevitably reveal deficiencies in that API.

The approach I use in Storyteller 3 is to make the documentation pull code samples directly out of the actual code — preferably from unit test code. By doing this, you pretty well force the documentation code samples to be synchronized with the actual code API’s. As long as the tests holding the code samples pass in an automated build, the code samples should be valid. You can see the result of this approach most clearly in the StructureMap docs.

Mechanically, Storyteller pulls this off by scanning code files in the repository and looking for comments in the actual code that marks an embeddable code sample. In C#, that looks something like this:

        // SAMPLE:  ActionMethod
        [FormatAs("Start with the number {number}")]
        public void StartWithTheNumber(int number = 5)
        {
            _number = number;
            say();
        }

        // END:  ActionMethod

In a Markdown file, I can embed the code sample above with a preprocessor like so:

<[sample: ActionMethod]>

When Storyteller generates the HTML from the Markdown file, it will embed the textual contents between the // SAMPLE and // ENDSAMPLE comments above in a <div /> that is formatted by Prism.js.

Today I’m supporting code samples from C#, HTML files, Xml files (boo, right?), and JavaScript. It’s not really much effort to add other languages if that’s valuable later (I used to support Ruby too, but we were going to move away from it and I dropped that).

Iterating from Failure == Success?

As an aside, I think that achieving great results in most software projects is more about iteration and responding to the feedback from early attempts than about crafting a great plan or having the perfect idea upfront. In the case of the Storyteller documentation generation, I learned a great deal from some earlier attempts inside the FubuMVC ecosystem to solve the same kind of living documentation solutions called FubuDocs and FubuMVC.CodeSnippets. Both of those projects were failures in and of themselves, but if the Storyteller documentation generation turns out to be successful, it will be directly attributable to what I learned from building those two earlier tools.

2015 in review

The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 92,000 times in 2015. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

Plough Horses and Context Shifting

Mostly I just miss my grandfather and this is just an excuse to share a story I’ve always liked for some reason. On another occasion he tried to do that old timer thing of telling me how hard he had it as a youngster because he had to ride a horse so many miles into town to get to his job at an absurdly early hour, then got all misty-eyed and ruined the effect with “…and sometimes I’d give that stallion his head and we’d go flying…”

My grandfather told me a story one time about how he used to plough with a horse not too far from town. Then and now, there’s a factory right at the edge of town. In those days that factory would blow a whistle at quitting time. My grandfather’s plough horse would finish the row they were on when it heard the whistle, but it would absolutely refuse to do anything else after that.

How is that relevant you ask? As I get a little older, I know that I’m less able to make large context shifts late in the afternoon as quitting time gets closer and closer. If I finish something pretty complicated after lunch and the next thing up is also going to be complicated or worse, involve a pretty large context shift into a different problem space, I know better than to even try to start that next thing. Instead, I’ll do some paper and pencil work to task out the next day’s work or switch to some easier work or correspondance.

Since I’m largely able to set my own hours and I’m mostly unencumbered by meetings (don’t hate me for that), I can actually think about how to optimize my work schedule to the work I’m doing. One thing I’ve learned is that you do best week over week if you pace yourself to the old XP idea of a sustainable pace. For me, that’s also knowing when to push hard and when it’s best to either quit and rest or just start lining up the next day’s push.

Marten is Ready for Early Adopters

I’ve been using RavenDb for development over the past several years and I’m firmly convinced that there’s a pretty significant productivity advantage to using document databases over relational databases for many systems. For as much as I love many of the concepts and usability of RavenDb, it isn’t running very successfully at work and it’s time to move our applications to something more robust. Fortunately, we’ve been able to dedicate some time toward using Postgresql as a document database. We’ve been able to do this work as a new OSS project called Marten. Our hope with Marten has been to retain the development time benefits of document databases (along with an easy migration path away from RavenDb) with a robust technological foundation — and even I’ll admit that it will occasionally be useful to fall back to using Postgresql as a relational database where that is still advantageous.

I feel like Marten is at a point where it’s usable and what we really need most is some early adopters who will kick the tires on it, give some feedback about how well it works, what’s missing that would make it easier to use, and how it’s performing in their systems. Fortunately, as of today, Marten now has (drum role please):

An available nuget published to the public Nuget.org feed with very few dependencies (just Npgsql) after I wrestled with ILMerge today (grrr)
A comprehensive documentation website with a getting started tutorial and plenty of information about what is available and other features that are only planned.

And of course, the Marten Gitter room is always open for business.

An Example Quickstart

To get started with Marten, you need two things:

A Postgresql database schema (either v9.4 or v9.5)
The Marten nuget installed into your application

After that, the quickest way to get up and running is shown below with some sample usage:

var store = DocumentStore.For("your connection string");

Now you need a document type that will be persisted by Marten:

    public class User
    {
        public Guid Id { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public bool Internal { get; set; }
        public string UserName { get; set; }
    }

As long as a type can be serialized and deserialized by the JSON serializer of your choice and has a public field or property called “Id” or “id”, Marten can persist and load it back later.

To persist and load documents, you use the IDocumentSession interface:

    using (var session = store.LightweightSession())
    {
        var user = new User {FirstName = "Han", LastName = "Solo"};
        session.Store(user);

        session.SaveChanges();
    }

My Thoughts on Choosing and Using Persistence Tools

By no means is this post a comprehensive examination of every possible type of persistence tool, software system, or even team role. It’s just my opinions and experiences — and even though I’ve been a software developer far longer than many folks, there are lots of types of systems and architectures I’ve never gotten a chance to work on. I’m also primarily a developer, so I’m definitely not representing the DBA viewpoint.

I had an interesting exchange on twitter last week when an ex-Austinite I hadn’t seen in years asked me if I still used NHibernate. At the same time, there’s some consternation at my work about our usage of RavenDb as a document database and some question about how we might replace that later. Because I happen to be working on potentially using Postgresql as a complete replacement for RavenDb and a possible replacement for some of our older Sql Server-based event sourcing tooling, I thought it would be helpful to go over what I think about “persistence” tools these days.

To sum up my feelings on the subject:

I think NoSQL document databases can make development much more productive over old-fashioned relational database usage — especially when your data is hierarchical
I’m completely done with heavyweight ORM’s like NHibernate or Entity Framework and I would recommend against the very concept at this point. I think the effort to directly persist a rich domain model to a relational database (or really just any database at all) was a failure.
I’m mostly indifferent to the so-called “micro-” ORM’s. I think they’re definitely an easy way to query and manipulate relational databases from middle tier code, but the usage I’ve seen at work makes me think they’re just a quick way to very tightly couple your application code to the details of the database schema.
I think that Event Sourcing inside a CQRS architecture can be very effective where that style fits, but it’s a mess when it’s used where not really appropriate.
I have mostly given up on the old Extreme Programming idea that you could happily build most of the application without having to worry about any kind of database until near the end of the project. If database performance is any kind of project risk, you’ve got to deal with that early. If your database choice is going to have an impact on your application code, then that too has to be dealt with earlier. If you can model your application as consuming JSON feeds, maybe you can get away with delaying the database.
EDIT: A couple folks have either asked about or recommended just using raw ADO.net code. My feeling on that subject hasn’t changed in years, if you’re writing raw ADO.Net code, you’re probably stealing from your employer.

Root Causes

When I judge whether or not a persistence tool is a good fit for how I prefer to work, I’m thinking about these low level first causes:

Is the tool ACID compliant, or will it at least manage not to lose important data because that tends to make our business folks angry. I’m just too old and conservative to screw around with any of the database tools that don’t support ACID.
Will the tool have a negative impact on our ability to evolve the structure of the application? Is it cheap for me to make incremental changes to the system state? I strongly believe that tools and technologies that don’t allow for easy system evolution make software development efforts fragile by forcing you to be right in your upfront designs.
What’s the impact going to be on automated test efforts? For all of its problems for us, RavenDb has to be the undisputed champion of testability because of how absurdly easy it is to establish and tear down system state between tests for reliable automated tests.
How much or little mismatch is there between the shape of my data that my business logic, user interface, or API handlers going to need versus how the database technology needs to store it because that old impedance mismatch issue can suck down a lot of developer time if you choose poorly.

Why I prefer Document Db’s over Relational Db’s for Most Development

Even though RavenDb isn’t necessarily working out for us, I still believe that document databases make sense and can lead to better productivity than relational databases. Doing a side by side comparison on some of my “first causes” above:

Evolutionary software design. Changing your usage of a relational database can easily involve changes to the DDL, database migrations, and possibly ORM mappings (the old, dreaded “wormhole anti-pattern” problem). I think this an area where “schemaless” document databases are a vast improvement for developer productivity because I only have to change my document type in code and go. It’s vastly less work when there’s only one model to change.
Testability. I think it’s more mechanical work to set up and tear down system state for automated tests with relational databases versus a document database. Relational integrity is a great thing when you need it, but it adds some extra work to test setup just to make the database shut up and let me insert my data. My clear experience from having automated testing against both types of database engines are that it’s much simpler working with document databases.
The Impedance Mismatch Challenge. Again, this is an area where I much prefer document databases when I generally want to store and retrieve hierarchical data. I also prefer document databases where data collections may have a great deal of polymorphism.

When would I still opt for a Relational Database?

Besides the overwhelming inertia of relational databases (everybody knows RDBMS tools and there are seemingly an infinite number of management and reporting tools to support RDBMS’s), there are still some places where I would still opt for a relational database:

Reporting applications. Not that it’s impossible in other kinds of databases, but there’s so many decent existing solutions for reporting against RDBMS’s.
If I were still a consultant, an RDBMS is a perfectly acceptable choice for conservative clients
Applications that will require a lot of adhoc queries. Much of my early career was trying to make sense of large engineering and construction databases that frequently went off the rails.
Batch jobs, not that I really wanna ever build systems like that again
Systems with a lot of flat, two dimensional data

The Pond Scum Shared Database Anti-Pattern

If you’ve worked around me long enough, you would surely hear me use the phrase “sharing a database is like drug abusers sharing needles.” I’ve frequently bumped into what I call the “pond scum anti-pattern” where an enterprise has one giant shared database with lots of little applications floating around it that modify and read pretty much the same set of database tables. It’s common, but so awfully harmful.

The indirect coupling between applications is especially pernicious because it’s probably not very obvious how any giving change to the database will impact all the little applications that float around it. My strong preference is for application databases rather than the giant shared database. That might very well lead to some duplication or worse, some inconsistency in data across applications, but we can’t solve everything in one already too long blog post;)

And to prove that this topic of the “shared database” problem is a long, never-ending problem, here’s a blog post from me on the same subject from 2005.

What about…?

MongoDb? I know some people like it and I’ve had some feedback on Marten that we should be patterning its usage on MongoDb rather than mostly on RavenDb. I’ve just seen too many stories about MongoDb losing data or having inadequate transactional integrity support.
Graph databases like Neo4J? I think they sound very interesting and there’s a project or two I’ve done that I thought might have benefited from using a graph database, but I’ve never used one. Someday.
Rail’s ActiveRecord? Even though I never made the jump to Ruby like so many other of my ALT.Net friends from a decade ago did, there was a time when I thought Ruby on Rails was the coolest thing ever. That day has clearly passed. I’m really not wild about any persistence that forces you to lock your application code to the shape of the database.
CSLA is apparently still around. To say the least, I’m not a fan. Too much harmful coupling between business logic and infrastructure, poor for evolutionary design in my opinion.

StructureMap 4.0 is Out!

tl;dr: StructureMap 4.0 went live to Nuget today with CoreCLR support, better performance, far better conventional registration via type scanning, and many improvements specifically targeted toward StructureMap integration into ASP.Net MVC 5 & 6 applications.

StructureMap 4.0 officially went live on Nuget today. The release notes page is updated for 4.0 and you can also peruse the raw GitHub issue list for the 4.0 milestone to see what’s new and what’s been fixed since 3.1.6.

Even though StructureMap has been around forever in .Net OSS terms (since June of 2004!), there are still new things to do, obsolete things to remove, and continuing needs to adapt to what users are actually trying to accomplish with the tool. As such, StructureMap 4.0 represents the lessons we’ve learned in the past couple years since the big 3.0 release. 4.0 is a much smaller set of changes than 3.0 and mostly contains performance and diagnostic improvements.

For the very first time since the long forgotten 2.5 release way back in 2008, I’m claiming that the StructureMap documentation site is completely up to date and effectively comprehensive. Failing that of course, the StructureMap Gitter room is open for questions.

This time around, I’d like to personally thank Kristian Hellang for being patient while we worked through issues with lifecycle and object disposal patterns for compliance with the new ASP.Net vNext dependency injection usage and the new StructureMap.DNX nuget for integrating StructureMap into vNext applications. I’d also like to thank Dmytro Dziuma for some pretty significant improvements to StructureMap runtime performance and his forthcoming packages for AOP with StructureMap and the rebuilt StructureMap.AutoFactory library. I’d also like to thank Oren Novotny for his help in moving StructureMap to the CoreCLR.

Some Highlights:

The StructureMap nuget now targets .Net 4.0, the CoreCLR via the “dotnet” profile, and various Windows Phone and Android targets via the PCL. While the early feedback on CoreCLR usage has been positive, I think you still have to assume that that support is unproven and early.
The internals of the type scanning and conventional registration has been completely overhauled to optimize container bootstrapping time and there are some new diagnostics to allow users to unwind frequent problems with type scanning registrations. The mechanism for custom conventions is a breaking change for 4.0, see the documentation for the details.
The lifecycle management had to be significantly changed and enhanced for ASP.Net vNext compliance. More on this in a later blog post.
Likewise, there are some new rules and behavior for how and when StructureMap will track and dispose IDisposable’s.
Performance improvements in general and some optimizations targeted specifically at integration with ASP.Net MVC (I don’t approve of how the ASP.Net team has done their integration, but .Net is their game and it was important to harden StructureMap for their goofy usage)
More robustness in heavy multi-threaded access
The constructor selection is a little smarter
ObjectFactory is gone, baby, gone. Jimmy Bogard will have to find some new reason to mock my code;)
If you absolutely have to use them, there is better support for customizing registration and behavior with attributes
The Registry class moved to the root “StructureMap” namespace, do watch that. That’s bugged me for years, so I went a head and fixed that this time since we were going for a new full point release.
4.0 introduces a powerful new mechanism for establishing policies and conventions on how objects are built at runtime. My hope is that this will solve some of the user questions and problems that I’ve gotten in the past couple years. There will definitely be follow up post on that.

And of course, you can probably expect a 4.0.1 release soon for any issues that pop up once folks use this in real project work. At a minimum, there’ll be updates for the CoreCLR support once the dust settles on all that churn.