Webinar on Storyteller 3 and Why It’s Different

JetBrains is graciously letting me do an online webinar on the new Storyteller 3 tool this Thursday (Jan. 21st). Storyteller is a tool for expressing automated software tests in a form that is consumable by non-technical folks and suitable for the idea of “executable specifications” or Behavior Driven Development. While Storyteller fills the same niche as Gherkin-based tools like SpecFlow or Cucumber, it differs sharply in the mechanical approach (Storyteller was originally meant to be a “better” FitNesse and is much more inspired by the original FIT concept than Cucumber).

In this webinar I’m going to show what Storyteller can do, how we believe you make automated testing more successful, and how that thinking has been directly applied to Storyteller. To try to answer the pertinent question of “why should I care about Storyteller?,” I’m going to demonstrate:

  • The basics of crafting a specification language for your application
  • How Storyteller integrates into Continuous Integration servers
  • Why Storyteller is a great tool for crafting deep reaching integration tests and allows teams to address complicated scenarios that might not be feasible in other tools
  • The presentation of specification results in a way that makes diagnosing test failures easier
  • The steps we’ve taken to make test data setup and authoring “self-contained” tests easier
  • The ability to integrate application diagnostics into Storyteller with examples from web applications and distributed messaging systems (I’m showing integration with FubuMVC, but we’re interested in doing the same thing next year with ASP.Net MVC6)
  • The effective usage of table driven testing
  • How to use Storyteller to diagnose performance problems in your application and even apply performance criteria to the specifications
  • If there’s time, I’ll also show Storyteller’s secondary purpose as a tool for crafting living documentation

A Brief History of Storyteller

Just to prove that Storyteller has been around for awhile and there is some significant experience behind it:

  • 2004 – I worked on a project that tried to use the earliest .Net version of FIT to write customer facing acceptance testing. It was, um, interesting.
  • 2005 – On a new project, my team invested very heavily in FitNesse testing with the cooperation of a very solid tester with quite a bit of test automation experience. We found FitNesse to be very difficult to work with and frequently awkward — but still valuable enough to continue using it. In particular, I felt like we were spending too much time troubleshooting syntax issues with how FitNesse parsed the wiki text written by our tester.
  • 2006-2008 – The original incarnation of Storyteller was just a replacement UI shell and command line runner for the FitNesse engine. This version was used on a couple projects with mixed success.
  • 2008-2009 – For reasons that escape me at the moment, I abandoned the FitNesse engine and rewrote Storyteller as its own engine with a new hybrid WPF/HTML client for editing tests. My concern at the time was to retain the strengths of FIT, especially table-driven testing, while eliminating much of the mechanical friction in FIT. The new “Storyteller 1.0” on Github was somewhat successful, but still had a lot of usability problems.
  • 2012 – Storyteller 2 came with some mild improvements on usability when I changed into my current position.
  • End of 2014 – My company had a town hall style meeting to address the poor results we were having with our large Storyteller test suites. Our major concerns were the efficiency of authoring specs, the reliability of the automated specs, and the performance of Storyteller itself. While we considered switching to SpecFlow or even trying to just do integration tests with xUnit tools and giving up on the idea of executable specifications altogether, we decided to revamp Storyteller instead of ditching it.
  • First Half of 2015 – I effectively rewrote the Storyteller test engine with an eye for performance and throughput. I ditched the existing WPF client (and nobody mourned it)  and wrote an all new embedded web client based on React.js for editing and interactively running specifications. The primary goals of this new Storyteller 3.0 effort has been to make specification authoring more efficient and to try to make the execution more performant. Quite possibly the biggest success of Storyteller 3 in real project usage has been the extra diagnostics and performance information that it exposes to help teams understand why tests and the underlying systems are behaving the way that they are.
  • July 2015 – now: The alpha versions of Storyteller 3 are being used by several teams at my shop and a handful of early adopter teams. We’ve gotten a couple useful pull requests — including several usability improvements from my colleagues — and some help with understanding what teams really need.

Deleting Code

I’m in the middle of a now weeks-long effort to “modernize” a good sized web client to the latest, greatest React.js stack. I just deleted a good chunk of existing code that I was able to render unnecessary with my tooling, and that seems like a good time for me to take a short break to muse about deleting code.

I’ve had quite a bit of cause over the six months to purge quite a bit of code out of some of my ongoing OSS projects. That’s gotten me to thinking about the causes of code deletion and when that is or is not a good thing.

 

It’s So Much Better Now

I’m in the process of retrofitting the Storyteller 3 React.js client to use Redux and Immutable.js in place of the homegrown Flux-like architecture using Postal.js I originally used way, way back (in React.js time) at this time last year. I was just able to rip out several big, complicated Javascript files that were replaced by much more concise, simpler, and hopefully less error prone code using Redux. The end result is being very positive, but you have to weigh that against the intermediate cost of making the changes. In this case, I’m really not sure if this was a clear win.

My Judgement: I’m usually pretty happy when I’m able to replace some clumsy code with something simpler and smoother — but at what cost?

More on my redux experiences in a couple weeks when it’s all done. 

 

That Code Served Its Purpose

Last July I came back from a family vacation all rested up and raring to go on an effort to consolidate and cut down FubuMVC and the remaining ecosystem. Most of my work for that month was removing or simplifying features that were no longer considered valuable by my shop. In particular, FubuMVC had a lot of features especially geared toward building large server side rendered web applications, among them:

  • Html conventions to build matching HTML displays, headers, and labels for .Net class properties
  • Modularity through “Bottles” to create drop in assemblies that could add any kind of client content (views, CSS, JS, whatnot) or server elements to an existing web application
  • “Content Extensions” that allowed users to create extensible views

All of the features above had already provided value in previous projects, but were no longer judged necessary for the kind of applications that we build today using much more JS and far less server side rendering. In those cases, it felt more like the code was being retired after a decent run rather than any kind of failure.

My Judgement: It’s kind of a good feeling

 

What was I thinking? 

Some code you have to nuke just because it was awful or a massive waste of time that will never provide much value. I had a spell when I was younger as one of those traveling consultants flying out every Monday to a client site. On one occasion I ended up having to stay in Chicago over a three day weekend instead of getting to come home. Being more ambitious back then, I spent most of that weekend building a WinForms application to explore and diagnose problems with StructureMap containers. That particular project was a complete flop and I’ve always regretted that I wasted an opportunity to go sight see in downtown Chicago instead of wasting my time.

I think there has to be a constantly running daemon process running in your mind during any major coding effort that can tell you “this isn’t working” or “this shouldn’t be this hard” that shakes you out of an approach or project that is probably headed toward failure.

My Judgement: Grumble. Fail fast next time and don’t pay the opportunity cost!

 

git reset –hard

Git makes it ridiculously easy to do quick, throwaway experiments in your codebase. Wanna see if you can remove a class without too much harm? No problem, just try it out, and if it flops, just reset or checkout or one of the million ways to do the same basic thing in git.

My Judgement: No harm, no foul. Surprisingly valuable for longer lived projects

 

I don’t want to support this anymore

When I was readying the StructureMap 3.0 release a couple years ago, I purposely removed several old, oddball features in StructureMap that I just didn’t want to support any longer. In every case, there actually was other, simpler ways to accomplish what the user was trying to do without that feature. My criteria there was “do I groan anytime a user asks me a question about this feature…” If the answer was “yes”, I killed it.

I was helping my wife through a class on learning Python, and watching over her shoulder I think I have to admire Python’s philosophy of having only one way to do any kind of task. Compare that philosophy to the seemingly infinite number of ways you can create objects in Javascript. In the case of StructureMap 3, I deleted some custom fluent interfaces for conditional object construction based on runtime conditions that could easily be accomplished much more flexibly by just letting users provide C# Func’s. In one blow, I removed a now unnecessary feature that confused users and caused me headaches on the user list without moving backward in capability.

My Judgement: Mixed. You wish it wasn’t necessary to do it, but the result should be favorable in the end.

How I’m Documenting OSS Projects

This post talks about the “living documentation” authoring support in Storyteller 3.0. I’m working toward finally making an official 3.0 release of the completely rebooted Storyteller tool just in time for a webinar for JetBrains on the 21st (I don’t have the link yet, but I’ll blog it here later). Expect as much Storyteller content as I can force myself to write for the next month.

OSS projects succeed or fail not just upon their technical merits or ease of usage. Effective documentation and samples matter too, and this is something that I haven’t always done well. When I restarted work on Storyteller last year I made a pledge to myself that any new OSS work that I attempted would not fail due to bad or missing documentation. I’m going to claim that you can already see the result of that attitude in the latest online docs for Marten, StructureMap, and Storyteller.

Doing Documentation Badly

I have unfortunately earned myself a bad reputation for doing a very poor job of documenting my OSS projects in the past:

  • I tried gamely to create comprehensive documentation for StructureMap as part of a big 2.5 release in 2008, but I foolishly did it with static html and the API very quickly got out of sync with the static content and that documentation probably caused more harm than good by confusing users. Only in the past couple months has the StructureMap documentation finally gotten completely updated for the latest release.
  • I cited the lack of quality documentation as the primary reason why I think that FubuMVC, the single largest effort of my technical career by far, failed as an OSS project. Sure, there were other issues, but more documentation might have led to many more users which surely would have led to much more usable feedback and improvement to the tooling.

 

Better Documentation with Storyteller 3.0

After my experiences with StructureMap and FubuMVC documentation, I knew I needed to some kind of “living documentation” approach that would make it easy to keep the documentation in sync with an API that might be rapidly evolving and relatively painless to incrementally update and publish. In addition, I really wanted to be able to get contributions from other folks with the documentation content. And finally, I wanted to be able to host the documentation by using GitHub gh-pages, and that means that I needed to export the documentation as static HTML. I’m not doing this yet, but since it might be nice to embed the HTML documentation inside of Nuget’s or in a downloadable zip file, I also want to be able to export the documentation as HTML that could be browsed from the file system.

To that end, the new Storyteller has some tooling inspired by readthedocs (what ASP.Net uses for the vNext documentation now) to author, publish, and easily maintain “living documentation” that can stay in sync with the actual API.

The key features are:

  1. The actual documentation content is authored as Markdown text files because that’s now a de facto standard for technical documentation, many developers understand it already, and it does not require any special editor.
  2. Storyteller can derive a navigation structure from the markdown file structure with or without some hints to make it easier to grow a documentation website as a project grows
  3. A system for embedding code samples taken directly from the actual source code explained in the next section
  4. A preview tool that will allow you to run the documentation project in a browser that auto-reloads based on your edits. The auto-reloading mechanism scans for changes to code samples and content files like CSS or custom JS files in addition to the Markdown content files. The preview tool has some keyboard shortcuts as well to open the underlying Markdown page being rendered in your default editor for .md files.
  5. “Skinning” support to theme your documentation site with some preprocessors to enable navigation links using the navigation derived from the file structure (next/preview/home etc.). If you look at the sample documentation websites I linked to at the beginning of this post, you’ll see that they all use roughly the same theme and layout that is very obviously based on Bootstrap. Storyteller itself does not require Bootstrap or that particular theme, it’s just that I’m not the world’s greatest web designer and I kept reusing the first theme layout that I got to look okay;-)
  6. A command line tool for quickly generating the final HTML contents and exporting to a directory. This tool supports different publishing modes for generating internal links for hosting at an organization level (http://structuremap.github.io), at the project level (http://jasperfx.github.io/marten), or for browsing from a file system. In real usage, I export directly to a clone of a gh-pages Git branch on my box, then manually push the changes to GitHub.

 

Code Samples

The single biggest problem I had with technical documentation in the past has been embedding code samples into HTML files. Both from the standpoint of how awkward that was in the past and in keeping the code samples up to date with changing API’s — and API’s tend to change fast when documentation efforts inevitably reveal deficiencies in that API.

The approach I use in Storyteller 3 is to make the documentation pull code samples directly out of the actual code — preferably from unit test code. By doing this, you pretty well force the documentation code samples to be synchronized with the actual code API’s. As long as the tests holding the code samples pass in an automated build, the code samples should be valid. You can see the result of this approach most clearly in the StructureMap docs.

Mechanically, Storyteller pulls this off by scanning code files in the repository and looking for comments in the actual code that marks an embeddable code sample. In C#, that looks something like this:

        // SAMPLE:  ActionMethod
        [FormatAs("Start with the number {number}")]
        public void StartWithTheNumber(int number = 5)
        {
            _number = number;
            say();
        }

        // END:  ActionMethod

In a Markdown file, I can embed the code sample above with a preprocessor like so:

<[sample: ActionMethod]>

When Storyteller generates the HTML from the Markdown file, it will embed the textual contents between the // SAMPLE and // ENDSAMPLE comments above in a <div /> that is formatted by Prism.js.

Today I’m supporting code samples from C#, HTML files, Xml files (boo, right?), and JavaScript. It’s not really much effort to add other languages if that’s valuable later (I used to support Ruby too, but we were going to move away from it and I dropped that).

Iterating from Failure == Success?

As an aside, I think that achieving great results in most software projects is more about iteration and responding to the feedback from early attempts than about crafting a great plan or having the perfect idea upfront. In the case of the Storyteller documentation generation, I learned a great deal from some earlier attempts inside the FubuMVC ecosystem to solve the same kind of living documentation solutions called FubuDocs and FubuMVC.CodeSnippets. Both of those projects were failures in and of themselves, but if the Storyteller documentation generation turns out to be successful, it will be directly attributable to what I learned from building those two earlier tools.

 

 

 

Plough Horses and Context Shifting

Mostly I just miss my grandfather and this is just an excuse to share a story I’ve always liked for some reason. On another occasion he tried to do that old timer thing of telling me how hard he had it as a youngster because he had to ride a horse so many miles into town to get to his job at an absurdly early hour, then got all misty-eyed and ruined the effect with “…and sometimes I’d give that stallion his head and we’d go flying…” 

My grandfather told me a story one time about how he used to plough with a horse not too far from town. Then and now, there’s a factory right at the edge of town. In those days that factory would blow a whistle at quitting time. My grandfather’s plough horse would finish the row they were on when it heard the whistle, but it would absolutely refuse to do anything else after that.

How is that relevant you ask? As I get a little older, I know that I’m less able to make large context shifts late in the afternoon as quitting time gets closer and closer. If I finish something pretty complicated after lunch and the next thing up is also going to be complicated or worse, involve a pretty large context shift into a different problem space, I know better than to even try to start that next thing. Instead, I’ll do some paper and pencil work to task out the next day’s work or switch to some easier work or correspondance.

Since I’m largely able to set my own hours and I’m mostly unencumbered by meetings (don’t hate me for that), I can actually think about how to optimize my work schedule to the work I’m doing. One thing I’ve learned is that you do best week over week if you pace yourself to the old XP idea of a sustainable pace. For me, that’s also knowing when to push hard and when it’s best to either quit and rest or just start lining up the next day’s push.

 

Marten is Ready for Early Adopters

I’ve been using RavenDb for development over the past several years and I’m firmly convinced that there’s a pretty significant productivity advantage to using document databases over relational databases for many systems. For as much as I love many of the concepts and usability of RavenDb, it isn’t running very successfully at work and it’s time to move our applications to something more robust. Fortunately, we’ve been able to dedicate some time toward using Postgresql as a document database. We’ve been able to do this work as a new OSS project called Marten. Our hope with Marten has been to retain the development time benefits of document databases (along with an easy migration path away from RavenDb) with a robust technological foundation — and even I’ll admit that it will occasionally be useful to fall back to using Postgresql as a relational database where that is still advantageous.

I feel like Marten is at a point where it’s usable and what we really need most is some early adopters who will kick the tires on it, give some feedback about how well it works, what’s missing that would make it easier to use, and how it’s performing in their systems. Fortunately, as of today, Marten now has (drum role please):

And of course, the Marten Gitter room is always open for business.

An Example Quickstart

To get started with Marten, you need two things:

  1. A Postgresql database schema (either v9.4 or v9.5)
  2. The Marten nuget installed into your application

After that, the quickest way to get up and running is shown below with some sample usage:

var store = DocumentStore.For("your connection string");

Now you need a document type that will be persisted by Marten:

    public class User
    {
        public Guid Id { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public bool Internal { get; set; }
        public string UserName { get; set; }
    }

As long as a type can be serialized and deserialized by the JSON serializer of your choice and has a public field or property called “Id” or “id”, Marten can persist and load it back later.

To persist and load documents, you use the IDocumentSession interface:

    using (var session = store.LightweightSession())
    {
        var user = new User {FirstName = "Han", LastName = "Solo"};
        session.Store(user);

        session.SaveChanges();
    }

 

 

 

 

My Thoughts on Choosing and Using Persistence Tools

By no means is this post a comprehensive examination of every possible type of persistence tool, software system, or even team role. It’s just my opinions and experiences — and even though I’ve been a software developer far longer than many folks, there are lots of types of systems and architectures I’ve never gotten a chance to work on. I’m also primarily a developer, so I’m definitely not representing the DBA viewpoint.

I had an interesting exchange on twitter last week when an ex-Austinite I hadn’t seen in years asked me if I still used NHibernate. At the same time, there’s some consternation at my work about our usage of RavenDb as a document database and some question about how we might replace that later. Because I happen to be working on potentially using Postgresql as a complete replacement for RavenDb and a possible replacement for some of our older Sql Server-based event sourcing tooling, I thought it would be helpful to go over what I think about “persistence” tools these days.

To sum up my feelings on the subject:

  • I think NoSQL document databases can make development much more productive over old-fashioned relational database usage — especially when your data is hierarchical
  • I’m completely done with heavyweight ORM’s like NHibernate or Entity Framework and I would recommend against the very concept at this point. I think the effort to directly persist a rich domain model to a relational database (or really just any database at all) was a failure.
  • I’m mostly indifferent to the so-called “micro-” ORM’s. I think they’re definitely an easy way to query and manipulate relational databases from middle tier code, but the usage I’ve seen at work makes me think they’re just a quick way to very tightly couple your application code to the details of the database schema.
  • I think that Event Sourcing inside a CQRS architecture can be very effective where that style fits, but it’s a mess when it’s used where not really appropriate.
  • I have mostly given up on the old Extreme Programming idea that you could happily build most of the application without having to worry about any kind of database until near the end of the project. If database performance is any kind of project risk, you’ve got to deal with that early. If your database choice is going to have an impact on your application code, then that too has to be dealt with earlier. If you can model your application as consuming JSON feeds, maybe you can get away with delaying the database.
  • EDIT: A couple folks have either asked about or recommended just using raw ADO.net code. My feeling on that subject hasn’t changed in years, if you’re writing raw ADO.Net code, you’re probably stealing from your employer.

 

 

Root Causes

When I judge whether or not a persistence tool is a good fit for how I prefer to work, I’m thinking about these low level first causes:

  • Is the tool ACID compliant, or will it at least manage not to lose important data because that tends to make our business folks angry. I’m just too old and conservative to screw around with any of the database tools that don’t support ACID.
  • Will the tool have a negative impact on our ability to evolve the structure of the application? Is it cheap for me to make incremental changes to the system state? I strongly believe that tools and technologies that don’t allow for easy system evolution make software development efforts fragile by forcing you to be right in your upfront designs.
  • What’s the impact going to be on automated test efforts? For all of its problems for us, RavenDb has to be the undisputed champion of testability because of how absurdly easy it is to establish and tear down system state between tests for reliable automated tests.
  • How much or little mismatch is there between the shape of my data that my business logic, user interface, or API handlers going to need versus how the database technology needs to store it because that old impedance mismatch issue can suck down a lot of developer time if you choose poorly.

 

Why I prefer Document Db’s over Relational Db’s for Most Development

Even though RavenDb isn’t necessarily working out for us, I still believe that document databases make sense and can lead to better productivity than relational databases. Doing a side by side comparison on some of my “first causes” above:

  • Evolutionary software design. Changing your usage of a relational database can easily involve changes to the DDL, database migrations, and possibly ORM mappings (the old, dreaded “wormhole anti-pattern” problem). I think this an area where “schemaless” document databases are a vast improvement for developer productivity because I only have to change my document type in code and go. It’s vastly less work when there’s only one model to change.
  • Testability. I think it’s more mechanical work to set up and tear down system state for automated tests with relational databases versus a document database. Relational integrity is a great thing when you need it, but it adds some extra work to test setup just to make the database shut up and let me insert my data. My clear experience from having automated testing against both types of database engines are that it’s much simpler working with document databases.
  • The Impedance Mismatch Challenge. Again, this is an area where I much prefer document databases when I generally want to store and retrieve hierarchical data. I also prefer document databases where data collections may have a great deal of polymorphism.

 

 

When would I still opt for a Relational Database?

Besides the overwhelming inertia of relational databases (everybody knows RDBMS tools and there are seemingly an infinite number of management and reporting tools to support RDBMS’s), there are still some places where I would still opt for a relational database:

  • Reporting applications. Not that it’s impossible in other kinds of databases, but there’s so many decent existing solutions for reporting against RDBMS’s.
  • If I were still a consultant, an RDBMS is a perfectly acceptable choice for conservative clients
  • Applications that will require a lot of adhoc queries. Much of my early career was trying to make sense of large engineering and construction databases that frequently went off the rails.
  • Batch jobs, not that I really wanna ever build systems like that again
  • Systems with a lot of flat, two dimensional data

 

The Pond Scum Shared Database Anti-Pattern

If you’ve worked around me long enough, you would surely hear me use the phrase “sharing a database is like drug abusers sharing needles.” I’ve frequently bumped into what I call the “pond scum anti-pattern” where an enterprise has one giant shared database with lots of little applications floating around it that modify and read pretty much the same set of database tables. It’s common, but so awfully harmful.

The indirect coupling between applications is especially pernicious because it’s probably not very obvious how any giving change to the database will impact all the little applications that float around it. My strong preference is for application databases rather than the giant shared database. That might very well lead to some duplication or worse, some inconsistency in data across applications, but we can’t solve everything in one already too long blog post;)

And to prove that this topic of the “shared database” problem is a long, never-ending problem, here’s a blog post from me on the same subject from 2005.

 

What about…?

  • MongoDb? I know some people like it and I’ve had some feedback on Marten that we should be patterning its usage on MongoDb rather than mostly on RavenDb. I’ve just seen too many stories about MongoDb losing data or having inadequate transactional integrity support.
  • Graph databases like Neo4J? I think they sound very interesting and there’s a project or two I’ve done that I thought might have benefited from using a graph database, but I’ve never used one. Someday.
  • Rail’s ActiveRecord? Even though I never made the jump to Ruby like so many other of my ALT.Net friends from a decade ago did, there was a time when I thought Ruby on Rails was the coolest thing ever. That day has clearly passed. I’m really not wild about any persistence that forces you to lock your application code to the shape of the database.
  • CSLA is apparently still around. To say the least, I’m not a fan. Too much harmful coupling between business logic and infrastructure, poor for evolutionary design in my opinion.

StructureMap 4.0 is Out!

tl;dr: StructureMap 4.0 went live to Nuget today with CoreCLR support, better performance, far better conventional registration via type scanning, and many improvements specifically targeted toward StructureMap integration into ASP.Net MVC 5 & 6 applications.

StructureMap 4.0 officially went live on Nuget today. The release notes page is updated for 4.0 and you can also peruse the raw GitHub issue list for the 4.0 milestone to see what’s new and what’s been fixed since 3.1.6.

Even though StructureMap has been around forever in .Net OSS terms (since June of 2004!), there are still new things to do, obsolete things to remove, and continuing needs to adapt to what users are actually trying to accomplish with the tool. As such, StructureMap 4.0 represents the lessons we’ve learned in the past couple years since the big 3.0 release. 4.0 is a much smaller set of changes than 3.0 and mostly contains performance and diagnostic improvements.

For the very first time since the long forgotten 2.5 release way back in 2008, I’m claiming that the StructureMap documentation site is completely up to date and effectively comprehensive. Failing that of course, the StructureMap Gitter room is open for questions.

This time around, I’d like to personally thank Kristian Hellang for being patient while we worked through issues with lifecycle and object disposal patterns for compliance with the new ASP.Net vNext dependency injection usage and the new StructureMap.DNX nuget for integrating StructureMap into vNext applications. I’d also like to thank Dmytro Dziuma for some pretty significant improvements to StructureMap runtime performance and his forthcoming packages for AOP with StructureMap and the rebuilt StructureMap.AutoFactory library. I’d also like to thank Oren Novotny for his help in moving StructureMap to the CoreCLR.

Some Highlights:

  • The StructureMap nuget now targets .Net 4.0, the CoreCLR via the “dotnet” profile, and various Windows Phone and Android targets via the PCL. While the early feedback on CoreCLR usage has been positive, I think you still have to assume that that support is unproven and early.
  • The internals of the type scanning and conventional registration has been completely overhauled to optimize container bootstrapping time and there are some new diagnostics to allow users to unwind frequent problems with type scanning registrations. The mechanism for custom conventions is a breaking change for 4.0, see the documentation for the details.
  • The lifecycle management had to be significantly changed and enhanced for ASP.Net vNext compliance. More on this in a later blog post.
  • Likewise, there are some new rules and behavior for how and when StructureMap will track and dispose IDisposable’s.
  • Performance improvements in general and some optimizations targeted specifically at integration with ASP.Net MVC (I don’t approve of how the ASP.Net team has done their integration, but .Net is their game and it was important to harden StructureMap for their goofy usage)
  • More robustness in heavy multi-threaded access
  • The constructor selection is a little smarter
  • ObjectFactory is gone, baby, gone. Jimmy Bogard will have to find some new reason to mock my code;)
  • If you absolutely have to use them, there is better support for customizing registration and behavior with attributes
  • The Registry class moved to the root “StructureMap” namespace, do watch that. That’s bugged me for years, so I went a head and fixed that this time since we were going for a new full point release.
  • 4.0 introduces a powerful new mechanism for establishing policies and conventions on how objects are built at runtime. My hope is that this will solve some of the user questions and problems that I’ve gotten in the past couple years. There will definitely be follow up post on that.

 

And of course, you can probably expect a 4.0.1 release soon for any issues that pop up once folks use this in real project work. At a minimum, there’ll be updates for the CoreCLR support once the dust settles on all that churn.

 

Optimizing Marten Part 2

This is an update to an earlier blog post on optimizing for performance in Marten. Marten is a new OSS project I’m working on that allows .Net applications to treat the Postgresql database as a document database. Our hope at work is that Marten will be a more performant and easier to support replacement in our ecosystem for RavenDb (and possibly a replacement event store mechanism inside of the applications that use event sourcing, but that’s going to come later).

Before we should think about using Marten for real, we’re undergoing some efforts to optimize the performance both in reading and writing data from Postgresql.

Optimizing Queries with Indexes

In my previous post, my former colleague Joshua Flanagan suggested using the Postgresql containment operator and gin indexes as part of my performance comparisons. After adding some ability to define database indexes for  Marten document types like this:

public class ContainmentOperator : MartenRegistry
{
    public ContainmentOperator()
    {
        // For persisting a document type called 'Target'
        For<Target>()

            // Use a gin index against the json data field
            .GinIndexJsonData()

            // directs Marten to try to use the containment
            // operator for querying against this document type
            // in the Linq support
            .PropertySearching(PropertySearching.ContainmentOperator);
    }
}

and like this for indexing what we’re calling “searchable” fields where Marten duplicates some element of a document into a separate database column for optimized searching:

public class DateIsSearchable : MartenRegistry
{
    public DateIsSearchable()
    {
        // This can also be done with attributes
        // This automatically adds a "BTree" index
        For<Target>().Searchable(x => x.Date);
    }
}

As of now, when you choose to make a field or property of a document “searchable”, Marten is automatically adding a database index to that column on the document storage table. By default, the index is the standard Postgresql btree index, but you do have the ability to override how the index is created.

Now that we have support for querying using the containment operator and support for defining indexes, I reran the query performance tests and updated the results with some new data:

Serializer: JsonNetSerializer

Query Type 1K 10K 100K
JSON Locator Only 7 77.2 842.4
jsonb_to_record + lateral join 9.4 88.6 1170.4
searching by duplicated field 1 16.4 135.4
searching by containment operator 4.6 14.8 132.4

Serializer: JilSerializer

Query Type 1K 10K 100K
JSON Locator Only 6 54.8 827.8
jsonb_to_record + lateral join 8.6 76.2 1064.2
searching by duplicated field 1 6.8 64
searching by containment operator 4 7.8 66.8

Again, searching by a field that is duplicated as a simple database column with a btree index is clearly the fastest approach. The containment operator plus gin index comes in second, and may be the best choice when you will have to issue many different kinds of queries against the same document type. Based on this data, I think that we’re going to make the containment operator be the preferred way of querying json documents, but fallback to using the json locator approach for all other query operators besides equality tests.

I still think that we have to ship with Newtonsoft.Json as our default json serializer because of F# and polymorphism concerns among other things, but if you can get away with it for your document types, Jil is clearly much faster.

There is some conversation in the Marten Gitter room about possibly adding gin indexes to every document type by default, but I think we first need to pay attention to the data in the next section:

 

Insert Timings

The querying is definitely important, but we certainly want the write side of Marten to be fast too. We’ve had what we call “BulkInsert” support using Npgsql & Postgresql’s facility for bulk copying. Recently, I’ve changed Marten’s internal unit of work class to issue all of its delete and “upsert” commands in one single ADO.Net DbCommand to try to execute multiple sql statements in a single network round trip.

My best friend the Oracle database guru (I’ll know if he reads this because he’ll be groaning about the Oracle part;)) suggested that this approach might not matter against issuing multiple ADO.Net commands against the same stateful transaction and connection, but we were both surprised by how much difference batching the SQL commands turned out to be.

To better understand the impact on insert timing using our bulk insert facility, the new batched update mechanism, and the original “ADO.Net command per document update” approach, I ran a series of tests that tried to insert 500 documents using each technique.

Because we also need to understand the implications on insertion and update timing of using the searchable, duplicated fields and gin indexes (there is some literature in the Postgresql docs stating that gin indexes could be expensive on the write side), I ran each permutation of update strategy against three different indexing strategies on the document storage table:

  1. No indexes whatsoever
  2. A duplicated field with a btree index
  3. Using a gin index against the JSON data column

And again, just for fun, I used both the Newtonsoft.Json and Jil serializers to also understand the impact that they have on performance.

You can find the code I used to make these tables in GitHub in the insert_timing class.

Using Newtonsoft.Json as the Serializer

Index Bulk Insert Batch Update Command per Document
No Index 62 149 244
Duplicated Field w/ Index 53 152 254
Gin Index on Json 96 186 300

 

Using Jil as the Serializer

Index Bulk Insert Batch Update Command per Document
No Index 47 134 224
Duplicated Field w/ Index 57 151 245
Gin Index on Json 79 180 270

As you can clearly see, the new batch update mechanism looks to be a pretty big win for performance over our original, naive “command per document” approach. The only downside is that this technique has a certain ceiling insofar as how many or how large the documents can be before the single command exceeds technical limits. For right now, I think I’d like to simply beat that problem with documentation pushing users to using the bulk insert mechanism for large data sets. In the longer term, we’ll throttle the batch update by paging updates into some to be determined number of document updates at a time.

The key takeaway for me just reinforces the very first lesson I had drilled into me about software performance: network round trips are evil. We are certainly reducing the number of network round trips between our application and the database server by utilizing the command batching.

You can also see that using a gin index slows down the document updates considerably. I think the only good answer to users is that they’ll have to do performance testing as always.

 

Other Optimization Things

  • We’ve been able to cutdown on Reflection hits and dynamic runtime behavior by using Roslyn as a crude metaprogramming mechanism to just codegen the document storage code.
  • Again in the theme of reducing network round trips, we’re going to investigate being able to batch up deferred queries into a single request to the Postgresql database.
  • We’re not sure about the details yet, but we’ll be investigating approaches for using asynchronous projections inside of Postgresql (maybe using Javascript running inside of the database, maybe .Net code in an external system, maybe both approaches).
  • I’m leaving the issues out in http://up-for-grabs.net, but we’ll definitely add the ability to just retrieve the raw JSON so that HTTP endpoints could stream data to clients without having to take the unnecessary hit of deserializing to a .Net type just to immediately serialize right back to JSON for the HTTP response. We’ll also support a completely asynchronous querying and update API for maximum scalability.

 

Using Roslyn for Runtime Code Generation in Marten

I’m using Roslyn to dynamically compile and load assemblies built at runtime from generated code in Marten and other than some concern over the warmup time, it’s been going very well so far.

Like so many other developers with more cleverness than sense, I’ve spent a lot of time trying to build Hollywood Principle style frameworks that try to dynamically call application code at runtime through Reflection or some kind of related mechanism. Reflection itself has traditionally been the easiest mechanism to use in .Net to create dynamic behavior at runtime, but it can be a performance problem, especially if you use it naively.

A Look Back at What Came Before…

Taking my own StructureMap IoC tool as an example, over the years I’ve accomplished dynamic runtime behavior in a couple different ways:

  1. Using IL directly using Reflection.Emit from the original versions through StructureMap 2.5. Working with IL is just barely a higher abstraction than assembly code and I don’t recommend using that if your goal is maintainability or making it easy for other developers to work in your code. I don’t miss generating IL by hand whatsoever. For those of you reading this and saying “pfft, IL isn’t so bad if you just understand how it works…”, my advice to you is to immediately go outside and get some fresh air and sunshine because you clearly aren’t thinking straight.
  2. From StructureMap 2.6 I crudely used the trick of building Expression trees representing what I needed to do, then compiling those Expression trees into objects of the right Func or Action signatures. This approach is easier – at least for me – because the Expression model is much closer semantically to the actual code you’re trying to mimic than the stack-based IL.
  3. From StructureMap 3.* on, there’s a much more complex dynamic Expression compilation model that’s robust enough to call constructor functions, setter properties, thread in interception, and surround all of that with try/catch logic for expressive exception messages and pseudo stack traces.

The current dynamic Expression approach in the StructureMap 3/4 internals is mostly working out well, but I barely remember how it works and it would take me a good day to just to get back into that code if I ever had to change something.

What if instead we could just work directly in plain old C# that we largely know and understand, but somehow get that compiled at runtime instead? Well, thanks to Roslyn and its “compiler as a service”, we now can.

I’ve said before that I want to eventually replace the Expression compilation with the Roslyn code compilation shown in this post, but I’m not sure I’m ambitious enough to mess with a working project.

How Marten uses Roslyn Runtime Generation 

As I explained in my last blog post, Marten generates some “glue code” to connect a document object to the proper ADO.Net command objects for loading, storing, or deleting. For each document class, Marten generates an IDocumentStorage class with this signature:

public interface IDocumentStorage
{
    NpgsqlCommand UpsertCommand(object document, string json);
    NpgsqlCommand LoaderCommand(object id);
    NpgsqlCommand DeleteCommandForId(object id);
    NpgsqlCommand DeleteCommandForEntity(object entity);
    NpgsqlCommand LoadByArrayCommand(TKey[] ids);
    Type DocumentType { get; }
}

In the test library, we have a class I creatively called “Target” that I’ve been using to test how Marten handles various .Net Types and queries. At runtime, Marten generates a class called TargetDocumentStorage that implements the interface above. Part of the generated code — modified by hand to clean up some extraneous line breaks and added comments — is shown below:

using Marten;
using Marten.Linq;
using Marten.Schema;
using Marten.Testing.Fixtures;
using Marten.Util;
using Npgsql;
using NpgsqlTypes;
using Remotion.Linq;
using System;
using System.Collections.Generic;

namespace Marten.GeneratedCode
{
    public class TargetStorage : IDocumentStorage, IBulkLoader, IdAssignment
    {
        public TargetStorage()
        {

        }

        public Type DocumentType => typeof (Target);

        public NpgsqlCommand UpsertCommand(object document, string json)
        {
            return UpsertCommand((Target)document, json);
        }

        public NpgsqlCommand LoaderCommand(object id)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForId(object id)
        {
            return new NpgsqlCommand("delete from mt_doc_target where id = :id").WithParameter("id", id);
        }

        public NpgsqlCommand DeleteCommandForEntity(object entity)
        {
            return DeleteCommandForId(((Target)entity).Id);
        }

        public NpgsqlCommand LoadByArrayCommand(T[] ids)
        {
            return new NpgsqlCommand("select data from mt_doc_target where id = ANY(:ids)").WithParameter("ids", ids);
        }

        // I configured the "Date" field to be a duplicated/searchable field in code
        public NpgsqlCommand UpsertCommand(Target document, string json)
        {
            return new NpgsqlCommand("mt_upsert_target")
                .AsSproc()
                .WithParameter("id", document.Id)
                .WithJsonParameter("doc", json).WithParameter("arg_date", document.Date, NpgsqlDbType.Date);
        }

        // This Assign() method would use a HiLo sequence generator for numeric Id fields
        public void Assign(Target document)
        {
            if (document.Id == System.Guid.Empty) document.Id = System.Guid.NewGuid();
        }

        public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable documents)
        {
            using (var writer = conn.BeginBinaryImport("COPY mt_doc_target(id, data, date) FROM STDIN BINARY"))
            {
                foreach (var x in documents)
                {
                    writer.StartRow();
                    writer.Write(x.Id, NpgsqlDbType.Uuid);
                    writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
                    writer.Write(x.Date, NpgsqlDbType.Date);
                }
            }
        }
    }
}

Now that you can see what code I’m generating at runtime, let’s move on to a utility for generating the code.

SourceWriter

SourceWriter is a small utility class in Marten that helps you write neatly formatted, indented C# code. SourceWriter wraps a .Net StringWriter for efficient string manipulation and provides some helpers for adding namespace using statements and tracking indention levels for you. After experimenting with some different usages, I mostly settled on using the Write(text) method that allows you to provide a section of code as a multi-line string. The TargetDocumentStorage code I showed above is generated from within a class called DocumentStorageBuilder with a call to the SourceWriter.Write() method shown below:

            writer.Write(
                $@"
BLOCK:public class {mapping.DocumentType.Name}Storage : IDocumentStorage, IBulkLoader<{mapping.DocumentType.Name}>, IdAssignment<{mapping.DocumentType.Name}>

{fields}

BLOCK:public {mapping.DocumentType.Name}Storage({ctorArgs})
{ctorLines}
END

public Type DocumentType => typeof ({mapping.DocumentType.Name});

BLOCK:public NpgsqlCommand UpsertCommand(object document, string json)
return UpsertCommand(({mapping.DocumentType.Name})document, json);
END

BLOCK:public NpgsqlCommand LoaderCommand(object id)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForId(object id)
return new NpgsqlCommand(`delete from {mapping.TableName} where id = :id`).WithParameter(`id`, id);
END

BLOCK:public NpgsqlCommand DeleteCommandForEntity(object entity)
return DeleteCommandForId((({mapping.DocumentType.Name})entity).{mapping.IdMember.Name});
END

BLOCK:public NpgsqlCommand LoadByArrayCommand(T[] ids)
return new NpgsqlCommand(`select data from {mapping.TableName} where id = ANY(:ids)`).WithParameter(`ids`, ids);
END


BLOCK:public NpgsqlCommand UpsertCommand({mapping.DocumentType.Name} document, string json)
return new NpgsqlCommand(`{mapping.UpsertName}`)
    .AsSproc()
    .WithParameter(`id`, document.{mapping.IdMember.Name})
    .WithJsonParameter(`doc`, json){extraUpsertArguments};
END

BLOCK:public void Assign({mapping.DocumentType.Name} document)
{mapping.IdStrategy.AssignmentBodyCode(mapping.IdMember)}
END

BLOCK:public void Load(ISerializer serializer, NpgsqlConnection conn, IEnumerable<{mapping.DocumentType.Name}> documents)
BLOCK:using (var writer = conn.BeginBinaryImport(`COPY {mapping.TableName}(id, data{duplicatedFieldsInBulkLoading}) FROM STDIN BINARY`))
BLOCK:foreach (var x in documents)
writer.StartRow();
writer.Write(x.Id, NpgsqlDbType.{id_NpgsqlDbType});
writer.Write(serializer.ToJson(x), NpgsqlDbType.Jsonb);
{duplicatedFieldsInBulkLoadingWriter}
END
END
END

END

");
        }

There’s a couple things to note about the code generation above:

  • String interpolation makes this so much easier than I think it would be with just string.Format(). Thank you to the C# 6 team.
  • Each line of code is written to the underlying StringWriter with the level of indention added to the left by SourceWriter itself
  • The “BLOCK” prefix directs SourceWriter to add an opening brace “{” to the next line, then increment the indention level
  • The “END” text directs SourceWriter to decrement the current indention level, then write a closing brace “}” to the next line and a blank line after that.

Now that we’ve got ourselves some generated code, let’s get Roslyn involved to compile it and actually get at an object of the new Type we want.

Roslyn Compilation with AssemblyGenerator

Based on a blog post by Tugberk Ugurlu, I built the AssemblyGenerator class in Marten shown below that invokes Roslyn to compile C# code and load the new dynamically built Assembly into the application:

public class AssemblyGenerator
{
    private readonly IList _references = new List();

    public AssemblyGenerator()
    {
        ReferenceAssemblyContainingType<object>();
        ReferenceAssembly(typeof (Enumerable).Assembly);
    }

    public void ReferenceAssembly(Assembly assembly)
    {
        _references.Add(MetadataReference.CreateFromFile(assembly.Location));
    }

    public void ReferenceAssemblyContainingType<T>()
    {
        ReferenceAssembly(typeof (T).Assembly);
    }

    public Assembly Generate(string code)
    {
        var assemblyName = Path.GetRandomFileName();
        var syntaxTree = CSharpSyntaxTree.ParseText(code);

        var references = _references.ToArray();
        var compilation = CSharpCompilation.Create(assemblyName, new[] {syntaxTree}, references,
            new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));


        using (var stream = new MemoryStream())
        {
            var result = compilation.Emit(stream);

            if (!result.Success)
            {
                var failures = result.Diagnostics.Where(diagnostic =>
                    diagnostic.IsWarningAsError ||
                    diagnostic.Severity == DiagnosticSeverity.Error);


                var message = failures.Select(x => $"{x.Id}: {x.GetMessage()}").Join("\n");
                throw new InvalidOperationException("Compilation failures!\n\n" + message + "\n\nCode:\n\n" + code);
            }

            stream.Seek(0, SeekOrigin.Begin);
            return Assembly.Load(stream.ToArray());
        }
    }
}

At runtime, you use the AssemblyGenerator class by telling it which other assemblies it should reference and giving it the source code to compile:

// Generate the actual source code
var code = GenerateDocumentStorageCode(mappings);

var generator = new AssemblyGenerator();

// Tell the generator which other assemblies that it should be referencing 
// for the compilation
generator.ReferenceAssembly(Assembly.GetExecutingAssembly());
generator.ReferenceAssemblyContainingType<NpgsqlConnection>();
generator.ReferenceAssemblyContainingType<QueryModel>();
generator.ReferenceAssemblyContainingType<DbCommand>();
generator.ReferenceAssemblyContainingType<Component>();

mappings.Select(x => x.DocumentType.Assembly).Distinct().Each(assem => generator.ReferenceAssembly(assem));

// build the new assembly -- this will blow up if there are any
// compilation errors with the list of errors and the actual code
// as part of the exception message
var assembly = generator.Generate(code);

Finally, once you have the new Assembly, use Reflection just to find the new Type you want by either searching through Assembly.GetExportedTypes() or by name. Once you have the Type object, you can build that object through Activator.CreateInstance(Type) or any of the other normal Reflection mechanisms.

The Warmup Problem

So I’m very happy with using Roslyn in this way so far, but the initial “warmup” time on the very first usage of the compilation is noticeably slow. It’s a one time hit on startup, but this could get annoying when you’re trying to quickly iterate or debug a problem in code by frequently restarting the application. If the warmup problem really is serious in real applications, we may introduce a mode that just lets you export the generated code to file and have that code compiled with the rest of your project for much faster startup times.