A way too early discussion of “Jasper”

After determining that I wasn’t going to be able to easily move the old FubuMVC codebase to the CoreCLR, I’ve been furiously working on the long proposed and delayed successor to FubuMVC that’s going to be called “Jasper.” I’m trying to get in front of a team doing CoreCLR development at work with a working MVP feature set in the next couple weeks. I’m needing to bring a couple other folks from my shop on to help out and a few folks have been asking what I’m up to just because of the sudden flurry of Github activity, so here’s a big ol’ braindump of the roadmap and architectural direction so far.

First, why do this at all instead of switching to another existing service bus?

We’re happy with how FubuMVC’s service bus support has worked out
We need to be “wire compatible” with FubuMVC
We want to do CoreCLR development right now, and NSB/MassTransit isn’t there yet
Jasper will be “xcopy deployable,” which we’ve found to be very advantageous for both development and automated testing
Because I want to — but don’t let my boss hear that

The Vision

Jasper is a next generation application development framework for distributed server side development in .Net (think service bus now and HTTP services later). Jasper is being built on the CoreCLR as a replacement for a small subset of the older FubuMVC tooling. Roughly stated, Jasper intends to keep the things that have been successful in FubuMVC, ditch the things that weren’t, and make the runtime pipeline be much more performant. Oh, and make the stack traces from failures within the runtime pipeline be a whole lot simpler to read — and yes, that’s absolutely worth being one of the main goals.

The current thinking is that we’d have these libraries/Nugets:

Jasper – The core assembly that will handle bootstrapping, configuration, and the Roslyn code generation tooling
JasperBus – The service bus features from FubuMVC and an alternative to MediatR
JasperDiagnostics – Runtime diagnostics meant for development and testing
JasperStoryteller – Support for hosting Jasper applications within Storyteller specification projects.
JasperHttp (later) – Build HTTP micro-services on top of ASP.Net Core in a FubuMVC-esque way.
JasperQueues (later) – JasperBus is going to use LightningQueues as its
primary transport mechanism, but I’d possibly like to re-architect that code to a new library inside of Jasper. This library will not have any references or coupling to any other Jasper project.
JasperScheduler (proposed for much later) – Scheduled or polling job support on top of JasperBus

The Core Pipeline and Roslyn

The basic goal of Jasper is to provide a much more efficient and improved version of the older FubuMVC architecture for CoreCLR development that is also “wire compatible” with our existing FubuMVC 3 services on .Net 4.6.

The original, core concept of FubuMVC was what we called the Russion Doll Model and is now mostly refered to as middleware. The Russian Doll Model architecture makes it relatively easy for developers to reuse code for cross cutting concerns like validation or security without having to write nearly so much explicit code. At this point, many other .Net frameworks support some kind of Russian Doll Model architecture like ASP.Net Core’s middleware or the Behavior model in NServiceBus.

In FubuMVC, that consisted of a couple parts:

A runtime abstraction for middleware called IActionBehavior for every step in the runtime pipeline for processing an HTTP request or service bus message. Behavior’s were a linked list chain from outermost behavior to innermost. This model was also adapted from FubuMVC into NServiceBus.
A configuration time model we called the BehaviorGraph that expressed all the routes and service bus message handling chains of behaviors in the system. This configuration time model made it possible to apply conventions and policies that established what exact middleware ran in what order for each message type or HTTP route. This configuration model also allowed FubuMVC to expose diagnostic visualizations about each chain that was valuable for troubleshooting problems or just flat out understanding what was in the system to begin with.

Great, lots of flexibility and some unusual diagnostics, but the FubuMVC model gets a lot uglier when you go to an “async by default” execution pipeline. Maybe more importantly, it suffers from too many object allocations because of all the little objects getting created on every message or HTTP request that hurt performance and scalability. Lastly, it makes for some truly awful stack traces when things go wrong because of all the bouncing between behaviors in the nested handler chain.

For Jasper, we’re going to keep the configuration model (but simplified), but this time around we’re doing some code generation at runtime to “bake” the execution pipeline in a much tighter package, then use the new runtime code compilation capabilitites in Roslyn to generate assemblies on the fly.

As part of that, we’re trying every possible trick we can think of to reduce object allocations and minimize the work being done at runtime by the underlying IoC container. The NServiceBus team did something very similar with their version of middleware and claimed an absolutely humongous improvement in throughput, so we’re very optimistic about this approach.

What’s with the name?

I think that FubuMVC turned some people off by its name (“for us, by us”). This time around I was going for an unassuming name that was easy to remember and just named it after my hometown (Jasper, MO).

JasperBus

The initial feature set looks to be:

Running decoupled commands ala MediatR
In memory transport
LightningQueues based transport
Publish/Subscribe messaging
Request/Reply messaging patterns
Dead letter queue mechanics
Configurable error handling rules
The “cascading messages” feature from FubuMVC
Static message routing rules
Subscriptions for dynamic routing — this time we’re looking at using [Consul(https://www.consul.io/)] for the underlying storage
Delayed messages
Batch message processing
Saga support (later) — but this is going to be a complete rewrite from FubuMVC

There is no intention to add the polling or scheduled job functionality that was in FubuMVC to Jasper.

JasperDiagnostics

We haven’t detailed this one out much, but I’m thinking it’s going to be a completely encapsulated ASP.Net Core application using Kestrel to serve some diagnostic views of a running Jasper application. As much as anything, I think this project is going to be a test bed for my shop’s approach to React/Redux and an excuse to experiment with the Apollo client with or without GraphQL. The diagnostics should expose both a static view of the application’s configuration and a live tracing of messages or HTTP requests being handled.

JasperStoryteller

This library won’t do too much, but we’ll at least want a recipe for being able to bootstrap and teardown a Jasper application in Storyteller test harnesses. At a minimum, I’d like to expose a bit of diagnostics on the service bus activity during a Storyteller specification run like we did with FubuMVC in the Storyteller specification results HTML.

JasperHttp

We’re embracing ASP.net Core MVC at work, so this might just be a side project for fun down the road. The goal here is just to provide a mechanism for writing micro-services that expose HTTP endpoints. The I think the potential benefits over MVC are:

Less ceremony in writing HTTP endpoints (fewer attributes, no required base classes, no marker interfaces, no fluent interfaces)
The runtime model will be much leaner. We think that we can make Jasper about as efficient as writing purely explicit, bespoke code directly on top of ASP.Net Core
Easier testability

A couple folks have asked me about the timing on this one, but I think mid-summer is the earliest I’d be able to do anything about it.

JasperScheduler

If necessary, we’ll have another “Feature” library that extends JasperBus with the ability to schedule user supplied jobs. The intention this time around is to just use Quartz as the actual scheduler.

JasperQueues

This is a giant TBD

IoC Usage Plans

Right now, it’s going to be StructureMap 4.4+ only. While this will drive some folks away, it makes the tool much easier to build. Besides, Jasper is already using some StructureMap functionality for its own configuration. I think that we’re only positioning Jasper for greenfield projects (and migration from FubuMVC) anyway.

Regardless, the IoC usage in Jasper is going to be simplistic compared to what we did in FubuMVC and certainly less entailed than the IoC abstractions in ASP.net MVC Core. We theorize that this should make it possible to slip in the IoC container of your choice later.

A Concept for Integrated Database Testing within Storyteller

As I wrote about a couple weeks back, we’re looking to be a bit more Agile with our relational database development. Storyteller is generally our tool of choice for automated testing when the problem domain involves a lot of data setup and where the declarative data checking becomes valuable. To take the next step toward more test automation against both our centralized database and the related applications, I’ve been working on a new package for Storyteller to enable easy integration of relational database manipulation and insertions. While I don’t have anything released to Nuget yet, I was hoping to get a little bit of feedback from others who might be interested in this new package — and have something to show other developers at work;)

As a super simplistic example, I’ve been retrofitting some Storyteller coverage against the Hilo sequence generation in Marten. That feature really only has two database objects:

mt_hilo: a table just to track which “page” of sequential numbers has been reserved
mt_get_next_hi: a stored procedure (I know, but let it go for now) that’s used to reserve and fetch the next page for a named entity

Those objects are shown below:

DROP TABLE IF EXISTS public.mt_hilo CASCADE;
CREATE TABLE public.mt_hilo (
	entity_name			varchar CONSTRAINT pk_mt_hilo PRIMARY KEY,
	hi_value			bigint default 0
);

CREATE OR REPLACE FUNCTION public.mt_get_next_hi(entity varchar) RETURNS int AS $$
DECLARE
	current_value bigint;
	next_value bigint;
BEGIN
	select hi_value into current_value from public.mt_hilo where entity_name = entity;
	IF current_value is null THEN
		insert into public.mt_hilo (entity_name, hi_value) values (entity, 0);
		next_value := 0;
	ELSE
		next_value := current_value + 1;
		update public.mt_hilo set hi_value = next_value where entity_name = entity;
	END IF;

	return next_value;
END
$$ LANGUAGE plpgsql;

As a tiny proof of concept, I wanted to have a Storyteller specification just to test the happy path of the objects above. In the Fixture class for the Hilo sequence objects, I need grammars to:

Verify that there is no existing data in mt_hilo at the beginning of the spec
Call the mt_get_next_hi function with a given entity name and verify the page number returned from the function
Do a set verification of the exact rows in the mt_hilo table at the end of the spec

To implement the desired specification language for the steps above, I wrote this class using the new Storyteller.RDBMS bits:

    public class HiloFixture : PostgresqlFixture
    {
        public HiloFixture()
        {
            Title = "The HiLo Objects";
        }

        public override void SetUp()
        {
            WriteTrace("Deleting from mt_hilo");
            Runner.Execute("delete from mt_hilo");
        }

        public IGrammar NoRows()
        {
            return NoRowsIn("There should be no rows in the mt_hilo table", "public.mt_hilo");
        }

        public RowVerification CheckTheRows()
        {
            return VerifyRows("select entity_name, hi_value from mt_hilo")
                .Titled("The rows in mt_hilo should be")
                .AddField("entity_name")
                .AddField("hi_value");
        }

        public IGrammarSource GetNextHi(string entity)
        {
            return Sproc("mt_get_next_hi")
                .Format("Get the next Hi value for entity {entity} should be {result}")
                .CheckResult<int>();
        }
    }

A couple other notes on the class above:

You might notice that I’m cleaning out the mt_hilo table in the Fixture.Setup() method. I do this to quietly establish a known starting state at the beginning of the specification execution
It’s not shown here, but part of your setup for this tooling is to tell Storyteller what the database connection string is. I haven’t exactly settled on the final mechanism for this yet.
The HiloFixture class subclasses the PostgresqlFixture class that provides some helpers for defining grammars against a Postgresql database. I’m developing against Postgresql at the moment (just so I can code on OSX), but this new package will target Sql Server as well out of the box because that’s what we need it for at work;)

Now that we’ve got the Fixture, I wrote this specification shown in Storyteller’s markdown flavored persistence:

# Read and Write

[Hilo]

In the initial state, there should be no data

|> NoRows
|> GetNextHi entity=foo, result=0
|> GetNextHi entity=bar, result=0
|> GetNextHi entity=foo, result=1
|> CheckTheRows
    [rows]
    |entity_name|hi_value|
    |foo        |1       |
    |bar        |0       |

Finally, here’s what the result of running the specification above looks like:

Screen Shot 2017-03-06 at 12.10.51 PM

Where do I foresee this being used?

I think the main usage for us is with some of our services that are tightly coupled to a Sql Server database. I see us using this tool to set up test data and be able to verify expected database state changes when our C# services execute.

I also see this for testing stored procedure logic when we deem that valuable, especially when the data setup and verification requires a lot of steps. I say that because Storyteller turns the expression of the specification into a declarative form. That’s also valuable because it helps you to decouple the expression of the specification from changes to the database structure. I.e., using Storyteller means that you can more easily handle scenarios like a database table getting a new non-null column with no default that would break any hard coded Sql statements.

I’d of course prefer not to have a lot of business logic in sproc’s, but if we are going to have mission critical sproc’s in production, I’d really prefer to have some test coverage over them.

New StructureMap Extensions for Aspect Oriented Programming and AutoFactories

StructureMap gets a couple new, official extension libraries today that have both been baking for quite awhile courtesy of Dmytro Dziuma. Both libraries target both .Net 4.5+ and the CoreCLR (Netstandard 1.3 to be exact).

First off, there’s the StructureMap.DynamicInterception package that makes it easy to apply Aspect Oriented Programming techniques as StructureMap interceptors. Here’s the introduction and documentation page in the StructureMap website for the library.

Secondly, there’s the long awaited StructureMap.AutoFactory library that adds the “auto factory” feature to StructureMap that many folks that came from Windsor had requested over the years. Check out the documentation for the library on the StructureMap website.

A big thanks to Dmytro for all the work he did with these libraries — and an apology from me for having dragged my feet on these things for ages:/

The Mistakes I’ve Made as an OSS Author

Personally, I think the ability to admit and face up to your mistakes is a valuable side effect of gaining experience and confidence as a developer. I can’t help you get out of “Imposter Syndrome Jail” per se, but I can say to younger developers that you’ll be able to be much more sanguine about the mistakes you make in your technical decision making once you get over thinking that you need to prove your worth to everyone around you at all times.

This post might be nothing but navel gazing, but I’d bet there’s something in here that would pertain to most developers sooner or later. I’ve had some of these mistakes rubbed into my face this week so this has been on my mind.

A couple years ago I would have said that my biggest mistake was a failure to provide adequate documentation and example usages. Today I’ll happily put the Marten, StructureMap, or Storyteller documentation against almost any OSS project, so I’m going to pass on being guilty about those past sins.

Don’t Fly Solo on Big Things

I think it’s perfectly possible to work by yourself on small, self-contained libraries. If you’re trying to do something big though, you’re going to need help from other folks. Ideally, you’ll need actual coding and testing help, but at a minimum you’ll need feedback and feature ideas from other folks. If you have any desire to see your project attract sizable usage, you’ll definitely want other folks who are also invested in seeing your project succeed.

I can’t help you much here in regards to how to accomplish the whole “build a vibrant OSS community” thing. Other than Marten, I’ve never been very successful at helping grow a community around any of the tools I’ve built.

FubuMVC did have a great community at first, but I attribute that much more to Chad Myers and Josh Arnold than anything I did at the time.

Thinking that Time is Linear

Every single time I make a StructureMap release I feel like “that’s it, I’m finally done with this thing, and I can move on to other things now.” I thought that the 3.0 release was going to permanently solve the worst of StructureMap’s structural and performance flaws. Then came ASP.Net Core, the CoreCLR, and a desire to speed up our application bootstrapping time, so out came StructureMap 4.0 — and this time I really was finished, thank you. Except that I wasn’t. Users found new bugs from use cases I’d never considered (and wouldn’t use anyway, but I digress). Corey Kaylor and I ended up doing some performance optimizations to StructureMap late last year that unclogged some issues with StructureMap in combination with some of the tools we use. Just this Monday I spent 3-4 hours addressing outstanding bugs and pull requests to push out a new release.

My point here is to adopt the mindset that your activity on an OSS project is cyclical, not linear. Software systems, frameworks, or libraries are never completed, only abandoned. This has been my single biggest error, and it’s really an issue of perspective.

Be Realistic about Supporting Users

I’ve had issues from time to time on StructureMap when I get wound up feeling like I was too backlogged with user questions and problems with a mix of guilt and frustration. I think the only real answer is to just be realistic about how fast you can get around to addressing user issues and cut yourself a little bit of slack. Your family, your workplace, and you have to be a higher priority than someone on the internet.

Building Features Too Early

In the early days of Agile development we talked a bit about “pull” vs. “push” approaches to project scope. In the “push” style, you try to plan out ahead of time what features and infrastructure you’re going to need, and build that out early. In a “pull” style, you delay introducing new infrastructure or features until there’s a demonstrated need for that. My consistent experience over the past decade has been that features I built in reaction to a definite need on an ongoing project at work have been much more successful than ideas I jammed into my OSS project because it sounded cool at the time.

Dogfooding

Try not to put anything out there for consumption by others if you haven’t used it yourself in realistic situations. I probably jumped the gun on the Storyteller 4.0 release and I’ll need to push a new release next week for usability concerns and a couple bugs. All of that stress could have been avoided if I’d just used the alpha’s in more of my own projects before cutting the nuget.

On the other hand, sometimes what you need most is feedback from other folks. I wonder if I made a mistake adding the event sourcing functionality into Marten. The project I had in mind that would have used that at work has been put off indefinitely and I’m not really dogfooding it at all myself. Fortunately, many other folks have been using it in realistic scenarios and I’m almost completely dependent upon them for finding problems or suggesting enhancements or API changes. I think that functionality would improve a lot faster if I were the one dogfooding it, but that’s not happening any time soon.

Inadequate Review of Pull Requests

I try to err on the side of taking in pull requests sooner rather than later, and it often causes trouble down the road. In a way, it’s harder to process code from someone else for new features because you’re not as invested into seeing your way through the implications and potential gotchas. I see a pull request that comes with adequate tests and I tend to take it in. There have been several times when I would have been better off to stop and think about how it fits into the rest of the project.

I don’t know what the exact answer is here. Too stringent of requirements for pull requests and you won’t get any. Too little oversight leads to you supporting someone else’s code.

Overreach and Hubris

I hate to say you shouldn’t chase your OSS dreams, but I think you have to be careful not to overreach or take on a mission impossible. Taking my spectacular flameout with the FubuMVC project as an example, I think I personally made these mistakes:

Being way too grandiose. An entirely alternative web development and service bus framework with its own concepts of modularity far outside the .Net mainstream was just never going to fly. I think you’re more likely to succeed by being part of an existing ecosystem rather than trying to create a whole new ecosystem. I guess I’m saying is that there just aren’t going to be very many DHH’s or John Resig’s.
Building infrastructure that wasn’t directly related to the core of your project. FubuMVC at the end included its own project templating engine, its own static file middleware, a Saml2 provider, and various other capabilities that I could have pulled off the shelf instead of building myself. All that ancillary stuff represented a huge opportunity cost to myself.
Just flat out building too much stuff instead of focusing on improving the core of your project

Concept for Integrating Selenium with Storyteller 4

While this is a working demonstration on my box, what I’m showing here is a very early conceptual approach for review by other folks in my shop. I’d love to have any feedback on this thing.

I spent quite a bit of time in our Salt Lake City office last week speaking with our QA folks about test automation in general and where Selenium does or doesn’t fit into our (desired) approach. The developers in my shop use Selenium quite a bit today within our Storyteller acceptance suite with mixed results, but now our QA folks are wanting to automate some of their manual test suite and kicking the tires on Selenium.

As a follow up to those discussions, this post shows the very early concept for how we can use Selenium functionality within Storyteller specifications for their and your feedback. All of the code is in Storyteller’s 4.1 branch.

Demo Specification

Let’s start very crude. Let’s say that you have a web page that has a

tag with some kind of user message text that’s hidden at first. On top of that, let’s say that you’ve got two buttons on the screen with the text “Show” and “Hide.” A Storyteller specification for that behavior might look like this:

specpreview

and the HTML results would look like this:

specresult

The 3+ second runtime is mostly in the creation and launching of a Chrome browser instance. More on this later.

To implement this specification we need two things, Fixture classes that implement our desired language and the actual specification data in a markdown file shown in the next section.

In this example, there would be a new “Storyteller.Selenium” library that provides the basis for integrating Selenium into Storyteller specifications with a common “ScreenFixture” base class for Fixture’s that target Selenium. After that, the SampleFixture class used in the specification above looks like this:

    public class SampleFixture : ScreenFixture
    {
        public SampleFixture()
        {
            // This is just a little bit of trickery to
            // use human readable aliases for elements on
            // the page. The Selenium By class identifies
            // how Selenium should "find" the element
            Element("the Show button", By.Id("button1"));
            Element("the Hide button", By.Id("button2"));
            Element("the div", By.Id("div1"));
            Element("the textbox", By.Id("text1"));
        }

        protected override void beforeRunning()
        {
            // Launching Chrome and opening the browser to a sample
            // HTML page. In real life, you'd need to be smarter about this
            // and reuse the Driver across specifications for better
            // performance
            Driver = new ChromeDriver();
            RootUrl = "file://" + Project.CurrentProject.ProjectPath.Replace("\\", "/");
        }

        public override void TearDown()
        {
            // Clean up behind yourself
            Driver.Close();
        }
    }

If you were editing the specifications in Storyteller’s Specification editor, you’ll have a dropdown box listing the elements by name any place where you need to specify an element like so:

editing

Finally, the proposed Storyteller.Selenium package adds information to the performance logging for how long a web page takes to load. This is the time according to WebDriver and shouldn’t be used for detailed performance optimization, but it’s still a useful number to understand performance problems during Storyteller specification executions. See the “Navigation/simple.htm” line below:

performance

What does the actual specification look like?

If you authored the specification above in the Storyteller user interface, you’d get this markdown file:

# Click Buttons

-> id = b721e06b-0b64-4710-b82b-cbe5aa261f60
-> lifecycle = Acceptance
-> max-retries = 0
-> last-updated = 2017-02-21T15:56:35.1528422Z
-> tags = 

[Sample]
|> OpenUrl url=simple.htm

This element is hidden by default
|> IsHidden element=the div

Clicking the "Show" button will reveal the div
|> Click element=the Show button
|> IsVisible element=the div
~~~

However, if you were writing the specification by hand directly in the markdown file, you can simplify it to this:

# Click Buttons

[Sample]
|> OpenUrl simple.htm

This element is hidden by default
|> IsHidden the div

Clicking the "Show" button will reveal the div
|> Click the Show button
|> IsVisible the div

We’re trying very hard with Storyteller 4 to make specifications easier to write for non-developers and what you see above is a product of that effort.

Why Storyteller + Selenium instead of just Selenium?

why would you want to use Storyteller and Selenium together instead of just Selenium by itself? A couple reasons:

There’s a lot more going on in effective automated tests besides driving web browsers (setting up system data, checking system data, starting/stopping the system under test). Storyteller provides a lot more functionality than Selenium by itself.
It’s very valuable to express automated tests in a higher level language with something like Storyteller or Cucumber instead of going right down to screen elements and other implementation details. I say this partially for making the specifications more human readable, but also to decouple the expression of the test from the underlying implementation details. You want to do this so that your tests can more readily accommodate structural changes to the web pages. If you’ve never worked on large scale automated testing against a web browser, you really need to be aware that these kinds of tests can be very brittle in the face of user interface changes.
Storyteller provides a lot of extra instrumentation and performance logging that can be very useful for debugging testing or performance problems
I hate to throw this one out there, but Storyteller’s configurable retry capability in continuous integration is very handy for test suites with oodles of asynchronous behavior like you frequently run into with modern web applications

Because somebody will ask, or an F# enthusiast will inevitably throw this out there, yes, there’s Canopy as well that wraps a nice DSL around Selenium and provides some stabilization. I’m not disparaging Canopy in the slightest, but everything I said about using raw Selenium applies equally to using Canopy by itself. To be a bit more eye-poky about it, one of the first success stories of Storyteller 3 was in replacing a badly unstable test suite that used Canopy naively.

Storyteller 4.0 is Out!

Storyteller is a long running project for authoring human readable, executable specifications for .Net projects. The new 4.0 release is meant to make Storyteller easier to use and consume for non technical folks and to improve developer’s ability to troubleshoot specification failures.

After about 5 months of effort, I was finally able to cut the 4.0 Nugets for Storyteller this morning and the very latest documentation updates. If you’re completely new to Storyteller, check out our getting started page or this webcast. If you’re coming from Storyteller 3.0, just know that you will need to first convert your specifications to the new 4.0 format. The Storyteller Fixture API had no breaking changes, but the bootstrapping steps are a little bit different to accommodate the dotnet CLI.

You can see the entire list of changes here, or the big highlights of this release are:

CoreCLR Support! Storyteller 4.0 can be used on either .Net 4.6 projects or projects that target the CoreCLR. As of now, Storyteller is now a cross platform tool. You can read more about my experiences migrating Storyteller to the CoreCLR here.
Embraces the dotnet CLI. I love the new dotnet cli and wish we’d had it years ago. There is a new “dotnet storyteller” CLI extensibility package that takes the place of the old ST.exe console tool in 3.0 that should be easier to set up for new users.
Markdown Everywhere! Storyteller 4.0 changed the specification format to a Markdown plus format, added a new capability to design and generate Fixture’s with markdown, and you can happily use markdown text as prose within specifications to improve your ability to communicate intentions in Storyteller specifications.
Stepthrough Mode. Integration tests can be very tricky to debug when they fail. To ease the load, Storyteller 4.0 adds the new Stepthrough mode that allows you manually walk through all the steps of a Storyteller specification so you can examine the current state of the system under test as an aid in troubleshooting.
Asynchronous Grammars. It’s increasingly an async-first kind of world, so Storyteller follows suit to make it easier for you to test asynchronous code.
Performance Assertions. Storyteller already tracks some performance data about your system as specifications run, so why not extend that to applying assertions about expected performance that can fail specifications on your continuous integration builds?

Other Things Coming Soon(ish)

A helper library for using Storyteller with ASP.Net Core applications with some help from Alba. I’m hoping to recreate some of the type of diagnostics integration we have today with Storyteller and our FubuMVC applications at work for our newer ASP.net Core projects.
A separate package of Selenium helpers for Storyteller
An extension specifically for testing relational database code
A 4.1 release with the features I didn’t get around to in 4.0;)

How is Storyteller Different than Gherkin Tools?

First off, can we just pretend for a minute that Gherkin/Cucumber tools like SpecFlow may not be the absolute last word for automating human readable, executable specifications?

By this point, I think most folks associate any kind of acceptance test driven development or truly business facing Behavioral Driven Development with the Gherkin approach — and it’s been undeniably successful. Storyteller on the other hand, was much more influenced by Fitnesse and could accurately be described as a much improved evolution of the old FIT model.

SpecFlow is the obvious comparison for Storyteller and by far the most commonly used tool in the .Net space. The bottom line for me with Storyteller vs. SpecFlow is that I think that Storyteller is far more robust technically in how you can approach the automated testing aspect of the workflow. SpecFlow might do the business/testing to development workflow a little better (but I’d dispute that one too with the release of Storyteller 4.0), but Storyteller has much, much more functionality for instrumenting, troubleshooting, and enforcing performance requirements of your specifications. I strongly believe that Storyteller allows you to tackle much more complex automated testing scenarios than other options.

Here is a more detailed list about how Storyteller differs from SpecFlow:

Storyteller is FOSS. So on one hand, you don’t have to purchase any kind of license to use it, but you’ll be dependent upon the Storyteller community for support.
Instead of parsing human written text and trying to correlate that to the right calls in the code, Storyteller specifications are mostly captured as the input and expected output. Storyteller specifications are then “projected” into human readable HTML displays.
Storyteller is much more table centric than Gherkin with quite a bit of functionality for set-based assertions and test data input.
Storyteller has a much more formal mechanism for governing the lifecycle of your system under test with the specification harness rather than depending on an application being available through other means. I believe that this makes Storyteller much more effective at development time as you cycle through code changes when you work through specifications.
Storyteller does not enforce the “Given/When/Then” verbiage in your specifications and you have much more freedom to construct the specification language to your preferences.
Storyteller has a user interface for editing specifications and executing specifications interactively (all React.js based now). The 4.0 version makes it much easier to edit the specification files directly, but the tool is still helpful for execution and troubleshooting.
We do not yet have direct Visual Studio.Net integration like SpecFlow (and I’m somewhat happy to let them have that one;)), but we will develop a dotnet test adapter for Storyteller when the dust settles on the VS2017/csproj churn.
Storyteller has a lot of functionality for instrumenting your specifications that’s been indispensable for troubleshooting specification failures and even performance problems. The built in performance tracking has consistently been one of our most popular features since it was introduced in 3.0.

Talking Marten on the Cross Cutting Concerns Podcast

When I was at Codemash this year, Matthew Groves was kind enough to let me do a podcast with him on Marten for the Cross Cutting Concerns podcast. Check it out.

Thoughts on Agile Database Development

I’m flying out to our main office next week and one of the big things on my agenda is talking over our practices around databases in our software projects. This blog post is just me getting my thoughts and talking points together beforehand. There are two general themes here, how I’d do things in a perfect world and how to make things better within the constraints of the organization and software architecture that have now.

I’ve been a big proponent of Agile development processes and practices going back to the early days of Extreme Programming (before Scrum came along and ruined everything about the way that Scrappy ruined Scooby Doo cartoons for me as a child). If I’m working in an Agile way, I want:

Strong project and testing automation as feedback cycles that run against all changes to the system
Some kind of easy traceability from a built or deployed system to exactly the version of the code and its dependencies , preferably automated through your source control processes
Technologies, tools, and frameworks that provide high reversibility to ease the cost of doing evolutionary software design.

From the get go, relational databases have been one of the biggest challenges in the usage of Agile software practices. They’re laborious to use in automated testing, often expensive in time or money to install or deploy, the change management is a bit harder because you can’t just replace the existing database objects the way we can with other code, and I absolutely think it’s reduces reversibility in your system architecture compared to other options. That being said, there are some practices and processes I think you should adopt so that your Agile development process doesn’t crash and burn when a relational database is involved.

Keep Business Logic out of the Database, Period.

I’m strongly against having any business logic tightly coupled to the underlying database, but not everyone feels the same way. For one reason, stored procedure languages (tSQL, PL/SQL, etc.) are very limited in their constructs and tooling compared to the languages we use in our application code (basically anything else). Mostly though, I avoid coupling business logic to the database because having to test through the database is almost inevitably more expensive both in developer effort and test run times than it would be otherwise.

Some folks will suggest that you might want to change out your database later, but to be honest, the only time I’ve ever done that in real life is when we moved from RavenDb to Marten where it had little impact on the existing structure of the code.

In practice this means that I try to:

Eschew usage of stored procedures. Yes, I think there are still some valid reasons to use sprocs, but I think that they are a “guilty until proven innocent” choice in almost any scenario
Pull business logic away from the database persistence altogether whenever possible. I think I’ll be going back over some of my old designing for testability blog posts from the Codebetter/ALT.Net days to try to explain to our teams that “wrap the database in an interface and mock it” isn’t always the best solution in every case for testability
Favor persistence tools that invert the control between the business logic and the database over tooling like Active Record that creates a tight coupling to the database. What this means is that instead of having business logic code directly reading and writing to the database, something else (Dapper if we can, EF if we absolutely have to) is responsible for loading and persisting application state back and forth between the domain in code and the underlying database. The point is to be able to completely test your business logic in complete isolation from the database.

I would make exceptions for use cases where using the database engine to do set based logic in a stored procedure is a more efficient way to solve the problem, but I haven’t been involved in systems like that for a long time.

Database per Developer/Tester/Environment

My very strong preference and recommendation is to have each developer, tester, and automated testing environment using a completely separate database. The key reason is to isolate each thread of team activity to avoid simultaneous operations or database changes from interfering with each other. Sharing the database makes automated testing much less effective because you often get false negatives or false positives from database activity going on somewhere else at the same time — and yes, this really does happen and I’ve got the scars to prove it.

Additionally, it’s really important for automated testing to be able to tightly control the inputs to a test. While there are some techniques you can use to do this in a shared database (multi-tenancy usage, randomized data), it’s far easier mechanically to just have an isolated database that you can easily control.

Lastly, I really like being able to look through the state of the database after a failed test. That’s certainly possible with a shared database, but it’s much easier in my opinion to look through an isolated database where it’s much more obvious how your code and tests changed the database state.

I should say that I’m concerned here with logical separation between different threads of activity. If you do that with truly separate databases or separate schemas in the same database, it serves the same goal.

“The” Database vs. Application Persistence

There are two basic development paradigms to how we think about databases as part of a software system:

The database is the system and any other code is just a conduit to get data back and forth from the database and its consumers
The database is merely the state persistence subsystem of the application

I strongly prefer and recommend the 2nd way of looking at that, and act accordingly. That’s a admittedly a major shift in thinking from traditional software development or database centric teams.

In practice, this generally means that I very strongly favor the concept of an application database that is only accessed by one application and can be considered to be just part of the application. In this case, I would opt to have all of the database DDL scripts and migrations in the source control repository for the application. This has a lot of benefits for development teams:

It makes it dirt simple to correlate the database schema changes to the rest of the application code because they’re all versioned together
Automated testing is easier within continuous integration builds becomes easier because you know exactly what scripts to apply to the database before running the tests
No need for elaborate cascading builds in your continuous integration setup because it’s just all together

In contrast, a shared database that’s accessed by multiple applications is a lot more potential friction. The version tracking between the two moving parts is harder to understand and it harms your ability to do effective automated testing. Moreover, it’s wretchedly nasty to allow lots of different applications to float on top of the same database in what I call the “pond scum anti-pattern” because it inevitably causes nasty coupling issues that will almost result in regression bugs due to it being so much harder to understand how changes in the database will ripple out to the applications sharing the database. A much, much younger version of myself walked into a meeting and asked our “operational data store” folks to add a column to a single view and got screamed at for 30 minutes straight on why that was going to be impossible and do you know how much work it’s going to be to test everything that uses that view young man?

Assuming that you absolutely have to continue to use a shared database like my shop does, I’d at least try to ameliorate that by:

Make damn sure that all changes to that shared database schema are captured in source control somewhere so that you have a chance at effective change tracking
Having a continuous integration build for the shared database that runs some level of regression tests and then subsequently cascades to all of the applications that touch that database being automatically updated and tested against the latest version of the shared database. I’m expecting some screaming when I recommend that in the office next week;-)
At the least, have some mechanism for standing up a local copy of the up to date database schema with any necessary baseline data on demand for isolated testing
Some way to know when I’m running or testing the dependent applications exactly what version of the database schema repository I’m currently using. Git submodules? Distribute the DB via Nuget? Finally do something useful with Docker, distribute the DB as a versioned Docker image, and brag about that to any developer we meet?

The key here is that I want automated builds constantly running as feedback mechanisms to know when and what database changes potentially break (or fix too!) one of our applications. Because of some bad experiences in the past, I’m hesitant to use cascading builds between separate repositories, but it’s definitely warranted in this case until we can get the big central database split up.

At the end of the day, I still think that the shared database architecture is a huge anti-pattern that most shops should try to avoid and I’d certainly like to see us start moving away from that model more and more.

Document Databases over Relational Databases

I’ve definitely put my money where my mouth is on this (RavenDb early on, and now Marten). In my mind, evolutionary or incremental software design is much easier with document databases for a couple reasons:

Far fewer changes in the application code result in database schema changes
It’s much less work to keep the application and database in sync because the storage just reflects the application model
Less work in the application code to transform the database storage to structures that are more appropriate for the business logic. I.e., relational databases really aren’t great when your domain model is logically hierarchical rather than flat
It’s a lot less work to tear down and set up known test input states in document databases. With a relational database you frequently end up having to deal with extraneous data you don’t really care about just to satisfy relational integrity concerns. Likewise, tearing down relational database state takes more care and thought than it does with a document database.

I would still opt to use a relational database for reporting or if there’s a lot of set based logic in your application. For simpler CRUD applications, I think you’re fine with just about any model and I don’t object to relational databases in those cases either.

It sounds trivial, but it does help tremendously if your relational database tables are configured to use cascading deletes when you’re trying to set a database into a known state for tests.

Team Organization

My strong preference is to have a completely self-contained team that has the ability and authority to make any and all changes to their application database, and that’s most definitely been valid in my experience. Have the database managed and owned separately from the development team is a frequent source of friction and definitely a major hit to your reversibility that forces you to do more potentially wrong, upfront design work. It’s much worse when that separate team does not share your priorities or simply works on a very different release schedule. I think it’s far better for a team to own their database — or at the very worst, have someone who is allowed to touch the database in the team room and team standup’s.

If I had full control over an organization, I would not have a separate database team. Keeping developers and database folks on separate team makes your team have to spend more time on inter-team coordination, takes away from the team’s flexibility in deciding what they can deliver, and almost inevitably causes a bottleneck constraint for projects. Even worse in my mind is when neither the developers nor the database team really understand how their work impacts the other team.

Even if we say that we have a matrix organization, I want the project teams to have primacy over functional teams. To go farther, I’d opt to make functional teams (developers, testers, DBA’s) be virtual teams solely for the purpose of skill acquisition, knowledge sharing, and career growth. My early work experience was being an engineer within large petrochemical project teams, and the project team dominant matrix organization worked a helluva lot better than it did at my next job in enterprise IT that focused more on functional teams.

As an architect now rather than a front line programmer, I constantly worry about not being able to feel the “pain” that my decisions and shared libraries cause developers because that pain is an important feedback mechanism to improve the usability of our shared infrastructure or application architecture. Likewise, I worry that having a separate database team creates a situation where they’re not very aware of the impact of their decisions on developers or vice versa. One of the very important lessons I was taught as an engineer was that it was very important to understand how other engineering disciplines work and what they needed so that we could work better with them.

Now though, I do work in a shop that has historically centralized the control of the database in a centralized database team. To mitigate the problems that naturally arise from this organizational model, we’re trying to have much more bilateral conversations with that team. If we can get away with this, I’d really like to see members of that team spend more time in the project team rooms. I’d also love it if we could steal a page from my original engineering job (Bechtel) and suggest some temporary rotations between the database and developer teams to better appreciate how the other half of that relationship works and what their needs are.

Marten 1.3 is Out: Bugfixes, Usability Improvements, and a lot less Memory Usage

I just uploaded Marten 1.3.0 to Nuget (but note that Nuget has had issues today with the index updating being delayed). This release is mostly bugfixes, but there’s some new functionality, and significant improvements to performance on document updates and bulk inserts. You can see the entire list of changes here with some highlights below.

I’d like to thank Marten contributors Eric Green, James Hopper, Michał Gajek, Barry Hagan, and Babu Annamalai for their contributions in this release. A special thanks goes out to Szymon Kulec for all his efforts in both Marten and Npgsgl to reduce Marten’s memory allocations.

Thanks to Phillip Haydon There’s a slew of new documentation on our website about Postgresql for Sql Server folks.

What’s New?

It wasn’t a huge release for new features, but these were added:

New “AsPagedList()” helper for fetching documents by page
Query for deleted, not deleted, or all documents marked as “soft deleted“
Indexes on Marten’s metadata columns
Querying by the document metadata

What’s Next?

The next release is going to be Marten 2.0 because we need to make a handful of breaking API changes (don’t worry, it’s very unlikely that most users would hit this). The big ticket item is a lot more work to reduce memory allocations throughout Marten. The other, not-in-the-slightest-bit-sexy change is to standardize and streamline Marten’s facilities for database change tracking with the hope that this work will make it far easier to start adding new features again.

The Different Meanings of “I take pull requests”

Years ago when I was in college and staying at my grandparent’s farm, my uncle rousted me up well after midnight because he could see headlights in our pasture. We went to check it out to make sure no one was trying to steal cattle (it’s very rare, but does happen) and found one of my grandparent’s neighbors completely stuck in a fence row and drunkenly trying to get himself out. I don’t remember the exact “conversation,” but his vocabulary was pretty well a single four letter expletive used as noun, verb, adjective, and adverb and the encounter went pretty quickly from potentially scary to comical.

Likewise, when OSS maintainers deploy the phrase “I take pull requests,” they mean a slew of very different things depending on the scenario or other party.

In order of positive to negative, here are the real meanings behind that phrase if you hear it from me:

I think that would be a useful idea to implement and perfectly suitable for a newcomer to the codebase. Go for it.
I like that idea, but I don’t have the bandwidth to do that right now, would you be willing to take that on?
I don’t think that idea is valuable and I wouldn’t do it if it were just me, but if you don’t mind doing that, I’ll take it in.
You’re being way too demanding, and I’m losing my patience with you. Since you’re clearly a jerk, I’m expecting this to make you go away if you have to do anything for yourself.