I will be joining a very impressive group of speakers at this year’s MonkeySpace conference in Chicago to give a couple talks related to FubuMVC:
Exploring the FubuMVC and Bottles Ecosystem
I think that the combination of FubuMVC with Bottles represents the very best modularity solution in all of .Net and that it’s competitive with anything else out there. In this talk I’m going to try to back up that claim with a quick demonstration of rapidly building out your application infrastructure with the existing ecosystem of drop in FubuMVC plugins (bottles). I’ll pull back the curtains and talk about the architectural decisions that made all the modularity possible and what we learned along the way.
Dependency Management in .Net OSS Development
The Fubu project ecosystem is big and growing. For the past couple years we’ve used a combination of Nuget and TeamCity to quickly push build products and dependencies from upstream projects to downstream consumers. We ran into a lot of technical problems and limitations with just about everything we’ve ever tried to do. In this talk I’ll show you the new ripple tool (ripple is sort of to Nuget as Bundler is to gems) we’ve built and adopted to smooth out consuming and publishing Nugets across the 60 odd fubu-related repositories. I’ll also show some concrete examples of how standardization has smoothed out the process.
For myself, I’m looking forward to Sebastien’s ReST talk, seeing what’s going on with OWIN, and making sure that every poor Microsoft attendee who crosses my path knows exactly how much pain strong naming + Nuget causes us.
I’ll be looking forward to meeting new people at MonkeySpace and catching up with friends I haven’t seen in quite a while (and getting out of the Texas summer heat for a couple days).
See you there.
TL;DR: FubuMVC and its related projects are finally getting some documentation, and the FubuMVC.CodeSnippets library is a big part of the “how” we’re trying to make the docs easier to write and maintain
Once upon a time there was a man who worked on an open source tool named “StructureMap.” This man spent an inordinate about of time on his 2.5 release, crafting a comprehensive set of documentation in static html as part of that release. Upon making the long awaited release, some unpleasant things happened:
- Many people didn’t like or just couldn’t derive any value out of the documentation website because of the way it was organized
- This man quickly realized from his own usage that many of the new API’s were awkward to use and he immediately added alternative API’s to make StructureMap 2.5+ usable
- It wasn’t easy to edit the big pile of html and copy/pasted code samples, making the effort even more painful — so the docs and the actual API wildly diverged and didn’t help the poor man handle user questions
Since I’d strongly prefer not to be that guy ever again, we’re putting some effort toward using “living documentation” techniques for FubuMVC, Storyteller, and StructureMap 3 to make it as easy as possible to keep the documentation in synch with the various frameworks as they evolve. As part of that goal, we’re using the FubuMVC.CodeSnippets (check out the link, there’s real documentation developed with FubuDocs) library to “slurp” sample code snippets right out of the live code during the automated builds. This way we can simply reuse unit test code and bits of example code running in the CI. If the real code changes, the sample code and the unit tests would have to change too or the CI build breaks.
In a nutshell, the idea behind FubuMVC.CodeSnippets is to just add some comments into your code marking the boundaries of a named “snippet” like so:
// SAMPLE: snippet-name
// C# code in the middle.
In a FubuMVC view, you can just say “I want to display snippet named ‘snippet-name’” and the view helper for code snippets will add the raw code in a <pre> tag and use prettyprint to color code the code.
I’m mostly finished with a fairly complicated project that used RavenDb and all is not quite well. All too frequently in the past month I’ve had to answer the question “was it a mistake to use RavenDb?” and the more Jeremy’s ego-bruising “should we scrap RavenDb and rebuild this on a different architecture?” Long story short, we made it work and I think we’ve got an architecture that can allow us to scale later, but the past month was miserable and RavenDb and our usage of RavenDb was the main culprit.
Our system is a problem resolution system for an automated data exchange between our company and our clients. The data exchange has long suffered from data quality issues and hence, we were tasked with building an online system to ameliorate the current manual heavy process for resolving the data issues. We communicate with the upstream system by receiving and sending flat files dropped into a folder (boo!). The files can be very large, and the shape of the data is conceptually different than how our application displays and processes events in our system. As part of processing the data we receive we have to do a fuzzy comparison to the existing data for each logical document because we don’t have any correlation identifier from the upstream system (this was obviously a severe flaw in the process, but I don’t have much control over this issue). The challenge for us with RavenDb was that we would have to process large bursts of data that involved both heavy reads and writes.
On the read side to support the web UI, the data was very hierarchical and using a document database was a huge advantage in my opinion.
First, some Good Stuff
- RavenDb has to be the easiest persistence strategy in all of software development to get up and running on day one. Granted that you’ll have to change settings for production later, but you can spin up a new project using RavenDb as an embedded database and start writing an application with persistence in nothing flat. I’ve told some of my ex-.Net/now Rails friends that I think I can spin up a FubuMVC app that uses RavenDb for persistence faster than they can with Rails and ActiveRecord. The combination of a document database and static typed document classes is dramatically lower friction in my opinion than using static typed domain entities with NHibernate or EF as well.
- I love, love, love being able to dump and rebuild a clean database from scratch in automated testing scenarios
- I’m still very high on document database’s, especially in the read side of an application. RavenDb might have fallen down for us in terms of write’s, but there were several places where storing a hierarchical document is just so much easier than dealing with relational database joins across multiple tables
- No DB migrations necessary
- Being able to drop down to Lucene queries helped us considerably in the UI
- I like the paging support in RavenDb
- RavenDb’s ability to batch up read’s was a big advantage when we were optimizing our application. I really like the lazy request feature and the IDocumentSession.Load(array of id’s) functions.
We had several memory usage problems that we ultimately attributed to RavenDb and its out of the box settings. In the first case, we had to turn off all of the 2nd level caching because it never seemed to release objects, or at least not before our application fell over from OutOfMemoryExceptions. In our case, the 2nd level cache would not have provided much value anyway except for a handful of little entities, so we just turned it off across the board. I think I would recommend that you only use caching with a whitelist of documents.
Also be aware that the implementations of IDocumentSession seem to be very much optimized for short transactions with limited activity at any one time. Unfortunately we were almost a batch driven system and our logical transactions became quite large and potentially involved a lot of reads against contextual information. After examining our application with a memory profiler, we determined that IDocumentSession was hanging on to the data we only read. We solved that issue by explicitly calling Evict() to remove objects from an IDocumentSession’s cache.
Don’t Abstract RavenDb Too Much
To be blunt, I really don’t agree with many of Ayende’s opinions about software development, but in regards to abstractions for RavenDb you have to play by his rules. We have a fubu project named FubuPersistence that adds common persistence capabilities like multi-tenancy and soft deletes on top of RavenDb in an easy to use way. That’s great and all, but we had to throw a lot of that goodness away because you so frequently have to get down to the metal with RavenDb to either tighten up performance or avoid stale data. We were able to happily spin up a database on the fly for testing scenarios, so you might look to do that more often than trying to swap out RavenDb for mocks, stubs, or 100% in memory repositories. Those tests are still slower than what you’d get with mocks or stubs, but you don’t have any choice when you start having to muck with RavenDb’s low level API’s.
I think RavenDb is weak in terms of dealing with large batches of updates or inserts. We tried using the BulkInsert functionality, and while it was a definite improvement in performance, we found it to be buggy and probably just immature (it is a recent feature). We first hit problems with map/reduce operations not finishing after processing a batch. We updated to a later version of RavenDb (2330), then had to retreat back to our original version (2230) with problems using Windows authentication in combination with the BulkInsert feature. We saw the same issues with the edge version of RavenDb as well. We also noticed that BulkInsert did not seem to honor the batch size settings and had several QA bugs under load because of this. We eventually solved the BulkInsert problems by sending batches of 200 documents for processing through our service bus and putting retry semantics around the BulkInsert to get around occasional hiccups.
The Eventual Consistency Thing
If you’re not familiar with Eventual Consistency and its implications, you shouldn’t even dream of putting a system based on RavenDb into production. The key with RavenDb is that query/command separation is pretty well built in. Writes are transactional, and reads by the document id will always give you the latest information, but other queries execute against indexes that are built in background threads as a result of writes. What this means to you is a chance of receiving stale results from queries against anything but a document id. There’s a real set of rationale behind this decision, but it’s still a major complication in your life with RavenDb.
With our lack of correlation identifiers from upstream, we were forced to issue a lot of queries against “natural key” data and we frequently ran into trouble with stale indexes in certain circumstances. Depending on circumstances, we fixed or prevented these issues by:
- Introducing a static index instead of relying on dynamic indexes. I think I’d push you to try to use a static index wherever possible.
- Judiciously using the WaitForNonStaleResults****** methods. Be careful with this one though, because it can have negative repercussions as well
- In a few cases we introduced an in-memory cache for certain documents. You *might* be able to utilize the 2nd level cache instead
- In another case or two, we switched from using surrogate keys to using natural keys because you always get the latest results when loading by the document id. User and login documents are the examples of this that I remember offhand.
The stale index problem is far more common in automated testing scenarios, so don’t panic when it happens.
I’m still very high on RavenDb’s future potential, but there’s a significant learning curve you need to be aware of. The most important thing to know about RavenDb in my opinion is that you can’t just use it, you’re going to have to spend some energy and time learning how it works and what some of the knobs and levers are because it doesn’t just work. On one hand, RavenDb has several features and capabilities that an RDBMS doesn’t and you’ll want to exploit those abilities. On the other hand, I do not believe that you can get away with using RavenDb with all of its default settings on a project with larger data sets.
Honestly, I think the single biggest problem on this project was in not doing the heavy load testing earlier instead of the last moment, but everybody involved with the project has already hung their heads in shame over that one and vowed to never do that again. Doing something challenging and doing something challenging right up against a deadline are too very different things. It is my opinion that while we did struggle with RavenDb that we would have had at least some struggle to optimize the performance if we’d built with an RDBMS and the user interface would have been much more challenging.
Knowing what I know now, I think it’s 50/50 that I would use RavenDb for a similar project again. If they get their story fixed for bigger transactions though, I’m all in.
I can finally claim some very substantial progress on StructureMap 3.0 today. For a background on the goals and big changes for the 3.0 release, see Kicking off StructureMap 3 from last year and some additions from last month when I started again. As of today, StructureMap 3.0 development is in the master branch in GitHub. If you need to get at StructureMap 2.6 level code, use the TwoSix branch.
What’s been done?
- I removed the strong naming.
- All the old [Obsolete] API methods have been removed
- The registration API has been greatly streamlined and there’s much more consistency internally now
- The nested container implementation has been completely redone. It’s much simpler, should be much faster because it’s doing much less on setup, and the old lifecycle confusion between the parent and nested container problems have been fixed.
- The “Profile” functionality has been completely redesigned and rebuilt. It’s also much more capable now than it was before.
- The container spinup time *should* be much better because there’s so much less going on and a lot more decision making is done in a lazy way with memoization along the way. Lazy<T> FTW!
- There’s much more runtime “figure out what I could do” type possibilities now
- You can apply lifecycle scoping Instance by Instance instead of only at the PluginType level. That’s been a big gripe for years.
- The Xml configuration has been heavily streamlined
- The old [PluginFamily] / [Pluggable] attributes have been completely ripped out
- Internally, the old PipelineGraph, InstanceFactory, ProfileManager architecture is all gone. The new PipelineGraph implementations just wrap one or more PluginGraph objects, so there’s vastly less data structure shuffling gone on internally.
What’s left to do?
I’ve transcribed my own notes about outstanding work (minus the documentation) to the GitHub issues page. There are a few items that are going to need some serious forethought, but I think the biggest architectural changes are already done and that list is starting to be more of a punchlist. I would dearly love any kind of help, design input, additions, or feedback on the outstanding work. If you’re inclined to get involved and tackle some of the issues, I tried to label the issues for the effort level.
If you think of the issues as picking a sword fight, the tags line up like this:
- “Easy Fix” – Facing a sheepherder who probably stole that heron mark blade he’s carrying
- “Medium Effort” – Fighting a Trolloc
- “Architectural Level Change” – Fade. I will likely need to be involved with any of these
Fairly soon, I’ll be making a call for folks to try out a prerelease version of StructureMap 3 in their existing applications. As part of that effort, I’d really like to get some feedback about the observed performance and see if we can beat on it enough to find any memory leak issues.
If you or someone you know is a multi-threading guru, I’d probably be interested in talking through some things with you in the codebase.
Docs? Someday? Maybe?
Hopefully someday soon. The FubuMVC core team will be relaunching a completely new website sometime in the next couple years with our own implementation of a readthedocs style infrastructure. I’m planning on making the new StructureMap documentation part of that website. Documentation will be in git where it’ll be easy to take in pull requests for additions and corrections, and you’ll be able to use either Html or Markdown for the content. We’ve already got a working mechanism to “slurp” code samples live out of a source code tree and put into the we pages with formatting via pretty print to achieve “living” documentation this time around.
I haven’t paid attention to any of the “IoC Container Performance Shootout!” type blog posts in a long time, but StructureMap used to routinely come in well ahead of the other full-featured IoC containers (tools like Funq shouldn’t be considered apples to apples with StructureMap/Windsor/Ninject/Autofac/whatever. If you don’t support auto-wiring, rich lifecycle support, and maybe even interception, I say you don’t count as full-featured) in terms of performance. However, as I’ve torn into the StructureMap codebase with an eye towards better performance for the first time in years, I’ve found a scary amount of performance killing cruft code. My final thought is that as bad as the StructureMap code was (and trust me, it was), if it’s really faster than the other IoC containers, then what does that say about their code internals at that time? ;-)
Just trying to round up more feedback as I go, here’s a handful of discussions I’ve started on the big proposed changes for StructureMap 3:
- Eliminating the old [PluginFamily] / [Pluggable] attributes
- Redo’ing Profile’s
- Streamlining Xml support
Please feel free to chime in here, twitter, or the list on any of these topics or any other thing you want for StructureMap 3.
My shop is starting to go down the path of executable specifications (using Storyteller2 as the tooling, but that’s not what this post is about). As an engineering practice, executable specifications* involves specifying the expected behavior of a user story with concrete examples of exactly how the system should behave before coding. Those examples will hopefully become automated tests that live on as regression tests.
What are we hoping to achieve?
- Remove ambiguity from the requirements with concrete examples. Ambiguity and misunderstandings from prose based requirements and analysis has consistently been a huge time waste and source of errors throughout my career.
- Faster feedback in development. It’s awfully nice to just run the executable specs in a local branch before pushing anything to the testers
- Find flaws in domain logic or screen behavior faster, and this has been the biggest gain for us so far
- Creating living documentation about the expected behavior of the system by making the specifications human readable
- Building up a suite of regression tests to make later development in the system more efficient and safer
While executable specifications are certainly a very challenging practice from the technical side of things, in the past week or so I’m aware of 3-4 scenarios where the act of writing the specification tests has flushed out problems with our domain logic or screen behavior a lot faster than we could have done otherwise.
Part of our application logic involves fuzzy matching against people in our system against some, ahem, not quite trustworthy data from external partners. Our domain expert explained the matching logic that he wanted was to match a person’s social security number, birth date, first name, and last name — but the name matching should be case insensitive and it’s valid to match on the initial of the first name. Since this logic can be expressed as a set number of inputs and the one output with a great number of permutations, I chose to express this specification as a table with Storyteller (conceptually identical to the old ColumnFixture in FitNesse). The final version of the spec is shown below (click the image to get a more readable version):
The image above is our final, approved version of this functionality that now lives as both documentation and a regression test. Before that though, I wrote the spec and got our domain expert to look at it, and wouldn’t you know it, I had misunderstood a couple assumptions and he gave me very concrete feedback about exactly what the spec should have been.
To make this just a little bit more concrete, our Storyteller test harness connects the table inputs to the system under test with this little bit of adapter code:
* Jeremy, is this really just Behavior Driven Development (BDD)? Or the older idea of Acceptance Test Driven Development (ATDD)? This is some folks’ definition of BDD, but BDD is so overloaded and means so many different things to different people that I hate using the term. ATDD never took off, and “executable specifications” just sounds cooler to me, so that’s what I’m going to call it.
I’ll be honest, I haven’t worked much on StructureMap since I originally shelved my original 3.0/rewrite work in the summer of 2010 — and yes, the documentation is almost worthless. Now that FubuMVC reached that magic 1.0 mark I’m turning my attention back to StructureMap for a bit, but I think I want some feedback about what I’m thinking right now.
For background, read:
- Kicking off StructureMap 3.0 – I just re-read this, and I’m still thinking all of the same things here and all the feedback is still valid.
- Proposed StructureMap 2.7 Release
A month ago my plan was to do a small 2.7 release on the existing codebase to remove all the [Obsolete] API calls and grab some pull requests along the way. Having done that, I would then turn my attention back to the 3.0 codebase where I planned to essentially rewrite the core of StructureMap and retrofit the existing API on top of the new, cleaner core. A week or so into the work for the 2.7 release and I’ve changed my mind. First off, by the rules of semantic versioning, I should bump the major version to 3.0.0 when I make the breaking API changes. Secondly, I’m coming around to the idea of restructuring the existing code in place instead of a full rewrite.
To reiterate the major points, the 3.0 release means:
- All [Obsolete] API calls are going away
- Removing the strong naming — if you absolutely *have* to have this, maybe we can make separate nuget packages. I suggest we name that “structuremap.masochistic.”
- Move to .Net 4.0. I don’t think it’s time to go to 4.5 yet and I don’t really want to mess with that anyway
- Taking a dependency on FubuCore — if that causes pushback we’ll ilmerge it
- Streamlining the Xml support
- Rewrite the “Profile” feature completely
- Make nested containers not be a crime against computer science
- NOT adding every random brainfart “feature” that Windsor has
- Make it faster
- Make the diagnostics much better
- Removing some obscure, clumsy features I never use and really wish you wouldn’t either
Additionally, we have a new “living documentation” infrastructure baking for the Fubu projects. I know some work already happened to transfer the StructureMap docs to Jekyll, but I’d far prefer to publish on the new fubu world website whenever that happens.
For right now, the 3.0 branch is in the original StructureMap repository at https://github.com/structuremap/structuremap/tree/three.