I’ve been lucky over the past year and change to work with some interesting projects that used some of the newer technologies and architectural concepts like Command Query Responsiblity Separation (CQRS), Event Sourcing, Eventual Consistency, and RavenDb as a document database. I cannot speak to the scalability benefits of these tools because that’s just not an area where I have expertise. Instead, I’m interested in how these tools have reduced coding ceremony, improved testability, and allowed my very small teams to effectively do continuous design by giving us much more architectural reversibility. I ran out of time and energy on this post, but I’ll follow up next week with more on event sourcing, what I like about RavenDb, and how we’ve used all of this in our projects.
Continuous Design is better with a Document Database
I gave a talk earlier this month at Agile Vancouver called “Architectural Reversibility, ” largely about how we can create better designs if we are able to do design incrementally throughout the lifetime of a project instead of having to do it all upfront. My point of view on this topic is that we’re far more likely to succeed if we’re able to recover from the inevitable errors in architecture, design, or requirements — or better yet, if we’re able to delay commitment to elements of our technical architecture until we know more later on in the project. Furthermore, I said that you should be cognizant of this when selecting technologies. One of my slides showed this progression of data access/persistence technologies from my own development career that went something like this:
- Stored procedures (sproc) for every single bit of data access
- Object Relational Mapper & Relational Database
- Document Database
Let’s say that I need to add a property to an entity in my existing system. Using the same numbering scheme as above, I would have to:
- Change the DDL defining the proper table. Update every sproc that returns that field and any that might need to search on that field. Go update all the places in the code that use the data returned by that table.
- Change the DDL defining the proper table or a data migration. Change the relevant class in the code (even with Ruby ActiveRecord you may still touch the class to add validation rules). Change the ORM mapping to add this field and verify the persistence of the new field all the way to the database.
- Add a new property to the proper class and make sure that it serializes.
Adding or changing the shape of the data in the 90’s style stored procedure model was tedious. Back then you had to try much harder to get things right on the first try. Using an ORM was much better, especially if you used conventions to drive the ORM mapping or even to generate the database schema from your classes. However, using a document database where you just serialize objects to a json structure with no schema requiring you to effectively do double data entry for the database and object model? That’s the best possible solution for really able to do continuous design because there’s very minimal friction in changing your object model (at least before you deploy for the first time anyway).
To summarize, document databases absolutely rock for architectural reversibility and that’s a very, very big deal.
Automated testing
In my strong opinion, doing automated, end to end testing using the database is vastly easier and more effective with a document database than with a relational database. I feel that this advantage is enough by itself to justify the usage of a document database. Why do I think that? Well first, let’s review the two mandatory parts of any repeatable automated test:
- Known inputs
- Expected outcomes
In order to be really successful with automated testing, I think you need to achieve a couple things:
- The tests have to run fast enough to provide timely feedback.
- It has to be mechanically cheap for a test author to put the system into the initial state
- You can not allow state to bleed between tests because that makes them unreliable
- And a Jeremy special: data input for automated tests should be isolated by test, i.e. no shared test data!
Referential integrity has repeatedly been a huge source of friction in test automation. I have found myself frequently adding junk data to a database for automated tests that was not remotely germaine to the meaning of the test just to get the database constraints to shut up. Folks, that’s friction that you just won’t have with a document database.
Immediately after adopting RavenDb we quickly adopted the trick of using Raven’s in memory storage for testing, and completely scrapping the full database between tests, virtually guaranteeing that we have our tests isolated from each other. You can certainly do something like this with relational databases, but in my experience doing this is much more work and far slower no matter how you do things. Being able to very quickly drop and rebuild a clean database in code is a killer feature for automated testing.
Separating the read and write models
The first time I saw Greg Young present on CQRS in 2008 I thought to myself “that’s interesting, but keeping two separate models for the same thing sounds like a lot of busywork to me.” In practice, I’m finding it to be more helpful than I thought because it has allowed my team to be able to focus on one problem at a time and jump into the work without having to understand everything at once.
We just started a project where we’ll be exchanging messages from our web application to an existing backend. We don’t exactly have the messaging workflow locked down, but our immediate concern is getting feedback on the usability and workflow of the proposed user interface. To that end we created a very simple “read” model that stores only the data that our views need and in a shape that’s easy to consume on the page with little concern for what the real, behavioral “write” side model will look like later on. We’re even able to write end to end automated tests against our user interface by setting up flat “read” documents in the database.
In iteration 2, we’ll be focusing on the events and messages throughout the system and flush out the “write” model and how it responds and changes with events. In both cases, we are able to tightly focus on only one aspect of the system and test each in isolation. Later on we’ll either use RavenDb’s built in mechanisms to or a code based “denormalizer” to keep the write and read models synchronized. I like this path of working because it’s allowing me to focus on a subset of the application at a time without ever having to be overwhelmed with so many variables.
Honestly, I think I’d be a lot more hesitant to try this kind of architecture with a relational database where I’d have to lug around more stuff (DDL scripts, ORM mappings, data migration scripts, etc.) than I do today with a document database where the document json structure just flows out of the existing classes. RavenDb’s index feature does a lot to alleviate the tedious “left hand/right hand” coding that I worried about when I first learned about CQRS.
Eventual Consistency requires some care in testing
Jimmy Bogard recently blogged about the downsides of eventual consistency with a user interface. We had some similar issues on a previous project Rather than repeat everything Jimmy said, I’ll simply add that you must be cognizant of eventual consistency during testing. A typical testing pattern is going to be something like:
- Arrange — set up a test scenario
- Act — do something that is expected to change the state of the system
- Assert — check that the system is in the state that you expected
Your problem here with eventual consistency is that there’s an asynchronous process between writing data in step 2 and being able to read the new data in step 3. You absolutely have to account for this in both your automated tests and any manual testing. My cheap solution with RavenDb is to swap out our low level RavenDb “persistor” in our IoC container with a testing implementation that just forces any reads to wait for all pending writes to finish first.
More importantly, I’m going to spend quite some time with our testers making sure that they have insight and visibility into this behavior so that everyone gets to keep from pulling out all our hair.
Finally…
I’m not a deep expert on these tools and techniques, but I’m seeing some things that I like so far. At this point, I’d strongly prefer to avoid working on projects involving a relational database ever again. As for RavenDb, it’s made a strong first impression on me and I’m looking forward to seeing where it goes from here. I will commit to flushing out a quick start recipe for integrating RavenDb with a drop in “Bottle” for FubuMVC as our de facto recommendation for new FubuMVC projects.
Next time…
It’s Friday afternoon, I have to hit publish before the end of the day for an elimination bet, and I haven’t seen the inside of the gym all week, so I’m quitting here. In part 2 I’d like to share why I think persistence is much easier with a document database, how we’re able to just not worry about a database at all early on, and my thoughts on developing with event sourcing. Until next time, adieu.