Initial thoughts on some new-fangled things part 1

I’ve been lucky over the past year and change to work with some interesting projects that used some of the newer technologies and architectural concepts like Command Query Responsiblity Separation (CQRS), Event Sourcing, Eventual Consistency, and RavenDb as a document database. I cannot speak to the scalability benefits of these tools because that’s just not an area where I have expertise. Instead, I’m interested in how these tools have reduced coding ceremony, improved testability, and allowed my very small teams to effectively do continuous design by giving us much more architectural reversibility. I ran out of time and energy on this post, but I’ll follow up next week with more on event sourcing, what I like about RavenDb, and how we’ve used all of this in our projects.

Continuous Design is better with a Document Database

I gave a talk earlier this month at Agile Vancouver called “Architectural Reversibility, ” largely about how we can create better designs if we are able to do design incrementally throughout the lifetime of a project instead of having to do it all upfront. My point of view on this topic is that we’re far more likely to succeed if we’re able to recover from the inevitable errors in architecture, design, or requirements — or better yet, if we’re able to delay commitment to elements of our technical architecture until we know more later on in the project. Furthermore, I said that you should be cognizant of this when selecting technologies. One of my slides showed this progression of data access/persistence technologies from my own development career that went something like this:

Stored procedures (sproc) for every single bit of data access
Object Relational Mapper & Relational Database
Document Database

Let’s say that I need to add a property to an entity in my existing system. Using the same numbering scheme as above, I would have to:

Change the DDL defining the proper table. Update every sproc that returns that field and any that might need to search on that field. Go update all the places in the code that use the data returned by that table.
Change the DDL defining the proper table or a data migration. Change the relevant class in the code (even with Ruby ActiveRecord you may still touch the class to add validation rules). Change the ORM mapping to add this field and verify the persistence of the new field all the way to the database.
Add a new property to the proper class and make sure that it serializes.

Adding or changing the shape of the data in the 90’s style stored procedure model was tedious. Back then you had to try much harder to get things right on the first try. Using an ORM was much better, especially if you used conventions to drive the ORM mapping or even to generate the database schema from your classes. However, using a document database where you just serialize objects to a json structure with no schema requiring you to effectively do double data entry for the database and object model? That’s the best possible solution for really able to do continuous design because there’s very minimal friction in changing your object model (at least before you deploy for the first time anyway).

To summarize, document databases absolutely rock for architectural reversibility and that’s a very, very big deal.

Automated testing

In my strong opinion, doing automated, end to end testing using the database is vastly easier and more effective with a document database than with a relational database. I feel that this advantage is enough by itself to justify the usage of a document database. Why do I think that? Well first, let’s review the two mandatory parts of any repeatable automated test:

Known inputs
Expected outcomes

In order to be really successful with automated testing, I think you need to achieve a couple things:

The tests have to run fast enough to provide timely feedback.
It has to be mechanically cheap for a test author to put the system into the initial state
You can not allow state to bleed between tests because that makes them unreliable
And a Jeremy special: data input for automated tests should be isolated by test, i.e. no shared test data!

Referential integrity has repeatedly been a huge source of friction in test automation. I have found myself frequently adding junk data to a database for automated tests that was not remotely germaine to the meaning of the test just to get the database constraints to shut up. Folks, that’s friction that you just won’t have with a document database.

Immediately after adopting RavenDb we quickly adopted the trick of using Raven’s in memory storage for testing, and completely scrapping the full database between tests, virtually guaranteeing that we have our tests isolated from each other. You can certainly do something like this with relational databases, but in my experience doing this is much more work and far slower no matter how you do things. Being able to very quickly drop and rebuild a clean database in code is a killer feature for automated testing.

Separating the read and write models

The first time I saw Greg Young present on CQRS in 2008 I thought to myself “that’s interesting, but keeping two separate models for the same thing sounds like a lot of busywork to me.” In practice, I’m finding it to be more helpful than I thought because it has allowed my team to be able to focus on one problem at a time and jump into the work without having to understand everything at once.

We just started a project where we’ll be exchanging messages from our web application to an existing backend. We don’t exactly have the messaging workflow locked down, but our immediate concern is getting feedback on the usability and workflow of the proposed user interface. To that end we created a very simple “read” model that stores only the data that our views need and in a shape that’s easy to consume on the page with little concern for what the real, behavioral “write” side model will look like later on. We’re even able to write end to end automated tests against our user interface by setting up flat “read” documents in the database.

In iteration 2, we’ll be focusing on the events and messages throughout the system and flush out the “write” model and how it responds and changes with events. In both cases, we are able to tightly focus on only one aspect of the system and test each in isolation. Later on we’ll either use RavenDb’s built in mechanisms to or a code based “denormalizer” to keep the write and read models synchronized. I like this path of working because it’s allowing me to focus on a subset of the application at a time without ever having to be overwhelmed with so many variables.

Honestly, I think I’d be a lot more hesitant to try this kind of architecture with a relational database where I’d have to lug around more stuff (DDL scripts, ORM mappings, data migration scripts, etc.) than I do today with a document database where the document json structure just flows out of the existing classes. RavenDb’s index feature does a lot to alleviate the tedious “left hand/right hand” coding that I worried about when I first learned about CQRS.

Eventual Consistency requires some care in testing

Jimmy Bogard recently blogged about the downsides of eventual consistency with a user interface. We had some similar issues on a previous project Rather than repeat everything Jimmy said, I’ll simply add that you must be cognizant of eventual consistency during testing. A typical testing pattern is going to be something like:

Arrange — set up a test scenario
Act — do something that is expected to change the state of the system
Assert — check that the system is in the state that you expected

Your problem here with eventual consistency is that there’s an asynchronous process between writing data in step 2 and being able to read the new data in step 3. You absolutely have to account for this in both your automated tests and any manual testing. My cheap solution with RavenDb is to swap out our low level RavenDb “persistor” in our IoC container with a testing implementation that just forces any reads to wait for all pending writes to finish first.

More importantly, I’m going to spend quite some time with our testers making sure that they have insight and visibility into this behavior so that everyone gets to keep from pulling out all our hair.

Finally…

I’m not a deep expert on these tools and techniques, but I’m seeing some things that I like so far. At this point, I’d strongly prefer to avoid working on projects involving a relational database ever again. As for RavenDb, it’s made a strong first impression on me and I’m looking forward to seeing where it goes from here. I will commit to flushing out a quick start recipe for integrating RavenDb with a drop in “Bottle” for FubuMVC as our de facto recommendation for new FubuMVC projects.

Next time…

It’s Friday afternoon, I have to hit publish before the end of the day for an elimination bet, and I haven’t seen the inside of the gym all week, so I’m quitting here. In part 2 I’d like to share why I think persistence is much easier with a document database, how we’re able to just not worry about a database at all early on, and my thoughts on developing with event sourcing. Until next time, adieu.

11 thoughts on “Initial thoughts on some new-fangled things part 1”

Matt S. says:

October 27, 2012 at 4:10 am

Jeremy,

I’m curious how you started with document databases. Did you join in on a project already using it? Did you try it out on a test/research project? Did you just go for it after “enough” research?

I am highly intrigued, but am hesitant to use it new projects without knowing more about scalability and performance under load. I know you stated that your expertise does not lie with the scalability of such things, but I am wondering how document databases have performed in your live deployments so far.

1. jeremydmiller says:
  
  October 28, 2012 at 7:09 pm
  
  I don’t particularly have any recipe for you here. I’d been aware of them for quite some time, but the last project that Josh and I did was perfect for a doc database. All the data was hierarchical with plenty of polymorphic collections. In other words, the kind of business domain where an RDBMS and ORM falls down.
  
  1. Matt S. says:
    
    October 29, 2012 at 2:12 pm
    
    Okay. That makes sense for an ideal situation to start using them. Thanks.
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1220
Greg Milner says:

October 29, 2012 at 3:26 pm

There’s a bit of a straw dog here, I’m afraid.

Regarding changes to a relational DB:

“Change the DDL defining the proper table. Update every sproc that returns that field and any that might need to search on that field. Go update all the places in the code that use the data returned by that table.”

It’s been my experience that you just have to change a couple of things, usually where the data get’s input and where it gets reported on. If proper database design is done up front, then most of the data most of the people are going to want to see most of the time will be covered and these additions after the fact will be outliers serving a select few who ask for them, by and large. At least that’s been my experience.

I suspect there’s a bit of prejudice going on here against relational DBs:

“At this point, I’d strongly prefer to avoid working on projects involving a relational database ever again.”

Relational data has it’s place and always will. There are certain aspects of life and business that cannot be modeled any other way. To say that you would like to never again work with relational data is akin to saying “I don’t ever want to talk to a Republican again.” As pleasant as that might be, it’s not going to happen, and frankly, we should not expect that it’s necessarily desireable.

1. jeremydmiller says:
  
  November 3, 2012 at 7:31 pm
  
  Couple things,
  
  I absolutely did not make a straw man argument against the usage of relational databases. There is a very stark contrast in the amount of technical overhead between using a relational database to persist an object model versus what I’ve experienced with a document database. I have a dozen years experience building systems with an RDBMS and I’ve written my fair share of stored procedures, triggers, DDL scripts and all the other detritus that goes with it.
  
  When I say that using a document database is much less work to change than an RDBMS, I’m thinking about the entire process from object model out to the UI and back to the DB. When you look at the bigger picture, that’s when an RDBMS starts to look like an increasingly bad idea.
  
  “If proper database design is done up front”
  
  To turn your “I don’t ever want to talk to a Republican again” quote back on you, saying that everything is going to be just fine if you design everything perfectly upfront (and I assume that you never have a phase 2) is a lot like saying that supply side economics will work this time if we just do it right. It is my very strong opinion from a great deal of experience that adopting techniques and tools that better fit continuous design lead to better results in the end and that using high ceremony tools or tools that are only optimized for the initial construction of an app have been consistently brittle and lead to higher project risk in my experience.
  
dotnetchris says:

October 29, 2012 at 3:43 pm

” My cheap solution with RavenDb is to swap out our low level RavenDb “persistor” in our IoC container with a testing implementation that just forces any reads to wait for all pending writes to finish first.”

I would strongly recommend against this approach. This makes your test system dramatically different from your real world system. Doing this is going to potentially open you up to consistency issues that don’t exist in your test environment.

What I would recommend as an alternative is in any part of the Arrange phase, you block at the end of that phase until your database is consistent. This should only be done if you have data you expect to be in the database all the time (like product information, locale information etc).

In the Act phase, you should never change your waiting usage between real world vs test.

You want to make sure your consistency model is exactly real world, otherwise you’re going to be chasing issues in production that you could’ve caught at development. Except with catching them in production you’ll likely need to do production data fixes, or even potentially just lost sales etc from unexpected system behavior that results in an unhandled exception.

Pingback: Initial thoughts on some new-fangled things part 2 « The Shade Tree Developer
DylanSmith says:

November 7, 2012 at 12:00 am

Your comment about being in an elimination pool sounded so awesome I arranged one with some of my blogger friends (well “wannabe blogger” friends, we probably post on average once every 3 months each). Not sure if this is the same thing you’re in, but we all tossed some money in a pool, and if we don’t blog once every 2 weeks we’re eliminated. AWESOME IDEA! Thanks for the inspiration.

sutikshan says:

November 15, 2012 at 2:22 am

Thanks jeremy. Will wait for subsequent posts. Onky grey area for us in document DBs are, how easily we can perform joins. How easily we can change/migrate our underlying document db product solution.

1. jeremydmiller says:
  
  November 15, 2012 at 1:23 pm
  
  Joins aren’t going to be as necessary with a document database. Generally, you store related information in the same document and retrieve it all together. You can also consider graph databases like Neo4J if that fits your domain model better.