Picking up from my last post, let me wrap up my initial thoughts on Event Sourcing, Document Databases, and why I think it’s going to take a generation for all of this stuff to be mainstream.
My bias
If you don’t agree with the following bullet points then you’re also unlikely to agree with my feelings about document databases in particular — and that’s okay, but at least let’s understand where we’re both coming from.
- I very strongly believe in incremental and evolutionary approaches to software development and I naturally prefer tools that fit an evolutionary or continuous model of working rather than tools that are optimized for waterfall philosophies (passive code generation, most Microsoft tools before the last couple years).
- I despise repetitive code ceremony (ironic considering that most days I work with static typed C#, but still).
- I think in terms of objects with an increasing contribution from functional programming. When I’m designing software, I’m thinking about responsibilities, roles, and behavior rather than “get data from table 1, 2, and 3, then update table 4.”
- A database is nothing more than the persistence subsystem of an application in my world view. The model in my code is reality, the database is just a persistent reflection of current and historical state.
Where I’m coming from
I started as a “Shadow IT” developer writing little tactical solutions for myself with MS Access before moving on to being a “real” developer doing all my data access work with stored procedures and generally developing against ADO record sets in my VB code. At that point, database related work was the central technical activity. From there, I followed a typical path from doing quasi-OO with hand rolled mapping code to using an Object Relational Mapper to do all my persistence work and database work. Before adopting RavenDb, one of my previous teams heavily leveraged Fluent NHibernate and it’s conventions to almost completely generate our database schema from our classes and validation rules. At that point, database work was minor in terms of our total effort except for the occasional performance optimization firedrill — especially compared to previous projects in my career .
Even so, I wasn’t completely happy because there was still friction I didn’t care for:
- You frequently compromised the shape of your objects to be more easily persistable/mappable with NHibernate
- The bleeping “everything must be virtual” thing you have to remember to do just so Anders Heilsberg will allow NHibernate to create a working virtual proxy
- Having to constantly worry about whether or not you are doing lazy or eager fetching in any given scenario
On my previous and hopefully my current project, things got even better because…
Persistence is easier with a Document Database and Event Sourcing
I saw a lot of the so-called impedance mismatch problem while persisting objects to a relational database. Once you consider hierarchies, graphs of objects, polymorphic collections, custom value types, and whatnot, you find that your behavior object model becomes quite different from your relational database structure. If you’re using an ORM, you quickly learn that there’s a substantial cost to fetching an entire object graph if the ORM has to cross multiple tables to do its work. At that point you start getting into the guts of your ORM and learn how to control lazy or eager fetching of a collection or a reference in scenarios that perform poorly — and just so we’re very clear here, you cannot make a general rule about to always be lazy or eager.
The great thing with using a document database to me is that most of that paragraph above goes away. The json documents that I persist in RavenDb are basically the same shape as my object graph in my C# code. I’m sure there are exceptions, but what I saw was that the whole eager or lazy fetching problem pretty well goes away because it’s cheap to pull the entire object graph out of RavenDb at one time when it’s all stored together rather than spread around different tables. Take away the concerns about lazy loading, and I no longer need a virtual proxy and all the annoying “must be explicitly virtual” ceremony work.
Mapping gets much simpler when all you’re doing is serializing an object to json. We occasionally customized the way our objects were serialized, especially custom value types, but over all it was less effort than mapping with an ORM even with conventions and auto mapping. I think the big win was hitting cases where you need polymorphic collections. Using RavenDb we just told Json.Net to track the actual class type and boom, we could persist any new subtype of our “CaseEvent” class in a property of type “IList<CaseEvent>.” Since I’ve always thought that ORM’s and RDBMS’s in general handle polymorphism very badly, I think that’s a big win.
We do write what we call “persistence check” tests that just verify that an object and all the fields we care about can make the round trip from persistence to being loaded later from a different database session. That small effort has repeatedly saved our bacon, but I insisted on that work with NHibernate as well anyway.
If you aren’t building systems where your objects are flat, then maybe this section just doesn’t matter as much to you, but it certainly has been a big advantage for me.
Event Sourcing — Want your cake? Wanna eat it too?
The combination of event sourcing and RavenDb as a database has significantly reduced the tension between your object model and easy persistence. I’m not hardcore on the philosophy that says “setters are evil” or that an object should never be allowed to be put into an invalid shape where you can only change the state of an object by calling its public methods — but that is still a consideration for me in designing classes. The problem is that you constantly compromise — on incur extra friction — if you insist on directly persisting the classes that implement your business logic into your database with an ORM. Either you:
- Open up public setters and a default constructor on your class to make your ORM happier at the cost of potentially allowing more coding errors into your business logic
- Use fancier, and in my experience more error prone, techniques in your ORM to map non-default constructors or backing fields
If instead, you use Event Sourcing you can have this scenario instead:
- Persist the events as dumb data bags where there’s no downside in making it completely serializable
- Persist a “readside” view of the system state suitable for your UI to consume that’s again devoid of any behavior so it’s also a dumb data bag
- Put the actual business logic with all the validation you want in a separate object that governs the acceptance and processing of the events, but you don’t actually persist that object, just its events (I know there’s a little more to this for optimization, snapshots, etc. but I want to hit publish before dinner).
I don’t know that this is a big win for me, but in a system with very rich business logic, I think you’re going to like this side effect of event sourcing.
Um, referential integrity? Uniqueness? Data validity?
Some of you are going to read this and say “what about referential integrity or uniqueness?” You’ll have to implement your own code-based uniqueness validation instead of relying on the database — but you really needed to do that anyway for usability’s sake in your user interface. I don’t see the referential integrity as being that big of an issue because you’re really storing documents as hierarchical data anyway. Either way, even if code based validation causes you more work, I’d say that these downsides are far outweighed by the advantages.
When will NoSQL databases be mainstream?
When I was growing up as a software developer, most people understood software development through the paradigm of a relational database. You had the database, processes that pushed data between tables, and maybe a user interface that displayed the database table and captured updates to be processed to the database. Back then we would routinely get “requirements documents” explaining business goals completely in terms of which tables needed to be updated.
For a variety of reasons I’ve completely rejected this style of development, but many people haven’t. I wouldn’t be surprised if database centric coding is still the dominant style of development in the world. Honestly, I think that the relational database with procedural code paradigm is far easier for most people to understand compared to object oriented programming, functional programming, or anything even more esoteric.
The relational database paradigm has an absolutely dominant mindshare amongst developers and there’s an absurd amount of prior investment in tooling for RDBMS. Add all that together, add a huge dash of pure inertia, and I think you’ve got the biggest example of technical “who moved my cheese” that I’ve seen in my technical career.
Next time…
Just to get third week out of this theme, I’ll summarize how my team used event sourcing on my previous project and get a bit more code centric.
I am someone institutionally committed to the relational view of data (however stored) with no intention of abandoning that paradigm. I think the case for Document Database, CQRS and Event Sourcing is routinely overstated and the complexity+maintenance challenges swept under the rug. There is something of a cult around all of this.
No such problem with this author or this series. Thoughtful as always. I wish I could break away and try it seriously for myself. Not ready for financial ruin just yet 🙂
As a dev writing LOB apps in a team of 2, I can’t see there ever being enough incentive for the IT ops to support a DB platform other than SQL server. I’m interested in the alternatives out there though.
One thing I’m not sure about – are the raven DB document records the events in event sourcing, or are they stored somewhere else?
You don’t have to do it that way, but you could. I saved all the events for a given “schedule” document as part of the “schedule” document. In my particular case I was only going to care about the events as part of the aggregate schedule document, so it made perfect sense. Other people I’ve spoken to will just write the events off to the file system and persist the current state.
I think that’s a question better suited for a CQRS mailing list.
As to, “my IT won’t go for it,” maybe, maybe not. I do know that nothing would ever change if you don’t even raise the question.
I Understand everything you are saying, except that I think your focus is too narrow for my liking. In terms of building an application you are spot on. However your focus on building is NOT the same focus your users will have. In terms of data input/management your development and delivery world is much better, but what about reporting? Pulling an entire object graph from a document db for a count of objects with a filter will kill your performance. I’ve already had to address this at a client site where reporting performance was unacceptable.
You might want to research that just a little bit more because you don’t have to hydrate the documents just to get a count via a filter. A simple map/reduce query will get a count for you, and if you were to do reports against RavenDb for one, you have this: http://ayende.com/blog/4680/ravendb-working-with-the-query-statistics
I do think that if you’re getting into a lot of adhoc reporting that I’d lean toward using a read only RDBMS off to the side of my transactional document database, but the idea of using a separate reporting database is certainly nothing new.
I’m certainly no expert with document databases or CQRS or event sourcing. However, from every bit of literature I’ve read and videos I’ve watched, this is such a huge concern it is at the very heart of the CQRS concept.
As Jeremy said, write your transactional data to the document store (or whatever you prefer). These are what your Commands act on (the “C”). Then, have a separate data store for all queries/reports (the “Q”). The query/report data store is usually shaped exactly the way your UI or reports need. This is ideal for an RDB and is usually a flattened view of the data; quite often with some duplicate data as a result of flattening.
You can even have your entire UI read the data for display from the flattened query data store. This means your view models are read directly from that store. So long as any correlated aggregate root ID is read and passed through the UI when a command needs to update that data.
Wow… “routinely get “requirements documents” explaining business goals completely in terms of which tables needed to be updated”. That’s scary!
IT in large organizations definitely don’t like “new-fangled”, so that can be a real challenge.
What about maintenance? The longest period of time people develop on a piece of software is after its initial release.
What about data consumption from the database outside the scope of the application? With document databases, no schema is available, so what the data means is given by the application.
What about migration/versioning of the data? After the initial release, you can’t simply change your object model as you have to version the serialized objects in your DB. A new property added is simple. But a class-split, or property move due to refactoring is not.
You very much favor low startup costs of the app development. This is understandable, however it’s not a free ride till the end. The tab has to be picked up by someone. And in your situation, it’s the developer who takes over maintenance of your software, plus the team who has to work with your document database after your application is phased out and a new one (or an additional one) has to be build on top of the same DB.
@Frans,
I’m sure that this one ends up being just an agreement to disagree, but…
What about maintenance? Why are you assuming that a relational database is going to be easier to maintain? From my perspective, effective source control, maintaining data migrations, and so much more friction in test automation efforts leads me to believe that I’d much rather maintain a system written on a document database.
“What about data consumption from the database outside the scope of the application?”
No way, no how. Do the separate reporting database if you need to, but no way in hell will you ever get me to agree that it’s not a disaster in the making to share a transactional database between applications.
“What about migration/versioning of the data?”
I think you might be exaggerating the problems there. Several of the document db’s have versioning migrations built in, and in some cases you can even do it in a lazy way where a document is only converted when you have to. I’ve read several accounts now of companies that abandoned RDBMS in favor of json blobs or NoSql databases because of the downtime cost of migrating RDBMS schemas.
“And in your situation, it’s the developer who takes over maintenance of your software, plus the team who has to work with your document database after your application is phased out and a new one (or an additional one) has to be build on top of the same DB.”
I’m calling FUD on that one.
I hear this argument all the time. How will anyone maintain the software after you get hit by a bus? * goes back to his 80 table database diagram
@frans with event sourcing there is not a version of the domain object serialized to the store. Events are serialized and replayed. Versioning is actually simpler than with a RDBMS if you do it right. The only time you put a serialized version is if you are snapshotting which is not needed in most cases.
@ward I would love to discuss the issues you mention. Could you clarify further your thoughts on complexity and maintenance? Most people are actually finding benefits with maintenance from event sourcing. I have seen some problems with maintenance and document dbs though.
Both docdbs and event sourcing do share one thing particularly hard to deal with that relational models handle well. If I partition my data wrong (my boundaries are wrong).. It’s a bit of work to redo partitioning of documents or event streams… Is this the particular situation you are considering or was there some other scenario?
“Since I’ve always thought that ORM’s and RDBMS’s in general handle polymorphism very badly, I think that’s a big win.”
I agree, but how do you think it could be improved? Or do you think it is just the nature of the beast?
I don’t know Marcus. I think you could get more aggressive with conventions and auto-mapping to handle the ORM configuration for you, but it’s still gonna blow in terms of RDBMS storage. You either get a crazy number of joins and union statements behind the scenes or you have the kind of sparsely populated columns that used to make DBA’s angry.
Glad to see you return to blogging!
I’ve been studying and learning about everything you’ve covered in these last two posts and am slowly wrapping my head around it all. I too am frustrated with the journey from Access to SQL Server with embedded SQL to sprocs/CRUD and then ORMs. Nearly 20 years in the business and this stuff still isn’t solved.
I don’t see Event Sourcing and Document Databases as the silver bullet… but it sure seems that in a lot of cases its a better approach than everything I’ve tried previously. I do fear that just as I progressed from Access -> ORM, I’ll add CQRS/ES to my toolbox and still be looking for a better way. I guess that just the nature of the things.
Did you learn this stuff all on your own or have you been to any of Greg’s or Udi’s courses?
Dude, if everything was already solved would software be any fun?
I lose 50 IQ points any time I’m in a classroom setting, so no courses for me;)
Having changed to use RavenDB earlier this year, I agree completely that this makes the developer’s life easier. My experience: not just a little bit, but totally _night-and-day_ better in every way!
You really notice how much more fluidly you can work against RavenDB when you have to go back to work with an ORM – there is a significant amount of tedious coding to wade through when you use an ORM. It is just so much easier to evolve the software with RavenDB than with an ORM.
The migrations that I’ve done so far have been real easy too, and Raven has some awesome support for that. The interesting point is that there have been much less of them as the nature of the data structure definition is far more fluid.
I agree that it is probably going to take some time before tech like RavenDB will become more acceptable for ops. That said, there’s nothing that RavenDB won’t let the ops guys achieve and its backend is Esent, which has been in use for lots of apps for ages.
Glad to see you blogging again! I’m using RavenDB more or less since the very beginning, when Oren made it public. I’ve switched our application from Firebird+NHibernate to RavenDB and haven’t regreted it yet. Getting rid of all the ORM concerns made me so much more “agile” in adding new stuff to the application and refactoring core parts.
The reporting side of RavenDB is kinda limited compared to relational databases, but I overcame this by going somewhere into the direction of CQRS (but haven’t heard about this when I started with RavenDB). I’ve learned a lot about CQRS on the last two Developer Open Spaces in Leipzig and just started experimenting with “pure” CQRS. I’m Looking forward to you upcoming blog posts on this topic (and hope you’ll still find some time for StructureMap 3.0 :-).
The thing I like the most about this “new” approach is the mere simplicity of it; if one would be really masochistic, one could even implement a simple docdb using json files and elasticsearch.
I tend to think that the .Net world is slowly emerging into the third level of thinking:
1. Drag & Drop spaghetti code => the value of abstraction
2. Lasagna code with ORM’s, DI containers etc => the cost of abstraction
3. Only abstract where it hurts => ready to engineer
(The original quote is from Kent Beck: “first you learn the value of abstraction, then you learn the cost of abstraction, then you’re ready to engineer” – https://twitter.com/KentBeck/status/258316233068396544 ).
There are also a lot of parallels to the systems in the mythical man month: first, second and third system…
Let’s hope the community manages to transcend the second system effect!