Picking up from my last post, let me wrap up my initial thoughts on Event Sourcing, Document Databases, and why I think it’s going to take a generation for all of this stuff to be mainstream.
If you don’t agree with the following bullet points then you’re also unlikely to agree with my feelings about document databases in particular — and that’s okay, but at least let’s understand where we’re both coming from.
- I very strongly believe in incremental and evolutionary approaches to software development and I naturally prefer tools that fit an evolutionary or continuous model of working rather than tools that are optimized for waterfall philosophies (passive code generation, most Microsoft tools before the last couple years).
- I despise repetitive code ceremony (ironic considering that most days I work with static typed C#, but still).
- I think in terms of objects with an increasing contribution from functional programming. When I’m designing software, I’m thinking about responsibilities, roles, and behavior rather than “get data from table 1, 2, and 3, then update table 4.”
- A database is nothing more than the persistence subsystem of an application in my world view. The model in my code is reality, the database is just a persistent reflection of current and historical state.
Where I’m coming from
I started as a “Shadow IT” developer writing little tactical solutions for myself with MS Access before moving on to being a “real” developer doing all my data access work with stored procedures and generally developing against ADO record sets in my VB code. At that point, database related work was the central technical activity. From there, I followed a typical path from doing quasi-OO with hand rolled mapping code to using an Object Relational Mapper to do all my persistence work and database work. Before adopting RavenDb, one of my previous teams heavily leveraged Fluent NHibernate and it’s conventions to almost completely generate our database schema from our classes and validation rules. At that point, database work was minor in terms of our total effort except for the occasional performance optimization firedrill — especially compared to previous projects in my career .
Even so, I wasn’t completely happy because there was still friction I didn’t care for:
- You frequently compromised the shape of your objects to be more easily persistable/mappable with NHibernate
- The bleeping “everything must be virtual” thing you have to remember to do just so Anders Heilsberg will allow NHibernate to create a working virtual proxy
- Having to constantly worry about whether or not you are doing lazy or eager fetching in any given scenario
On my previous and hopefully my current project, things got even better because…
Persistence is easier with a Document Database and Event Sourcing
I saw a lot of the so-called impedance mismatch problem while persisting objects to a relational database. Once you consider hierarchies, graphs of objects, polymorphic collections, custom value types, and whatnot, you find that your behavior object model becomes quite different from your relational database structure. If you’re using an ORM, you quickly learn that there’s a substantial cost to fetching an entire object graph if the ORM has to cross multiple tables to do its work. At that point you start getting into the guts of your ORM and learn how to control lazy or eager fetching of a collection or a reference in scenarios that perform poorly — and just so we’re very clear here, you cannot make a general rule about to always be lazy or eager.
The great thing with using a document database to me is that most of that paragraph above goes away. The json documents that I persist in RavenDb are basically the same shape as my object graph in my C# code. I’m sure there are exceptions, but what I saw was that the whole eager or lazy fetching problem pretty well goes away because it’s cheap to pull the entire object graph out of RavenDb at one time when it’s all stored together rather than spread around different tables. Take away the concerns about lazy loading, and I no longer need a virtual proxy and all the annoying “must be explicitly virtual” ceremony work.
Mapping gets much simpler when all you’re doing is serializing an object to json. We occasionally customized the way our objects were serialized, especially custom value types, but over all it was less effort than mapping with an ORM even with conventions and auto mapping. I think the big win was hitting cases where you need polymorphic collections. Using RavenDb we just told Json.Net to track the actual class type and boom, we could persist any new subtype of our “CaseEvent” class in a property of type “IList<CaseEvent>.” Since I’ve always thought that ORM’s and RDBMS’s in general handle polymorphism very badly, I think that’s a big win.
We do write what we call “persistence check” tests that just verify that an object and all the fields we care about can make the round trip from persistence to being loaded later from a different database session. That small effort has repeatedly saved our bacon, but I insisted on that work with NHibernate as well anyway.
If you aren’t building systems where your objects are flat, then maybe this section just doesn’t matter as much to you, but it certainly has been a big advantage for me.
Event Sourcing — Want your cake? Wanna eat it too?
The combination of event sourcing and RavenDb as a database has significantly reduced the tension between your object model and easy persistence. I’m not hardcore on the philosophy that says “setters are evil” or that an object should never be allowed to be put into an invalid shape where you can only change the state of an object by calling its public methods — but that is still a consideration for me in designing classes. The problem is that you constantly compromise — on incur extra friction — if you insist on directly persisting the classes that implement your business logic into your database with an ORM. Either you:
- Open up public setters and a default constructor on your class to make your ORM happier at the cost of potentially allowing more coding errors into your business logic
- Use fancier, and in my experience more error prone, techniques in your ORM to map non-default constructors or backing fields
If instead, you use Event Sourcing you can have this scenario instead:
- Persist the events as dumb data bags where there’s no downside in making it completely serializable
- Persist a “readside” view of the system state suitable for your UI to consume that’s again devoid of any behavior so it’s also a dumb data bag
- Put the actual business logic with all the validation you want in a separate object that governs the acceptance and processing of the events, but you don’t actually persist that object, just its events (I know there’s a little more to this for optimization, snapshots, etc. but I want to hit publish before dinner).
I don’t know that this is a big win for me, but in a system with very rich business logic, I think you’re going to like this side effect of event sourcing.
Um, referential integrity? Uniqueness? Data validity?
Some of you are going to read this and say “what about referential integrity or uniqueness?” You’ll have to implement your own code-based uniqueness validation instead of relying on the database — but you really needed to do that anyway for usability’s sake in your user interface. I don’t see the referential integrity as being that big of an issue because you’re really storing documents as hierarchical data anyway. Either way, even if code based validation causes you more work, I’d say that these downsides are far outweighed by the advantages.
When will NoSQL databases be mainstream?
When I was growing up as a software developer, most people understood software development through the paradigm of a relational database. You had the database, processes that pushed data between tables, and maybe a user interface that displayed the database table and captured updates to be processed to the database. Back then we would routinely get “requirements documents” explaining business goals completely in terms of which tables needed to be updated.
For a variety of reasons I’ve completely rejected this style of development, but many people haven’t. I wouldn’t be surprised if database centric coding is still the dominant style of development in the world. Honestly, I think that the relational database with procedural code paradigm is far easier for most people to understand compared to object oriented programming, functional programming, or anything even more esoteric.
The relational database paradigm has an absolutely dominant mindshare amongst developers and there’s an absurd amount of prior investment in tooling for RDBMS. Add all that together, add a huge dash of pure inertia, and I think you’ve got the biggest example of technical “who moved my cheese” that I’ve seen in my technical career.
Just to get third week out of this theme, I’ll summarize how my team used event sourcing on my previous project and get a bit more code centric.