By no means is this post a comprehensive examination of every possible type of persistence tool, software system, or even team role. It’s just my opinions and experiences — and even though I’ve been a software developer far longer than many folks, there are lots of types of systems and architectures I’ve never gotten a chance to work on. I’m also primarily a developer, so I’m definitely not representing the DBA viewpoint.
I had an interesting exchange on twitter last week when an ex-Austinite I hadn’t seen in years asked me if I still used NHibernate. At the same time, there’s some consternation at my work about our usage of RavenDb as a document database and some question about how we might replace that later. Because I happen to be working on potentially using Postgresql as a complete replacement for RavenDb and a possible replacement for some of our older Sql Server-based event sourcing tooling, I thought it would be helpful to go over what I think about “persistence” tools these days.
To sum up my feelings on the subject:
- I think NoSQL document databases can make development much more productive over old-fashioned relational database usage — especially when your data is hierarchical
- I’m completely done with heavyweight ORM’s like NHibernate or Entity Framework and I would recommend against the very concept at this point. I think the effort to directly persist a rich domain model to a relational database (or really just any database at all) was a failure.
- I’m mostly indifferent to the so-called “micro-” ORM’s. I think they’re definitely an easy way to query and manipulate relational databases from middle tier code, but the usage I’ve seen at work makes me think they’re just a quick way to very tightly couple your application code to the details of the database schema.
- I think that Event Sourcing inside a CQRS architecture can be very effective where that style fits, but it’s a mess when it’s used where not really appropriate.
- I have mostly given up on the old Extreme Programming idea that you could happily build most of the application without having to worry about any kind of database until near the end of the project. If database performance is any kind of project risk, you’ve got to deal with that early. If your database choice is going to have an impact on your application code, then that too has to be dealt with earlier. If you can model your application as consuming JSON feeds, maybe you can get away with delaying the database.
- EDIT: A couple folks have either asked about or recommended just using raw ADO.net code. My feeling on that subject hasn’t changed in years, if you’re writing raw ADO.Net code, you’re probably stealing from your employer.
When I judge whether or not a persistence tool is a good fit for how I prefer to work, I’m thinking about these low level first causes:
- Is the tool ACID compliant, or will it at least manage not to lose important data because that tends to make our business folks angry. I’m just too old and conservative to screw around with any of the database tools that don’t support ACID.
- Will the tool have a negative impact on our ability to evolve the structure of the application? Is it cheap for me to make incremental changes to the system state? I strongly believe that tools and technologies that don’t allow for easy system evolution make software development efforts fragile by forcing you to be right in your upfront designs.
- What’s the impact going to be on automated test efforts? For all of its problems for us, RavenDb has to be the undisputed champion of testability because of how absurdly easy it is to establish and tear down system state between tests for reliable automated tests.
- How much or little mismatch is there between the shape of my data that my business logic, user interface, or API handlers going to need versus how the database technology needs to store it because that old impedance mismatch issue can suck down a lot of developer time if you choose poorly.
Why I prefer Document Db’s over Relational Db’s for Most Development
Even though RavenDb isn’t necessarily working out for us, I still believe that document databases make sense and can lead to better productivity than relational databases. Doing a side by side comparison on some of my “first causes” above:
- Evolutionary software design. Changing your usage of a relational database can easily involve changes to the DDL, database migrations, and possibly ORM mappings (the old, dreaded “wormhole anti-pattern” problem). I think this an area where “schemaless” document databases are a vast improvement for developer productivity because I only have to change my document type in code and go. It’s vastly less work when there’s only one model to change.
- Testability. I think it’s more mechanical work to set up and tear down system state for automated tests with relational databases versus a document database. Relational integrity is a great thing when you need it, but it adds some extra work to test setup just to make the database shut up and let me insert my data. My clear experience from having automated testing against both types of database engines are that it’s much simpler working with document databases.
- The Impedance Mismatch Challenge. Again, this is an area where I much prefer document databases when I generally want to store and retrieve hierarchical data. I also prefer document databases where data collections may have a great deal of polymorphism.
When would I still opt for a Relational Database?
Besides the overwhelming inertia of relational databases (everybody knows RDBMS tools and there are seemingly an infinite number of management and reporting tools to support RDBMS’s), there are still some places where I would still opt for a relational database:
- Reporting applications. Not that it’s impossible in other kinds of databases, but there’s so many decent existing solutions for reporting against RDBMS’s.
- If I were still a consultant, an RDBMS is a perfectly acceptable choice for conservative clients
- Applications that will require a lot of adhoc queries. Much of my early career was trying to make sense of large engineering and construction databases that frequently went off the rails.
- Batch jobs, not that I really wanna ever build systems like that again
- Systems with a lot of flat, two dimensional data
The Pond Scum Shared Database Anti-Pattern
If you’ve worked around me long enough, you would surely hear me use the phrase “sharing a database is like drug abusers sharing needles.” I’ve frequently bumped into what I call the “pond scum anti-pattern” where an enterprise has one giant shared database with lots of little applications floating around it that modify and read pretty much the same set of database tables. It’s common, but so awfully harmful.
The indirect coupling between applications is especially pernicious because it’s probably not very obvious how any giving change to the database will impact all the little applications that float around it. My strong preference is for application databases rather than the giant shared database. That might very well lead to some duplication or worse, some inconsistency in data across applications, but we can’t solve everything in one already too long blog post;)
And to prove that this topic of the “shared database” problem is a long, never-ending problem, here’s a blog post from me on the same subject from 2005.
- MongoDb? I know some people like it and I’ve had some feedback on Marten that we should be patterning its usage on MongoDb rather than mostly on RavenDb. I’ve just seen too many stories about MongoDb losing data or having inadequate transactional integrity support.
- Graph databases like Neo4J? I think they sound very interesting and there’s a project or two I’ve done that I thought might have benefited from using a graph database, but I’ve never used one. Someday.
- Rail’s ActiveRecord? Even though I never made the jump to Ruby like so many other of my ALT.Net friends from a decade ago did, there was a time when I thought Ruby on Rails was the coolest thing ever. That day has clearly passed. I’m really not wild about any persistence that forces you to lock your application code to the shape of the database.
- CSLA is apparently still around. To say the least, I’m not a fan. Too much harmful coupling between business logic and infrastructure, poor for evolutionary design in my opinion.
10 thoughts on “My Thoughts on Choosing and Using Persistence Tools”
Just curious, what was your issue with RavenDB?
I often find myself struggling with the same persistence merry go round as you. However I have come to the conclusion that we are on the verge of a paradigm shift. We are all suffering today because of the compromises we have to make with the various tools at our disposal. I suspect this will change in the next few years. We are only just beginning to scratch the surface of what is possible in this “post relational” world, if indeed it turns out that way.
Let’s list some NoSql solutions out there: CouchDB, Couchbase, GetEventStore, Redis, Riak, Cassandra, RavenDB, MongoDB, RethinkDB, Datomic. In truth I could go on past the comment length of your blog listing all these and it begs the question “how many people use this stuff?”.
Then you decide to look at one – let’s say RethinkDB and you think “wow, they’ve done some work polishing this, maybe this is MongoDB done right”. Then you read “This is equivalent to SQL’s READ UNCOMMITTED isolation level” in the doc’s and think “oh crap, too bad, let’s move on”. I mean they chose uncommitted over stale – I just can’t wrap my head around that decision.
At least with databases such as GetEventStore everyone’s being honest – if you want grids of data you’re going to have to push it all into an RDMBS. It still means you have to do all of that yourself, but cool it works and there are some limited but good examples out there.
I think you’ve found the right approach for now though with Postgres. It’s JSONB support means you can have the best of both worlds and I wish you all the luck in the world with it. I just wish Microsoft would get on board with SQL Server so enterprise dev’s such as myself could have a go too!
ORM’s have always been and continue to be a cluster. I have successfully avoided using them now for a decade or so at the cost of writing manual ADO.NET code. It’s a cost I am willing to bear though. Data access is where you lose performance and not having control over it is criminal.
Anyway, rant over 🙂 Great article btw haha!
As someone also struggling with RavenDB and considering pulling it out of my stack, I’m well aware of the struggles it causes (and I’m also really hoping to use your Marten project in future). Following up on Frank’s question, I am more curious to know – in your opinion what would cause you to change your mind and keep Raven in your stack?