By no means is this post a comprehensive examination of every possible type of persistence tool, software system, or even team role. It’s just my opinions and experiences — and even though I’ve been a software developer far longer than many folks, there are lots of types of systems and architectures I’ve never gotten a chance to work on. I’m also primarily a developer, so I’m definitely not representing the DBA viewpoint.
I had an interesting exchange on twitter last week when an ex-Austinite I hadn’t seen in years asked me if I still used NHibernate. At the same time, there’s some consternation at my work about our usage of RavenDb as a document database and some question about how we might replace that later. Because I happen to be working on potentially using Postgresql as a complete replacement for RavenDb and a possible replacement for some of our older Sql Server-based event sourcing tooling, I thought it would be helpful to go over what I think about “persistence” tools these days.
To sum up my feelings on the subject:
- I think NoSQL document databases can make development much more productive over old-fashioned relational database usage — especially when your data is hierarchical
- I’m completely done with heavyweight ORM’s like NHibernate or Entity Framework and I would recommend against the very concept at this point. I think the effort to directly persist a rich domain model to a relational database (or really just any database at all) was a failure.
- I’m mostly indifferent to the so-called “micro-” ORM’s. I think they’re definitely an easy way to query and manipulate relational databases from middle tier code, but the usage I’ve seen at work makes me think they’re just a quick way to very tightly couple your application code to the details of the database schema.
- I think that Event Sourcing inside a CQRS architecture can be very effective where that style fits, but it’s a mess when it’s used where not really appropriate.
- I have mostly given up on the old Extreme Programming idea that you could happily build most of the application without having to worry about any kind of database until near the end of the project. If database performance is any kind of project risk, you’ve got to deal with that early. If your database choice is going to have an impact on your application code, then that too has to be dealt with earlier. If you can model your application as consuming JSON feeds, maybe you can get away with delaying the database.
- EDIT: A couple folks have either asked about or recommended just using raw ADO.net code. My feeling on that subject hasn’t changed in years, if you’re writing raw ADO.Net code, you’re probably stealing from your employer.
When I judge whether or not a persistence tool is a good fit for how I prefer to work, I’m thinking about these low level first causes:
- Is the tool ACID compliant, or will it at least manage not to lose important data because that tends to make our business folks angry. I’m just too old and conservative to screw around with any of the database tools that don’t support ACID.
- Will the tool have a negative impact on our ability to evolve the structure of the application? Is it cheap for me to make incremental changes to the system state? I strongly believe that tools and technologies that don’t allow for easy system evolution make software development efforts fragile by forcing you to be right in your upfront designs.
- What’s the impact going to be on automated test efforts? For all of its problems for us, RavenDb has to be the undisputed champion of testability because of how absurdly easy it is to establish and tear down system state between tests for reliable automated tests.
- How much or little mismatch is there between the shape of my data that my business logic, user interface, or API handlers going to need versus how the database technology needs to store it because that old impedance mismatch issue can suck down a lot of developer time if you choose poorly.
Why I prefer Document Db’s over Relational Db’s for Most Development
Even though RavenDb isn’t necessarily working out for us, I still believe that document databases make sense and can lead to better productivity than relational databases. Doing a side by side comparison on some of my “first causes” above:
- Evolutionary software design. Changing your usage of a relational database can easily involve changes to the DDL, database migrations, and possibly ORM mappings (the old, dreaded “wormhole anti-pattern” problem). I think this an area where “schemaless” document databases are a vast improvement for developer productivity because I only have to change my document type in code and go. It’s vastly less work when there’s only one model to change.
- Testability. I think it’s more mechanical work to set up and tear down system state for automated tests with relational databases versus a document database. Relational integrity is a great thing when you need it, but it adds some extra work to test setup just to make the database shut up and let me insert my data. My clear experience from having automated testing against both types of database engines are that it’s much simpler working with document databases.
- The Impedance Mismatch Challenge. Again, this is an area where I much prefer document databases when I generally want to store and retrieve hierarchical data. I also prefer document databases where data collections may have a great deal of polymorphism.
When would I still opt for a Relational Database?
Besides the overwhelming inertia of relational databases (everybody knows RDBMS tools and there are seemingly an infinite number of management and reporting tools to support RDBMS’s), there are still some places where I would still opt for a relational database:
- Reporting applications. Not that it’s impossible in other kinds of databases, but there’s so many decent existing solutions for reporting against RDBMS’s.
- If I were still a consultant, an RDBMS is a perfectly acceptable choice for conservative clients
- Applications that will require a lot of adhoc queries. Much of my early career was trying to make sense of large engineering and construction databases that frequently went off the rails.
- Batch jobs, not that I really wanna ever build systems like that again
- Systems with a lot of flat, two dimensional data
The Pond Scum Shared Database Anti-Pattern
If you’ve worked around me long enough, you would surely hear me use the phrase “sharing a database is like drug abusers sharing needles.” I’ve frequently bumped into what I call the “pond scum anti-pattern” where an enterprise has one giant shared database with lots of little applications floating around it that modify and read pretty much the same set of database tables. It’s common, but so awfully harmful.
The indirect coupling between applications is especially pernicious because it’s probably not very obvious how any giving change to the database will impact all the little applications that float around it. My strong preference is for application databases rather than the giant shared database. That might very well lead to some duplication or worse, some inconsistency in data across applications, but we can’t solve everything in one already too long blog post;)
And to prove that this topic of the “shared database” problem is a long, never-ending problem, here’s a blog post from me on the same subject from 2005.
- MongoDb? I know some people like it and I’ve had some feedback on Marten that we should be patterning its usage on MongoDb rather than mostly on RavenDb. I’ve just seen too many stories about MongoDb losing data or having inadequate transactional integrity support.
- Graph databases like Neo4J? I think they sound very interesting and there’s a project or two I’ve done that I thought might have benefited from using a graph database, but I’ve never used one. Someday.
- Rail’s ActiveRecord? Even though I never made the jump to Ruby like so many other of my ALT.Net friends from a decade ago did, there was a time when I thought Ruby on Rails was the coolest thing ever. That day has clearly passed. I’m really not wild about any persistence that forces you to lock your application code to the shape of the database.
- CSLA is apparently still around. To say the least, I’m not a fan. Too much harmful coupling between business logic and infrastructure, poor for evolutionary design in my opinion.