EDIT on 2/12/2016: This is almost a 3 year old post, but still gets quite a few reads. For an update, I’m part of a project called Marten that is seeking to use Postgresql as a document database that we intend to use as a replacement for RavenDb in our architecture. While I’m still a fan of most of the RavenDb development experience, the reliability, performance, and resource utilization in production has been lacking. At this point, I would not recommend adopting RavenDb for new projects.
I’m mostly finished with a fairly complicated project that used RavenDb and all is not quite well. All too frequently in the past month I’ve had to answer the question “was it a mistake to use RavenDb?” and the more Jeremy’s ego-bruising “should we scrap RavenDb and rebuild this on a different architecture?” Long story short, we made it work and I think we’ve got an architecture that can allow us to scale later, but the past month was miserable and RavenDb and our usage of RavenDb was the main culprit.
Some Context
Our system is a problem resolution system for an automated data exchange between our company and our clients. The data exchange has long suffered from data quality issues and hence, we were tasked with building an online system to ameliorate the current manual heavy process for resolving the data issues. We communicate with the upstream system by receiving and sending flat files dropped into a folder (boo!). The files can be very large, and the shape of the data is conceptually different than how our application displays and processes events in our system. As part of processing the data we receive we have to do a fuzzy comparison to the existing data for each logical document because we don’t have any correlation identifier from the upstream system (this was obviously a severe flaw in the process, but I don’t have much control over this issue). The challenge for us with RavenDb was that we would have to process large bursts of data that involved both heavy reads and writes.
On the read side to support the web UI, the data was very hierarchical and using a document database was a huge advantage in my opinion.
First, some Good Stuff
- RavenDb has to be the easiest persistence strategy in all of software development to get up and running on day one. Granted that you’ll have to change settings for production later, but you can spin up a new project using RavenDb as an embedded database and start writing an application with persistence in nothing flat. I’ve told some of my ex-.Net/now Rails friends that I think I can spin up a FubuMVC app that uses RavenDb for persistence faster than they can with Rails and ActiveRecord. The combination of a document database and static typed document classes is dramatically lower friction in my opinion than using static typed domain entities with NHibernate or EF as well.
- I love, love, love being able to dump and rebuild a clean database from scratch in automated testing scenarios
- I’m still very high on document database’s, especially in the read side of an application. RavenDb might have fallen down for us in terms of write’s, but there were several places where storing a hierarchical document is just so much easier than dealing with relational database joins across multiple tables
- No DB migrations necessary
- Being able to drop down to Lucene queries helped us considerably in the UI
- I like the paging support in RavenDb
- RavenDb’s ability to batch up read’s was a big advantage when we were optimizing our application. I really like the lazy request feature and the IDocumentSession.Load(array of id’s) functions.
Memory Utilization
We had several memory usage problems that we ultimately attributed to RavenDb and its out of the box settings. In the first case, we had to turn off all of the 2nd level caching because it never seemed to release objects, or at least not before our application fell over from OutOfMemoryExceptions. In our case, the 2nd level cache would not have provided much value anyway except for a handful of little entities, so we just turned it off across the board. I think I would recommend that you only use caching with a whitelist of documents.
Also be aware that the implementations of IDocumentSession seem to be very much optimized for short transactions with limited activity at any one time. Unfortunately we were almost a batch driven system and our logical transactions became quite large and potentially involved a lot of reads against contextual information. After examining our application with a memory profiler, we determined that IDocumentSession was hanging on to the data we only read. We solved that issue by explicitly calling Evict() to remove objects from an IDocumentSession’s cache.
Don’t Abstract RavenDb Too Much
To be blunt, I really don’t agree with many of Ayende’s opinions about software development, but in regards to abstractions for RavenDb you have to play by his rules. We have a fubu project named FubuPersistence that adds common persistence capabilities like multi-tenancy and soft deletes on top of RavenDb in an easy to use way. That’s great and all, but we had to throw a lot of that goodness away because you so frequently have to get down to the metal with RavenDb to either tighten up performance or avoid stale data. We were able to happily spin up a database on the fly for testing scenarios, so you might look to do that more often than trying to swap out RavenDb for mocks, stubs, or 100% in memory repositories. Those tests are still slower than what you’d get with mocks or stubs, but you don’t have any choice when you start having to muck with RavenDb’s low level API’s.
Bulk Inserts
I think RavenDb is weak in terms of dealing with large batches of updates or inserts. We tried using the BulkInsert functionality, and while it was a definite improvement in performance, we found it to be buggy and probably just immature (it is a recent feature). We first hit problems with map/reduce operations not finishing after processing a batch. We updated to a later version of RavenDb (2330), then had to retreat back to our original version (2230) with problems using Windows authentication in combination with the BulkInsert feature. We saw the same issues with the edge version of RavenDb as well. We also noticed that BulkInsert did not seem to honor the batch size settings and had several QA bugs under load because of this. We eventually solved the BulkInsert problems by sending batches of 200 documents for processing through our service bus and putting retry semantics around the BulkInsert to get around occasional hiccups.
The Eventual Consistency Thing
If you’re not familiar with Eventual Consistency and its implications, you shouldn’t even dream of putting a system based on RavenDb into production. The key with RavenDb is that query/command separation is pretty well built in. Writes are transactional, and reads by the document id will always give you the latest information, but other queries execute against indexes that are built in background threads as a result of writes. What this means to you is a chance of receiving stale results from queries against anything but a document id. There’s a real set of rationale behind this decision, but it’s still a major complication in your life with RavenDb.
With our lack of correlation identifiers from upstream, we were forced to issue a lot of queries against “natural key” data and we frequently ran into trouble with stale indexes in certain circumstances. Depending on circumstances, we fixed or prevented these issues by:
- Introducing a static index instead of relying on dynamic indexes. I think I’d push you to try to use a static index wherever possible.
- Judiciously using the WaitForNonStaleResults****** methods. Be careful with this one though, because it can have negative repercussions as well
- In a few cases we introduced an in-memory cache for certain documents. You *might* be able to utilize the 2nd level cache instead
- In another case or two, we switched from using surrogate keys to using natural keys because you always get the latest results when loading by the document id. User and login documents are the examples of this that I remember offhand.
The stale index problem is far more common in automated testing scenarios, so don’t panic when it happens.
Conclusion
I’m still very high on RavenDb’s future potential, but there’s a significant learning curve you need to be aware of. The most important thing to know about RavenDb in my opinion is that you can’t just use it, you’re going to have to spend some energy and time learning how it works and what some of the knobs and levers are because it doesn’t just work. On one hand, RavenDb has several features and capabilities that an RDBMS doesn’t and you’ll want to exploit those abilities. On the other hand, I do not believe that you can get away with using RavenDb with all of its default settings on a project with larger data sets.
Honestly, I think the single biggest problem on this project was in not doing the heavy load testing earlier instead of the last moment, but everybody involved with the project has already hung their heads in shame over that one and vowed to never do that again. Doing something challenging and doing something challenging right up against a deadline are too very different things. It is my opinion that while we did struggle with RavenDb that we would have had at least some struggle to optimize the performance if we’d built with an RDBMS and the user interface would have been much more challenging.
Knowing what I know now, I think it’s 50/50 that I would use RavenDb for a similar project again. If they get their story fixed for bigger transactions though, I’m all in.
