Would I use RavenDb again?

EDIT on 2/12/2016: This is almost a 3 year old post, but still gets quite a few reads. For an update, I’m part of a project called Marten that is seeking to use Postgresql as a document database that we intend to use as a replacement for RavenDb in our architecture. While I’m still a fan of most of the RavenDb development experience, the reliability, performance, and resource utilization in production has been lacking. At this point, I would not recommend adopting RavenDb for new projects.

I’m mostly finished with a fairly complicated project that used RavenDb and all is not quite well.  All too frequently in the past month I’ve had to answer the question “was it a mistake to use RavenDb?” and the more Jeremy’s ego-bruising “should we scrap RavenDb and rebuild this on a different architecture?”  Long story short, we made it work and I think we’ve got an architecture that can allow us to scale later, but the past month was miserable and RavenDb and our usage of RavenDb was the main culprit.

Some Context

Our system is a problem resolution system for an automated data exchange between our company and our clients.  The data exchange has long suffered from data quality issues and hence, we were tasked with building an online system to ameliorate the current manual heavy process for resolving the data issues.  We communicate with the upstream system by receiving and sending flat files dropped into a folder (boo!).  The files can be very large, and the shape of the data is conceptually different than how our application displays and processes events in our system.  As part of processing the data we receive we have to do a fuzzy comparison to the existing data for each logical document because we don’t have any correlation identifier from the upstream system (this was obviously a severe flaw in the process, but I don’t have much control over this issue).  The challenge for us with RavenDb was that we would have to process large bursts of data that involved both heavy reads and writes.

On the read side to support the web UI, the data was very hierarchical and using a document database was a huge advantage in my opinion.

First, some Good Stuff

  • RavenDb has to be the easiest persistence strategy in all of software development to get up and running on day one.  Granted that you’ll have to change settings for production later, but you can spin up a new project using RavenDb as an embedded database and start writing an application with persistence in nothing flat.  I’ve told some of my ex-.Net/now Rails friends that I think I can spin up a FubuMVC app that uses RavenDb for persistence faster than they can with Rails and ActiveRecord.  The combination of a document database and static typed document classes is dramatically lower friction in my opinion than using static typed domain entities with NHibernate or EF as well.
  • I love, love, love being able to dump and rebuild a clean database from scratch in automated testing scenarios
  • I’m still very high on document database’s, especially in the read side of an application.  RavenDb might have fallen down for us in terms of write’s, but there were several places where storing a hierarchical document is just so much easier than dealing with relational database joins across multiple tables
  • No DB migrations necessary
  • Being able to drop down to Lucene queries helped us considerably in the UI
  • I like the paging support in RavenDb
  • RavenDb’s ability to batch up read’s was a big advantage when we were optimizing our application.  I really like the lazy request feature and the IDocumentSession.Load(array of id’s) functions.

Memory Utilization

We had several memory usage problems that we ultimately attributed to RavenDb and its out of the box settings.  In the first case, we had to turn off all of the 2nd level caching because it never seemed to release objects, or at least not before our application fell over from OutOfMemoryExceptions.  In our case, the 2nd level cache would not have provided much value anyway except for a handful of little entities, so we just turned it off across the board.  I think I would recommend that you only use caching with a whitelist of documents.

Also be aware that the implementations of IDocumentSession seem to be very much optimized for short transactions with limited activity at any one time.  Unfortunately we were almost a batch driven system and our logical transactions became quite large and potentially involved a lot of reads against contextual information.  After examining our application with a memory profiler, we determined that IDocumentSession was hanging on to the data we only read.  We solved that issue by explicitly calling Evict() to remove objects from an IDocumentSession’s cache.

Don’t Abstract RavenDb Too Much

To be blunt, I really don’t agree with many of Ayende’s opinions about software development, but in regards to abstractions for RavenDb you have to play by his rules.  We have a fubu project named FubuPersistence that adds common persistence capabilities like multi-tenancy and soft deletes on top of RavenDb in an easy to use way.  That’s great and all, but we had to throw a lot of that goodness away because you so frequently have to get down to the metal with RavenDb to either tighten up performance or avoid stale data.  We were able to happily spin up a database on the fly for testing scenarios, so you might look to do that more often than trying to swap out RavenDb for mocks, stubs, or 100% in memory repositories.  Those tests are still slower than what you’d get with mocks or stubs, but you don’t have any choice when you start having to muck with RavenDb’s low level API’s.

Bulk Inserts

I think RavenDb is weak in terms of dealing with large batches of updates or inserts.  We tried using the BulkInsert functionality, and while it was a definite improvement in performance, we found it to be buggy and probably just immature (it is a recent feature).  We first hit problems with map/reduce operations not finishing after processing a batch.  We updated to a later version of RavenDb (2330), then had to retreat back to our original version (2230) with problems using Windows authentication in combination with the BulkInsert feature.  We saw the same issues with the edge version of RavenDb as well.  We also noticed that BulkInsert did not seem to honor the batch size settings and had several QA bugs under load because of this.  We eventually solved the BulkInsert problems by sending batches of 200 documents for processing through our service bus and putting retry semantics around the BulkInsert to get around occasional hiccups.

The Eventual Consistency Thing

If you’re not familiar with Eventual Consistency and its implications, you shouldn’t even dream of putting a system based on RavenDb into production.  The key with RavenDb is that query/command separation is pretty well built in.  Writes are transactional, and reads by the document id will always give you the latest information, but other queries execute against indexes that are built in background threads as a result of writes.  What this means to you is a chance of receiving stale results from queries against anything but a document id.  There’s a real set of rationale behind this decision, but it’s still a major complication in your life with RavenDb.

With our lack of correlation identifiers from upstream, we were forced to issue a lot of queries against “natural key” data and we frequently ran into trouble with stale indexes in certain circumstances.  Depending on circumstances, we fixed or prevented these issues by:

  • Introducing a static index instead of relying on dynamic indexes.  I think I’d push you to try to use a static index wherever possible.
  • Judiciously using the WaitForNonStaleResults****** methods.  Be careful with this one though, because it can have negative repercussions as well
  • In a few cases we introduced an in-memory cache for certain documents.  You *might* be able to utilize the 2nd level cache instead
  • In another case or two, we switched from using surrogate keys to using natural keys because you always get the latest results when loading by the document id.  User and login documents are the examples of this that I remember offhand.

The stale index problem is far more common in automated testing scenarios, so don’t panic when it happens.

Conclusion

I’m still very high on RavenDb’s future potential, but there’s a significant learning curve you need to be aware of.  The most important thing to know about RavenDb in my opinion is that you can’t just use it, you’re going to have to spend some energy and time learning how it works and what some of the knobs and levers are because it doesn’t just work.  On one hand, RavenDb has several features and capabilities that an RDBMS doesn’t and you’ll want to exploit those abilities.  On the other hand, I do not believe that you can get away with using RavenDb with all of its default settings on a project with larger data sets.

Honestly, I think the single biggest problem on this project was in not doing the heavy load testing earlier instead of the last moment, but everybody involved with the project has already hung their heads in shame over that one and vowed to never do that again.  Doing something challenging and doing something challenging right up against a deadline are too very different things.  It is my opinion that while we did struggle with RavenDb that we would have had at least some struggle to optimize the performance if we’d built with an RDBMS and the user interface would have been much more challenging.

Knowing what I know now, I think it’s 50/50 that I would use RavenDb for a similar project again.  If they get their story fixed for bigger transactions though, I’m all in.

33 thoughts on “Would I use RavenDb again?

  1. Thanks for this post-mortem. I’ve been deliberating RavenDB for a while now, but I’m nervous about a few things. On a slightly off-topic, as to one of your statements, what are your recommendations for relaying to the user that a save is pending? That is my single biggest struggle with eventual consistency. I get that I can have a date/time that shows the user how old the data they are viewing is, but how can I let them know that something new/changed is pending? Is this something you ever worry about?

    Thanks,

    -Matt

    1. @Matt,

      RavenDb is transactional. The stale index only comes into play if you query by something other than the document id shortly after making the initial save. If you’re mostly saving and loading by the document id, you’re good to go. I don’t think our application is going to be typical in any way.

  2. Well, I haven’t used RavenDB since before it got switched (or rather licensed as) AGPL, but as for the IDocumentSession aspect, I’m pretty sure that it’s very similar to NHibernate’s ISession in which it is supposed to be used for very small and short intervals.

    I think it’s designed to just be alive for long enough to either load the objects or save the objects, then disappear.

    1. @Darren,

      We use the IDocumentSession scoped at the request and/or transaction level. I used NHibernate for years and I’m well aware of how to do the scoping just fine. The reality was that the client caching is just too aggressive out of the box.

      1. Gotcha, sounded like you were using it longer. So the objects don’t disappear when session is disposed even when not using 2nd level cache? Or are you saying 2nd level cache is just too slow with lots of objects?

  3. Hi Jeremy,

    Thanks for sharing this, it’s so hard to get frank and honest information from people who have actually used a technology on a real project that had real features, performance needs and deadlines. None of the things you reveal are shocking but it is good to know about them in advance.
    I’m just dipping my toes into RavenDb on a small project and so far all the pain points have been around us not fully understanding how it works. That said the documentation is good and the community that surrounds the product are helpful and well informed. It’s clear it is here to stay we just have to learn these lessons around when it is or is not the right tool for the job.

    Thanks,

    Charlie

  4. ‘Knowing what I know now, I think it’s 50/50 that I would use RavenDb for a similar project again.’ – what would you use, another document DB or go back to a relational DB

    1. I don’t want to go back to an RDBMS unless there’s a lot of ad hoc reporting requirements. Maybe I’d use an RDBMS if I got into batch processing system again.

      1. What about MongoDB? Would that have made a difference? I’m currently evaluating MongoDB vs RavenDB…

      2. At the time I started looking hard at document databases there was a slew of “Mongo lost my data!” posts flying around and I thought, and still think, that RavenDb has a better .Net client story.

        It’s definitely something to consider. My boss is wanting to experiment with using PostgresSql as a hybrid document database as well.

  5. One thing i have to put out there, having RavenDB applications in production for almost 3 years now, RavenDB has never once ever lost any of my data.

    When you said your application reached OOM, was the RavenDB server and application server on the same machine? Was there other services on the machine consuming resources, like a sql server or other database etc?

    1. We saw OOM exceptions both with the Raven server on a different box and running everything on the same box. I think it was just from exceeding the stack size limits, but I can’t prove that for sure. I saw the problems both on our under powered QA server and my very powered up MBP w/ max RAM.

      So no, I don’t think we would have fixed our issues by just throwing hardware at the problem.

      In the end, it wasn’t that bad to get out of the memory problems. It’s just that you have to take steps to prevent memory problems that’s going to get other people down the line.

      1. If you’re providing a dedicated server to Raven that’s not sharing resources you really shouldn’t be able to get the server to OOM. If you were able to do that, and you weren’t doing things like increasing the paging limits to bring back huge result sets, did you talk to Ayende about it?

    1. We chimed in a little bit into the discussion list for Raven on the bugs we hit with BulkInsert. We just got a little bit of hand waiving that they were known issues and fixed in later versions — but they weren’t and the later versions had different bugs.

      He can say whatever he wants, but I think it just comes down to RavenDb being a little bit immature, especially around larger dataset’s.

  6. A very interesting write up. I’ve had RavenDB (v1.0) out on about 3-4K desktop deployments for about a year now and it’s been a really smooth and easy time. As you mention, the best part for me is the reduced friction for the developer.

    Have to say that I’ve never seen any OOM or data loss at all, despite deployment on a really wide variety of hardware – some of it barely suitable to run the OS! I’d be interested in hearing some stats about the size of your documents and the number of documents you were using.

    My experience of the support from Hibernating Rhinos and the Google group has always been very good indeed and I’m surprised by your comment above that your OOM issue was dismissed by a “little bit of hand waving”. I wonder if you followed that up with a support call?

  7. We had a similar experience with memory issues and NHibernate’s ISession. In fact, for operations that had a lot of entities, we wound up having a rule that reads could only load DTOs, and writes would only load entities by Id.

    But, NHibernate never cached things loaded with projections or raw SQL queries. I’d turn that stuff of pretty quick too.

  8. @jeremydmiller I have two RavenDB production applications and you kind of hit on some of the same issues I ran into to. One of the applications has over 0.5 million records in the database at a time with high write frequency (news application). The thing I found painful was the Map/Reduce performance, but it has since improved (but can always be better). I also bump up against the memory limit, but I am fine with that as long as it doesn’t go to 99%. I am more 75/25 percent on RavenDB. I would recommend it to anybody, but realize that it doesn’t do adhoc querying very well.

    The team is helpful but it the same thoughts ran through my mind when shit hit the fan. Glad I’m not alone

  9. We have been using raven for over a year now and I concur with most of what you say.

    The highs in raven are proper highs – using documents is so much more natural than rdbms, ability to prototype things is fantastic, integration testing is trivial, ease of deployment, ease of replication/sharding setup, etc

    The lows are quite low – we spent weeks on trying to figure out ‘out of memory’ issues in raven during replication, this stuff just doesn’t happen in mature products. We’ve dealt with a few ‘edge case’ issues where those edges weren’t edges to us. Quite a bit of documentation doesn’t (didn’t?) exist and you have to figure stuff out on your own. It can be frustrating having to read through the source to see how/what is happening. Overall there is quite a bit of ‘immature’ smell to the product but at the speed it’s going, what do you expect?

    As the product matures the frustrations are going away – replication memory issues get solved, documentation gets slowly updated and raven becomes a better product.

    Forums are great most of the time and you can get good answers directly from the team.

    If I was to do it all over again I would still pick RavenDB over rdbms. It’s just a much better fit for some problems and once you do have a reasonable level of expertise in Raven things become even more natural. Just don’t expect it to be as polished as a product that’s been around for decades. You might get into the dark corners and you might need to sit there and figure stuff out but in the end I believe it’s worth it.

  10. Rather strange choice to have – RDBMS versus RavenDb. You could have used Mongo, it is more mature, have better documentation broad range or drivers. RavenDb is attractive to anyone who used NH before due to its API similarity. All people from NoSQL world, would this be Couchbase or Raven or whatever keep talking about eventual consistency every time they speak, I am actually surprised why this could possibly could be a shock for anyone who decides to develop their application with NoSQL db. Indexes being late with updates are known even in Lotus Notes (where Couchbase and generally most NoSQL comes from) for decades.

    1. Not strange at all, RavenDb is transactional and integrates much more cleanly with .Net than MongoDb. Several of the things we like most about using RavenDb aren’t possible with MongoDb. The real reason is that at the time there were way too many stories going around the interwebs about MongoDb losing data.

  11. For some one who has used RavenDB for the last three years in production, and for the last one year with over 10+ million records, as the primary write store, on a distributed architecture with over 20+ raven nodes, I have rarely come across out of memory exceptions, and all most all my problems have to do with RavenDB+DTC when its under high read and write load. And its getting a better product by the hour. I would definitely use it and recommend it over any other document database in a .NET project.
    There are always scenarios when its not the best fit, but for the majority of time, using a RDBMS over a document database simply because the former is ‘mature’ is a moot argument when you consider time to deployment and time spent writing plumbing and mapping code versus business features.

  12. Hello,

    What about 5 years later now and their brand new modern release 4 ?
    Is it mature now ?
    Or maybe couchbase is a good .Net candidate?

Leave a comment