Building an EventStore with User Defined Projections on top of Postgresql and Node.js

EDIT 3/28/2016: Since this blog post still gets plenty of reads, here’s an update. The work described here never got used in a real project (#sadtrombone), but fear not, the projection support is going to live again in a new project called Marten that seeks to turn Postgresql into a very functional Document Db and Event Store with projections.

I did an internal talk on the tooling and concepts in this post at our Salt Lake City office a couple months ago. The recording, for what it’s worth, is here. I’m assuming that you’re at least somewhat familiar with the concepts of Event Sourcing and CQRS, but if you’re not, there are links to descriptive explanations of these concepts in the body of the post.

For most of the 2000’s my goto strategy for application persistence was to use some sort of object relational mapping to persist and read the object structures that I wanted to work with in my code. Sometimes I used hand rolled code to do the mapping, and other times my teams used NHibernate. In the past couple years I’ve been on projects that used the RavenDb document database with mixed success. I’ve also worked on a couple codebases that used an event sourcing strategy to persist meaningful business events, sometimes with RavenDb as the underlying storage engine and another project that uses an older version of NEventStore with Sql Server as the storage mechanism.

For various reasons, we’ve chosen to use a Node.js based stack to rewrite an old WPF application that is a suitable candidate for event sourcing on the backend (Corey Kaylor explained his take on this decision in a blog post). Since we already wanted to replace Sql Server (and probably RavenDb) with Postgresql in the long run, at Corey’s suggestion I have been working on and off to try leveraging  to create a new event store suitable for Node.js development that supports user-defined projections. Lacking all originality, I’m calling this new library “pg-events.”  You can find pg-events hosted under my GitHub account (my very first foray back into OSS post-FubuMVC).

 

Feature Set

  • Support the basic event sourcing pattern by appending the raw business events as JSON to the event store
  • Track events by a “stream” of related events that probably relates directly to some kind of business concept or workflow
  • Support user-defined projections of the raw event data to create “read side” views for clients
  • Support aggregated views of a stream (really just another projection). Use a basic snapshotting strategy of the aggregate state for efficiency
  • Build time tooling to initialize a postgresql database with the custom schema objects and import javascript libraries to postgresql
  • A crude, partial implementation of CommonJS that runs within postgresql

 

Conceptual Architecture

The first thing to know is that we’re making a very large bet on the portability of Javascript code and the ability to run at least a subset of this new event store code hosted in Postgresql, Node.js, embedded in other programming, or even potentially in a browser. The user-defined projections could potentially be executed in any of the pieces below, and we think that flexibility will pay off down the road for both performance and scalability tuning.

So far, I think the end state is going to consist of these four pieces:

 

pg-events

  1. Postgresql Database
    1. Custom schema objects largely based on Greg Young’s Building an Event Store paper to store events and stream metadata.
    2. Tables to persist projected views. Most projection views will be persisted to separate tables instead of one giant “pge_views” table for better query performance
    3. Stored procedures to update and query data in the event storage tables, mostly using postgresql’s Javascript support
    4. Executes and updates aggregate snapshots and synchronous projections. More on that in the following section.
  2. A Node.js Client
    1. Exposes methods to append events to the store
    2. Exposes methods to query for projected views, aggregate snapshots, and raw event stream data
    3. See https://github.com/jeremydmiller/pg-events/wiki/Client for more information
  3. Admin CLI Tool
    1. Build the necessary schema objects into a Postgresql
    2. Loads the user defined projections and other pg-events libraries into the database
    3. Can reset the event storage and projected view tables to an empty state for testing
    4. Eventually, this tool will also support “recapitulation” to rebuild projection data from the raw events when the definition of a projection changes
  4. Background Projection Runner
    1. Executes and updates projected views in a background process. This is my very next coding adventure. I’m going to build it out first with Node.js, then try my hand at implementing it again with a standalone Golang executable that uses an embedded V8 engine to execute the projections. Expect my twitter feed to be entertaining when I’m able to start that work. I’ll blog about this later when I know what it’s going to actually look like;)

 

 

User Defined Projections 

We looked at EventStore at first and I definitely liked their first class support for user defined projections. Our implementation of projections is very obviously influenced by EventStore’s.

I think that the event sourcing efforts I’ve been a part of have been successful overall, but “projecting” the raw event stream into a persisted read side or view model has been challenging. For pg-events, we’re expressing the projections with simple transformation functions that will take in the initial state and the raw event data and simply return the new state (it’s a logical fold left operation for the projections that work across multiple events).

For a sample event sourcing domain, I’ve been using the idea of a quest from the way too many fantasy books I’ve read over my lifetime. During a quest, our heroes might record events like “QuestStarted”, “MembersJoined”, “MembersDeparted”, or “TownReached.” To know or understand the exact composition of a quest party at any time, we need to replay some of the change events (Gandalf stayed behind to fight the Balrog, Boromir was killed, Frodo and Sam ran off, Gollum joined up, etc.) for the quest.

Say we write a projection for a new view across the events in a single quest called “Party” just to understand the membership. From the unit tests, that projection looks like:

require("../../lib/projections")
	.projectStream({
		name: 'Party',
		stream: 'Quest', 
		mode: 'sync',

		$init: function(){
			return {
				active: true,
				traveled: 0,
				location: null,
				members: []
			}
		},

		QuestStarted: function(state, evt){
			state.active = true;
			state.location = evt.location;
			state.members = evt.members.slice(0);

			state.members.sort();
		},

		TownReached: function(state, evt){
			state.location = evt.location;
			state.traveled += evt.traveled;
		},

		EndOfDay: function(state, evt){
			state.traveled += evt.traveled;
		},

		QuestEnded: function(state, evt){
			state.active = false;
			state.location = evt.location;
		},

		MembersJoined: function(state, evt){
			state.members = state.members.concat(evt.members);
			state.members.sort();
		},

		MembersDeparted: function(state, evt){
			state.location = evt.location;

			for (var i = 0; i < evt.members.length; i++){
				var index = state.members.indexOf(evt.members[i]);
				state.members.splice(index, 1);
			}

			state.members.sort();
		}


	});

You’ll notice that there’s a field called “mode” with a value of “sync.” Using the portability of Javascript, we’re planning for these modes:

  1. sync – A ‘sync’ projection will be executed synchronously inside postgresql within the same transaction as the event capture
  2. async – In progress. An ‘async’ projection will be calculated in a background process instead of at event capture time (Eventual Consistency).
  3. live – Forthcoming. These projections will only be calculated upon demand. I’m not yet sure if we’ll do the actual projection transformations within the database or the Node.js client. I guess we could allow two different “live” modes if there’s value in doing that.

So, eventual consistency killing you in your current event sourcing efforts because you hit errors by querying off of stale data? Opt for synchronous projections. Have lots of writes, but relatively few reads? Use asynchronous or even live projections that are only calculated on demand. Have lots of reads but very few writes? I think I would again opt for synchronous projections.

I worked on a system a couple years ago in a failed startup that ran projections in a browser to do historical point in time simulations. I don’t see any reason why we couldn’t do something similar in pg-events if that is ever valuable.

Believe it or not, I have a decent start on documenting the projection support at https://github.com/jeremydmiller/pg-events/wiki/Projections.

 

Why Postgresql? 

Postgresql is the people pleaser of database engines. Want all the normal RDBMS capabilities? Would you be more productive using Postgresql as a document database? Want to write stored procedures with a language that closely resembles Oracle’s PL/SQL (no, a thousand times no, never again)?  Would you even want to use Javascript inside the database itself? Regardless of how you answered any of those questions, Postgresql is trying really hard to be what you want. In our case, I like that we can use postgresql as an event store, a document database for things that don’t fit into the event sourcing model, and a classic RDBMS if that’s what we want in some circumstances.

Mostly though, we like that Postgresql has a proven track record and we suspect that the DevOps support tools will be more effective than we’ve experienced with other OSS database tools.

Of course, the only reason why pg-events is viable in the first place is that postgresql has outstanding JSON support and the ability to author stored procedures with Javascript using Google’s V8 engine. With our project timeline, it’s also safe to assume that Postgresql 9.4 with its significant improvements to the JSON storage will be available before we go live to production.

 

Why CQRS isn’t crazy

I’ll feely admit that the first time I saw Greg Young talking about the Command Query Responsibility Segregation (CQRS) style of architecture in 2008 I thought it was nuts. Specifically, I was afraid that doing the transformations between the “write side” model and the “read side” model consumed by the clients would lead to far too much repetitive “left hand, right hand” code. The reality is, of course, is that I was already doing a lot of work to map database tables to object graphs, transforming domain model objects to DTO’s to send over the wire, and crafting database views to transform our raw data into something more conducive to reporting requirements. In a way, CQRS just explicitly calls out a large part of software development efforts that is often overlooked. If we simply accept the idea that different consumers and producers of the persisted state in our system naturally have different needs as far as how the same information is written, structured, and consumed, CQRS isn’t really “crazy talk” or extra work. One of the biggest differences is that with event sourcing + CQRS you probably try to pre-build and persist the read side views instead of trying to create views or DTO’s on the fly from the “one true database model.”

 

Some thoughts on Relational Databases

I’m very much in the camp that says that the database is strictly for persistence and your business logic and/or user interface should never be tightly coupled to whatever the database is, so the idea of just consuming the raw database tabular data in business logic code is a non-starter for me — not to mention that a flat database table structure is very rarely the exact structure that you’d want in your business logic code outside of CRUD-centric applications. I’ve been a part of technical arguments with database-centric folks for so long that I’m simply happy to say “agree to disagree” on these issues and let us all go on our way.

There’s a tremendous amount of inertia and investment in tooling in our industry in regards to the usage of relational databases as the de facto standard for just about all persistence needs. Additionally, most developers, testers, and even the business people seem to naturally understand the relational database model. Even so, as alternative models like document or graph databases build up more tooling, acceptance, and developer familiarity, I think that relational databases will eventually be consigned to reporting applications or pure CRUD applications (but even then I prefer document databases).

That being said, I think that the future really is “polyglot persistence” and that our children are going to laugh at us in decades to come when we explain how we built systems against relational databases.

Advertisements

14 thoughts on “Building an EventStore with User Defined Projections on top of Postgresql and Node.js

  1. Pingback: Dew Drop – October 23, 2014 (#1883) | Morning Dew

  2. Dave Falkner

    I don’t know how I feel about Postgresql being the “people pleaser of database engines.” (I’ll have to check it out.) I really think that relational databases are universally intended to make people miserable.

    Reply
  3. Brian Doahue

    I saw on Twitter you said you chose to build this not because you dislike EventStore (Greg’s) but more because you liked Postgresql. Are the main reasons because you can do transactional readmodel/projection updates and query with SQL or NoSQL as needed? Curious what other advantages your implementation might have vs. EventStore. I’ve spent a decent amount of time with EventStore now, but have not had much opportunity to work with Postgresql due to my client being married to SQL Server til death do they part.

    Reply
    1. jeremydmiller Post author

      @Brian,

      Part of this is some scar tissue over Raven and the troubles we had with devops things like synchronization. Think of it this way, by building an event store on top of the postgres engine we get:

      * A proven storage engine
      * The entire ecosystem of postgresql for backups, synchronization, monitoring tools, docker support
      * A much, much bigger community.
      * Since we were already going to use postgres for non-ES things, this gives us a simpler architecture
      * We definitely want the option where we can do the event capture, selected projections, and who knows what else in a single transaction boundary.

      – Jeremy

      Reply
  4. Joe Sadowski

    Interesting. I’m wondering how you would handle aggregation across streams. For example, something like building a list of the quests each member has been on.

    Reply
    1. jeremydmiller Post author

      @Joe,

      Anything that aggregates across streams is gonna have to be done asynchronously so we can order them correctly w/o getting screwed by concurrent writes. I’m still working through the details (and other work on existing systems keeps getting in my way), but one way or another the events that need to be run through asynchronous projections will get logged in a “rolling buffer” table, then we’ll use postgresql’s listen/notify capability to trigger the external projection service that there’s work to do. We’ll probably use a level of polling on top of the pub/sub just to pick up what gets missed as well.

      – Jeremy

      Reply
  5. Pingback: Thoughts from CodeMash 2015 | The Shade Tree Developer

  6. Paul Tiseo

    Curious to know how pg-events is fairing? I have read about CQRS and event stores for years now, and have a NodeJS project that could actually benefit from it. Love postgresql too. Seems like where I’d head normally, if I hadn’t stumbled on your project.

    Can we get an update? Maybe like an HGTV or VH1 “Where Are They Now”? 🙂

    Reply
    1. jeremydmiller Post author

      @Paul,

      Queue the sad trombone. I was building pg-events for a specific project at work and wouldn’t you know it, the scope on that project was cut way back and pg-events never got used on a real project. I pushed through quite a bit of work after this blog post to add the asynchronous projections, but it never got proven out in production. I don’t know what a possible timeline would be, but there’s a decent chance that I’ll wrap a .Net client around this so we can retrofit it to an existing system at work some time this year.

      The code’s wide open on Github for what that’s worth. Thanks for the interest,

      Jeremy

      Reply
  7. Pingback: Postgresql as a Document Db for .Net Development | The Shade Tree Developer

  8. Pingback: My Thoughts on Choosing and Using Persistence Tools | The Shade Tree Developer

  9. Pingback: Marten as an Event Store — Now and the Future Vision | The Shade Tree Developer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s