The code shown in this post is in flight and I’m just writing this post to try to get more feedback and suggestions on the approach we’re going so far before doing anything silly like making an official release.
The Marten community has been working toward a 2.0 release some time in the next couple months (hopefully in June for my own peace of mind). Since it is a full point release, we can entertain breaking API changes and major restructuring of the code. The big ticket items have been improving performance, reducing memory usage inside of Marten, a yet-to-be-completely-defined overhaul of the event store. The biggest change by far in terms of development time is the introduction of multi-tenancy support within Marten.
From Wikipedia:
The term “software multitenancy” refers to a software architecture in which a single instance of software runs on a server and serves multiple tenants. A tenant is a group of users who share a common access with specific privileges to the software instance.
The gist of multi-tenancy is that you are able to store and retrieve data tied to a tenant (client/customer/etc.), preferably in a way that prevents one tenant’s users from seeing or editing data from other tenants — and yes, I have indeed seen systems that screwed up on this in harmful ways.
To make this a little more concrete, here’s a sample:
[Fact] public void use_multiple_tenants() { // Set up a basic DocumentStore with multi-tenancy // via a tenant_id column var store = DocumentStore.For(_ => { // This sets up the DocumentStore to be multi-tenanted // by a tenantid column _.Connection(ConnectionSource.ConnectionString) .MultiTenanted(); }); // Write some User documents to tenant "tenant1" using (var session = store.OpenSession("tenant1")) { session.Store(new User{UserName = "Bill"}); session.Store(new User{UserName = "Lindsey"}); session.SaveChanges(); } // Write some User documents to tenant "tenant2" using (var session = store.OpenSession("tenant2")) { session.Store(new User { UserName = "Jill" }); session.Store(new User { UserName = "Frank" }); session.SaveChanges(); } // When you query for data from the "tenant1" tenant, // you only get data for that tenant using (var query = store.QuerySession("tenant1")) { query.Query<User>() .Select(x => x.UserName) .ToList() .ShouldHaveTheSameElementsAs("Bill", "Lindsey"); } using (var query = store.QuerySession("tenant2")) { query.Query<User>() .Select(x => x.UserName) .ToList() .ShouldHaveTheSameElementsAs("Jill", "Frank"); } }
There are three basic possibilities for multi-tenancy that we are considering or building:
- Separate database per tenant — For maximum separation of different client’s data, you can opt to store the information in separate databases with the same schema structure, with the obvious downside being more complicated deployments and quite possibly requiring more hosting infrastructure. At runtime, when you tell Marten what the tenant is, and behind the scenes it will look up the database connection information for that tenant and possibly create a missing tenant database on the fly in development modes. We don’t quite have this scenario supported yet, but we’ve done a lot of preparatory work in Marten’s internals to enable this mechanism to work without having to blow up application memory by duplicating objects underneath the DocumentStore objects for each tenant.
- Separate schema per tenant — Using a separate schema in the same database for each tenant might be a great compromise between data separation and server utilization. Unfortunately, some Marten internals are making this one harder than it should be. Today, you can opt to stick different document types into different schemas. My theory is that if we could eliminate that feature, we could drastically simplify this scenario.
- Multi-tenancy in a single table with a tenant id — The third possibility is to store all tenant data in the same tables, but use a new “tenant_id” column to distinguish between tenants. Marten needs to be smart enough to quietly filter all queries based on the current tenant and to always write documents to the current tenant id. Likewise, Marten has been changed so that you cannot modify data from any other tenant than the current tenant for a session. Most of the work to support this option is already done and I expect this to be the most commonly used approach.
Right now, we’re very close to fully supporting #3, and not too far away from #1 either. I have a theory that we could support a kind of hybrid of #1 and either #2 or #3 that could be the basis for sharding Marten databases.
We *could* also do multi-tenancy by having separate tables per tenant in the same schema, but that’s way more work inside of Marten internals and I just flat out don’t want to do that.
So, um, what do you think? What would you use or change?
I’m in the midst of making a SaaS app multi-tenancy. (I know, how could it not be, right? But the original team just figured they’d keep spinning up new copies.) I think it depends so much on the app you’re dealing with, but more specifically, I think the disaster recovery plan depends more too. We could probably figure out scenario 1 and 2, but I don’t want to push schema updates times x customers, let alone figure out backing up all of those, even in the cloud. 3 works for us, but not without the pain caused by all kinds of poor factoring.
The weird things you never expect to run into…
Yeah, the schema updates on #1 and #2 are the reason I went for #3;) We will have some tooling to distribute schema changes for #1 & #2, but it’s always going to be shaky developing that outside of a real project.
Have you considered goose?
Neat!
With option 3 is there the possibility of marking some tables as non-tenanted? Thinking maybe some mostly static lookups that occasionally change such as government supplied lookup data, etc
Huh, when I was first drawing this out, I was going to go for the idea that you’d explicitly or conventionally mark document types as multi-tenanted or not. I ended up saying that I’d start with an all or nothing approach until someone gave me a reason why it needed to be doc by doc type.
With that in mind, I’ve added an issue to do just what you’re describing: https://github.com/JasperFx/marten/issues/769
Glad to offer a reason 🙂
I had in mind a lot of common lookup data (health: ICD10 codes, MIMS medications, etc govt: postal stuff) that could be stored this way. If you’re a service provider you update it once for all tenants and, importantly, avoid the duplicated data storage.
Just to throw this out there what if this SaaS data wasn’t always fully seperated. Perhaps you have a very large client that has 4 divisions and users from each division can’t see each other but they want to roll this data up. Or the project is government based where the hierarchy tree can be 4-5 levels deep. Each piece of data still lives at a certain level within the hierarchy but a user at the root can see all of it and roll it all up. You could specify a tenant id of 4.15.30 where you get access to any data at levels 4.15.30.*
That may or may not be in your scope but it seems that in a lot of SaaS that I see the entity that is collecting the data is often one of a 100 that belong to the same parent organization/company and they often want to roll data up or manage at various levels. Just an idea.
Gotcha. How about this: https://github.com/JasperFx/marten/issues/768. Maybe have some kind of “tenant registry” data that could do the rollup of child to parent. And definitely add something to the Linq support like:
“`
session.Query().Where(x.Tenant().StartsWith(“company id”);
“`
For multiple databases option, the store is still tied via connection string, which therefore limits you to 1 database server and 1 username / password.
How hard would it be to have a tenant registry and a list of tenants mapped to connection strings?
When offering a SaaS etc, there’s always going to be customers who want their data completely segregated from others or for security/performance reasons you want to put them on a separate server or use a separate username / password.
Is that going to be super hard? I haven’t looked into the connection pooling on the store
@James,
My thought has been that the “multiple databases” option would involve looking up potentially separate connection strings. I *think* there’ll be some kind of pluggable
option for looking up the connection string and possibly provisioning new databases per tenant at development time.
– Jeremy