Before I talk about the batch querying feature set in Marten, let’s take a little detour through a common approach to persistence in .Net architectures that commonly causes the exact problem that Marten’s batch querying seeks to solve.
I’ve been in several online debates lately about the wisdom or applicability of granular repository abstractions over inner persistence infrastructure like EF Core or Marten like this sample below:
That’s a pretty common approach, and I’m sure it’s working out for some people in at least simpler CRUD-centric applications. Unfortunately though, that reliance on fine-grained repositories also breaks down badly in more complicated systems where a single logical operation may need to span multiple entity types. Coincidentally, I have frequently seen this kind of fine grained abstraction directly lead to performance problems in the systems I’ve helped with after their original construction over the past 6-8 years.
For an example, let’s say that we have a message handler that will need to access and modify data from three different entity types in one logical transaction. Using the fine grained repository strategy, we’d have something like this:
public class SomeMessage
{
public Guid UserId { get; set; }
public Guid OrderId { get; set; }
public Guid AccountId { get; set; }
}
public class Handler
{
private readonly IUnitOfWork _unitOfWork;
private readonly IRepository<Account> _accounts;
private readonly IRepository<User> _users;
private readonly IRepository<Order> _orders;
public Handler(
IUnitOfWork unitOfWork,
IRepository<Account> accounts,
IRepository<User> users,
IRepository<Order> orders)
{
_unitOfWork = unitOfWork;
_accounts = accounts;
_users = users;
_orders = orders;
}
public async Task Handle(SomeMessage message)
{
// The potential performance problem is right here.
// Multiple round trips to the database
var user = await _users.Load(message.UserId);
var account = await _accounts.Load(message.AccountId);
var order = await _orders.Load(message.OrderId);
var otherOrders = await _orders.Query()
.Where(x => x.Amount > 100)
.ToListAsync();
// Carry out rules and whatnot
await _unitOfWork.Commit();
}
}
So here’s the problem with the code up above as I see it:
You’re having to inject separate dependencies for the matching repository type for each entity type, and that adds code ceremony and noise code.
The code is making repeated round trips to the database server every time it needs more data. This is a contrived example, and it’s only 4 trips, but in real systems this could easily be many more. To make this perfectly clear, one of the very most pernicious sources of slow code is chattiness (frequent network round trips) between the application layer and backing database.
Fortunately, Marten has a facility called batch querying that we can use to fetch multiple data queries at one time, and even start processing against the earlier results while the later results are still being read. To use that, we’ve got to ditch the “one size fits all, least common denominator” repository abstraction and use the raw Marten IDocumentSession service as shown in this version below:
public class MartenHandler
{
private readonly IDocumentSession _session;
public MartenHandler(IDocumentSession session)
{
_session = session;
}
public async Task Handle(SomeMessage message)
{
// Not gonna lie, this is more code than the first alternative
var batch = _session.CreateBatchQuery();
var userLookup = batch.Load<User>(message.UserId);
var accountLookup = batch.Load<Account>(message.AccountId);
var orderLookup = batch.Load<Order>(message.OrderId);
var otherOrdersLookup = batch.Query<Order>().Where(x => x.Amount > 100).ToList();
await batch.Execute();
// We can immediately start using the data from earlier
// queries in memory while the later queries are still processing
// in the background for a little bit of parallelization
var user = await userLookup;
var account = await accountLookup;
var order = await orderLookup;
var otherOrders = await otherOrdersLookup;
// Carry out rules and whatnot
// Commit any outstanding changes with Marten
await _session.SaveChangesAsync();
}
The code above creates a single, batched query for the four queries this handler needs, meaning that Marten is making a single database query for the four SELECT statements. As an improvement in the Marten V4 release, the results coming back from Postgresql are processed in a background Task, meaning that in the code above we can start working with the initial Account, User, and Order data while Marten is still building out the last Order results (remember that Marten has to deserialize JSON data to build out your documents and that can be non-trivial for large documents).
I think these are the takeaways for the before and after code here:
Network round trips are expensive and chattiness can be a performance bottleneck, but batch querying approaches like Marten’s can help a great deal.
Putting your persistence tooling behind least common denominator abstractions like the IRepository<T> approach shown above eliminate the ability to use advanced features of your actual persistence tooling. That’s a serious drawback as that disallows the usage of the exact features that allow you to create high performance solutions — and this isn’t specific to using Marten as your backing persistence tooling.
Writing highly performant code can easily mean writing more code as you saw above with the batch querying. The point there being to not automatically opt for the most highly performant approach if it’s unnecessary and more complex than a slower, but simpler approach. Premature optimization and all that.
I’m only showing a small fraction of what the batch query supports, so certainly checkout the documentation for more examples.
In my last post, My Thoughts on Code “Modernization” I tried to describe my company and more specifically my team’s thinking about our technical initiatives and end goals as we work to update the technology and architecture of our large systems. In this post, I’d like to continue that discussion, but this time focus on the conditions that hopefully promotes developer (and testers and other team members) “happiness” with their ongoing work.
When I presented to our development organization last week, I included this slide to start that conversation:
To be clear, I’m completely focused in this post on issues or factors where I think I have influence or control over the situation. That being said, I am technically a people manager now, so what I can at least do for the other folks in my team is to:
Be supportive and appreciative of their efforts
Ask them for their feedback or advice on our shared work and intended strategies so they know that they have a voice
Occasionally be a “shit umbrella” for them whenever necessary, or maybe more likely just try to help with disputes or tensions with folks outside of our team. I’m still finding my sea legs on being a manager, so we’ll see how that goes
Not hold them up for too long when they do need my approvals for various HR kind of things or when they ask me to review a pull request (speaking of which, I need to pull the trigger on this post soon and go do just that).
On to other things…
Employability. Really? Yes.
Developer retention is a real issue, especially in a problem space like ours where domain knowledge is vital to working inside the code. Not to oversimplify a complex subject, but it’s my firm belief that developers feel most secure and event content in their current job when they feel that they’re actively developing skills that are in demand in the job market. I strongly believe, and our development management seems to agree, that we will do better with developer retention if we could be using newer technology — and we’re not talking about radical changes in platform here.
On the flip side, I think we have some compelling, anecdotal evidence that developers who feel like they’ve been in a rut on the technical side of things are more likely to get happy feet and consider leaving.
So, long story short, moving to newer tools like, say, React.js (as opposed to existing screens using Knockout.js or jQuery heavy Razor Pages) or the latest versions of .Net partially with the goal of making our developers happier with their jobs is actually a defensible goal in my mind. Within reason of course. And if that means that I get the chance to swap in Marten + Postgresql as a replacement for our limited usage of MongoDb, that’s just a bonus:)
At a minimum, I definitely think we should at least try to rotate developers between the existing monolith and the newer, spiffier, lower friction services so that everybody gets a taste of better work.
I know what some of you are thinking here, “this is just resume-driven development and you should concentrate on delivering value to the business instead of playing with shiny object development toys.” That’s a defensible position, but as you read this let’s pretend that my shop isn’t trying to go guard rail to guard rail by eschewing boring tools in favor of far out, bleeding edge tooling just to make a few folks happy. We’re just trying to move some important functionality from being based on obsolescent tools to more current technology as an intermediate step into our company’s future.
Low Friction Organizational Structure
I’m only speaking for myself in this section. Many of my colleagues would agree with what I’m saying here, but I’ll take the sole blame for all of it.
Given a choice and the ultimate power to design the structure of a software development organization, I would build around the idea of multi-disciplinary, self-contained teams where each team has every single skillset necessary for them to ship what they’re working on completely by themselves. This means that I want front end developers, back end developers, testers, and DevOps (or just folks with DevOps skillsets regardless of their title) folks all in the same team and collaborating together closely. This obviates the need for many formal handoffs between teams, which I think is one of the biggest single sources of friction and inefficiency in software development.
By formal handoffs, I mean Waterfall-ish things like:
Having to fill out Jira tickets for some other team to make changes in your development or testing environments
Creating a design or specification document for another team
Testers being in a separate organization and schedule than the development team so that there’s potentially a lag between coding and testing
I’m of course all in on Agile Software Development and I’m also generally negative toward Waterfall processes of any sort. It’s not surprising then that I think that formal handoffs and intermediate documentation deliverables take time and energy that could be better spent on creating value instead. More importantly though, it makes teams less flexible and more brittle because they’re more dependent upon upfront planning. More than that, you’re often dependent on people who have no skin in the game for your projects.
Being forced to be more plan-oriented and less flexible in terms of scheduling or external resources means that a team is less able to learn and adapt as they work. Being less adaptable and less iterative makes it harder for teams to deliver quality work. Lastly, communication and collaboration is naturally going to be better within a team than it is between teams or even completely separate organizations.
At a bare minimum, I absolutely want developers (including front end, back end, database, and whatever type of developers a team needs) and testers in one single team working on the same schedule toward shared goals. Preferably I’d like to see us transition to a DevOps culture by at least breaking down some of the current walls between development groups, testing teams, and our operations team.
Lastly, to relate this back to the main theme of making an environment that’s better to work in, I think that increasing direct collaboration between various disciplines and minimizing the overhead of formal handoffs makes for more job satisfaction and less frustration.
“Time to Login Screen” Metric
Let’s say we’re onboarding a new developer or maybe one of our developers is moving to a different product. After they do a clean clone of that codebase onto their local development machine, how fast can they get to a point where they’re able to build the code, run the actual system locally, and execute all the tests in the codebase? That’s what a former colleague of mine like to call the “time to login screen metric.”
To reduce that friction of getting started, my thinking is to:
Lean heavily on using Docker containers to stand up required infrastructure like databases, Redis, monitoring tools, etc. that are necessary to run the system or tests. I think it’s very important for any kind of stateful tools to be isolated per developer on their own local machines. Running docker compose up -d is a whole lot faster than trying to follow installation instructions in a Wiki page.
Try to avoid depending on technologies that cannot be used locally. As an example, we already use Rabbit MQ for message queueing, which conveniently is also very easy to run locally with Docker. As we move our systems to cloud hosting, I’m opposed to switching to Azure Service Bus without some other compelling reason because it does not have any local development story.
It’s vital to have build scripts within the code repository that can effectively stand up any environment necessary to start working with the code. This includes any kind of database migration infrastructure and baseline test data setup. Everybody wants to have a good README file in a new codebase to help them get started, but I also believe that a good automated script that sets things up for you is awfully effective as documentation too.
It’s probably also going to be important to get to a point where the codebases are a little smaller so that there’s just less stuff to set up at any one time.
“Quick Twitch” Codebases
Almost a decade ago I wrote a post entitled When I’m most productive about the type of technical ecosystems in which I felt most productive that I think still holds up, but let me expound on that a little bit here.
Let’s start with how fast a new developer or a current developer switching into a new codebase can be up and working. Using the “time to login screen” metric I learned from a former colleague, a developer should be able to successfully build and run the system and tests locally for a codebase very shortly after a fresh clone of that codebase.
Today our big platforms are fairly described as monoliths, with us underway toward breaking up the monolithic systems to something closer to a microservice architecture. I think we’d like to get the codebases broken up into smaller codebases where a development team can completely understand the codebase that they’re currently working in. Moreover, I’d like it to be much more feasible to update the technical tools, libraries, and runtime dependencies of a single codebase than it is today with our monoliths.
As a first class goal of splitting up today’s monoliths, we want our developers to be able to do what I call “quick twitch” development:
Most development tasks are small enough that developers can quickly and continuously flow from small unit tests to completed code and on to the next task. This is possible in well-factored codebases, but not so much in codebases that require a great deal of programming ceremony or have poor structural factoring.
Feedback cycles on the code are quick. This generally means that compilation is fast, and that test suites are fast enough to be executed constantly without breaking a developer’s mental flow state.
Unit tests can cover small areas of the code while still providing value such that it’s rare that a developer needs to use a debugger to understand and solve problems. Seriously, having to use the debugger quite a bit is a drag on developer productivity and usually a sign that your automated testing strategy needs to incorporate more finer grained tests.
The key here is to enable developers to achieve a “flow state” in their daily work.
This is not what we want in our codebases, except substitute “running through the test suite” in place of “compiling:”
Next time…
In the third and final post in this series, I want to talk through our evolution from monoliths to microservices and/or smaller distributed monoliths with an emphasis on not doing anything stupid by going from guard rail to guard rail.
Some of this is going to be specific to a .Net ecosystem, but most of what I’m talking about here I think should be applicable to most development shops. This is more or less a companion white paper for a big internal presentation I did at work this week.
My team at work is tasked with a multi-year code and architecture modernization across our large technical platforms. To give just a little bit of context, it’s a familiar story. We have some very large, very old, complex monolithic systems in production using some technologies, frameworks, and libraries that in a perfect world we’d like to update or replace. Being that quite a bit of code was written before Test Driven Development was just a twinkle in Kent Beck’s eye, the automated test coverage on parts of the code isn’t what we’d like it to be.
With all that said, to any of my colleagues that read this, I’d say that we’re in much better shape quality and ecosystem wise than the average shop with old, continuously developed systems.
During a recent meeting right before Christmas, one of my colleagues had the temerity to ask “what’s the end goal of modernization and when can we say we’re done?” — which set off some furious thinking, conversations within the team, and finally a presentation to the rest of our development groups.
We came up with these three main goals for our modernization efforts:
Arrive at a point where we can practice Continuous Delivery (CD) within all our major product lines
Improved Developer (and Tester) Happiness
System Performance
Arguably, I’d say that being able to practice Continuous Delivery with a corresponding DevOps culture would help us achieve the other two goals, so I’m almost ready to declare that our main goal. Everything else that’s been on our “modernization agenda” is arguably just an intermediate step on the way to the goal of continuous delivery, or another goal that is at least partially unlocked by the advances we’ll have to make in order to get to continuous delivery.
Intermediate Steps
Speaking of the major intermediate or enabling steps we’ve identified, I took a shot at showing what we think are the major enabling steps for our future CD strategy in a diagram:
Upgrading to .Net vLatest
Upgrading from the full “classic” Windows-only version of .Net to the latest version of .Net and ASP.Net Core is taking up most of our hands on focus right now. There’s probably some performance gains to be had by merely updating to the latest .Net 5/6, but I see the big advantages to the latest .Net versions as being much more container friendly and allowing us flexibility on hosting options (Linux containers) compared to where we are now. I personally think that the recent generations of .Net and ASP.Net Core are far easier to work with in automated testing scenarios, and that should hopefully be a major enable of CD processes for us.
Most importantly of all, I’d like to get back to using a Mac for daily development work, so there’s that.
Improved Automated Testing
We’re fortunately starting from a decent base of test automation, but there’s plenty of opportunities to get better before we can support more frequent releases. (I’ve written quite a bit about automated testing here). Long story short, I think we have some opportunities to:
Get better at writing testable code for easier and more effective unit testing
Introduce a lot more integration testing in the middle zone of the stereotypical “test pyramid”
Cut back on expensive Selenium-based testing wherever possible in favor of some other form of more efficient test automation. See Jeremy’s Only Rule of Testing.
Since all of this is interrelated anyway, “testability” is absolutely one of the factors we’ll use to decide where service boundaries are as we try to slice our large monoliths into smaller, more focused services. If it’s not valuable to test a service by itself without including other services, then that service boundary is probably wrong.
Containerization
This comes up a lot at work, but I’d call this as mostly an enabler step toward deploying to cloud hosting and easier incremental deployment than we have today rather than any kind of end in itself, especially in areas where we need elastic scaling. I think being able to run our services in containers also going to be helpful for the occasional time when you need to test locally against multiple services or processes.
And yeah, we could try to do a lift and shift to move our big full .Net framework apps to virtual machines in the cloud or try out Windows containers, but previous analysis has suggested that that’s not viable for us. Plus nobody wants to do that.
Open Telemetry Tracing and Production Monitoring
This effort is fortunately well underway, but one of our intermediate goals is to apply effective Open Telemetry tracing through all our products, and I say that for these reasons:
It enables us to use a growing off the shelf ecosystem of visualization and metrics tooling
I think it’s an invaluable debugging tool, especially when you have asynchronous messaging or dependencies on external systems — and we’re only going to be increasing our reliance on messaging as we move more and more to micro-services
Open Telemetry is very handy in diagnosing performance or throughput problems by allowing you to “see” the context of what is happening within and across systems during a logical business operation.
To the last point, my key example of this was helping a team last year analyze some performance issues in their web services. An experienced developer will probably look through database logs to identify slow queries that might explain the poor performance as one of their first steps, but in this case that turned up no single query that was slow enough to explain the performance issues. Fortunately, I was able to diagnose the issue as an N+1 query issue by reading through the code, but let’s just say that I got lucky.
If we’d had open telemetry tracing between the web service calls and the database queries that each service invocation made, I think we would have been able to quickly see a relationship between slow web service calls and the sheer number of little database queries that the web service was making during the slow web service requests, which should have led the team to immediately suspect an N+1 problem.
As for production monitoring, we of course already do that but there’s some opportunity to be more responsive at least to performance issues detected by the monitoring rules. We’re working under the assumption that deploying more often and more incrementally means that we’ll also have to be better at detecting production issues. Not that you purposely try to let problems get through testing, but if we’re going to convince the greater company that it’s safe to deploy small changes in an automated fashion, we need to have ways to rapidly detect when new problems in production are introduced.
Again, the general theme is for us to be resilient and adaptive because problems are inevitable — but don’t let the fear of potential problems put us into an analysis paralysis spiral.
Cloud Hosting
I think that’s a major enabler of continuous delivery, with the real goal for us being more flexible in how our development, testing, and production environments are configured as we continue to break up the monolith codebases and change our current architecture. I’d also love for us to be able to flexibly spin up environments for testing on demand, and tear them down when they’re not needed without a lot of formal paperwork in the middle.
There might also be an argument for shifting to the cloud if we could reduce hosting and production support costs along the way, but I think there’s a lot of analysis left to do before we can make that claim to the folks in the high backed chairs.
System Performance
Good runtime performance and meeting our SLA agreements for such is absolutely vital for us as medical analytics company. I wrestled quite a bit with making this a first class goal of our “modernization” initiative and came down on the side of “yes, but…” My thinking here, with some agreement from other folks, is that system performance issues will be much easier to address when we’re backed by a continuous delivery backbone.
There’s something to be said for doing upfront architecture work to consider known performance risks before a single line of code is written, but the truth is that a great deal of the code is already written. Moreover, the performance issues and bottlenecks that pop up in production aren’t always where we would have expected them to be during upfront architecture efforts anyway.
Improving performance in a complicated system is generally going to require a lot of measurement and iteration. Knowing that, having the faster release cycle made safe by effective automated test coverage should help us react quicker to performance problems or take advantage of newer ideas to improve performance as we learn more about how our systems behave or gain some insights into client data sets. Likewise, we’ll have to improve our production monitoring and instrumentation to anyway to enable continuous delivery, and we’re hopeful that that will also help us more quickly identify and diagnose performance issues.
To phrase this a bit more bluntly, I believe that upfront design and architecture can be valuable and sometimes necessary, but consistent success in software development is more likely a result of feedback and adaptation over time than being dependent on getting everything right the first time.
Ending this post abruptly….
I’m tired, it’s late, and I’m going to play the trick of making this a blog series instead of one gigantic post that never gets finished. In following posts, I’d like to discuss my thoughts on:
Creating the circumstances for “Developer Happiness” with some thinking about what kind of organizational structure and technical ecosystem allows developers and testers to be maximally productive and at least have a chance to be happy within their roles
Some thinking around micro-services and micro-frontends as we try to break up the big ol’ monoliths with some focus on intermediate steps to get there
I trot out one of these posts at the beginning of each year, but this time around it’s “aspirations” instead of “plans” because a whole lot of stuff is gonna be a repeat from 2020 and 2021 and I’m not going to lose any sleep over what doesn’t get done in the New Year or not be open to brand new opportunities.
In 2022 I just want the chance to interact with other developers. I’ll be at ThatConference in Round Rock, TX in January May? speaking about Event Sourcing with Marten (my first in person conference since late 2019). Other than that, my only goal for the year (Covid-willing) is to maybe speak at a couple more in person conferences just to be able to interact with other developers in real space again.
My peak as a technical blogger was the late aughts, and I think I’m mostly good with not sweating any kind of attempt to regain that level of readership. I do plan to write material that I think would be useful for my shop, or just about what I’m doing in the OSS space when I feel like it.
Which brings me to the main part of this post, my involvement with the JasperFx (Marten, Lamar, etc). family of OSS projects (plus Storyteller) which takes up most of my extracurricular software related time. Just for an idea of the interdependencies, here’s the highlights of the JasperFx world:
.NET Transactional Document DB and Event Store on PostgreSQL
Marten took a big leap forward late in 2021 with the long running V4.0 release. I think that release might have been the single biggest, most complicated OSS release that I’ve ever been a part of — FubuMVC 1.0 notwithstanding. There’s also a 5.0-alpha release out that addresses .Net 6 support and the latest version of Npgsql.
Right now Marten is a victim of its own success, and our chat room is almost constantly hair on fire with activity, which directly led to some planned improvements for V5 (hopefully by the end of January?) in this discussion thread:
Multi-tenancy through a separate database per tenant (long planned, long delayed, finally happening now)
Some kind of ability to register and resolve services for more than one Marten database in a single application
And related to the previous two bullet points, improved database versioning and schema migrations that could accommodate there being more than one database within a single .Net codebase
Improve the “generate ahead” model to make it easier to adopt. Think faster cold start times for systems that use Marten
Beyond that, some of the things I’d like to maybe do with Marten this year are:
Investigate the usage of Postgresql table partitioning and database sharding as a way to increase scalability — especially with the event sourcing support
Projection snapshotting
In conjunction with Jasper, expand Marten’s asynchronous projection support to shard projection work across multiple running nodes, introduce some sort of optimized, no downtime projection rebuilds, and add some options for event streaming with Marten and Kafka or Pulsar
Try to build an efficient GraphQL adapter for Marten. And by efficient, I mean that you wouldn’t have to bounce through a Linq translation first and hopefully could opt into Marten’s JSON streaming wherever possible. This isn’t likely, but sounds kind of interesting to play with.
In a perfect, magic, unicorns and rainbows world, I’d love to see the Marten backlog in GitHub get under 50 items and stay there permanently. Commence laughing at me on that one:(
Jasper is a toolkit for common messaging scenarios between .Net applications with a robust in process command runner that can be used either with or without the messaging.
I started working on rebooting Jasper with a forthcoming V2 version late last year, and made quite a bit of progress before Marten got busy and .Net 6 being released necessitated other work. There’s a non-zero chance I will be using Jasper at work, which makes that a much more viable project. I’m currently in flight with:
Building Open Telemetry tracing directly into Jasper
Bi-directional compatibility with MassTransit applications (absolutely necessary to adopt this in my own shop).
Performance optimizations
.Net 6 support
Documentation overhaul
Kafka as a message transport option (Pulsar was surprisingly easy to add, and I’m hopeful that Kafka is similar)
And maybe, just maybe, I might extend Jasper’s somewhat unique middleware approach to web services utilizing the new ASP.Net Core Minimal API support. The idea there is to more or less create an improved version of the old FubuMVC idiom for building web services.
Lamar is a modern IoC container and the successor to StructureMap
I don’t have any real plans for Lamar in the new year, but there are some holes in the documentation, and a couple advanced features could sure use some additional examples. 2021 ended up being busy for Lamar though with:
Lamar v7 added support for IAsyncEnumerable (also finally), a small enhancement for the Minimal API feature in ASP.Net Core, and .Net 6 support
Add Robust Command Line Options to .Net Applications
Oakton did have a major v4/4.1 release to accommodate .Net 6 and ASP.Net Core Minimal API usage late in 2021, but I have yet to update the documentation. I would like to shift Oakton’s documentation website to VitePress first. The only plans I have for Oakton this year is to maybe see if there’d be a good way for Oakton to enable “buddy” command line tools to your application like the dotnet ef tool using the HostFactoryResolver class.
The bustling metropolis of Alba, MO
Alba is a wrapper around the ASP.Net Core TestServer for declarative, in process testing of ASP.Net Core web services. I don’t have any plans for Alba in the new year other than to respond to any issues or opportunities to smooth out usage from my shop’s usage of Alba.
Alba did get a couple major releases in 2021 though:
Solutions for creating robust, human readable acceptance tests for your .Net or CoreCLR system and a means to create “living” technical documentation.
Storyteller has been mothballed for years, and I was ready to abandon it last year, but…
We still use Storyteller for some big, long running integration style tests in both Marten and Jasper where I don’t think xUnit/NUnit is a good fit, and I think maybe I’d like to reboot Storyteller later this year. The “new” Storyteller (I’m playing with the idea of calling it “Bobcat” as it might be a different tool) would be quite a bit smaller and much more focused on enabling integration testing rather than trying to be a BDD tool.
Not sure what the approach might be, it could be:
“Just” write some extension helpers to xUnit or NUnit for more data intensive tests
“Just” write some extension helpers to SpecFlow
Rebuild the current Storyteller concept, but also support a Gherkin model
Something else altogether?
My goals if this happens is to have a tool for automated testing that maybe supports:
Much more data intensive tests
Better handles integration tests
Strong support for test parallelization and even test run sharding in CI
Could help write characterization tests with a record/replay kind of model against existing systems (I’d *love* to have this at work)
Has some kind of model that is easy to use within an IDE like Rider or VS, even if there is a separate UI like Storyteller does today
And I’d still like to rewrite a subset of the existing Storyteller UI as an excuse to refresh my front end technology skillset.
To be honest, I don’t feel like Storyteller has ever been much of a success, but it’s the OSS project of mine that I’ve most enjoyed working on and most frequently used myself.
Weasel
Weasel is a set of libraries for database schema migrations and ADO.Net helpers that we spun out of Marten during its V4 release. I’m not super excited about doing this, but Weasel is getting some sort of database migration support very soon. Weasel isn’t documented itself yet, so that’s the only major plan other than supporting whatever Marten and/or Jasper needs this year.
Baseline
Baseline is a grab bag of helpers and extension methods that dates back to the early FubuMVC project. I haven’t done much with Baseline in years, and it might be time to prune it a little bit as some of what Baseline does is now supported in the .Net framework itself. The file system helpers especially could be pruned down, but then also get asynchronous versions of what’s left.
StructureMap
I don’t think that I got a single StructureMap question last year and stopped following its Gitter room. There are still plenty of systems using StructureMap out there, but I think the mass migration to either Lamar or another DI container is well underway.
TL;DR: Marten’s compiled query feature makes using Linq queries significantly more efficient at runtime if you need to wring out just a little more performance in your Marten-backed application.
I was involved in a twitter conversation today that touched on the old Specification pattern of describing a reusable database query by an object (watch it, that word is overloaded in software development world and even refers to separate design patterns). I mentioned that Marten actually has an implementation of this pattern we call Compiled Queries.
Jumping right into a concrete example, let’s say that we’re building an issue tracking system because we hate Jira so much that we’d rather build one completely from scratch. At some point you’re going to want to query for all open issues currently assigned to a user. Assuming our new Marten-backed issue tracker has a document type called Issue, a compiled query class for that would look like this:
// ICompiledListQuery<T> is from Marten
public class OpenIssuesAssignedToUser: ICompiledListQuery<Issue>
{
public Expression<Func<IMartenQueryable<Issue>, IEnumerable<Issue>>> QueryIs()
{
return q => q
.Where(x => x.AssigneeId == UserId)
.Where(x => x.Status == "Open");
}
// This is an input parameter to the query
public Guid UserId { get; set; }
}
And now in usage, we’ll just spin up a new instance of the OpenIssuesAssignedToUser to query for the open issues for a given user id like this:
var store = DocumentStore.For(opts =>
{
opts.Connection("some connection string");
});
await using var session = store.QuerySession();
var issues = await session.QueryAsync(new OpenIssuesAssignedToUser
{
UserId = userId // passing in the query parameter to a known user id
});
// do whatever with the issues
Other than the weird method signature of the QueryIs() method, that class is pretty simple if you’re comfortable with Marten’s superset of Linq. Compiled queries can be valuable anywhere where the old Specification (query objects) pattern is useful, but here’s the cool part…
Compiled Queries are Faster
Linq has been an awesome addition to the .Net ecosystem, and it’s usually the very first thing I mention when someone asks me why they should consider .Net over Java or any other programming ecosystem. On the down side though, it’s complicated as hell, there’s some runtime overhead to generating and parsing Linq queries at runtime, and most .Net developers don’t actually understand how it works internally under the covers.
The best part of the compiled query feature in Marten is that on the first usage of a compiled query type, Marten memoizes its “query plan” for the represented Linq query so there’s significantly less overhead for subsequent usages of the same compiled query type within the same application instance.
To illustrate what’s happening when you issue a Linq query, consider the same logical query as above, but this time in inline Linq:
var issues = await session.Query<Issue>()
.Where(x => x.AssigneeId == userId)
.Where(x => x.Status == "Open")
.ToListAsync();
// do whatever with the issues
When the Query() code above is executed, Marten is:
Building an entire object model in memory using the .Net Expression model.
Linq itself never executes any of the code within Where() or Select() clauses, instead it parses and interprets that Expression object model with a series of internal Visitor types.
The result of visiting the Expression model is to build a corresponding, internal IQueryHandler object is created that “knows” how to build up the SQL for the query and then how to process the resulting rows returned by the database and then to coerce the raw data into the desired results (JSON deserialization, stash things in identity maps or dirty checking records, etc).
Executing the IQueryHandler, which in turn writes out the desired SQL query to the outgoing database command
Make the actual call to the underlying Postgresql database to return a data reader
Interpret the data reader and coerce the raw records into the desired results for the Linq query
Sounds kind of heavyweight when you list it all out. When we move the same query to a compiled query, we only have to incur the cost of parsing the Linq query Expression model once, and Marten “remembers” the exact SQL statement, how to map query inputs like OpenIssuesAssignedToUser.UserId to the right database command parameter, and even how to process the raw database results. Behind the scenes, Marten is generating and compiling a new class at runtime to execute the OpenIssuesAssignedToUser query like this (I reformatted the generated source code just a little bit here):
using System.Collections.Generic;
using Marten.Internal;
using Marten.Internal.CompiledQueries;
using Marten.Linq;
using Marten.Linq.QueryHandlers;
using Marten.Testing.Documents;
using NpgsqlTypes;
using Weasel.Postgresql;
namespace Marten.Testing.Internals.Compiled
{
public class
OpenIssuesAssignedToUserCompiledQuery: ClonedCompiledQuery<IEnumerable<Issue>, OpenIssuesAssignedToUser>
{
private readonly HardCodedParameters _hardcoded;
private readonly IMaybeStatefulHandler _inner;
private readonly OpenIssuesAssignedToUser _query;
private readonly QueryStatistics _statistics;
public OpenIssuesAssignedToUserCompiledQuery(IMaybeStatefulHandler inner, OpenIssuesAssignedToUser query,
QueryStatistics statistics, HardCodedParameters hardcoded): base(inner, query, statistics, hardcoded)
{
_inner = inner;
_query = query;
_statistics = statistics;
_hardcoded = hardcoded;
}
public override void ConfigureCommand(CommandBuilder builder, IMartenSession session)
{
var parameters = builder.AppendWithParameters(
@"select d.id, d.data from public.mt_doc_issue as d where (CAST(d.data ->> 'AssigneeId' as uuid) = ? and d.data ->> 'Status' = ?)");
parameters[0].NpgsqlDbType = NpgsqlDbType.Uuid;
parameters[0].Value = _query.UserId;
_hardcoded.Apply(parameters);
}
}
public class
OpenIssuesAssignedToUserCompiledQuerySource: CompiledQuerySource<IEnumerable<Issue>, OpenIssuesAssignedToUser>
{
private readonly HardCodedParameters _hardcoded;
private readonly IMaybeStatefulHandler _maybeStatefulHandler;
public OpenIssuesAssignedToUserCompiledQuerySource(HardCodedParameters hardcoded,
IMaybeStatefulHandler maybeStatefulHandler)
{
_hardcoded = hardcoded;
_maybeStatefulHandler = maybeStatefulHandler;
}
public override IQueryHandler<IEnumerable<Issue>> BuildHandler(OpenIssuesAssignedToUser query,
IMartenSession session)
{
return new OpenIssuesAssignedToUserCompiledQuery(_maybeStatefulHandler, query, null, _hardcoded);
}
}
}
What else can compiled queries do?
Besides being faster than raw Linq and being useful as the old reliable Specification pattern, compiled queries can be very valuable if you absolutely insist on mocking or stubbing the Marten IQuerySession/IDocumentSession. You should never, ever try to mock or stub the IQueryable interface with a dynamic mock library like NSubstitute or Moq, but mocking the IQuerySession.Query<T>(T query) method is pretty straight forward.
Hey, I blog a lot about the OSS tools I work on, so this week I’m going in a different direction and blogging about other OSS tools I use in daily development. In no small part, this blog post is a demonstration to some of my colleagues to get them to weigh in on the approach I took here.
I’ve been dragging my feet for way, way too long at work on what’s going to be our new centralized identity provider service based on Identity Server 5 from Duende Software. It is the real world, so for the first phase of things, the actual user credentials are stored in an existing Sql Server database, with roughly a database per client strategy of multi-tenancy. For this new server, I’m introducing a small lookup database to store the locations of the client specific databases. So the new server has this constellation of databases:
After some initial spiking, the first serious thing I did was to set up the automated developer build for the codebase. For local development, I need a script that:
Sets up multiple Sql Server databases for local development and testing
Restore Nuget dependencies
Can build the actual C# code (and might later delegate to NPM if there’s any JS/TS code in the user interface)
Run all the tests in the codebase
For very simple projects I’ll just use the dotnet command line to run tests from the command line in CI builds or at Git commit time. Likewise in Node.js projects, npm by itself is frequently good enough. If all there was was the C# code, dotnet test would be enough of a build script in this Identity Server project, but the database requirements are enough to justify a more complex build automation approach.
Build Scripting with Bullseye
Until very recently, I still used the Ruby-based Rake tooling for build scripting, but Ruby as a scripting language has definitely fallen out of favor in .Net circles. After Babu Annamalai introduced Bullseye/SimpleExec into Marten, I’m using Bullseye as my go to build scripting tool.
At least in my development circles, make-like, task-oriented build automation tools have definitely lost popularity in recent years. But in this identity server project, that’s exactly what I want for build automation. My task-oriented build scripting tool of choice for .Net work is the combination of Bullseye with SimpleExec. Bullseye itself is very easy to use because you’re using C# in a small .Net project. Because it’s just a .Net console application, you also have complete access to Nuget libraries — as we’ll exploit in just a bit.
To get started with Bullseye, I created a folder called build off of the repository root of my identity server codebase, and created a small .Net console application that I also call build. You can see an example of this in the Lamar codebase.
Because we’ll need this in a minute, I’ll also place some wrapper scripts at the root directory of the repository to call the build project called build.cmd, build.ps1, and build.sh for Windows, Powershell, and *nix development. The build.cmd file is just delegating to the new build project and passing all the command line variables like so:
@echo off
dotnet run --project build/build.csproj -c Release -- %*
Back to the new build project, I added Nuget references to Bullseye and SimpleExec. In the Program.Main() function (this could be a little simpler with the new streamlined .Net 6 entry point), I’ll add a couple static namespace declarations:
using static Bullseye.Targets;
using static SimpleExec.Command;
Now we’re ready to write our first couple tasks directly into the Program code file. I still prefer to have separately executable tasks restoring Nugets, compiling, and running all the tests so you can run partial builds at will. In this case, using some sample code from the Oakton build script:
// Just delegating to the dotnet cli to restore nugets
Target("restore", () =>
{
Run("dotnet", "restore src/Oakton.sln");
});
// compile the whole solution, but after running
// the restore task
Target("compile", DependsOn("restore"),() =>
{
Run("dotnet",
$"build src/Oakton.sln --no-restore");
});
Target("test", DependsOn("compile"),() =>
{
RunTests("Tests");
});
// Little helper function to execute tests by test project name
// still just delegating to the command line
private static void RunTests(string projectName, string directoryName = "src")
{
Run("dotnet", $"test --no-build {directoryName}/{projectName}/{projectName}.csproj");
}
We’ve got a couple more steps to make this a full build script. We also need this code at the very bottom of our Program.Main() function to actually run tasks:
RunTargetsAndExit(args);
I typically have an explicit “default” task that gets executed when you just type build / ./build.sh that usually just includes other named tasks. In the case of Oakton, it runs the unit test task plus another task called “commands” that smoke tests several command line calls:
Target("default", DependsOn("test", "commands"));
Usually, I’ll also use a “ci” task that is intended for continuous integration builds that is a superset of the default build task with extra integration tests or Nuget publishing (this isn’t as common now that we tend to use separate GitHub actions for Nuget publishing). In Oakton’s case the “ci” task is exactly the same:
Target("ci", DependsOn("default"));
After all that is in place, and working in Windows at the moment, I like to make git commits with the old “check in dance” like this:
build && git commit -a -m "some commit message"
Less commonly, but still valuable, let’s say that Microsoft has just released a new version of .Net that causes a cascade of Nuget updates and other havoc in your projects. While working through that, I’ll frequently do something like this to work out Nuget resolution issues:
git clean -xfd && build restore
Or once in awhile the IDE build error window can be misleading, so I’ll build from the command line with:
build compile
So yeah, most of the build “script” I’m showing here is just delegating to the dotnet CLI and it’s not very sophisticated. I still like having this file so I can jump between my projects and just type “build” or “build compile” without having to worry about what the solution file name is, or telling dotnet test which projects to run. That being said though, let’s jump into something quite a bit more complicated.
Adding Sql Server and EF Core into the Build Script
For the sake of testing my new little identity server, I need at least a couple different client databases plus the lookup database. Going back to first principles of Agile Development practices, it should be possible for a brand new developer to do a clean clone of the new identity server codebase and very quickly be running the entire service and all its tests. I’m going to pull that off by adding new tasks to the Bullseye script to set up databases and automate all the testing.
First up, I don’t need very much data for testing, so I’m more than good enough just running Sql Server in Docker, so I’ll add this docker-compose.yml file to my repository:
The only think interesting to note is that I mapped a non-default port number (1435) to this container for the sole sake of being able to run this container in parallel to Sql Server itself that I have to have for other projects at work. Back to Bullseye, and I’ll add a new task to delegate to docker compose to start up Sql Server:
And trust me on this one, the Docker setup is asynchronous, so you actually need to make your build script wait a little bit until the new Sql Server database is accessible before doing anything else. For that purpose, I use this little function:
public static async Task WaitForDatabaseToBeReady()
{
Console.WriteLine("Waiting for Sql Server to be available...");
var stopwatch = new Stopwatch();
stopwatch.Start();
while (stopwatch.Elapsed.TotalSeconds < 30)
{
try
{
// ConnectionSource is really just exposing a constant
// with the known connection string to the Dockerized
// Sql Server
await using var conn = new SqlConnection(ConnectionSource.ConnectionString);
await conn.OpenAsync();
var cmd = conn.CreateCommand();
cmd.CommandText = "select 1";
await cmd.ExecuteReaderAsync();
Console.WriteLine("Sql Server is up and ready!");
return;
}
catch (Exception)
{
await Task.Delay(250);
Console.WriteLine("Database not ready yet, trying again.");
}
}
}
Next, I need some code to create additional databases (I’m sure you can do this somehow in the docker compose file itself, but I didn’t know how at the time and this was easy). I’m going to omit the actual CREATE DATABASE calls, but just know there’s a method with this signature on a static class in my build project called Database:
public static async Task BuildDatabases()
I’m using EF Core for data access in this project, and also using EF Core migrations to do database schema building, so we’ll want the dotnet ef tooling available, so I added a task for just that:
The dotnet ef command line usage has a less than memorable pattern of usage, so I made a little helper function that’s gonna get called for different combinations of EF Core context name and database connection strings:
public static async Task RunEfUpdate(string contextName, string databaseName)
{
Console.WriteLine($"Running EF Migration for context {contextName} on database '{databaseName}'");
// ConnectionSource is a little helper specific to my
// identity server project
var connection = ConnectionSource.ConnectionStringForDatabase(databaseName);
await Command.RunAsync("dotnet",
$"ef database update --project src/ProjectName/ProjectName.csproj --context {contextName} --connection \"{connection}\"");
}
For a little more context, I have two separate EF Core DbContext classes (obfuscated from the real code):
LookupDbContext — the “master” registry of client databases by client id
IdentityDbContext — addresses a single client database holding user credentials
And now, after all that work, here’s a Bullseye script that can stand up a new Sql Server database in Docker, build the required databases if necessary, establish baseline data, and run the correct EF Core migrations as needed:
Target("database", DependsOn("docker-up"), async () =>
{
// "Database" is a static class in my build project where
// I've dumped database helper code
await Database.BuildDatabases();
// RunEfUpdate is delegating to dotnet ef
await Database.RunEfUpdate("LookupDbContext", "identity");
// Not shown, but fleshing out some static lookup data
// with straight up SQL calls
// Running migrations on all three test databases for client
// credential databases
await Database.RunEfUpdate("IdentityDbContext", "environment1");
await Database.RunEfUpdate("IdentityDbContext", "environment2");
await Database.RunEfUpdate("IdentityDbContext", "environment3");
});
Now, the tests for this identity server are almost all going to be integration tests, so I won’t even bother separating out integration tests from unit tests. That being said, our main test library is going to require the Sql Server database built above to be available before the tests are executed, so I’m going to add a dependency to the test task like so:
// The database is required
Target("test", DependsOn("compile", "database"), () =>
{
RunTests("Test Project Name");
});
Now, when someone does a quick clone of this codebase, they should be able to just run the build.cmd/ps1/sh script and assuming that they already have the correct version of .Net installed plus Docker Desktop:
Have all the nuget dependencies restored
Compile the entire solution
Start a new Sql Server instance in Docker with all testing databases built out with the correct database structure and lookup data
Have executed all the automated tests
Bonus Section: Integration with GitHub Actions
I’m a little bit old school with CI. I grew up in the age when you tried to keep your CI set up as crude as possible and mostly just delegated to a build script that did all the actual work. To that end, if I’m using Bullseye as my build scripting tool and GitHub Actions for CI, I delegate to Bullseye like this from the Oakton project:
The very bottom line of code is the pertinent part that delegates to our Bullseye script and runs the “ci” target that’s my own idiom. Part of the point here is to have the build script steps committed and versioned to source control — which these days is also done with the YAML GitHub action definition files, so that’s not as important as it used to be. What is still important today is that coding in YAML sucks, so I try to keep most of the actual functionality in nice, clean C#.
Bonus: Why didn’t you…????
Why didn’t you just use MSBuild? It’s possible to use MSBuild as a task runner, but no thank you. I was absolutely sick to death of coding via XML in NAnt when MSBuild was announced, and I’ll admit that I never gave MSBuild the time of day. I’ll pass on more coding in Xml.
Why didn’t you just use Nuke or Cake? I’ve never used Nuke and can’t speak to it. I’m not a huge Cake fan, and Bullseye is a simple model to me
Why didn’t you just use Powershell? You end up making powershell scripts call other scripts and it clutters the file system up.
Alba is a small open source library that is a helper for integration testing against ASP.Net Core HTTP methods that makes the underlying ASP.Net Core TestServer easier and more declarative to use within tests.
Continuing a busy couple weeks of OSS work getting tools on speaking terms with .Net 6, Alba v6.0 was released early this week with support for .Net 6 and the new WebApplication bootstrapping model within ASP.Net Core 6.0. Before I dive into the details, a big thanks to Hawxy who did most of the actual coding for this release.
The biggest change was getting Alba ready to work with the new WebApplicationBuilder and WebApplicationFactory models in ASP.Net Core such that Alba can be used with any typical way to bootstrap an ASP.Net Core project. See the Alba Setup page in the documentation for more details.
Using Alba with Minimal API Projects
From Alba’s own testing, let’s say you have a small Minimal API project that’s bootstrapped like this in your web services Program file:
using System;
using Microsoft.AspNetCore.Builder;
var builder = WebApplication.CreateBuilder(args);
// Add services to the container.
var app = builder.Build();
// Configure the HTTP request pipeline.
app.UseHttpsRedirection();
app.MapGet("/", () => "Hello World!");
app.MapGet("/blowup", context => throw new Exception("Boo!"));
app.Run();
Alba’s old (and still supported) model of using the application’s HostBuilder in the .Net 5 project templates is no help here, but that’s okay, because Alba now also understand how to use WebApplicationFactory to bootstrap the application shown above. Here’s some sample code to do just that in a small xUnit test:
// WebApplicationFactory can resolve old and new style of Program.cs
// .NET 6 style - the global:: namespace prefix would not be required in a normal test project
await using var host = await AlbaHost.For<global::Program>(x =>
{
x.ConfigureServices((context, services) =>
{
services.AddSingleton<IService, ServiceA>();
});
});
host.Services.GetRequiredService<IService>().ShouldBeOfType<ServiceA>();
var text = await host.GetAsText("/");
text.ShouldBe("Hello World!");
And you’re off to the races and authoring integration tests with Alba!
How Alba and ASP.Net Have Evolved
The code that ultimately became Alba has been around for over a decade, and I think it’s a little interesting to see the evolution of web development in .Net through Alba & I’s history.
1998: I built a couple internal, “shadow IT” applications for my engineering team with ASP “classic” and fell in love with web development
2003: I was part of a small team building a new system on the brand new ASP.Net WebForms application model and fell out of love with web development for several years
~2015: I ripped Alba into its own library and ported the code to work against the OWIN model
2017: Alba 1.0 was ported to the new ASP.Net Core, but used its own special sauce to run HTTP requests in memory with stubbed out HttpContext objects
2018: Alba 2.0 accommodated a world of changes from the release of ASP.Net Core 2.*. There was temporarily separate Nugets for ASP.Net Core 1 and ASP.Net Core 2 because the models were so different. That sucked.
2019: Alba 3.0 was released supporting ASP.Net Core 3.*, and ditched all support for anything on the full .Net framework. At this point Alba’s internals were changed to utilize the ASP.Net Core TestServer and HostBuilder models
2020: Alba 4.0 supported ASP.Net Core 5.0
August 2021: Alba 5.0 added a new extension model with initial extensions for testing applications secured by JWT bearer tokens
December 2021: .Net 6 came with a lot of changes to the ASP.Net Core bootstrapping model, so here we are with a brand new Alba 6.0.
Lots and lots of changes in the web development world within .Net, and I’m betting that’s not completely done changing. For my part, Alba isn’t the most widely used library, but there’s more than enough usage for me to feel good about a piece of fubumvc living on. Plus we use it at work for integration testing, so Alba is definitely going to live on.
It’s been a busy couple weeks in OSS world for me scurrying around and getting things usable in .Net 6. Today I’m happy to announce the release of Lamar 7.0. The Nuget for Lamar itself and Lamar.Microsoft.DependencyInjection with adjusted dependencies for .Net 6 went up yesterday, and I made some additions to the documentation website just now. There are no breaking changes in the API, but Lamar dropped all support for any version of .Net < .Net 5.0. Before I get into the highlights, I’d like to thank:
Babu Annamalai for making the docs so easy to re-publish
Andrew Lock for writing some very helpful blog posts about new .Net 6 internals that have helped me get through .Net 6 improvements to several tools the past couple weeks.
Lamar and Minimal API
Lamar v7 adds some specific support for better usability of the new Minimal API feature in ASP.Net Core. Below is the sample we use in the Lamar documentation and the internal tests:
var builder = WebApplication.CreateBuilder(args);
// use Lamar as DI.
builder.Host.UseLamar((context, registry) =>
{
// register services using Lamar
registry.For<ITest>().Use<MyTest>();
registry.IncludeRegistry<MyRegistry>();
// add the controllers
registry.AddControllers();
});
var app = builder.Build();
app.MapControllers();
// [FromServices] is NOT necessary when using Lamar v7
app.MapGet("/", (ITest service) => service.SayHello());
app.Run();
The Lamar IContainer itself, and all nested containers (scoped containers in .Net DI nomenclature) implement both IDisposable and IAsyncDisposable. It is not necessary to call both Dispose() and DisposeAsync() as either method will dispose all tracked IDisposable / IAsyncDisposable objects when either method is called.
// Asynchronously disposing the container
await container.DisposeAsync();
The following table explains what method is called on a tracked object when the creating container is disposed:
If an object implements…
Container.Dispose()
Container.DisposeAsync()
IDisposable
Dispose()
Dispose()
IAsyncDisposable
DisposeAsync().GetAwaiter().GetResult()
DisposeAsync()
IDisposable and IAsyncDisposable
DisposeAsync()
DisposeAsync()
If any objects are being created by Lamar that only implement IAsyncDisposable, it is probably best to strictly use Container.DisposeAsync() to avoid any problematic mixing of sync and async code.
We’ve got an upcoming Marten 5.0 release ostensibly to support breaking changes related to .Net 6, but that also gives us an opportunity to consider work that would result in breaking API changes. A strong candidate for V5 right now is finally adding long delayed first class support for multi-tenancy through separate databases.
Let’s say that you’re building an online database-backed, web application of some sort that will be servicing multiple clients. At a minimum, you need to isolate data access so that client users can only interact with the data for the correct client or clients. Ideally, you’d like to get away with only having one deployed instance of your application that services the users of all the clients. In other words, you want to support “multi-tenancy” in your architecture.
For the rest of this post, I’m going to use the term “tenant” to refer to whatever the organizational entity is that owns separate database data. Depending on your business domain, that could be a client, a sub-organization, a geographic area, or some other organizational concept.
There are three basic approaches to segregating tenant data in a database:
Single database, single schema, but use a field or property in each table to denote the tenant. This is Marten’s approach today with what we call the “Conjoined” model. The challenge here is that all queries and writes to the database need to take into account the currently used tenant — and that’s where Marten’s multi-tenancy support helps a great deal. Database schema management is easier with this approach because there’s only one set of database objects to worry about. More on this later.
Separate schema per tenant in a single database. Marten does not support this model, and it doesn’t play well with Marten’s current internal design. I seriously doubt that Marten will ever support this.
Separate database per tenant. This has been in Marten’s backlog forever, and maybe now is the time this finally gets done (plenty of folks have used Marten this way already with custom infrastructure on top of Marten, but there’s some significant overhead). I’ll speak to this much more in the last section of this post.
Basic Multi-Tenancy Support in Marten
To set up multi-tenancy in your document storage with Marten, we can set up a document store with these options:
var store = DocumentStore.For(opts =>
{
opts.Connection("some connection string");
// Let's just say that each and every document
// type is going to be multi-tenanted
opts.Policies.AllDocumentsAreMultiTenanted();
// Or you can do this document type by document type
// if some document types are not related to a tenant
opts.Schema.For<User>().MultiTenanted();
});
There’s a couple other ways to opt document types into multi-tenancy, but you get the point. With just this, we can start a new Marten session for a particular tenant and carry out basic operations isolated to a single tenant like so:
// Open a session specifically for the tenant "tenant1"
await using var session = store.LightweightSession("tenant1");
// This would return *only* the admin users from "tenant1"
var users = await session.Query<User>().Where(x => x.Roles.Contains("admin"))
.ToListAsync();
// This user would be automatically be tagged as belonging to "tenant1"
var user = new User {UserName = "important_guy", Roles = new string[] {"admin"}};
session.Store(user);
await session.SaveChangesAsync();
The key thing to note here is that other than telling Marten which tenant you want to work with as you open a new session, you don’t have to do anything else to keep the tenant data segregated as Marten is dealing with those mechanics behind the scenes on all queries, inserts, updates, and deletions from that session.
Awesome, except that some folks needed to occasionally do operations against multiple tenants at one time…
Tenant Spanning Operations in Marten V4
The big improvements in Marten V4 for multi-tenancy was in making it much easier to work with data from multiple tenants in one document session. Marten has long had the ability to query data across tenants with the AnyTenant() or ` like so:
var allAdmins = await session.Query<User>()
.Where(x => x.Roles.Contains("admin"))
// This is a Marten specific extension to Linq
// querying
.Where(x => x.AnyTenant())
.ToListAsync();
Which is great for what it is, but there wasn’t any way to know what tenant each document returned belonged to. We made a huge effort in V4 to expand Marten’s document metadata capabilities, and part of that is the ability to write the tenant id to a document being fetched from the database by Marten. The easiest way to do that is to have your document type implement the new ITenanted interface like so:
public class MyTenantedDoc: ITenanted
{
public Guid Id { get; set; }
// This property will be set by Marten itself
// when the document is persisted or loaded
// from the database
public string TenantId { get; set; }
}
So now we at least have the ability to know which documents we queried across the tenants belong to which tenant.
The next thing folks wanted from V4 was the ability to make writes against multiple tenants with one single document session in a single unit of work. To that end, Marten V4 introduced the concept of ITenantOperations to log operations against a specific tenants besides the tenant the current session was opened as. And all those operations should be committed to the underlying Postgresql database as a single transaction.
To make that concrete, here’s some sample code, but this time adding two new User document with the same user name, but to two different tenants by tenant id:
// Same user name, but in different tenants
var user1 = new User {UserName = "bob"};
var user2 = new User {UserName = "bob"};
// This exposes operations against only tenant1
session.ForTenant("tenant1").Store(user1);
// This exposes operations that would apply to
// only tenant2
session.ForTenant("tenant2").Store(user2);
// And both operations get persisted in one transaction
await session.SaveChangesAsync();
So that’s the gist of the V4 multi-tenancy improvements. We also finally support multi-tenancy within the asynchronous projection support, but I’ll blog about that some other time.
Now though, it’s time to consider…
Database per Tenant
To be clear, I’m looking for any possible feedback about the requirements for this feature in Marten. Blast away here in comments, or here’s a link to the GitHub issue, or go to Gitter.
While you can — and many folks have successfully achieved — multi-tenancy through database per tenant by just keeping an otherwise identically configured DocumentStore per named tenant in memory with the only difference being a connection string. That certainly can work, especially with a low number of tenants. There’s a few problems with that approach though:
You’re on your own to configure that in the DI container within your application
DocumentStore is a relatively expensive object to create, and it potentially generates a lot of runtime objects that get held in memory. You don’t really want a bunch of those hanging around
Going around AddMarten() negates the Marten CLI support, which is the easiest possible way to manage Marten database schema migrations. Now you’re completely on your own about how to do database migrations without using pure runtime database patching — which we do not recommend in production.
So let’s just call it a given that we do want to add some formal support for multi-tenancy through separate databases per tenant to Marten. Moreover, Database per Tenant been in our backlog forever, but pushed off every time we’ve struggled to make big Marten releases.
I think there’s some potential for this story to cause breaking API changes (I don’t have anything specific in mind, it’s just likely in my opinion), so that makes that story a very good candidate to get in place for Marten V5. From the backlog issue writeup I made back in 2017:
Have all tenants tracked in memory, such that a single DocumentStore can share all the expensive runtime built internal objects across tenants
A tenanting strategy that can lookup the database connection string per tenant, and create sessions per separate tenants. There’s actually an interface hook in Marten all ready to go that may serve out of the box when we do this (I meant to do this work years ago, but it just didn’t happen).
At development time (AutoCreate != AutoCreate.None), be able to spin up a new database on the fly for a tenant if it doesn’t already exist
“Know” what all the existing tenants are so that we could apply database migrations from the CLI or through the DocumentStore schema migration APIs
Extend the CLI support to support multiple tenant databases
Make the database registry mechanism be a little bit pluggable. Thinking that some folks would have a few tenants where you’d be good with just writing everything into a static configuration file. Other folks may have a *lot* of tenants (I’ve personally worked on a system that had >100 separate tenant databases in one deployed application), so they may want a “master” database
I’m going to have to admit that I got caught flat footed by the .Net 6 release a couple weeks ago. I hadn’t really been paying much attention to the forthcoming changes, maybe got cocky by how easy the transition from netcoreapp3.1 to .Net 5 was, and have been unpleasantly surprised by how much work it’s going to take to move some OSS projects up to .Net 6. All at the same time that the advance users of the world are clamoring for all their dependencies to target .Net 6 yesterday.
All that being said, here’s my running list of plans to get the projects in the JasperFx GitHub organization successfully targeting .Net 6. I’ll make edits to this page as things get published to Nuget.
Baseline
Baseline is a grab bag utility library full of extension methods that I’ve relied on for years. Nobody uses it directly per se, but it’s a dependency of just about every other project in the organization, so it went first with the 3.2.2 release adding a .Net 6 target. No code changes were necessary other than adding .Net 6 to the CI testing. Easy money.
Oakton
EDIT: Oakton v4.0 is up on Nuget. WebApplication is supported, but you can’t override configuration in commands with this model like you can w/ HostBuilder only. I’ll do a follow up at some point to fill in this gap.
Oakton is a tool to add extensible command line options to .Net applications based on the HostBuilder model. Oakton is my problem child right now because it’s a dependency in several other projects and its current model does not play nicely with the new WebApplicationBuilder approach for configuring .Net 6 applications. I’d also like to get the Oakton documentation website moved to the VitePress + MarkdownSnippets model we’re using now for Marten and some of the other JasperFx projects. I think I’ll take a shortcut here and publish the Nuget and let the documentation catch up later.
Alba
Alba is an automated testing helper for ASP.Net Core. Just like Oakton, Alba worked very well with the HostBuilder model, but was thrown for a loop with the new WebApplicationBuilder configuration model that’s the mechanism for using the new Minimal API (*cough* inevitable Sinatra copy *cough*) model. Fortunately though, Hawxy came through with a big pull request to make Alba finally work with the WebApplicationFactory model that can accommodate the new WebApplicationBuilder model, so we’re back in business soon. Alba 5.1 will be published soon with that work after some documentation updates and hopefully some testing with the Oakton + WebApplicationBuilder + Alba model.
EDIT: Alba 7.0 is up with the necessary changes, but the docs will come later this week
Lamar
Lamar is an IoC/DI container and the modern successor to StructureMap. The biggest issue with Lamar on v6 was Nuget dependencies on the IServiceCollection model, plus needing some extra implementation to light up the implied service model of Minimal APIs. All the current unit tests and even integration tests with ASP.Net Core are passing on .Net 6. To finish up a new Lamar 7.0 release is:
One .Net 6 related bug in the diagnostics
Better Minimal API support
Upgrade Oakton & Baseline dependencies in some of the Lamar projects
Documentation updates for the new IAsyncDisposable support and usage with WebApplicationBuilder with or without Minimal API usage
EDIT: Lamar 7.0 is up on Nuget with .Net 6 support
Marten/Weasel
We just made the gigantic V4 release a couple months ago knowing that we’d have to follow up quickly with a V5 release with a few breaking changes to accommodate .Net 6 and the latest version of Npgsql. We are having to make a full point release, so that opens the door for other breaking changes that didn’t make it into V4 (don’t worry, I think shifting from V4 to V5 will be easy for most people). The other Marten core team members have been doing most of the work for this so far, but I’m going to jump into the fray later this week to do some last minute changes:
Review some internal changes to Npgsql that might have performance impacts on Marten
Consider adding an event streaming model within the new V4 async daemon. For folks that wanna use that to publish events to some kind of transport (Kafka? Some kind of queue?) with strict ordering. This won’t be much yet, but it keeps coming up so we might as well consider it.
Multi-tenancy through multiple databases. It keeps coming up, and potentially causes breaking API changes, so we’re at least going to explore it
I’m trying not to slow down the Marten V5 release with .Net 6 support for too long, so this is all either happening really fast or not at all. I’ll blog more later this week about multi-tenancy & Marten.
Weasel is a spin off library from Marten for database change detection and ADO.Net helpers that are reused in other projects now. It will be published simultaneously with Marten.
Jasper
Oh man, I’d love, love, love to have Jasper 2.0 done by early January so that it’ll be available for usage at my company on some upcoming work. This work is on hold while I deal with the other projects, my actual day job, and family and stuff.