Real Life TDD Example

Continuing a new blog series that I started yesterday on the application and usage of Test Driven Development.

Other posts in this series (so far):

In this post I’m going to walk through how I used TDD myself to build a feature and try to explain why I wrote the tests I did, and why I sequenced things as I did. Along the way, I dropped in short descriptions of ideas or techniques to best use TDD that I’ll hopefully revisit in longer form later in subsequent posts.

I do generally use Test Driven Development (TDD) in the course of my own coding work, but these days the mass majority of my coding work is in open source projects off to the side. One of the active open source projects I’m actively contributing to is a tool named “Wolverine” that is going to be a new command bus / mediator / messaging tool for .NET (it’s “Jasper” rebranded with a lot of improvements). I’ll be using Wolverine code for the code samples in this post going forward.

TDD’ing “Back Pressure” in Wolverine

One of the optional features of Wolverine is to buffer incoming messages from an external queue like Rabbit MQ in a local, in-process queue (through TPL Dataflow if you’re curious) before these messages are processed by the application’s message handlers. That’s sometimes great because it can sometimes speed up processing throughput quite a bit. It can also be bad if the local queue gets backed up and there are too many messages floating around that create memory pressure in your application.

To alleviate that concern, Wolverine uses the idea of “back pressure” to temporarily shut off local message listening from external message brokers if the local queue gets too big, and turn message listening back on only when the local queues get smaller as messages are successfully handled.

Here’s more information about “back pressure” from Derek Comartin.

Here’s a little diagram of the final structure of that back pressure subsystem and where it sits in the greater scope of things:

The diagram above reflects the final product only after I used Test Driven Development along the way to help shape the code. Rewinding a little bit, let me talk about the intermediate steps I took to get to this final, fully tested structure by going through some of my internal rules for TDD.

The first, most important step though is to just commit to actually doing TDD as you work. Everything else follows from that.

Writing that First Test

Like a lot of other things in life, coding is sometimes a matter of momentum or lack thereof. Developers can easily psych themselves into a state of analysis paralysis if they can’t immediately decide on exactly how the code should be designed from end to end. TDD can help here by letting you concentrate on a small area of the code you do know how to build, and verify that the new code works before you set it aside to work on the next step.

When I started the back pressure work, the very first test I wrote was to simply verify the ability for users to configure thresholds for when the messaging listener should be stopped and restarted on an endpoint by endpoint basis (think Rabbit MQ queue or a named, local in memory queue). I also wrote a test for default thresholds (which I made up on the spot) in cases when there was no explicit override.

Here’s the “Arrange” part of the first test suite:

public class configuring_endpoints : IDisposable
{
    private readonly IHost _host;
    private WolverineOptions theOptions;
    private readonly IWolverineRuntime theRuntime;

    public configuring_endpoints()
    {
        // This bootstraps a simple Wolverine system
        _host = Host.CreateDefaultBuilder().UseWolverine(x =>
        {
            // I'm configuring some known endpoints in the system. This is the "Arrange"
            // part of the system
            x.ListenForMessagesFrom("local://one").Sequential().Named("one");
            x.ListenForMessagesFrom("local://two").MaximumParallelMessages(11);
            x.ListenForMessagesFrom("local://three").UseDurableInbox();
            x.ListenForMessagesFrom("local://four").UseDurableInbox().BufferedInMemory();
            x.ListenForMessagesFrom("local://five").ProcessInline();

            x.ListenForMessagesFrom("local://durable1").UseDurableInbox(new BufferingLimits(500, 250));
            x.ListenForMessagesFrom("local://buffered1").BufferedInMemory(new BufferingLimits(250, 100));
        }).Build();

        theOptions = _host.Get<WolverineOptions>();
        theRuntime = _host.Get<IWolverineRuntime>();
    }

I’m a very long term usage of ReSharper and now Rider from JetBrains, so I happily added the new BufferingLimits argument to the previously existing BufferedInMemory() method in the unit test and let Rider add the argument to the method based on your inferred usage within the unit test. It’s not really the point of this post, but absolutely lean on your IDE when writing code “test first” to generate stub methods or change existing methods based on the inferred usage from your test code. It’s frequently a way to go a little faster when doing TDD.

And next, here’s some of the little tests that I used to verify both the buffering limit defaults and overrides based on the new syntax above:

    [Fact]
    public void has_default_buffering_options_on_buffered()
    {
        var queue = localQueue("four");
        queue.BufferingLimits.Maximum.ShouldBe(1000);
        queue.BufferingLimits.Restart.ShouldBe(500);
    }

    [Fact]
    public void override_buffering_limits_on_buffered()
    {
        var queue = localQueue("buffered1");
        queue.BufferingLimits.Maximum.ShouldBe(250);
        queue.BufferingLimits.Restart.ShouldBe(100);
    }

It’s just a couple simple tests with a little bit of admittedly non-trivial setup code, but you have to start somewhere. A few notes about why I started with those particular tests and how I decided to test that way:

  • Test Small before Testing Big — One of my old rules of doing TDD is to start by testing the building blocks of a new user story/feature/bug fix before trying to attempt to write a test that spans the entire flow of the new code. In this case, I want to prove out that just the configuration element of this complicated new functionality works before I even think about running the full stack. Using this rule should help you keep your debugger on the sidelines. More on this in later posts
  • Bottom Up or Top Down — You can either start by trying to code the controlling workflow and create method stubs or interface stubs for dependencies as you discover exactly what’s necessary. That’s working top down. In contrast, I frequently work “bottom up” when I understand some of the individual tasks within the larger feature, but maybe don’t yet understand how the entire workflow should be yet. More on this in a later post, but the key is always to start with what you already understand.
  • Sociable vs solitary tests — The tests above are “sociable” in that they use a “full” Wolverine application to test the new configuration code within the full cycle of the application bootstrapping process. This is opposed to being a “solitary” test that tests a very small, isolated part of the code. My decision to do this was based on my feeling that that test would be simple enough to write, and that a more isolated test in this particular case wasn’t really useful anyway.

The code tested by these first couple tests was pretty trivial, but it has to work before the whole feature can work, so it deserves a test. By and large, I like the advice that you write tests for any code that could conceivably be wrong.

I should also note that I did not in this case do a full design upfront of how this entire back pressure feature would be structured before I wrote that first couple tests.

One of the advantages of working in a TDD style is that it forces you (or should) to work incrementally in smaller pieces of code, which can hopefully be rearranged later when your initial ideas about how the code should be structured turn out to be wrong.

Using Responsibility Driven Design

I don’t always do this in a formal way, but by and large my first step in developing a new feature is to just think through the responsibilities within the new feature. To help discover those responsibilities I like to use object role stereotypes to quickly suggest splitting up the feature into different elements of the code by responsibility in order to make the code easier to test and proceed from there.

Back to building the back pressure feature, from experience I knew that it’s often helpful to separate out the responsibility to make a decision to take an action away from actually performing that action. To that end I chose to separate out a small, separate class called BackPressureAgent that will be responsible for deciding when to pause or restart listening based on the conditions of the current endpoint (how many messages are queued locally, and is the listener actively pulling in new messages from the external resource).

In object role stereotype terms, BackPressureAgent becomes a “controller” that controls and directs the actions of other objects and decides what those other objects should be doing. In this case, BackPressureAgent is telling an IListeningAgent object whether to pause or restart as shown in this “happy path, all is good, do nothing” test case shown below below:

    [Fact]
    public void do_nothing_when_accepting_and_under_the_threshold()
    {
        theListeningAgent.Status
            .Returns(ListeningStatus.Accepting);
        theListeningAgent.QueueCount
            .Returns(theEndpoint.BufferingLimits.Maximum - 1);
        
        // Evaluate whether or not the listening should be paused
        // based on the current queued item count, the current status
        // of the listening agent, and the configured buffering limits
        // for the endpoint
        theBackPressureAgent.CheckNowAsync();

        // Should decide NOT to do anything in this particular case
        theListeningAgent.DidNotReceive().MarkAsTooBusyAndStopReceivingAsync();
        theListeningAgent.DidNotReceive().StartAsync();
    }

In the tests above, I’m using a dynamic mock using NSubstitute for the listening agent just to simulate the current queue size and status, then evaluate whether or not the code under test decided to stop the listening or not. In the case above, the listening agent is running fine, and no action should take place.

Some notes on the test above:

  • In object role stereotype terms, the IListeningAgent is both an “interfacer” that we can use to provide information about the local queue and a “service provider” that can in this case “mark a listening endpoint as too busy and stop receiving external messages” and also restart the message listening later
  • The test above is an example of “interaction-based testing” that I’ll expound on and contrast with “state-based testing” in the following section
  • IListeningAgent already existed at this point, but I added new elements for QueueCount and the clumsily named `MarkAsTooBusyAndStopReceivingAsync()` method while writing the test. Again, I defined the new method and property names within the test itself, then let Rider generate the methods behind the scenes. We’ll come back to those later.
  • Isolate the Ugly Stuff — Early on I decided that I’d probably have BackPressureAgent use a background timer to occasionally sample the state of the listening agent and take action accordingly. Writing tests against code that uses a timer or really any asynchronous code is frequently a pain, so I bypassed that for now by isolating the logic on deciding to stop or restart external message listening away from the background timer, the active message broker infrastructure (again, think Rabbit MQ or AWS SNS or Azure Service Bus).
  • Keep a Short Tail — Again, the decision making logic is easy to test without having to pull in the background timer, the local queue infrastructure, or any kind of external infrastructure. Another way to think about that I learned years ago was this simple test of your code’s testability: “if I try to write a test for your code/method/function, what else do I have to pull off the shelf in order to run that test?” You ideally want that answer to be “not very much” or at least “nothing that’s hard to set up or control.”
  • Mocks are a red pepper flake test ingredient. Just like cooking with red pepper flakes, some judicial usage of dynamic mock objects can sometimes be a good thing, but using too many mock objects is pretty much always going to ruin the test in terms of readability, test setup work, and harmful coupling between the test and the implementation details

I highly recommend Rebecca Wirfs-Brock’s online A Brief Tour of Responsibility Driven Development for more background on this.

I didn’t test this

I needed to add an actual implementation of IListeningAgent.QueueCount that just reflected the current state of a listening endpoint based on the local queue within that endpoint like so:

    public int QueueCount => _receiver is ILocalQueue q ? q.QueueCount : 0;

I made the judgement call that that code above was simple enough — and also too much trouble to test anyway — that it was low risk to not write any test whatsoever.

Making a required code coverage number is not a first class goal. Neither is using pure, unadulterated TDD for every line of code you write (but definitely test as you work rather than waiting until the very end to test no matter how you work). The real goal is being able to use TDD as a very rapid feedback cycle and as a way to arrive as code that exhibits the desirable qualities of high cohesion and low coupling.

Introducing the first integration test

Earlier I said that one of my rules was “test small before testing big.” At this point I still wasn’t ready to try to just code the rest of the back pressure and try to run it all because I hadn’t yet coded the functionality to actually pause listening to external messages. That new method in ListeningAgent is shown below:

    public async ValueTask MarkAsTooBusyAndStopReceivingAsync()
    {
        if (Status != ListeningStatus.Accepting || _listener == null) return;
        await _listener.StopAsync();
        await _listener.DisposeAsync();
        _listener = null;
        
        Status = ListeningStatus.TooBusy;
        _runtime.ListenerTracker.Publish(new ListenerState(Uri, Endpoint.Name, Status));

        _logger.LogInformation("Marked listener at {Uri} as too busy and stopped receiving", Uri);
    }

It’s not very much code, and to be honest, I sketched out the code without first writing a test. Now, I could have written a unit test for this method, but my ultimate “zeroth rule” of testing is:

Test with the finest grained mechanism that tells you something important

Me!

I did not believe that a “solitary” unit test — probably using mock objects? — would provide the slightest bit of value and would simply replicate the implementation of the method in mock object expectations. Instead, I wrote an integration test in Wolverine’s “transport compliance” test suite like so:

[Fact]
public async Task can_stop_receiving_when_too_busy_and_restart_listeners()
{
    var receiving = (theReceiver ?? theSender);
    var runtime = receiving.Get<IWolverineRuntime>();

    foreach (var listener in runtime.Endpoints.ActiveListeners().Where(x => x.Endpoint.Role == EndpointRole.Application))
    {
        await listener.MarkAsTooBusyAndStopReceivingAsync();

        listener.Status.ShouldBe(ListeningStatus.TooBusy);
    }

    foreach (var listener in runtime.Endpoints.ActiveListeners().Where(x => x.Endpoint.Role == EndpointRole.Application))
    {
        await listener.StartAsync();

        listener.Status.ShouldBe(ListeningStatus.Accepting);
    }

    var session = await theSender.TrackActivity(Fixture.DefaultTimeout)
        .AlsoTrack(theReceiver)
        .DoNotAssertOnExceptionsDetected()
        .ExecuteAndWaitAsync(c => c.SendAsync(theOutboundAddress, new Message1()));


    session.FindSingleTrackedMessageOfType<Message1>(EventType.MessageSucceeded)
        .ShouldNotBeNull();
}

The test above reaches into the listening endpoints within a receiving Wolverine application:

  1. Pauses the external message listening
  2. Restarts the external message listening
  3. Publishes a new message from a sender to a receiving application
  4. Verifies that, yep, that message really got to where it was supposed to go

As the test above is applied to every current transport type in Wolverine (Rabbit MQ, Pulsar, TCP), I had to then run a whole bunch of integration tests against external infrastructure (running locally in Docker containers, isn’t it a great time to be alive?).

Once that test passed for all transports — and I felt that was important because there had been previous issues making a similar circuit breaker feature work without “losing” in flight messages — I was able to move on.

Almost there, but when should back pressure be applied?

At this point I was so close to being ready to make that last step and finish it all off by running end to end with everything! But at this point I remembered that back pressure should only be checked for certain types of messaging endpoints with what ultimately became these rules:\

  • It’s not a local queue. I know this might be a touch confusing, but Wolverine let’s you use named, local queues as well as using local queues internally for the listening endpoint from external message brokers like Rabbit MQ queues. If the endpoint is a named, local queue, there’s no point in using back pressure (at least in its current incarnation).
  • The listening endpoint is configured to be what Wolverine calls “buffered” mode as opposed to “inline” mode where a message has be be completely processed inline with being delivered by external message brokers before you acknowledge the receipt to the message broker
  • Or the listening endpoint is enrolled in Wolverine’s durable inbox

After fiddling with the logic to make that determination inline inside of ListeningAgent or BufferingAgent, I decided for a variety of reasons that that little bit of logic really belonged in its own method on Wolverine’s Endpoint class that is the configuration model for all communication endpoints. The base method is just this:

    public virtual bool ShouldEnforceBackPressure()
    {
        return Mode != EndpointMode.Inline;
    }

In this particular case, I probably jumped right into the code, but immediately wrote tests for the code for Rabbit MQ endpoints:

        [Theory]
        [InlineData(EndpointMode.BufferedInMemory, true)]
        [InlineData(EndpointMode.Durable, true)]
        [InlineData(EndpointMode.Inline, false)]
        public void should_enforce_back_pressure(EndpointMode mode, bool shouldEnforce)
        {
            var endpoint = new RabbitMqEndpoint(new RabbitMqTransport());
            endpoint.Mode = mode;
            endpoint.ShouldEnforceBackPressure().ShouldBe(shouldEnforce);
        }

and also for endpoints that model local queue endpoints that should of course never have back pressure applied in the current model:

    [Theory]
    [InlineData(EndpointMode.Durable)]
    [InlineData(EndpointMode.Inline)]
    [InlineData(EndpointMode.BufferedInMemory)]
    public void should_not_enforce_back_pressure_no_matter_what(EndpointMode mode)
    {
        var endpoint = new LocalQueueSettings("foo")
        {
            Mode = mode
        };
        
        endpoint.ShouldEnforceBackPressure().ShouldBeFalse();
    }

That’s nearly trivial code, and I wasn’t that worried about the code not working. I did write tests for that code — even if later — because the test made a statement about how the code should work and keeps someone else from accidentally breaking the back pressure subsystem by changing that method. In a way, putting that test in the code acts as documentation for later developers.

Before wrapping up with a giant integration test, let’s talk about…

State vs Interaction Testing

One way or another, most automated tests are going to fall into the rough structure of Arrange-Act-Assert where you connect known inputs to expected outcomes for some kind of action or determination within your codebase. Focusing on assertions, most of the time developers are using state-based testing where the tests are validating the expected value of:

  • A return value from a method or function
  • The state of an object
  • Changes to a database or file

Here’s a simple example from Wolverine that tests some exception handling code with a state-based test:

    [Fact]
    public void type_match()
    {
        var match = new TypeMatch<BadImageFormatException>();
        match.Matches(new BadImageFormatException()).ShouldBeTrue();
        match.Matches(new DivideByZeroException()).ShouldBeFalse();
    }

In contrast, interaction-based testing involves asserting on the expected signals passed or messages passed between two or more elements of code. You probably already know this from mock library usage. Here’s an example from Wolverine code that I’ll explain and discuss more below:

    [Fact]
    public async Task do_not_actually_send_outgoing_batched_when_the_system_is_trying_to_shut_down()
    {
        // This is a cancellation token for the subsystem being tested
        theCancellation.Cancel();

        // This is the "action"
        await theSender.SendBatchAsync(theBatch);

        // Do not send on the batch of messages if the
        // underlying cancellation token has been marked
        // as cancelled
        await theProtocol.DidNotReceive()
            .SendBatchAsync(theSenderCallback, theBatch);
    }

Part of Wolverine’s mission is to be a messaging tool between two or more processes. The code being tested above takes part of sending outgoing messages in a background test. When the application has signaled that it is shutting down through the usage of a CancellationToken, the BatchSender class being tested above should not send any more outgoing messages. I’m asserting that behavior by checking that a certain interaction between BatchSender and a raw socket handling class was not called with new messages, and therefore, no outgoing messages were sent.

A common criticism of the testing technique I used above is something to the effect of “why do I care whether or not a method was called, I only care about the actual impact of the code!” This is a bit semantic, but my advice here is to say (and think to yourself) that you are asserting on the decision whether or not to send outgoing messages when the system itself is trying to shut down.

As to whether or not to use state-based vs interaction-based testing, I’d say that is a case by case decision. If you can easily verify the expected change of state or expected result of an action, definitely opt for state-based testing. I’d also use state-based testing anytime that the necessary interactions are unclear or confusing, even if that means opting for a bigger more “sociable” test or a full blown integration test.

However, to repeat an earlier theme, there are plenty of times when it’s easiest in code to separate the decision made to take an action from testing the result of that action in code. Here’s an example from my own work from just last week adding some back pressure protection to the message listening subsystem in Wolverine.

Summary of Test Driven Development So Far

My goal with this post was to introduce a lot of ideas and concepts I like to use with TDD in the context of a non-trivial, but still not too big, development of a real life feature that was built with TDD.

I briefly mentioned some of my old “Jeremy’s Rules of Test Driven Development” that really just amount to some heuristic tools to think through separation of concerns through the lens of what makes unit testing easier or at least possible:

  • Test Small before Testing Big
  • Isolate the Ugly Stuff
  • Keep a Short Tail
  • Push, don’t Pull — I didn’t have an example for this in the back pressure work, but I’ll introduce this in its own post some day soon

I also discussed state-based vs interaction-based testing. I think you need both in your mental toolbox and have some idea of when to apply both.

I also introduced responsibility driven design with an eye toward how that can help TDD efforts.

In my next post I think I’ll revisit the back pressure feature from Wolverine and show how I ultimately created an end to end integration test that got cut from this post because it’s big, hugely complicated, and worthy of its own little post.

After that, I’ll do some deeper dives on some of the design techniques and testing concepts that I touched on in this post.

Until later, Jeremy out…

Leave a comment