Testing effectively — with or without mocks or stubs

My team at MedeAnalytics are working with our development teams on a long term initiative to improve the effectiveness of our developer testing practices and encouraging more Test Driven Development in daily development. As part of that effort, it’s time for my organization to talk about how — and when — we use test doubles like mock objects or stubs in our daily work.

I think most developers have probably heard the terms mock or stub. Mocks and stubs are examples of “test doubles” that refer to any kind of testing object or function that is substituted a production dependency while testing or in early development before the real dependency is available. At the moment, I’m concerned that we may be overusing mock objects and especially dynamic mocking tools when other types of testing doubles could be easier or we should be switching to integration testing and be testing against the real dependencies.

Let’s first talk about the different kinds of tests that developers write, both as part of a Test Driven Development workflow and tests that you may add later to act as regression tests. When I was learning about developer testing, we talked about a pretty strict taxonomy of test types using the old Michael Feathers definition of what constituted a unit test. Unit tests typically meant that we tested one class at a time with all of its dependencies replaced with some kind of test double so we could isolate the functionality of just that one test. We’d also write some integration tests that ran part or all of the application stack, but that wasn’t emphasized very much at the time.

Truth be told, many of the mock-heavy unit tests I wrote back then didn’t provide a lot of value compared to the effort I put into authoring them, and I learned the hard way in longer lived codebases like StructureMap that I was better off in many cases breaking the “one class” rule of unit tests and writing far more coarser grained tests that were focused on usage scenarios because the fine-grained unit tests actually made it much harder to evolve the internal structure of the code. In my later efforts, I switched to mostly testing through the public APIs all the way down the stack and got much better results.

Flash forward to today, and we talk a lot about the testing pyramid concept where code really needs to be covered by different types of tests for maximum effective test coverage – lots of small unit tests, a medium amount of a nebulously defined middle ground of integration tests, and a handful of full blown, end to end black box tests. The way that I personally over-simplify this concept is to say:

Test with the finest grained mechanism that tells you something important

Jeremy Miller (me!)

For any given scenario, I want developers to consciously choose the proper mix of testing techniques that really tells you that the code fulfills its requirements correctly. At the end of the day, the code passing all of its integration tests should go along way toward making us confident that the code is ready to ship to production.

I also want developers to be aware that unit tests that become tightly coupled to the implementation details can make it quite difficult to refactor that code later. In one ongoing project, one of my team members is doing a lot of work to optimize an expensive process. We’ve already talked about the need to do the majority of his testing from outside-in with integration tests so he will have more freedom to iterate on completely different internal mechanisms while he pursues performance optimization.

I think I would recommend that we all maybe stop thinking so much about unit vs integration tests and think more about tests being on a continuous spectrum between “sociable” vs “solitary” tests that Marten Fowler discusses in On the Diverse And Fantastical Shapes of Testing.

To sum up this section, I think there are two basic questions I want our developers to constantly ask themselves in regards to using any kind of test double like a mock object or a stub:

  1. Should we be using an integration or “sociable” test instead of a “solitary” unit test that uses test doubles?
  2. When writing a “solitary” unit test with a test double, which type of test double is easiest in this particular test?

In the past, we strongly preferred writing “solitary” tests with or without mock objects or other fakes because those tests were reliable and ran fast. That’s still a valid consideration, but I think these days it’s much easier to author more “socialable” tests that might even be using infrastructure like databases or the file system than it was when I originally learned TDD and developer testing. Especially if a team is able to use something like Docker containers to quickly spin up local development environments, I would very strongly recommend writing tests that work through the data layer or call HTTP endpoints in place of pure, Feathers-compliant unit tests.

Black box or UI tests through tools like Selenium are just a different ball game altogether and I’m going to just say that’s out of the scope of this post and get on with things.

As for when mock objects or stubs or any other kind of test double are or are not appropriate, I’m going to stand pat on what I’ve written in the past:

After all of that, I’d finally like to talk about all the different kinds of test doubles, but first I’d like to make a short digression into…

The Mechanics of Automated Tests

I think that most developers writing any kind of tests today are familiar with the Arrange-Act-Assert pattern and nomenclature. To review, most tests will follow a structure of:

  1. “Arrange” any initial state and inputs to the test. One of the hallmarks of a good test is being repeatable and creating a clear relationship between known inputs and expected outcomes.
  2. “Act” by executing a function or a method on a class or taking some kind of action in code
  3. “Assert” that the expected outcome has happened

In the event sourcing support in Marten V4, we have a subsystem called the “projection daemon” that constantly updates projected views based on incoming events. In one of its simpler modes, there is a class called SoloCoordinator that is simply responsible for starting up every single configured projected view agent when the projection daemon is initialized. To test the start up code of that class, we have this small test:

    public class SoloCoordinatorTests
    {
        [Fact]
        public async Task start_starts_them_all()
        {
            // "Arrange" in this case is creating a test double object
            // as an input to the method we're going to call below
            var daemon = Substitute.For<IProjectionDaemon>();
            var coordinator = new SoloCoordinator();

            // This is the "Act" part of the test
            await coordinator.Start(daemon, CancellationToken.None);

            // This is the "Assert" part of the test
            await daemon.Received().StartAllShards();
        }
    }

and the little bit of code it’s testing:

        public Task Start(IProjectionDaemon daemon, CancellationToken token)
        {
            _daemon = daemon;
            return daemon.StartAllShards();
        }

In the code above, the real production code for the IProjectionDaemon interface is very complicated and setting up a real one would require a lot more code. To short circuit that set up in the “arrange” part of the test, I create a “test double” for that interface using the NSubstitute library, my dynamic mock/stub/spy library of choice.

In the “assert” phase of the test I needed to verify that all of the known projected views were started up, and I did that by asserting through the mock object that the IProjectionDaemon.StartAllShards() method was called. I don’t necessarily care here that that specific method was called so much as that the SoloCoordinator sent a logical message to start all the projections.

In the “assert” part of a test you can either verify some expected change of state in the system or return value (state-based testing), or use what’s known as “interaction testing” to verify that the code being tested sent the expected messages or invoked the proper actions to its dependencies.

See an old post of mine from 2005 called TDD Design Starter Kit – State vs. Interaction Testing for more discussion on this subject. The old code samples and formatting are laughable, but I think the discussion about the concepts are still valid.

As an aside, you might ask why I bothered writing a test for such a simple piece of code? I honestly won’t write bother writing unit tests in every case like this, but a piece of advice I read from (I think) Kent Beck was to write a test for any piece of code that could possibly break. Another rule of thumb is to write a test for any vitally important code regardless of how simple it is in order to remove project risk by locking it down through tests that could fail in CI if the code is changed. And lastly, I’d argue that it was worthwhile to write that test as documentation about what the code should be doing for later developers.

Mocks or Spies

Now that we’ve established the basic elements of automated testing and reviewed the difference between state-based and interaction-based testing, let’s go review the different types of test doubles. I’m going to be using the nomenclature describing different types of test doubles from the xUnit Patterns book by Gerard Meszaro (I was one of the original technical reviewers of that book and used the stipend to buy a 30gb iPod that I had for years). You can find his table explaining the different kinds of test doubles here.

As I explain the differences between these concepts, I recommend that you focus more on the role of a test double within a test than get too worked up about how things happen to be implemented and if we are or are not using a dynamic mocking tool like NSubstitute.

The most commonly used test double term is a “mock.” Some people use this term to mean “dynamic stand in object created dynamically by a mocking tool.” The original definition was an object that can be used to verify the calls or interactions that the code under test makes to its dependencies.

To illustrate the usage of mock objects, let’s start with a small feature in Marten to seed baseline data in the database whenever a new Marten DocumentStore is created. That feature allows users to register objects implementing this interface to the configuration of a new DocumentStore:

    public interface IInitialData
    {
        Task Populate(IDocumentStore store);
    }

Here’s a simple unit test from the Marten codebase that uses NSubstitute to do interaction testing with what I’d still call a “mock object” to stand in for IInitialData objects in the test:

        [Fact]
        public void runs_all_the_initial_data_sets_on_startup()
        {

            // These three objects are mocks or spies
            var data1 = Substitute.For<IInitialData>();
            var data2 = Substitute.For<IInitialData>();
            var data3 = Substitute.For<IInitialData>();

            // This is part of a custom integration test harness
            // we use in the Marten codebase. It's configuring and
            // and spinning up a new DocumentStore. As part of the
            // DocumentStore getting initialized, we expect it to
            // execute all the registered IInitialData objects
            StoreOptions(_ =>
            {
                _.InitialData.Add(data1);
                _.InitialData.Add(data2);
                _.InitialData.Add(data3);
            });

            theStore.ShouldNotBeNull();

            // Verifying that the expected interactions
            // with the three mocks happened as expected
            data1.Received().Populate(theStore);
            data2.Received().Populate(theStore);
            data3.Received().Populate(theStore);
        }

In the test above, the “assertions” are just that at some point when a new Marten DocumentStore is initialized it will call the three registered IInitialData objects to seed data. This test would fail if the IInitialData.Populate(IDocumentStore) method was not called on any of the mock objects. Mock objects are used specifically to do assertions about the interactions between the code under test and its dependencies.

In the original xUnitPatterns book, the author also identified another slightly different test double called a “spy” that recorded the inputs to itself that could be interrogated in the “Assert” part of a test. That differentiation did make sense years ago when early mock tools like RhinoMocks or NMock worked very differently than today’s tools like NSubstitute or FakeItEasy.

I used NSubstitute in the sample above to build a mock dynamically, but at other times I’ll roll a mock object by hand when it’s more convenient. Consider the common case of needing to verify that an important message or exception was logged (I honestly won’t always write tests for this myself, but it makes for a good example here).

Using a dynamic mocking tool (Moq in this case) to mock the ILogger<T> interface from the core .Net abstractions just to verify that an exception was logged could result in code like this:

_loggerMock.Received().Log(
            _loggerMock.Verify(
                x => x.Log(
                    LogLevel.Error,
                    It.IsAny<EventId>(),
                    It.IsAny<It.IsAnyType>(),
                    It.IsAny<Exception>(),
                    It.IsAny<Func<It.IsAnyType, Exception, string>>()),
                Times.AtLeastOnce);

In this case you have to use an advanced feature of mocking libraries called argument matchers as a kind of wild card to match the expected call against data generated at runtime you don’t really care about. As you can see, this code is a mess to write and to read. I wouldn’t say to never use argument matchers, but it’s a “guilty until proven” kind of technique to me.

Instead, let’s write our own mock object for logging that will more easily handle the kind of assertions we need to do later (this wasn’t a contrived example, I really have used this):

    public class RecordingLogger<T> : ILogger<T>
    {
        public IDisposable BeginScope<TState>(TState state)
        {
            throw new NotImplementedException();
        }

        public bool IsEnabled(LogLevel logLevel)
        {
            return true;
        }

        public void Log<TState>(
            LogLevel logLevel, 
            EventId eventId, 
            TState state, 
            Exception exception, 
            Func<TState, Exception, string> formatter)
        {
            // Just add this object to the list of messages
            // received
            var message = new LoggedMessage
            {
                LogLevel = logLevel,
                EventId = eventId,
                State = state,
                Exception = exception,
                Message = formatter?.Invoke(state, exception)
            };

            Messages.Add(message);
        }

        public IList<LoggedMessage> Messages { get; } = new List<RecordingLogger<T>.LoggedMessage>();

        public class LoggedMessage
        {
            public LogLevel LogLevel { get; set; }

            public EventId EventId { get; set; }

            public object State { get; set; }

            public Exception Exception { get; set; }

            public string Message { get; set; }
        }

        public void AssertExceptionWasLogged()
        {
            // This uses Fluent Assertions to blow up if there
            // are no recorded errors logged
            Messages.Any(x => x.LogLevel == LogLevel.Error)
                .Should().BeTrue("No exceptions were logged");
        }
    }

With this hand-rolled mock object, the ugly code above that uses argument matchers just becomes this assertion in the tests:

// _logger is a RecordingLogger<T> that 
// was used as an input to the code under
// test
_logger.AssertExceptionWasLogged();

I’d argue that that’s much simpler to read and most certainly to write. I didn’t show it here, but with you could have just interrogated the calls made to a dynamically generated mock object without using argument matchers, but the syntax for that can be very ugly as well and I don’t recommend that in most cases.

To sum up, “mock objects” are used in the “Assert” portion of your test to verify that expected interactions were made with the dependencies of the code under tests. You also don’t have to use a mocking tool like NSubstitute, and sometimes a hand-rolled mock class might be easier to consume and lead to easier to read tests.

Pre-Canned Data with Stubs

“Stubs” are just a way to replace real services with some kind of stand in that supplies pre-canned data as test inputs. Whereas “mocks” refer to interaction testing, “stubs” are to provide inputs in state-based testing. I won’t go into too much detail because I think this concept is pretty well understood, but here’s an example of using NSubstitute to whip up a stub in place of a full blown Marten query session in a test:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(session);
            
            // carry out the Arrange and Assert parts of the test
        }

Again, “stub” refers to a role within the test and not how it was built. In memory database stand ins in tools like Entity Framework Core are another common example of using stubs.

Dummy Objects

A “dummy” is a test double who’s only real purpose is to act as a stand in service that does nothing but yet allows your test to run without constant NullReferenceException problems. Going back to the ServiceThatLooksUpUsers in the previous section, let’s say that the service also depends on the .Net ILogger<T> abstraction for tracing within the service. We may not care about the log messages happening in some of our tests, but ServiceThatLooksUpUsers will blow up if it doesn’t have a logger, so we’ll use the built in NullLogger<T> that’s part of the .Net logging as a “dummy” like so:

        [Fact]
        public async Task using_a_stub()
        {

            var user = new User {UserName = "jmiller",};

            // I'm stubbing out Marten's IQuerySession
            var session = Substitute.For<IQuerySession>();
            session.LoadAsync<User>(user.Id).Returns(user);

            var service = new ServiceThatLooksUpUsers(
                session,
                
                // Using a dummy logger
                new NullLogger<ServiceThatLooksUpUsers>());

            // carry out the Arrange and Assert parts of the test
        }

Summing it all up

I tried to cover a lot of ground, and to be honest, this was meant to be the first cut at a new “developer testing best practices” guide at work so it meanders a bit.

There’s a few things I would hope folks would get out of this post:

  • Mocks or stubs can sometimes be very helpful in writing tests, but can also cause plenty of heartburn in other cases.
  • Don’t hesitate to skip mock-heavy unit tests with some sort of integration test might be easier to write or do more to ascertain that the code actually works
  • Use the quickest feedback cycle you can get away with when trying to decide what kind of testing to do for any given scenario — and sometimes a judicious usage of a stub or mock object helps write tests that run faster and are easier to set up than integration tests with the real dependencies
  • Don’t get tunnel vision on using mocking libraries and forget that hand-rolled mocks or stubs can sometimes be easier to use within some tests

One thought on “Testing effectively — with or without mocks or stubs

Leave a comment