Using Mocks or Stubs, Revisited

This post is a request from work. As part of our ongoing effort to convert a very large application from RavenDb to Marten for its backing persistence, we want to take some time to reconsider our automated testing strategies in regards to using unit tests versus integration tests versus full end to end tests.

I originally wrote a series of blog posts on these subjects in my CodeBetter days about a decade ago (the formatting of the old posts has all been destroyed after several blog engine migrations, sorry about that):

I read over those posts just now and feel like the content mostly holds up today (concepts and fundamentals tend to be useful much longer than any specific technology).

What’s Different Today?

For one, a couple years back there was a somewhat justified backlash against the over use of mocking libraries in unit tests. For another, I think there’s more of a bias in favor of doing much more integration testing and a focus on “vertical slice” testing than there used to be. For my part, I’d say that I use mock objects much more rarely now than I would have a decade ago.

Most of my coding efforts these days go into OSS projects, so a quick rundown:

Marten is mostly tested through integration tests all the way through to the underlying Postgresql database. I did a count today and only found about 5-6 test fixture classes that use NSubstitute for mocking or stubbing. This is mostly because I’m not confident enough in my knowledge of Postgresql JSON manipulation or the underlying Npgsql library mechanics to put much faith in intermediate tests that would just verify expected SQL strings or verifying the “correct” interaction with ADO.Net objects.
StructureMap is an old codebase, and you can see many different styles of testing throughout its history. At this point, there are very, very few unit tests (~20 out of about 1100) that use mock objects. With StructureMap, it’s generally very fast to set up full end to end scenarios for new features or bug fixes and that’s now what I prefer.

I’m not yet ready to say that mocking libraries are ready for the waste bin, but even I use them much more rarely now. Some of that may be that I’m in very different problem domains than I was a decade ago when we were still using the Model View Presenter architecture that made the usages of mocking more natural. What little client side work I do today is all in Javascript in the browser, and integration testing is much easier in that context than it ever was in WinForms or WPF and I don’t feel the need for mocks nearly as often.

State vs Interaction Testing

I spent a lot of time in my summers growing up helping my grandfather around the farm. Since there’s nothing in this world that breaks more often than farming implements (hay balers were the worst, but I remember the combine being pretty nasty too), I spent a lot of time using wrenches of all sorts. Ratchets when you could to be faster, closed ended wrenches if you had enough room to attach them, or open ended wrenches for tight spaces where you just couldn’t get to it otherwise. My point being that all those different tools for the same basic job were necessary to effectively get at all the bolts and nuts in all the crazy angles and tight spaces we ran into working on my grandfather’s equipment.

wrench-988762_960_720

Similar to my old choice of open or closed ended wrenches, when you’re writing automated tests, you’re measuring the expected outcome of the test in two ways:

By measurable changes in the system state or a value returned from a function call. You might be verifying data written to the system database, or files being dropped, or content appearing on a screen. This is what’s known as state-based testing.
By watching the messages sent between classes or subsystems. This style of testing is interaction-based testing.

Most developers are going to be more naturally comfortable with state-based testing and many developers will flat out refuse to use anything different. That being said, interaction testing is often the “open ended wrench” of test automation you have to use when it’s going to be much more work to verify some part of the system with state based checks.

A common objection I’ve seen in regards to mocking is “why would I ever care that a particular method was called?” I’m going to apply a little bit of sophistry here, because you don’t necessarily care that a method was called so much as “did it make the right decision” that happens to be determined.

Interaction testing is probably most effective when you’re purposely separating code that decides what actions to take next from actually taking that action — especially if your choice on verifying that action was taken is to either:

Verify that a particular message was sent to initiate the action
Check through the state of an external database, file, or email inbox to see if we can spot some kind of evidence that the desired action took place

My preference in this kind of case has always been to use mock objects to test that the decision to carry out the action takes place correctly, and then test that action completely independently.

Some examples of when I would still reach for interaction testing, often with a mock object:

Routing type of logic like “send this box to this warehouse”
Deciding to carry out actions like
From one of our systems at work, whether or not we should mark an active call as “on hold” or in an “active” state.

To get more specific, and since this post was originally supposed to be specifically about how we should be testing with Marten, here’s some examples of operations that I think are or are not appropriate to mock:

[Fact]
public void good_usages()
{
    var session = Substitute.For<IDocumentSession>();

    // Were the changes to a session saved?
    session.Received().SaveChanges();

    // or not
    session.DidNotReceive().SaveChanges();

    // Was a new Issue document stored into the session?
    var issue = new Issue {};
    session.Received().Store(issue);
}

[Fact]
public void not_places_or_interfaces_you_should_mock()
{
    var session = Substitute.For<IDocumentSession>();

    User user = null;

    // No possible way you should stub this functionality
    // Go to a fullblown integration test
    var issue = session
        .Query<Issue>()
        .Include<User>(x => x.AssigneeId, x => user = x)
        .FirstOrDefault(x => x.Title.StartsWith("some problem"));
}

Mocks versus Stubs

A lot of people get genuinely upset about even trying to make a distinction between mocks and stubs and some mocking tools even made it a point of pride to make absolutely no distinction between the two different things. Regardless of implementation, the important difference is strictly in the role of the testing “double” within the test.

“Stub” means that you are simply pre-canning the response to a request for data. You use stubs just to set up test inputs
“Mock” objects are used to verify the interactions with the object

To put that in context, here’s an example of stubbing and mocking using NSubstitute against Marten’s IDocumentSession interface:

[Fact]
public void stub_versus_mock()
{
    // Create a substitute IDocumentSession that could be used
    // as either a mock or stub
    var session = Substitute.For<IDocumentSession>();
    
    // Use session as a stub to return known data for
    // the given issue id
    var issueId = Guid.NewGuid();
    session.Load<Issue>(issueId).Returns(new Issue{});

    // Then inside the real code,
    var issue = session.Load<Issue>(issueId);

    // If the real code should be committing the current
    // Marten unit of work, we could verify that by seeing
    // if the SaveChanges() method was called.

    session.SaveChanges();

    // Using session as a mock object
    session.Received().SaveChanges();
}

I may be the last person left who thinks this, but I think it’s still valuable to be conscious of the different roles of mocks and stubs when you’re formulating a testing strategy. I also prefer that folks use the terminology correctly just for better communication. In reality though, most developers are so confused by the differences that most people have just thrown their hands up in the air and use the terms all interchangeably.

When are Stubs Desirable?

Automated tests are only effective when you can consistently create a known system state and inputs, then exercise the system to measure expected outputs. Quite often your system will have some architectural dependency on data provided by some kind of external system or subsystem that’s either impossible to control or just too much mechanical work to setup.

As I talked about last year in Succeeding with Automated Testing, I strongly believe in the effectiveness and efficiency of whitebox testing over insisting that only black box tests are valid. Following that philosophy, I’ll almost automatically prefer to use stubs in place of any external dependency that we can’t really control in the test data setup. Examples from our work include:

An identity service from a government entity that isn’t always available
A centralized database where we have no ability to set up on our own (using Docker might be an alternative here to start from a known state). Shared, centralized databases are hell in automated tests.
Authentication subsystems, especially Windows authentication
Active Directory access

Avoid Chatty Interfaces

Anytime you’re having to deal with a chatty interface that takes a lot of interactions to perform. Taking our Marten project as an example, it takes a lot of calls to manipulate the ADO.Net objects to do the simple work of resolving a document from a single row returned from an ADO.Net reader:

public virtual T Resolve(int startingIndex, DbDataReader reader, IIdentityMap map)
{
    if (reader.IsDBNull(startingIndex)) return null;

    var json = reader.GetString(startingIndex);
    var id = reader[startingIndex + 1];

    return map.Get <T> (id, json);
}

I could technically test the Resolve() method shown above by using mock objects in place of the IIdentityMap and the underlying database reader, but my tests would require several mocking expectations just to get both the reader and identity map to behave the way the test needs. A mocked test would certainly run faster than an end to end test, but in this case, only the end to end test would tell me with any confidence that the Resolve() method truly works as it should.

Avoid Mocking Interfaces You Don’t Understand

I used to say that you shouldn’t mock an interface that you don’t control, but the real danger is mocking any interface that you don’t entirely understand. I say this because it’s perfectly common to write a unit test using mock objects that passes, then turns out to fail when you run it in real life or within some kind of integration test. Maybe the best way of saying this is that it’s only appropriate to use mocking when the interaction being measured will definitely tell you something useful about how the code is going to behave.