Atakua's Diary

Unit tests as finite state machine approximation of software

Finite state machine (FSM) is both a strict theoretical concept and a practically useful mechanism to build and reason about certain kinds of software. Unit tests is a category of tests that focuses on having the minimum of dependencies when making verifiable assertions about production software’s behavior. Neither is a silver bullet to all problems of computer science or software engineering, but within the areas of their applicability, they are indispensable.

I will try to demonstrate an interesting connection between FSMs and unit tests, and what effects this observation has on testability of production code. While FSMs are routinely used in production code, there are large classes of programs that cannot be based off them. Yet, we can still unit-test many aspects of them, unless we encounter the case of self-inflicted tautological test cases.

Isomorphism between FSM transition table and unit test suite

Isomorphism between FSM transition table and unit test suite

Let’s first define components of finite state machines and anatomy of unit tests. Then I will explain the bidirectional mapping between them.

Start state — event — end state

An FSM is defined by having a finite set of states, a finite set of events, and a graph of transitions between those states when the events arrive. The graph definition can always be represented by an equivalent table description: for each graph edge it will have exactly one table row starting state — event — end state, non-ambiguously describing that state transition.

Given-When-Then

A properly focused test case follows the structure of a well formulate use case. That use case is formulated as a triplet of three statements:

given the specific starting state of the system under test;
when the certain event happens;
then the system under test ends up in the specified final state.

For the automatic tests, this structure is more often called Arrange-Act-Assert. But the meaning is exactly the same here: we arrange a starting state, we act on it, and we assert that the final state is as expected.

To complete the preparation, let’s mention that a properly tested software comes with a suite of unit tests: a set of independent, non-duplicating test cases formulated as unit tests.

The isomorphic mapping between the two concepts

The next observation is that a suite of unit tests for some piece of production software maps neatly to a state machine related to behavior of that software. Given statements are isomorphic to starting states, when statements describe events and then correspond to end states.

The reverse also holds. If an FSM is hidden in production software, then we can trivially construct a unit test suite that would completely cover behavior of that FSM; each individual test case would map to a single FSM transition.

I can now formulate two statements that follow from everything stated above, which I will put into additional context.

S1. A unit test suite describes a finite state machine, defining some behavior of the associated production software under test.

S2. If your production software contains an FSM inside it, that FSM can be exhaustively tested by unit tests.

Let’s explore two corollaries from these two observations. The first one is connected to our desire to exhaustively test production software. The second one is about the risk of creating tautological tests that do more harm than good.

Impossibility to test all production behavior

Notice that S1 above mentions that “some” behavior of production software can be captured by unit tests. Can we somehow strengthen it to say all behavior? The answer is no.

Useful software often implements abstract machines that are more powerful than an FSM can express. Push-down automata (stack machines) and Turing machines are such classes of computational devices. Compared to state machines, the latter two have memory in forms of stacks or tapes.

It is proven that no amount of finite states can emulate a stack or tapes present in the Push-down and Turing machines. What that means for us is that certain aspects of production software operation, e.g., storing and recalling arbitrary amounts of data, making unbound number of iterations, — cannot be exhaustively tested by a suite of unit tests which are only as powerful as an equivalent FSM.

You would need tests that are Turing-complete themselves to do that. But you do not want to have too many of such tests for serious reasons. And even with tests allowed to be as powerful as they possibly can, not every desirable and definable aspect of production code’s behavior can be automatically tested, the halting problem commands us.

But let’s focus on the good opportunity in all of that.

While we cannot fully pin down the behavior of many kinds of useful production software by using an associated unit test suite, we can still approximate it and fixate many desirable aspects of it. If the range of behaviors that are not tested is purposefully made to be small, then we will reap most benefits of having a solid unit test suite, and keep the amount of higher-level and higher-cost testing (including manual testing) to the minimum.

Looking at it differently, we can also search for smaller state machines hiding in already written, legacy business logic, and re-express those machines as test suites of characterization tests. We won’t be able to capture every aspect of existing behavior here, but we will still reduce the risk of introducing an unexpected regression while maintaining this legacy production code, by having it fixed in the tests.

Tests repeating production software

Let’s consider the statement S2, narrowing it down to the case when the whole production system under test is also explicitly written as an FSM. Such hypothetical production software contains explicitly defined states and events; the transitions are expressed in one of few forms: as a graph, a table, a state pattern etc. When not abused, state machines are a useful and practical pattern to employ for designing many kinds of software.

Will an associated unit test suite be as useful as for other types of software? Not really. What we would achieve in such case in tests is a slightly different reimplementation of exactly what the production code already says. Each test case would be trivially traceable to its equivalent piece in the production code. Any change to the production code will predictably break a single unit test case, and any newly added failing test case will force us to add a new transition in production part to make it pass. This is a clear bad case of having tautological tests.

Why does this happen? Any explicit implementation of FSM factors into two clearly separable pieces: the transition graph/table and the transition function.

The transition function is always the same: find matching start state and event, return their final state. Such function is generic, it contains no traces of the domain-specific business logic. This function can usually be covered by a single test case that checks the fact that the transition indeed happens.
The transition table, on the opposite, contains all the business logic, but none of it is expressed as code. It’s all triplets of data values: (state, event, state). There is no behavior to be tested here. Surely, some higher relations of cross-entry consistency can be verified (e.g. that all transitions are defined once and only once, etc.) But it is not testing the domain behavior, it is type testing; it can be done statically, without running the application. Otherwise either the data table is correct, or it is not.

A test suite for such a construct will contain that single aforementioned test case for the transition function, followed by cases which are reiterations of those value triplets, to be subjected to that function. We can change the format in which those triples are stored (e.g., from a table to a graph, a domain-specific language, some other sparse structure etc.), but there will be no confusion on how those cases are mapped to the data lines in production code.

This is what makes those tests tautological: they restate the same things in the same manner.

Test tautology arising and disappearing in the TDD cycle

It is a bit ironic: we pushed our business domain analysis to the limit. We’ve discovered that the solution to the business problem can be expressed as a state machine. We’ve separated its transition table from the transition function. And by doing that, we made the problem uninteresting to test!

I definitely observed this to happen when writing software using test-driven development (TDD), in early iterations of a new project. When there is not much production code around yet, that code often tends to repeat much of the test case code which has just been written before it. After all, TDD laws tell us to write the absolute minimum of production code, enough to make the latest added test case pass. In many cases simply restating the same thing over makes the test accept it.

Note how it is the production code that is tautological to test code in this process, not the other way around. That is by order of construction: tests come before production code.

But then something else happens. After some time, you decide to refactor already written production code: remove duplication, introduce abstractions, generalize overly specific pieces, etc. If done properly, the production code no longer resembles the test code it was based off, while still conforming to the behavior pinned down in the test suite. The tautology has been expunged from it.

But something even bigger often happens as well. It is at this stage the production code often jumps in its complexity from a finite state machine to having a stack or another type of memory. This means, production code stops being equivalent to an FSM, and it happens during the refactoring. TDD laws do not prevent us from doing that: we are free to modify the production code however we want as long as all existing test cases pass. Generalization transformations is what is capable of adding new behavior.

We do not touch the existing tests during the refactoring: the TDD laws say nothing about when an existing test case may be modified. The tests stay preserved as a collection of given-when-then, a.k.a. start-event-end-state, triplets. The test suite remains to be a state machine.

I have demonstrated before that such FSM, hidden in the test suite, is no longer capable to fully capture all the behavior of now much more expressive production code. By the construction relation between them, however, the test suite still approximates that behavior for all original test inputs. If the tests pass, then the associated production code, no matter how many times refactored, still provably demonstrates the behaviors originally present in and dictated by that same test suite.