Test smells

The classification is based on book:

Gerard Meszaros xUnit Test Patterns: Refactoring Test Code, 2007. I have added a few additional test smells that I use to see.

Each smell is accompanied by suggestions on how to avoid or address it.

Code smells

C1 Obscure test

It is difficult to understand the test at a glance.

How to avoid:

  1. Keep test files small

  2. Treat testing code the same way as production code, i.e. turn it into Clean Code via refactoring.

C2 Conditional test logic

A test contains code that may or may not be executed.

Typical situation: test checks its own applicability before running towards the "main" assertion. Symptom: code with return (void) that is placed inside a conditional statement.

How to avoid

  1. Write test before the production code and ensure that it fails, i.e. that it does not skip the assertion logic by an early exit.

  2. Separate deciding of applicability of a test from the actual test into own file.

  3. Pay close attention to testing code with many if's loops, and multiple returns.

  4. In general, control flow of a test case should be as simple as possible, ideally fully linear with implicit branching (throwing exceptions) at assertion statements.

C3 Hard-to-test production code

Production code is difficult to test.

You want to test functionality that you've just prepared or about to prepare. You discover that writing a test is problematic:

  1. You have no idea where to start. The functionality you are tasked with is not exposed to the outside world that you can control.\ or,

  2. You know where to start, but reaching the code under test would require penetrating several layers of abstraction. Each layer does not bring anything to the test essence, only adds obscurity to its purpose.

How to avoid:

  1. Write tests before writing production code. This way, production code gets designed for testability.

How to address existing problems:

  1. Expose intermediate layers of the software to inspect or directly interact.

  2. Mock dependencies that do not stand on the stimulation-reaction path exercised in the test.

  3. Isolate global dependencies. Easier said than done.

C4 Test code duplication

The same test code is repeated many times.

How to avoid

  1. Use the rule of three strikes to tell which duplicated code to extract.

  2. Extract commonly repeated stuff (setup routines, architectural constants, groups of assertions checking target structures) to a file common to all test cases of a case suite. If/when the contents of the common file becomes useful across test suites, move it to testlib.

C5 Test logic in production

The code that is put into production contains logic that should be exercised only during tests.

How to avoid:

  1. Keep an eye on conditionally compiled code. It should not contain logic only reachable from tests.

  2. Find flags arguments that tell the function do a different thing when called from a test.

  3. Extract test logic from production files into ctests, or into separate Python files.

Behavior smells

B1 Assertion roulette

The code that is put into production contains logic that should be exercised only during tests.

How to avoid:

  1. Follow the AAA rule.

  2. Split existing tests into smaller cases, each ending with own assertion

  3. Physicalize essential dependencies between iterations (when one case depends on artifacts/state created by the previous case) and get rid of artificial dependencies (two cases starting from the same initial state but ordered because it was convenient)

B2 Nondeterministic test

Sometimes they pass and sometimes they fail.

How to avoid:

  1. Keep tests small

  2. If there are environmental dependencies, remove them or isolate them into own prerequisite test cases

  3. Instead of thinking "How would I reduce the rate of failures?", face the enemy and think "What conditions cause this case fail as often as possible?" The latter would help exposing the real cause of instability.

  4. Remove the flaky test. Fewer test cases means lower coverage, but a single flaky test undermines trust to the whole test suite, which is often worse.

B3 Fragile test

A test fails to compile or run when the system under test is changed in ways that do not affect the part the test is exercising.

Reasons:

  1. Test looks at non-essential details together with essential ones. The approach of comparing against a pre-recorded trace often suffers from this.

  2. Tests written in compiled (i.e., not interpreted) languages have linking dependencies for all symbols, even for those which are not reached during the test operation.

  3. Test has a spurious dependency that it does not use. E.g., a simulation scenario may load (and thus depend on) a disk image from which not a single byte is ever read.

How to avoid:

  1. Keep tests small

  2. Keep test dependencies small and explicit. For example, unit tests for large files and classes are more likely to fail when something inside that file/class has changed, even if it is not relevant to this particular case.

  3. Avoid testing against a reference log, trace etc.

  4. Extract dependencies into own test cases.

  5. Do not add a new assertion line "just in case". If you need to test this assertion but it does not correlate with the existing assertions in the test case, have it added in a new case focused around that specific condition.

B4 Frequent debugging

Manual debugging is required to determine the cause of most test failures.

How to avoid:

  1. Avoid eager, obscure, flaky tests (easier said than done)

  2. Avoid tests that generate a lot of logs.

  3. Make sure tests have good failure localization. An error message (including call stack) should point to the failure reason as close as possible.

  4. Test that hangs is an incorrect test. Having a separate watchdog component does not make the test correct. Instead of fixing the test's input data (i.e. production code), the test itself should be redesigned to detect invalid input data.

B5 Manual intervention

A test requires a person to perform some manual action each time it is run.

How to avoid:

  1. If manual intervention is required to start a test, it has to be automated. The simplest is to start it on a periodic basis. The best is an event-driven initiation: run when conditions that affect the outcome of the test have changed. E.g., when a new commit has been added,  or prerequisite tests have generated required artifacts.

  2. If manual intervention is required for in-deep analysis of test artifacts, it may be caused by B1, B2, B4, P1. One can try automate analysis of test artifacts, but instead, it is better to work on simplifying these artifacts so that no automation in processing is required.

B6 Slow tests

The tests take too long to run.

A slow test is not run often enough. It causes failure to localize: multiple commits slip between the test runs

Debugging a slow test means iterations will be long. Causes B4 Frequent debugging. Causes P3 High maintenance costs.

How to avoid:

  1. Write small tests

  2. Separate test phases into independent and explicitly dependent smaller test cases. Parallelize execution of independent cases, avoid running cases with broken prerequisites (fail fast).

  3. Do not depend on slow end-to-end tests alone. For situations discovered from their debugging, prepare small fast focused test cases.

Project-level smells

These smells are often caused by previously described behavior and code smells, when the latter are introduced and ignored on systematic level in the project.

P1 Buggy tests

Bugs are regularly found in the automated tests.

Bug in test:

  1. Failure to report problem when it is present. Reasons: C2, C3

  2. Failure to be silent when there is no problem. Reasons: B3, B2, B6

How to avoid:

  1. Code reviewers should call out smelly tests as the risk for test bugs. All smells should be considered.

  2. Static code analysis with checks specific for test code.

P2 Developers not writing tests

Developers aren’t writing automated tests.

Among the reasons: 

  1. Some reasons are C3 Hard to test code.

  2. Not writing tests before the production code means they will not be written later. Do not fool yourself — no, you won't write as many tests as you would have to after the production code is prepared.

  3. Writing tests, but on a wrong level. Focusing exclusively on end-to-end tests causes B1-B6, C1-C5 (yes, all of them).

How to avoid:

  1. Raise awareness about different classes of tests.

  2. Educate about modern software development practices (test-first and TDD)

P3 High Test maintenance cost

Too much effort is spent maintaining existing tests.

How to avoid:

  1. Do not write tests! No tests == no time spent on maintaining them. All this saved time an be spend on debugging and making more advanced tools to help debugging.

  2. But seriously, pay attention to what are the most common smells in high-maintenance code, and address them. Pay attention to tests having high error localization capabilities.

P4 Production bugs

We find too many bugs during formal tests or in production.

How to avoid:

  1. Run tests frequently enough. All tests should pass before releasing something to production

  2. Keep an eye on Lost tests. A test file being present does not mean it will be run.

  3. Invest in adding a matching regression test after fixing customer's case. Best is to have a reproducer of customer's complaint as a small case. Note: dragging the whole customer environment and calling it a test causes P2, B6, B4, B3, B2, …

  4. Fixate requirements as test cases. Otherwise, an untested requirement is more likely to have a production code bug 

Miscellaneous smells

M1 Chatty test

Test generates a huge number of artifacts (e.g., log messages) even when it passes. When it fails, the actual error message is lost among all the irrelevant stuff (C1 obscure). Copious output may also slow the test down (B6).

How to avoid:

  1. Keep it to Unix principles for command line utilities. A successful execution should generate no diagnostic output.

M2 Test cries "wolf"

A variation of M1 where a log file for passing test contains a lot of text lines with words "warning", "error", "fatal" or others with similar meaning. Typical for long and obscure tests, these lines indicate ignored or corrected problems. When an actual test failure happens and get logged, its message is lost among all other similarly looking error messages.

How to avoid:

  1. Having warnings is unprofessional, they must be fixed or at least kept to minimum, especially if they are coming from components that do not play any role in the specific test.

  2. Having errors in log is misleading. The customer will definitely see them as well and will be confused.

M3 Catch-all test behavior

The catch-all statements in production code are not welcome. The usual recommendation is to explicitly write catch sections for expected problems, and let the runtime deal with all the unexpected ones.

The same rule applies to tests. Negative effects depend on whether the exception is then reported as assertion failure or is ignored.

  1. Having a catch-all leads to printing a generic or misleading error message, worsens failure localization and causes debugging B4.

  2. If the exception caught by catch-all is then ignored (because other expected classes of exceptions were deemed harmless), it will lead to P1 buggy test.

How to avoid:

  1. Don't use catch-all; catch specific exception classes instead.

  2. Don't use equivalents to catch-all. For example, waiting for a timeout and then reporting "something is wrong, go figure" should not be a normal and only assertion in a test case.

M4 Test case has no assertion statements

A case without assertions ensures only that no exceptions are thrown. It ensures nothing about the behavior of the code under test.

This may be considered as a variation of M3. Not expecting anything is similar to catching-all and reporting any situation as generic error. This fails to localize any error and forces a human to debug the case.

Note that sometimes an expectation is that certain piece of production code (the callback) will not be called in a test. The callback may be represented by a test double that unconditionally fails. If it gets called, it will constitute the behavior the test case is supposed to verify. Note that this still smells, unless it is obvious from at a glance that the intention here was to not to fail.

See similar rule of SonarQube: https://rules.sonarsource.com/java/tag/tests/RSPEC-2699


Written by Grigory Rechistov in Uncategorized on 11.11.2021. Tags: programming,


Copyright © 2024 Grigory Rechistov