Code coverage is a common metric used by teams who implement unit tests for their software. On the surface, coverage is a useful tool. However, misconceptions and overreliance on coverage can lead to antipatterns that can reduce a teams effectiveness.

In this post I make some references to this repository, which contains a solution for the Game of Life kata in TypeScript. The codebase uses a few different NPM packages for testing purposes.

  • Mocha - Testing framework.
  • Chai - Assertion library
  • Istanbul - JavaScript test coverage
  • NYC - Istanbul’s CLI

This is an opinion piece, not a tutorial, so prior knowledge of these tools is unnecessary. :)

What is Code Coverage?

Code coverage (also referred to as test coverage) tells you how much of your code was executed during testing. This is important because if code isn’t executed, then it isn’t being tested. A codebase with no tests would be considered 0% covered.

There are a few different ways to measure code coverage, such as:

  • Statement coverage tells you how many statements are executed.
  • Branch coverage tells you how many conditional branches are executed.
  • Function coverage tells you how many functions are called.
  • Line coverage tells you how many lines were executed (which is slightly different than statements).

Together, these metrics provide a decent quantitative overview of your test coverage.

An Example Coverage Report

In the Game of Life codebase, the NPM script test:coverage can be used to generate a code coverage report. This script uses NYC to execute our Mocha tests and generate a simple ASCII report after the results. If you have uncovered lines, it will list those as well.

$ npm run test:coverage

> kata-js-game-of-life@1.0.0 test:coverage /Users/brittany/code/bannmoore/kata-js-game-of-life
> nyc mocha -r ts-node/register **/*.spec.ts



  Game of life
    main
      ✓ should increment the board by generation


  1 passing (12ms)

-----------------|----------|----------|----------|----------|-------------------|
File             |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
-----------------|----------|----------|----------|----------|-------------------|
All files        |      100 |    81.48 |      100 |      100 |                   |
 game-of-life.ts |      100 |    81.48 |      100 |      100 |    50,51,62,63,67 |
-----------------|----------|----------|----------|----------|-------------------|

Coverage Antipatterns

Now, we know what code coverage measures: the amount of code executed during testing. This by itself is a valuable metric, since it helps the team identify and close testing gaps. Coverage is also accessible to teams that are new to testing, since it’s relatively easy to measure.

The problem is, like many tools in software, it can be adopted carelessly. Here are some common antipatterns I’ve seen, that arise when code coverage is misused or misunderstood.

Mistaking Test Coverage for Test Quality

Coverage is a quantitative metric, not a qualitative one. A test suite can be brittle, difficult to maintain, and overly complicated while maintaining high levels of test coverage. Think of your coverage reporter like this: its job is to tell you where potholes in the street are. It doesn’t care how well you fill them.

“The phrase, complete coverage, is misleading… Even if you achieve complete coverage for a given population of tests (such as, all lines of code tested), you have not done complete, or even adequate, testing.” - Cem Kaner

Let’s take a look at my Game of Life tests. We only need two tests to achieve maximum coverage.

// this test on its own generates substantial coverage
it('should increment the board by generation', function () {
  const worldData = 'Generation 1:\n4 8\n........\n....*...\n...**...\n........\n\n'
  expect(subject.main(worldData)).to.equal('Generation 2:\n4 8\n........\n...**...\n...**...\n........\n\n')
})

// adding this test will bring us to 100% coverage
it('should handle empty string', function () {
  expect(subject.main('')).to.equal('Generation 0:\n-1 -1\n\n')
})

These two tests provide a basic happy path for two scenarios: a large board and an invalid (empty) board. Every line, branch, and function is executed. Can we pack up our laptops and go home now? Well, not exactly.

I’d expect two tests to catch more bugs than zero, which is good, but let’s consider what happens if one of these tests fails. There are over 10 different methods collaborating in this program. If a test fails, I’ll have no idea which function caused the failure. Coverage doesn’t tell me whether my tests will fail in a valuable way.

To be clear, the tests above are still useful; they provide assurance that my program is wired together properly. To make debugging easier on my future self, though, I should also add more granular tests for individual units of business logic.

Overestimating Coverage’s Impact on Bugs

When using code coverage as a metric (particularly as a business-facing one), it’s important to be clear on what the benefits of coverage are and are not. I’ve seen non-technical team members look at code coverage and think it means “immunity to bugs”.

My Game of Life tests have 100% coverage, but they actually miss some edge cases. To prove it, I’ll reintroduce a relatively common bug (I actually did this while I was developing this kata for the first time).

Line 106 of game-of-life.ts uses find to check whether we’ve already looked at a particular neighbor:

if (indices.find((i: number) => i === index) === undefined) { // correct

I’ll break this line by using a ! check instead of == undefined. This is a bug because 0 is a valid index that can be returned by find (oops).

if (!indices.find((i: number) => i === index)) { // bug

Now, I run my 100% coverage tests… and they pass. Uh-oh.

$ npm run test

> kata-js-game-of-life@1.0.0 test /Users/brittany/code/bannmoore/kata-js-game-of-life
> mocha -r ts-node/register **/*.spec.ts



  Game of life
    main
      ✓ should increment the board by generation
      ✓ should handle empty string


  2 passing (21ms)

My test cases are executing my code, but they aren’t comprehensive. A test using a smaller board with a similar layout will fail, correctly, while our bug is in place.

it('should handle a 2x2 grid', function () {
  const worldData = 'Generation 1:\n2 2\n*.\n**\n\n'
  expect(subject.main(worldData)).to.equal('Generation 2:\n2 2\n**\n**\n\n')
})

Reaching a high level of code coverage does not mean you’ll never add new tests or rework existing ones. And that’s okay! Just remember this: in the end, your code’s resiliance is a measure of test design, not coverage.

Diminishing Returns of Coverage Efforts

It’s pretty easy to get 100% code coverage of a kata program like the Game of Life. But when you’re dealing with software out in the wild - a monolith, a distributed system, etc. - the lure of 100% coverage becomes a siren call. Tempting, but often personally and professionally unhelpful.

The problem is that coverage tools view all code equally. It has no concept of churn or the 80:20 rule. As far as your coverage tool is concerned, all lines of code are equally worthy of coverage. This egalitarian worldview is admirable, but it’s not an accurate representation of your codebase’s intrinsic qualities. Especially if your codebase is a legacy monolith or otherwise difficult to test.

I don’t want to be pushed around by metrics, but I do want the insights they can give me. - Sandi Metz

I tend to think that increasing code coverage is worth doing if it’s relatively straightforward. But if writing additional tests on your project starts to feel really painful, stop and assess. Painful testing often means your code isn’t designed for testability or that you are writing low-value tests (see below). A test that adds 1% of coverage but has to be updated in every single PR probably isn’t worth the effort.

When you can’t reach your coverage goal, treat that as valuable information about your codebase. A coverage gap may exist because you can’t test a section easily, or because the churn is so low that it’s not worth the refactoring required. Coverage as a metric should be a means to an end, not the end itself.

Incentivizing Low-Value Tests

Since code coverage is easy to measure, it’s tempting for leaders to put incentives on it. And we all know how developers respond to incentives, right?

Even if you don’t put a reward on coverage, be cautious about inflating its importance. If your testing conversations revolve around coverage and nothing else, you probably aren’t talking about how your test suite is helping the team. That’s a recipe for trouble. A common outcome of an over-emphasis on coverage are unhelpful tests.

…in recent times I have found myself saying more often, “Why did you write that test?” instead of, “You should write a test.” - Dan Lebrero

A low-value test may just be one that never fails (because the code under test is never modified or extremely simple), or it could be a strange test that doesn’t really tell you anything important.

My favorite example of this is in framework applications, like React or Angular. React projects generally have an index.js that loads your App component. There are StackOverflow questions out there from developers asking how to test this file, in order to achieve 100% code coverage. The correct answer (in my opinion) is “don’t”. An integration test is more than sufficient for framework boilerplate, especially when it contains no business logic.

I absolutely understand why a developer would ask this question, though - it’s a direct consequence of over-emphasizing code coverage. Especially with junior devs who are still learning about good testing practices. It’s gotten to the point where anything under 100% bothers us, and that energy is probably better spent on other things.

TL;DR

There are many ways to use code coverage incorrectly that hurt your code and your team in the long run. But it’s still a valuable tool, when played to its strengths. The best defense against misusing coverage (or any testing tool) is to have open conversations how it benefits your test suite and team goals.

Thanks for reading!