Environment & Risk Aware
Test Coverage

All types of software testing have different strengths and risks. Manual, functional, integration, unit, static code analysis; these help assure the quality of the system under test.

Test coverage attempts to quantify how completely a system is tested. Coverage metrics like "code coverage" can't prove a test is ever actually run against the code it covers, and offers no assurance the test is a good one, simply that it exists.

A test is itself a system, and that system's quality should be continually evaluated, and it's usefulness justified.

Principles

Toward this goal, ERA has three principles.

Undefined behavior is not testable and cannot count as coverage.

Tests that are never run, should not be counted as coverage.

When a bug is found that a test should have caught, note that test as *ignored and redesign.

* Failed, broken, ignored or skipped tests do not count as coverage.

ERA principles improve test quality.

ERA coverage score measures how many tests were run, when and where, and how many were not.

Score

The ERA coverage score is a ratio. passing:possible.

Passing=(executed tests - (fail+broken+ignored+skipped))

Possible=(defined scenarios - skipped)

The ERA score shows the percentage of passing tests on any given environment, for a particular version of the system under test, at any given time.

To represent the score as a percentage, simply divide passing by possible and multiply by 100.

Adding the ratios across all environments provides a comprehensive life cycle test coverage score as a single percentage.

Coverage

For unit tests, each individual test represents a scenario. Coverage is determined by passing test executions in a particular environment, for a particular version.

Coverage is different from pass/fail results. ERA coverage shows how many tests have been run, and how many are missing.

An ERA score exposes the relationship between failed, broken and ignored tests in a way pass:fail cannot.

Environment

The software development lifecycle is a stepped process with continuous deployment across nodes, internal and external. Testing should match the cadence of a software system version on it's way through the pipeline.

Some tests need to be skipped in certain environments.

Example: An engineer testing locally during development might avoid functional tests that run slowly.

Example: Automated test code should not be deployed to production or shipped to the customer.

Nodes vs. versions

In order to prove that a test is run in the software lifecycle, you must record when and where it ran.

If a version (say a branch of code) was a static codebase across development pipeline nodes, one coverage score for that version would suffice; but this is rarely the case in practice.

Patches, bug fixes, hotfixes, dependency churn and out of sync code make each version a mutating target, changing between environments.

Only by tracking test execution per environment, can you see where to improve coverage across the software lifecycle.

Risk

The direct risk of writing test code, is poor/low assurance tests.

The ERA test score exposes unused test scenarios, and shows the real effect these risks have on coverage.

Risk factors

Quality. - Does the test catch real bugs?

Maintenance. - Standard technical debt.

Execution frequency. - How often tests are run.

Reusability/portability. - Code or scenario reuse across environment or platform.

Performance cost (execution time). - Slow tests are run less frequently.

These factors contribute to bad tests, or reduced coverage.

Low Quality Tests

Tests should be peer-reviewed before commit.

Tests that never find bugs, never break. Those should be periodically evaluated for their cost of Maintenance and execution time/effort. Reduce scope where appropriate.

Tests that break frequently should be refactored. Reduce scope OR report flaws without assertions, halting execution less.

False Positives

When a test breaks, it might have found a bug, or the test code might be in error. Tests in error could represent a bug, other unexpected behavior of the system, or bad test code.

For the purpose of measuring coverage, it doesn't matter why a test is broken. A failed test is not functional, and cannot provide coverage.

False positives are part of maintenance risk.

How To

Measuring an ERA coverage score is easy. Every time you run tests, manual or automated, record the result, version and environment (context) where it ran.

Compare the passing tests to all defined test scenarios in that context.

Define Expected Behavior

Tests without defined behavior are vague, their value is hard or impossible to quantify.

ERA is only aware of the scenarios that you have defined. Undefined behavior cannot count as coverage. Record all system behavior as test scenarios.

Test scenarios *can* be defined by fully automated test code alone (unit tests), but that impedes code reuse, and is opaque to anyone besides coders (often even to them).

Further, when an automated test breaks, and to be useful it must break, code-defined test scenarios have no option for manual testing. No safety net.

Fall back on manual testing when needed. When an automated test breaks, it no longer covers that system. System behavior should be defined and testable at any level of automation.

The best solution is to define the test scenario in a common human readable language anyone can understand.

Here is where Behavior Driven Development Behavior Specification helps.

Given When Then language is an idiomatic grammar where system behavior is defined, and automated code can be directly linked and executed. But the scenario stands alone with or without automation.

Measure Test Coverage Per Environment

Each time a test of any type is run, record the result, version and environment details.

Realize that every time a test is executed, it happens in a certain environment. Record that detail with the results.

Keep a running total of possible tests in each environment. Decide which tests are applicable and denote unused scenarios as skipped, then total the possible test scenarios.

React to Bugs

Bugs are unexpected behavior of the system under test.

All bugs found by, or outside, your tests prove your test quality.

When bugs surface that a test should have caught, note that test as ignored and refactor.

Become suspicious of tests that never seem to break.

Consider a suite of tests that report bugs without assertions.

Think about test reuse. Where in your software lifecycle can you increase coverage? How can code be leveraged and repurposed?

Track your test maintenance. Outside the scope of measuring coverage, this is an important risk to understand. ERA coverage scores will help clarify how many of your tests are in a broken state at any point in time