Oct 03, 2023

A Simpler Testing Pyramid: Getting the Most out of Your Tests

InfoQ Homepage Articles A Simpler Testing Pyramid: Getting the Most out of Your Tests

This item in japanese

Apr 10, 2023 10 min read


Tyson Gern

reviewed by

Matt Campbell

Developers use many different labels to describe their automated tests (unit, integration, acceptance, component, service, end-to-end, UI, database, system, functional, or API). Each of these labels has a different semantic meaning, either describing the scope of the test, the types of actions that the test takes, the subject of the test, or the subject’s collaborators. We usually don’t agree on what each of these labels means, and the discussions about their definition tend to be futile.

Rather than arguing over which labels to use and how to define them, I’ve found it more helpful to use one of two adjectives to label each test: slow or fast. These labels can be just as useful when deciding the makeup of a test suite while allowing developers to objectively classify tests without unproductive arguments.

Code, deploy, and scale Java your way.Microsoft Azure supports your workload with abundant choices, whether you're working on a Java app, app server, or framework. Learn more.

The choice of test labels is an important influence on the makeup of a test suite. Developers use them to know when to write a test for a given behavior, to know which type of test to write, and to assess the balance of the test suite as a whole. When we get this wrong, we end up with a test suite that either doesn’t provide accurate coverage or provides coverage at an unacceptable cost.

When should you write a test for a given piece of production code? Developers who, like me, practice Extreme Programming (XP) or Test Driven Development (TDD) often answer this question with “always”. However, not every piece of code should automatically be tested. For each proposed test, first, weigh the costs of writing the test against the benefits.

I’m not advocating against writing tests. Indeed, for most tests, this is a quick check and the answer is yes. However, this check is useful, especially if a test is slow to run, slow to write, or difficult to maintain. In these cases ask yourself a few questions.

Is the test costly because of a design decision? Can the code be refactored to better accommodate testing? Your tests are the first consumers of your production code. Making code easier to test often makes it easier to consume, improving the quality of your codebase.

Is the test costly because of the testing approach? Would a different testing approach make this test easier to write? Consider using test doubles like fakes or mocks in place of collaborators. If your tests need a complicated setup extract this to a test scenario that can be reused between tests.

Be careful not to overuse test doubles, as they don’t provide as much confidence as real collaborators. Sometimes this drop in confidence is worth the ease of setup, decrease in test duration, or increase in reliability. However, too much reliance on test doubles may couple your tests to your implementation, resulting in a test suite that provides low confidence and that inhibits refactoring.

Is the test costly because the behavior is inherently difficult to test? If so, consider the importance of the feature you’re testing. If it’s a critical feature involving processing payments then the test might be worth the cost. If it’s a quirky edge case in your display logic then you should reconsider whether or not to write the test.

Is the test costly because it fails unpredictably? If so you must remove it, rewrite it to be more reliable, or separate it from the rest of your test suite. For a test suite to provide useful feedback, you must be confident that test failures represent undesired behavior. If you find that a test is necessary and cannot be made predictable, move it to another test suite that is run less frequently.

To help with the decision of when to write a test, and what type of test to write, developers often place test labels on a testing pyramid in order to communicate the importance of having more of one type of test than another.

Given the many different labels used to describe tests, every testing pyramid looks a bit different from the others. Try running an image search for “testing pyramid” and you will find only a few duplicate pyramids on the first page of the results. Each pyramid typically has low-cost unit tests at the bottom, high-cost system tests at the top, and several layers of medium-cost tests in the middle.

Before a team can benefit from the testing pyramid, the team must decide on which labels to include in the testing pyramid, what the definition of each label is, and in what order to include the labels on the pyramid.

This is often a contentious decision, as each developer in a team tends to use a different set of labels to describe tests, and there is not wide agreement on what each label means. Indeed, almost every testing pyramid includes unit tests at the bottom, but there is wide disagreement on what the word “unit” refers to. This disagreement reduces the usefulness of the testing pyramid since discussions tend to revolve around the labels rather than reducing the cost of the test suite.

Speed tends to be the highest contributor to the cost of a test suite. To get rapid feedback developers should run the test suite multiple times per hour, so even a small increase in the time it takes to run the suite can add up to lots of waiting over time.

Time spent waiting for the tests to run is unproductive time. When a test suite is very slow (taking longer than five minutes to run) developers often work on other tasks while the test is running. This task switching is harmful, as it decreases focus and results in the developer losing context. Once the slow test suite is finished the developer must take additional time to regain context before continuing with their original task.

Focusing on test speed, a simpler testing pyramid emerges.

This pyramid sends a clear message that a test suite should have as many fast tests as possible and just enough slow tests to provide full coverage of desired behavior. It communicates the same message as the more common (and complicated) testing pyramids, but is far easier for developers to understand and agree upon.

While different developers might not agree upon where to place a certain test in a common testing pyramid, it’s easy to know where a given test fits in the pyramid above. Teams only have to agree upon what is a fast test, and what is a slow test. While the threshold may be different depending on the business domain, language, or framework, the speed of tests can be measured objectively.

Test suites always start out fast, but rarely stay that way. More tests get added over time and developer tolerance for a slow test suite increases. Many developers don’t realize that a fast test suite is a possibility because they have never worked in a codebase where the test suite stays fast.

Keeping a test suite fast takes discipline. Developers must scrutinize any time they add to a test suite, and realize the large benefit gained from even a small decrease in length. For example, if a member of a team of 6 developers spends 4 hours to speed up the tests by 10 seconds, that investment will pay off in just six weeks (assuming developers run tests once per hour during a working day).

When left unchecked, the length of a test suite increases exponentially over time. That is, the length increases proportionally to the current duration. When the suite runs in 10 seconds a developer might agonize over adding just one second to the build, but once the test suite grows to 3 minutes they might not even notice.

One method to prevent exponential growth is to set a hard limit on your test suite length: fail the build if your test suite takes longer than, for example, one minute to run. If a test run takes too long the build will fail and the developer must take some time to speed up tests before continuing. Don’t fix the build by simply increasing this limit. Rather, take the time to understand why the tests are slow and how you can make them faster.

Test code must be treated with the same care and scrutiny as production code. Refactor continuously to keep your test code well-structured and fast, therefore minimizing the cost of maintaining and running your test suite. Keep in mind that refactoring tests should not modify the behavior of either the test code or the production code. Rather, it should change your code to be more readable, more maintainable, and faster to run.

If you can’t avoid having a few slow tests, add them to a separate test suite. This slow test suite isn’t meant to be run as often as your main test suite but is there to provide some additional coverage. It should not block the build process but should be run periodically to ensure the behavior it tests is still functioning correctly.

It’s not too late to change your approach if you’ve used a different testing pyramid to shape your current test suite. If you’ve followed a more-complicated testing pyramid it’s likely that many of your tests contain your testing pyramid’s label names.

As a first step, take some time to rename your tests. The new test names should reflect the behavior under test rather than the test label. For example, you might rename the UserIntegrationTest to the UserAuthenticationTest or the RegistrationApiTest to the AddPaidUserTest.

During this process, you’ll likely find some collisions among the new names. These collisions are a warning that you may have multiple tests that cover the same behavior. Take some time to move, combine, rename, or remove these tests to address the duplication.

Once your tests are renamed, reorganize the test directory structure to group tests according to behavior. This organization will keep tests that change at the same time close to each other in your codebase and will help you to catch new tests that cover duplicate behavior.

A slow test suite must be addressed right away. Immediately set a limit on the test suite duration so it doesn’t get any slower. Next, add some instrumentation to help you find the slowest tests by listing the execution time for each test or group of tests. You’ll likely find some tests during this process that are easy to speed up.

Once you fix these you’ll be left with another group of slow tests that are more difficult to improve. Separate your fast tests so you can run them separately from the remaining slow tests. This will give you an immediate speed bump for some test runs, which will buy you more time to make improvements.

Dedicate time to speeding up your test on a regular basis. Investigate whether the behavior covered by these slow tests is able to be covered (or already covered) by faster tests. A common example of this is covering many edge cases with tests that drive a browser. Using a browser to run tests is time intensive and the behaviors can often be covered by lower-level tests which tend to run faster.

Before your next discussion over whether to write, for example, a system test or an integration test, take a minute to think. You’re likely to find that the distinction between the two matters little. If your goal is to provide high confidence while minimizing cost, then your argument is really about how you can test the desired behavior with the lowest cost test possible. Steer the discussion in this direction and you’ll have a more productive outcome.

Rather than focusing on test labels, focus on what’s important: Write fast tests. If your test is slow, make it faster. If you can’t, try to provide the same coverage with a few tests with a narrower scope. If that fails, ask yourself if the benefit that the test provides is worth the substantial cost of a slow test. If it is worth it, consider moving your slow tests to a separate test suite that doesn’t block the build.

Follow this new testing pyramid and focus on test speed to keep your test suite fast and your confidence high.

Writing for InfoQ has opened many doors and increased career opportunities for me. I was able to deeply engage with experts and thought leaders to learn more about the topics I covered. And I can also disseminate my learnings to the wider tech community and understand how the technologies are used in the real world.

I discovered InfoQ’s contributor program earlier this year and have enjoyed it since then! In addition to providing me with a platform to share learning with a global community of software developers, InfoQ’s peer-to-peer review system has significantly improved my writing. If you’re searching for a place to share your software expertise, start contributing to InfoQ.

I started writing news for the InfoQ .NET queue as a way of keeping up to date with technology, but I got so much more out of it. I met knowledgeable people, got global visibility, and improved my writing skills.

Becoming an editor for InfoQ was one of the best decisions of my career. It has challenged me and helped me grow in so many ways. We'd love to have more people join our team.

InfoQ seeks a full-time Editor-in-Chief to join C4Media's international, always remote team. Join us to cover the most innovative technologies of our time, collaborate with the world's brightest software practitioners, and help more than 1.6 million dev teams adopt new technologies and practices that push the boundaries of what software and teams can deliver!

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

You need to Register an InfoQ account or Login or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

by Joan Comas,

by Joan Comas,

Your message is awaiting moderation. Thank you for participating in the discussion.

Thanks a lot for this very detailed article. I would like to add that I am also evaluating the Return of Investment of every test.In particular, I have been questioning the need for a test for classes that do 1 single thing that we have been doing forever. For example, I don't find any RoI in testing that:- When adding an entity in a Repository, the entity is stores in the DB.- When calling a Send() method of a class that should send data over http, the data is sent over http.- When calling a Publish() method of a class that should publish an event into a queue, the event is published in the queue.Those repositories clients and publishers should never contain any actual business rules. And their functionality should be trivial nowadays. Yes some SQL queries are going to be more complex and there is going to be transactions and that stuff, but then is all about creating infrastructure to take care of it and THAT is the one that is properly tested in one single place.Furthermore, I question the need for "test to forbid", where developers write tests to make sure some dependency is not used. And I have seen far too many.Finally, I would like to mention that, when possible, organize the code by feature (like it's do e in Vertical Slice and DDD) so that the group of tests covering it can be run in isolation, saving lots of time. Making a fast test suite is not just about slow / fast test, but also isolation of tests. There are tools nowadays to run only impacted tests, so make use of them and save even more time. This way developers won't feel the need of cutting down important slow tests just for the sake of time.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Join a community of experts. Tyson Gernhas opened many doors and increased career opportunitiesVivian HuInfoQ’s peer-to-peer review system has significantly improved my writingOghenevwede Emeni got global visibility, and improved my writing skillsEdin Kapićbest decisions of my careerhelped me grow in so many waysjoin our teamThomas Bettsfull-time Editor-in-ChiefThe InfoQGet the most out of the InfoQ Joan Comas