We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

218: Balancing test coverage with test costs - Nicole Tietz-Sokolskaya

2024/4/18

Test & Code

AI Deep Dive AI Insights AI Chapters Transcript

People

Brian

Python 开发者和播客主持人，专注于测试和软件开发教育。

Nicole Tietz-Sokolskaya

Topics

Nicole Tietz-Sokolskaya: 我认为软件工程师们对测试的讨论往往不够细致，常常陷入产品经理希望加快进度而工程师希望进行更多测试的僵局。我们需要重新思考测试的目的，即降低风险，并权衡测试的成本和收益，思考测试的边际效益。盲目追求代码覆盖率可能会适得其反，例如为了提高覆盖率而编写不必要的测试，或者因为代码重构而降低覆盖率，这并不一定意味着测试不足，反而可能是代码更精简的体现。我个人不再使用代码覆盖率作为衡量标准，而更关注测试的上下文和最终结果。关注测试覆盖的代码上下文比单纯的覆盖率数字更重要。并非所有代码都需要测试，有些测试的价值不高，例如对CSS代码的测试。我主要关注端到端测试的代码覆盖率，而对单元测试的代码覆盖率不太关注。在端到端测试中，处理异常情况的代码也需要测试，可以使用模拟或架构设计来模拟各种故障场景。系统对预期错误的处理能力也属于系统行为，需要进行测试。决定是否测试某种情况需要权衡其发生的可能性和后果。对于一些低概率的错误，可以设置监控告警，而不是编写大量的测试用例。决定测试哪些内容，需要考虑这些内容对业务的影响程度。应该优先测试对业务影响最大的功能模块。测试资源应该集中在对业务影响最大的功能模块上。性能测试需要模拟真实的负载情况，否则测试结果没有意义。理想情况下，性能测试应该模拟整个系统的端到端负载。可以使用监控数据来验证性能测试结果的准确性。需要权衡服务中断的成本和编写及维护测试的成本。维护测试代码的成本也很高，需要考虑在测试范围和维护成本之间取得平衡。端到端测试比单元测试更不容易受到代码重构的影响。测试套件的运行时间也是一个需要考虑的成本因素。高代码覆盖率可以帮助识别和删除无用代码。 Brian: 测试套件的运行时间不应过长，理想情况下应该在几分钟以内完成。测试套件的理想运行时间应该在几分钟以内，过长的运行时间会影响团队效率。测试套件应该模块化，以便可以快速运行和调试。测试应该分层进行，开发阶段的测试应该快速，而CI阶段的测试可以更全面。

Deep Dive

Key Insights

What is the main trade-off discussed in Nicole's blog post about testing?

The main trade-off discussed is balancing the cost of testing (time, resources, maintenance) against the risk of not testing enough (potential bugs, downtime, and business impact). The post emphasizes the need to critically evaluate how much testing is necessary and where to focus testing efforts to maximize value.

Why can refactoring code reduce code coverage percentages?

Refactoring can reduce code coverage percentages because it often results in fewer lines of code. If the same number of tests cover fewer lines, the coverage percentage drops. This creates a paradox where improving code quality by making it more concise can appear to reduce test coverage, even though the code is better.

What is the issue with aiming for 100% code coverage in a React app with styled components?

Aiming for 100% code coverage in a React app with styled components can be problematic because it requires testing every line of CSS. This often leads to low-value tests that don't meaningfully improve code reliability, while consuming significant time and effort that could be better spent on higher-impact testing.

How does Nicole suggest deciding what to test and what not to test?

Nicole suggests focusing testing efforts on the most critical parts of the system, such as features that directly impact revenue or user experience. For example, live interaction features that could result in significant financial loss if they fail should be prioritized over less critical features like analysis tools, which can tolerate occasional downtime.

What is the challenge with performance testing, according to Nicole?

Performance testing is challenging because it must closely match real-world workloads to be meaningful. Simulating realistic user behavior is difficult, especially before deployment, and testing isolated components doesn't capture the non-linear interactions between different parts of the system. Monitoring production behavior can help refine performance tests over time.

What is the cost of maintaining a large test suite?

Maintaining a large test suite can be costly because it requires ongoing effort to update tests as the codebase evolves. Refactoring becomes more difficult, and the time to run tests increases, which can slow down development workflows. Additionally, tightly coupled unit tests can break frequently during refactoring, adding to maintenance overhead.

What is Nicole's opinion on the ideal length of a test suite?

Nicole believes a test suite should ideally run in single-digit minutes, with five minutes being the upper limit for a reasonable development workflow. Longer test suites can significantly impact productivity, especially if developers get distracted while waiting for tests to complete.

How does Nicole use code coverage to improve code quality?

Nicole uses code coverage to identify and delete unreachable code. By analyzing coverage reports, she can pinpoint code that isn't being executed and remove it, which improves code quality and reduces unnecessary complexity. This approach also helps ensure that the remaining code is well-tested and functional.

Shownotes Transcript

Nicole is a software engineer and writer, and recently wrote about the trade-offs we make when deciding which tests to write and how much testing is enough.

We talk about:

Balancing schedule vs testing
How much testing is the right about of testing
Should code coverage be measured and tracked
Good refactoring can reduce code coverage
Is it worth testing error conditions?
Are rare error codes ok to just monitor?
API drift and autospec
Mitigating risk
Deciding what to test and what not to test
Focus testing on key money-making features
If there's a bug in this part of the code, how much business impact is there?
Performance testing needs to approximately match real world workloads
Cost of a service breaking vs the cost of creating, maintaining, and running tests
Keeping test suites quick to minimize getting distracted

Links:

Too much of a good thing: the trade-off we make with tests)
Load testing is hard, and the tools are... not great. But why?)
Yet Another Rust Resource (YARR!))
Goodhart's law) - "When a measure becomes a target, it ceases to be a good measure"

** Learn pytest**

pytest is the number one test framework for Python.
Learn the basics super fast with Hello, pytest!)
Then later you can become a pytest expert with The Complete pytest Course)
Both courses are at courses.pythontest.com)

218: Balancing test coverage with test costs - Nicole Tietz-Sokolskaya 27:31 Share