基于属性的测试:时间旅行与修复缺陷 (2019)
Time Travelling and Fixing Bugs with Property-Based Testing (2019)

原始链接: https://wickstrom.tech/2019-11-17-time-travelling-and-fixing-bugs-with-property-based-testing.html

## 用户注册验证的基于属性测试:摘要 本教程演示了使用 Haskell 和 Hedgehog 进行基于属性测试 (PBT),以识别和修复用户注册验证系统中的错误。该系统验证用户名(长度 0-50)和年龄(18-150)。核心原则是定义属性——软件*应该*始终满足的通用规则——并让 PBT 框架生成大量测试用例来验证它们。 最初,创建了正向和反向测试,分别验证有效和无效的表单。这个过程强调了测试生成器本身的重要性,发现了一个年龄生成器未能完全覆盖无效年龄范围的问题。然后实施了覆盖率检查,以确保生成器产生具有代表性的输入样本。 当需求转变为使用出生日期而不是年龄时,验证函数被修改为接受今天的日期作为参数,从而保持确定性和可测试性。开发了一个单一、全面的属性测试,结合了对闰年边缘情况的覆盖率检查,最终揭示了日期计算逻辑中的一个错误。 本教程强调了确定性函数对测试的价值,覆盖率检查指导生成器设计的强大功能,以及多个简单属性与单个复杂属性之间的权衡。最终,PBT 结合了仔细的生成器设计和覆盖率分析,证明了其在发现和解决微妙错误方面的有效性。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 基于属性的测试修复bug和时间旅行 (2019) (wickstrom.tech) 19 分,by todsacerdoti 12小时前 | 隐藏 | 过去的 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文
Time Travelling and Fixing Bugs with Property-Based Testing

Property-based testing (PBT) is a powerful testing technique that helps us find edge cases and bugs in our software. A challenge in applying PBT in practice is coming up with useful properties. This tutorial is based on a simple but realistic system under test (SUT), aiming to show some ways you can test and find bugs in such logic using PBT. It covers refactoring, dealing with non-determinism, testing generators themselves, number of examples to run, and coupling between tests and implementation. The code is written in Haskell and the testing framework used is Hedgehog.

This tutorial was originally written as a book chapter, and later extracted as a standalone piece. Since I’m not expecting to finish the PBT book any time soon, I decided to publish the chapter here.

System Under Test: User Signup Validation

The business logic we’ll test is the validation of a website’s user signup form. The website requires users to sign up before using the service. When signing up, a user must pick a valid username. Users must be between 18 and 150 years old.

Stated formally, the validation rules are:

0length(name)5018age150(1) \begin{aligned} 0 \leq \text{length}(\text{name}) \leq 50 \\ 18 \leq \text{age} \leq 150 \end{aligned} \qquad(1)

The signup and its validation is already implemented by previous programmers. There have been user reports of strange behaviour, and we’re going to locate and fix the bugs using property tests.

Poking around the codebase, we find the data type representing the form:

validation package. It’s parameterized by two types:

  1. the type of validation failures
  2. the type of a successfully validated value

The Validation type is similar to the Either type. The major difference is that it accumulates failures, rather than short-circuiting on the first failure. Failures are accumulated when combining multiple Validation values using Applicative.

Using a non-empty list for failures in the Validation type is common practice. It means that if the validation fails, there’s at least one error value.

Validation Property Tests

Let’s add some property tests for the form validation, and explore the existing implementation. We begin in a new test module, and we’ll need a few imports:

Negative Property Tests we noted that there’s a problem? The issue is, we’re not covering all validation rules in our tests. But the problem is not in our property definitions. It’s in one of our generators, namely genInvalidAge. We’re now in a perculiar situation: we need to test our tests.

One way to test a generator is to define a property specifically testing the values it generates. For example, if we have a generator positive that is meant to generate only positive integers, we can define a property that asserts that all generated integers are positive:

Building on developers’ intuitions to create effective property-based tests, John Hughes talks about “one property to rule them all.” Similarly, we’ll define a single property prop_validates_age for birth date validation. We’ll base our new property on prop_invalid_age_fails, but generalize to cover both positive and negative tests:

February 29th, and the person would turn 18 years old the day after (on March 1st), the validation function incorrectly considers the person old enough. We’ve found a bug.

Test Count and Coverage

Two things led us to find this bug:

  1. Most importantly, that we generate today’s date and pass it as a parameter. Had we used the actual date, retrieved with an IO action, we’d only be able to find this bug every 1461 days. Pure functions are easier to test.
  2. That we ran more tests than the default of 100. We might not have found this bug until much later, when the generated dates happened to trigger this particular bug. In fact, running 20000 tests does not always trigger the bug.

Our systems are often too complex to be tested exhaustively. Let’s use our form validation as an example. Between 1900-01-01 and 2100-12-31 there are 73,413 days. Selecting today’s date and the birth date from that range, we have more than five billion combinations. Running that many Hedgehog tests in GHCi on my laptop (based on some quick benchmarks) would take about a month. And this is a simple pure validation function!

To increase coverage, even if it’s not going to be exhaustive, we can increase the number of tests we run. But how many should we run? On a continuous integration server we might be able to run more than we do locally, but we still want to keep a tight feedback loop. And what if our generators never produce inputs that reveal existing bugs, regardless of the number of tests we run?

If we can’t test exhaustively, we need to ensure our generators cover interesting combinations of inputs. We need to carefully design and measure our tests and generators, based on the edge cases we already know of, as well as the ones that we discover over time. PBT without measuring coverage easily turns into a false sense of security.

In the case of our leap day bug, we can catch it with fewer tests, and on every test run. We need to make sure we cover leap days, used both as today’s date and as the birth date, even with a low number of tests.

Covering Leap Days

To generate inputs that cover certain edge cases, we combine specific generators using Gen.frequency:

legally turns 18: a person born on a common day turning 18 on a leap day (➌), and a leapling turning 18 on a common day (➍).

Running the modified property test, we get the leap day counter-example every time, even with as few as a hundred tests. For example, we might see today’s date being 1904-02-29 and the birth date being 1886-03-01. The validation function deems the person old enough. Again, this is incorrect.

Now that we can quickly and reliably reproduce the failing example we are in a great position to find the error. While we could use a fixed seed to reproduce the particular failing case from the 20000 tests run, we are now more confident that the property test would catch future leap day-related bugs, if we were to introduce new ones. Digging into the implementation, we’ll find a boolean expression in a pattern guard being the culprit: