伪造：受假设启发的 Haskell 收缩

原文

Consider this falsify property test that tries to verify the (obviously false) property that all elements of all lists of up to 10 binary digits are the same (we will explain the details below; hopefully the intent is clear):

prop_list :: Property ()
prop_list = do
    n  <- gen $ Gen.integral $ Range.between (0, 10)
    xs <- gen $ replicateM n $ Gen.int $ Range.between (0, 1)
    assert $ P.pairwise P.eq .$ ("xs", xs)

we might get a counter-example such as this:

failed after 9 shrinks
(xs !! 0) /= (xs !! 1)
xs     : [0,1]
xs !! 0: 0
xs !! 1: 1

More interesting than the counter-example itself is how falsify arrived at that counter-example; if we look at the shrink history (--falsify-verbose), we see that the list shrunk as follows:

   [1,1,0,1,0,1]
~> [1,1,0]       -- shrink the list length
~> [0,1,0]       -- shrink an element of the list
~> [0,1]         -- shrink the list length again

Hypothesis library showed the world how to achieve this. In this blog post we will introduce falsify, a new library that provides property based testing in Haskell and has an approach to shrinking that is inspired by Hypothesis. As we shall see, however, the details are quite different.

Background

In this first section we will discuss some of the background behind falsify; the next section will be a more tutorial-style introduction on how to use it. This section is not meant to an exhaustive discussion of the theory behind falsify, or how the theory differs from that of Hypothesis; both of those topics will be covered in a paper, currently under review. However, a basic understanding of these principles will help to use the library more effectively, and so that will be our goal in this first section.

Unit testing versus property based testing

In unit testing (for example using tasty-hunit), a test for a function f might look something like this:

test :: Assertion
test =
    unless (f input == expected) $
      assertFailure "not equal"

test_property :: Property ()
test_property = do
    input <- gen $ genInput
    unless (prop input (f input)) $
      testFailed "property not satisfied"

Testing the Hard Stuff and Staying Sane, and then reading How to Specify It!: A Guide to Writing Properties of Pure Functions, both by the world’s foremost property-based testing guru John Hughes.

The importance of shrinking

Suppose we want to test the (false) property that for all numbers x and y, x - y == y - x:

prop_shrinking :: Property ()
prop_shrinking = do
    x <- gen $ Gen.int $ Range.between (0, 99)
    y <- gen $ Gen.int $ Range.between (0, 99)
    unless (x - y == y - x) $
      testFailed "property not satisfied"

Since the property is false, we will get a counter-example. Without shrinking, one such a counter-example might be

x = 38
y = 23

Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane:

Random tests contain a great deal of junk—that is their purpose! Junk provokes unexpected behaviour and tests scenarios that the developer would never think of. But tests usually fail because of just a few features of the test case. Debugging a test that is 90% irrelevant is a nightmare; presenting the developer with a test where every part is known to be relevant to the failure, simplifies the debugging task enormously.

(emphasis mine). For our example, the numbers 38 and 23 are not particularly relevant to the failure; with shrinking, however, the counter-example we will get is

x = 0
y = 1

Indeed, this is the only counter-example we will ever get: 0 is the smallest number possible (“least amount of detail”), and the only thing that is relevant about the second number is that it’s not equal to the first.

Parsing versus generation

Generation of inputs relies on pseudo-random number generators (PRNGs). The typical interface to a PRNGs is something like this:

nextSample :: PRNG -> (Word, PRNG)

newtype Gen a = Gen (PRNG -> (a, PRNG))

QuickCheck to shrinking is to pair a generator with a shrinking function, a function of type

This works, but it’s not without its problems; see my blog post Integrated versus Manual Shrinking for an in-depth discussion. The key insight of the Hypothesis library is that instead of shrinking generated values, we instead shrink the samples produced by the PRNG. Suppose we unfold a PRNG to a stream of random samples:

unfoldLinear :: PRNG -> [Word]
unfoldLinear prng =
    let (s, prng') = next prng
    in s : unfoldLinear prng'

newtype Parser a = Parser ([Word] -> (a, [Word])

parseBool :: Parser Bool
parseBool = Parser $ \(s:ss) -> (
    if s >= maxBound `div` 2 then True else False
  , ss
  )

definition of Gen in QuickCheck, you will see it’s actually different to the definition we showed above:

newtype Gen a = Gen (PRNG -> a)

next  :: PRNG -> (Word, PRNG)  -- as before
split :: PRNG -> (PRNG, PRNG)  -- new

both :: QcGen a -> QcGen b -> QcGen (a, b)
both (QcGen g1) (QcGen g2) = QcGen $ \prng ->
    let (l, r) = split prng
    in (g1 l, g2 r)

Generate Functions.

The `falsify` definition of `Gen`

If we apply the insight from Hypothesis (that is, parse samples rather than generate using PRNGs) to this new setting where splitting PRNGs is a fundamental operation, we arrive at the definition of Gen in falsify. First, unfolding a PRNG does not give us an infinite stream of samples, but rather an infinite tree of samples:

data STree = STree Word STree STree

unfold :: PRNG -> STree
unfold prng =
    let (s, _) = next  prng
        (l, r) = split prng
    in STree s (unfold l) (unfold r)

newtype Gen a = Gen (STree -> (a, [STree]))

both :: Gen a -> Gen b -> Gen (a, b)
both (Gen g1) (Gen g2) = Gen $ \(STree s l r) ->
    let (a, ls) = g1 l
        (b, rs) = g2 r
    in ( (a, b)
       ,    [STree s l' r  | l' <- ls]
         ++ [STree s l  r' | r' <- rs]
       )

actual definition in falsify has an additional constructor Minimal, which is a finite representation of the infinite tree that is zero everywhere. This is a key component in making this work with infinite data structures; see upcoming paper for an in-depth discussion. Users of the library however generally do not need to be aware of this (indeed, the sample tree abstraction is not part of the public API).

Consequences of using sample trees

Arguably all of the key differences between Hypothesis and falsify stem from the difference in representation of samples: a linear stream in Hypothesis and an infinite tree in falsify. In this section we will discuss two consequences of this choice.

Shrinking the sample tree

First, we need to decide how to shrink a sample tree. In Hypothesis, the sample stream (known as a “choice sequence”) is subjected all kinds of passes (15 and counting, according to Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer), which shrink the sample stream according to lexicographical ordering; for example:

..¸ x, ..        < .., x', ..       -- shrink an element (x' < x)
.., x, y, z, ..  < .., x, z, ..     -- drop an element from the stream
.., x, y, z, ..  < .., y, z, x, ..  -- sort part of the stream (y < z < x)

When we are dealing with infinite sample trees, such a total ordering does not exist. For example, consider the following two trees:

tree1 = STree ..         tree2 = STree ..
         (STree 1 ..)              (STree 2 ..)
         (STree 4 ..)              (STree 3 ..)

Sample 1 in tree1 is less than the corresponding sample 2 in tree2, but sample 4 in tree1 is greater than the corresponding sample 3 in tree2. Hence, we have neither tree1 < tree2 nor tree2 < tree1: these two trees are incomparable. Instead, falsify works with a partial ordering; instead of the multitude of shrinking passes of Hypothesis, falsify has precisely one pass: shrink an individual sample in the tree.

Distributing samples to parsers

When we have a stream of values that we need to use for multiple parsers, we need to decide which samples go to which parser. In Hypothesis, this essentially happens on a first-come-first-served basis: any samples left unused by the first parser will be used by the next. As discussed, falsify parsers do not return “samples left unused”. Instead, the sample tree is split each time we compose parsers, like we did in both, shown above. In practice, this happens primarily when using applicative <*> or monadic >>=.

Predictability

These two differences are rather technical in nature; how do they affect users? Suppose we have a generator that produces a list and then a number:

listThenNum :: Gen ([Bool], Int)
listThenNum = do
    xs <- Gen.list ..
    n  <- Gen.int  ..
    return (xs, n)

If we are using a stream of samples, Hypothesis style, and then drop a random sample from that stream, the generator for int might suddenly be run against an entirely different sample; it might increase in value! Similarly, if we run that int generator against the first sample left over by the list generator, and if that list generator uses fewer samples as it shrinks, we might also run int against an unrelated sample, and its value might again increase.

This is not necessarily a problem; after all, we can then start to decrease that new int value again. However, that is only possible if the generated value with the larger int is still a counter-example to whatever property is being tested. If that is not the case, then we might not be able to shrink the list, and we might end up with a non-minimal counter-example. That can make debugging more difficult (we haven’t gotten rid of all the “junk”), and it can be difficult for users to understand why this might not shrink any further; even if the library offers facilities for showing why shrinking stopped (for example, showing which shrunk examples were rejected; verbose mode in falsify), it can still be quite puzzling why the library is trying to increase a value during shrinking.

Neither of these problems can arise in falsify: it never drops samples at all (instead, only shrinking individual samples), and since monadic bind splits the sample tree, we are guaranteed that the behaviour of int is entirely unaffected by the behaviour of list. This makes the shrinking behaviour in falsify more predictable and easier to understand.

Monadic bind

Integrated versus Manual Shrinking; this blog post discusses not only the problems in QuickCheck, but also shows one alternative approach, known as integrated shrinking, used by QuviQ QuickCheck and made popular in the Haskell world by the library hedgehog.

The problem with integrated shrinking is that it does not work across monadic bind. The linked blogpost explains this in great detail, but the essence of the problem is not hard to see. Consider the type of monadic bind:

(>>=) :: Gen a -> (a -> Gen b) -> Gen b

We cannot shrink the right hand side of (>>=) independent from the left hand side, because the right hand side is not a generator. We only have a generator once we apply the supplied function to the result of the first generator. This means that we cannot shrink these two generators independently: if, after shrinking the right hand side, we go back and then shrink the left hand side, we get an entirely different generator, and the shrinking we did previously is just wasted.

In practice what this means is that once we start shrinking the right hand side, we will never go back anymore and shrink the left hand side. In the example from the introduction we first generated a list length, and then the elements of the list:

prop_list :: Property ()
prop_list = do
    n  <- gen $ Gen.integral $ Range.between (0, 10)
    xs <- gen $ replicateM n $ Gen.int $ Range.between (0, 1)
    assert $ P.pairwise P.eq .$ ("xs", xs)

With integrated shrinking, once we start shrinking elements from the list, we will never go back anymore and shrink the list length. With internal shrinking, however, we can go back and forth across monadic bind. This is the raison d’être of internal shrinking: it doesn’t matter that we cannot shrink the two generators independently, because we are not shrinking generators! Instead, we just shrink the samples that feed into those generators.

Selective functors

It is important to understand the limitations of internal shrinking: it is certaintly not a silver bullet. For example, consider this combinator that takes two generators, flips a coin (generates a boolean, shrinking towards True), and then executes one of the two generators:

choose :: Gen a -> Gen a -> Gen a -- Suboptimal definition
choose g g' = do
    b <- Gen.bool True
    if b then g else g'

This combinator works, but it’s not optimal. Suppose the initial value of b is False, and so we use g'; and let’s suppose furthermore that we spend some time shrinking the sample tree using g'. Consider what happens if b now shrinks to True. When this happens we will now run g against the sample tree as it was left after shrinking with g'. Although we can do that, it very much depends on the specific details of g and g' whether it’s useful to do it, and we will certainly lose the predictability we discussed above.

We could try to make the two generators shrink independent from each other by simply running both of them, and using the boolean only to choose which result we want. After all, Haskell is lazy, and so this should be fine:

choose :: Gen a -> Gen a -> Gen a -- Bad definition!
choose g g' = do
    x <- g
    y <- g'
    b <- Gen.bool True
    return $ if b then x else y

While is is true that generation using this definition of choose will work just fine (and laziness ensures that we will in fact only run whatever generator is used), this combinator shrinks very poorly. The problem is that if we generate a value but they don’t use it, the (part of) the sample tree that we used to produce that value is irrelevant, and so by definition we can always replace it by the sample tree that is zero everywhere. This means that if we later want to switch to that generator, we will only be able to do so if the absolute minimum value that the generator can produce happens to work for whatever property we’re testing. This is an important lesson to remember:

Do not generate values and then discard them: such values will always shrink to their minimum. (Instead, don’t generate the value at all.)

selective applicative functors give us. A detailed discussion of this topic would take us well outside the scope of this blog post; in the remainder of this section we will discuss the basics only.

Gen is a selective functor, which means that it is an instance of Selective, which has a single method called select:

select :: Gen (Either a b) -> Gen (a -> b) -> Gen b

The intuition is that we run the first generator; if that produces Left a, we run the second generator to get a b; if the first generator produces Right b, we skip the second generator completely. Like for applicative <*> and monadic (>>=), the two generators are run against different subtrees of the sample tree, but the critical difference is that we will not try to shrink the right subtree for the second generator unless that generator is used.

If that all sounds a bit abstract, perhaps suffices to say that any selective functor supports

ifS :: Selective f => f Bool -> f a -> f a -> f a

choose :: Gen a -> Gen a -> Gen a
choose = ifS (bool True)

definition in the falsify library itself.

Tutorial

With the background out of the way, let’s now consider how to actually use the library. Probably the easiest way to get started is to use the tasty integration. Here is a minimal template to get you started:

module Main (main) where

import Test.Tasty
import Test.Tasty.Falsify

main :: IO ()
main = defaultMain $ testGroup "MyTestSuite" [
      testProperty "myFirstProperty" prop_myFirstProperty
    ]

prop_myFirstProperty :: Property ()
prop_myFirstProperty = return ()

tasty package, as well as falsify of course. If you want, you can also use the Test.Falsify.Interactive module to experiment with falsify in ghci.

Getting started

Suppose we want to test that if we multiply a number by two, the result must be even. Here’s how we could do it:

prop_multiply2_even :: Property ()
prop_multiply2_even = do
    x <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
    unless (even (x * 2)) $ testFailed "not even"

unless) for monads are available
gen runs a generator, and adds the output of the generator to the test log. (The test log is only shown when the property fails.)
Gen.int is an alias for Gen.integral, which can produce values for any Integral type. There is no analogue of QuickCheck’s Arbitrary class in falsify: like in hedgehog and in Hypothesis, every generator must be explicitly specified. For a justification of this choice, see Jacob Stanley’s excellent Lambda Jam 2017 presentation Gens N’ Roses: Appetite for Reduction (Jacob is the author of hedgehog).
The specified Range tells the generator two things: in which range to produce a value, and how to shrink that value. In our example, withOrigin takes an “origin” as explicit value (here, 0), and the generator will shrink towards that origin.
testFailed is the primitive way to make a test fail, but we shall see a better way momentarily.

Predicates

Suppose we mistakingly think we need to multiply a number by three to get a even number:

prop_multiply3_even :: Property ()
prop_multiply3_even = do
    x <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
    unless (even (x * 3)) $ testFailed "not even"

If we run this test, we will get a counter-example:

multiply3_even: FAIL
  failed after 14 shrinks
  not even
  Logs for failed test run:
  generated 1 at CallStack (from HasCallStack):
    gen, called at demo/Demo/Blogpost.hs:217:10 in main:Demo.Blogpost

Predicate. A predicate of type

is essentially a function

a -> b -> c -> .. -> Bool

prop_multiply3_even_pred :: Property ()
prop_multiply3_even_pred = do
    x <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
    assert $ P.even `P.dot` P.fn ("multiply3", (* 3)) .$ ("x", x)

P.even, like even from the prelude. is a predicate that checks its argument is even
P.dot, like (.) from the prelude, composes a predicate with a function. In addition to the function itself, you also specify the name of the function, so that that name can be used in error messages.
[(.)][docs:falsify:dollar], like `()` from the prelude, applies a predicate to a named argument.

The use of predicates is not required, but can be very helpful indeed. For our running example, this will produce this test failure message instead:

multiply3_even_pred: FAIL
  failed after 2 successful tests and 13 shrinks
  not (even (multiply3 x))
  x          : 1
  multiply3 x: 3

Ranges, Labelling

We saw the use of withOrigin already, and earlier in this blog post we used between; a generator such as

Gen.integral $ Range.between (10, 100)

skewedBy. A generator such as

Gen.integral $ Range.skewedBy 5 (0, 100)

prop_skew :: Double -> Property ()
prop_skew skew = do
    xs <- gen $ Gen.list rangeListLen $ Gen.integral rangeValues
    x  <- gen $ Gen.integral rangeValues
    collect "elem" [x `elem` xs]
  where
    rangeListLen, rangeValues :: Range Word
    rangeListLen = Range.between (0, 10)
    rangeValues  = Range.skewedBy skew (0, 100)

collect to collect some statistics; specifically, in what percentage of the tests x is an element of xs. If we run this with a skew of 0, we might see something like:

100000 successful tests

Label "elem":
   94.6320% False
    5.3680% True

In only 5% of cases the element appears in the list. There are various ways in which we could change that distribution of test data, but the simplest way is simply to generate more values towards the lower end of the range; if we run the test with a skew of 5 we get

Label "elem":
   41.8710% False
   58.1290% True

Generators

Nearly all generators are built using prim as their basic building block, which returns the next sample from the sample tree. Higher-level generators split into two categories: “simple” (non-compound) generators that produce a value given some arguments, and generator combinators that take generators as input and produce new generators. Some important examples in the first category are:

integral, which we discussed already
bool, which produces a Bool, shrinking towards a choice of True or False
elem, which picks a random element from a list, and shuffle, which shuffles a list
etc.

The library also offers a number of generator combinators; here we list the most important ones:

choose we saw when we discussed Selective functors, and chooses (uniformly) between two generators, shrinking towards the first.
list takes a range for the list length and a generator and produces a list of values. Unlike the simple “pick a length and then call replicateM” approach from the example from the introduction, this generator can drop elements anywhere in the list (it does this by using the combinator mark to mark elements in the list; as the marks shrink towards “drop”, the element is removed, up to the specified Range).
frequency, similar to the like-named function in QuickCheck, takes a list of generators and frequencies, and chooses a generator according to the specified frequency. This is another way in which we can tweak the distribution of test data.

The implementation of frequency ensures that the generators can shrink indepedently from each other. This could be defined just using Selective, but for improved performance it makes use of a low-level combinator called perturb; see also bindIntegral, which generalizes Selective bindS, and has significantly better performance than bindS.

Generating functions

One of the most impressive aspects of QuickCheck is that it can generate, show and shrink functions. This is due to a functional pearl by Koen Claessen called Shrinking and showing functions; the presentation is available on YouTube and is well worth a watch. We have adapted the QuickCheck approach (and simplified it slightly) to falsify; the generator is called fun. Here is an example (Fn is a pattern synonym; you will essentially always need it when generating functions):

prop_fn1 :: Property ()
prop_fn1 = do
    Fn (f :: [Int] -> Bool) <- gen $ Gen.fun $ Gen.bool False
    assert $
         P.eq
         `P.on` P.fn ("f", f)
      .$ ("x", [1, 2, 3])
      .$ ("y", [4, 5, 6])

This property says that for any function f :: [Int] -> Bool, if we apply that function to the list [1, 2, 3] we must get the same result as when we apply it to the list [4, 5, 6]. Of course, that is not true, and when we run this test, falsify will give us a counter-example:

failed after 53 shrinks
(f x) /= (f y)
x  : [1,2,3]
y  : [4,5,6]
f x: True
f y: False

Logs for failed test run:
generated {[1,2,3]->True, _->False} at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:244:32 in main:Demo.Blogpost

Notice the counter-example we get: a function that returns True for the list [1, 2, 3], and False everywhere else. It truly is quite astonishing that this works: we can see that the list [1, 2, 3] is special by inspecting the source code, but of course falsify (or indeed, QuickCheck; this is not unique to falsify) cannot! Instead, falsify will generate a random infinitely large description of functions from [Int] -> Bool, that covers all possible input lists, and then start shrinking this description, throwing away values for inputs it doesn’t need, until a minimal test case remains. Truly a testament to the power of laziness in Haskell.

For a more realistic example, let’s port an example from Koen Claessen’s presentation to falsify. This example is testing the (wrong) property that for all functions f and predicates p,

map f . filter p == filter p . map f

In falsify, we might express this as:

prop_mapFilter :: Property ()
prop_mapFilter = do
    Fn (f :: Int -> Int)  <- gen $ Gen.fun genInt
    Fn (p :: Int -> Bool) <- gen $ Gen.fun genBool
    xs :: [Int] <- gen $ Gen.list (Range.between (0, 100)) genInt
    assert $
       P.eq
      `P.split` (P.fn ("map f", map f), P.fn ("filter p", filter p))
      `P.split` (P.fn ("filter p", filter p), P.fn ("map f", map f))
      .$ ("xs", xs)
      .$ ("xs", xs)
  where
    genInt :: Gen Int
    genInt = Gen.int $ Range.between (0, 100)

    genBool :: Gen Bool
    genBool = Gen.bool False

We generate a random function f, a random predicate p, a random list xs, and then assert the property; and of course, falsify will happily give us a counter-example:

failed after 25 shrinks
(map f (filter p xs)) /= (filter p (map f xs))
xs                 : [96]
xs                 : [96]
filter p xs        : [96]
map f xs           : [0]
map f (filter p xs): [0]
filter p (map f xs): []

Logs for failed test run:
generated {_->0} at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:254:30 in main:Demo.Blogpost
generated {96->True, _->False} at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:255:30 in main:Demo.Blogpost
generated [96] at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:256:20 in main:Demo.Blogpost

It generated a function that maps anything to 0, a predicate that is True for 96 and False for everything else, and a list containing only the value 96; this is indeed a nice counter-example, as the output from the assert explains.

Dependencies between commands of my blog post “An in-depth look at quickcheck-state-machine” discusses this kind of problem in the context of quickcheck-state-machine.

Testing shrinking

When we use internal (or indeed, integrated) shrinking, we don’t write a separate shrinking function, but that doesn’t mean we cannot get shrinking wrong. Shrinking never truly comes for free! As a simple example, consider writing a generator that produces any value below a given maximum (essentially, a more limited form of integral). A first attempt might be:

below :: Word64 -> Gen Word64
below n = (`mod` n) <$> Gen.prim

testShrinkingOfGen:

prop_below_shrinking :: Property ()
prop_below_shrinking = do
    n <- gen $ Gen.integral $ Range.between (1, 1_000)
    testShrinkingOfGen P.ge $ below n

This property will fail:

failed after 4 successful tests and 14 shrinks
original < shrunk
original: 0
shrunk  : 1

Logs for failed test run:
generated 2 at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:281:10 in main:Demo.Blogpost

testMinimum. Let’s consider the naive list generator from the introduction one more time:

naiveList :: Range Int -> Gen a -> Gen [a]
naiveList r g = do
    n  <- Gen.integral r
    replicateM n g

prop_naiveList_minimum :: Property ()
prop_naiveList_minimum =
    testMinimum (P.elem .$ ("expected", [[0,1], [1,0]])) $ do
      xs <- gen $ naiveList
                    (Range.between (0, 10))
                    (Gen.int (Range.between (0, 1)))
      case P.eval $ P.pairwise P.eq .$ ("xs", xs) of
        Left _   -> testFailed xs
        Right () -> return ()

The counter-example reported by falsify is (somewhat shortened):

naiveList_minimum: FAIL
  failed after 0 shrinks
  minimum `notElem` expected
  minimum : [0,0,1]
  expected: [[0,1],[1,0]]

Logs for failed test run:
generated [0,0,1] at CallStack (from HasCallStack):
  gen, called at demo/Demo/Blogpost.hs:294:13 in main:Demo.Blogpost

Logs for rejected potential next shrinks:

** Rejected run 0
generated [] at CallStack (from HasCallStack):

** Rejected run 3
generated [0] at CallStack (from HasCallStack):

** Rejected run 4
generated [0,0] at CallStack (from HasCallStack):

** Rejected run 8
generated [0,0,0] at CallStack (from HasCallStack):

This is telling us that the minimum value it produced was [0, 0, 1], instead of one of the two lists that we expected. It also tells us what shrink steps were rejected (because they weren’t counter-examples). This is informative, because _none of those shrink steps is [0, 1]: the naive list generator, unlike the real one (which does pass this property) cannot drop elements from the start of the list.

testShrinkingOfGen (and its generalization testShrinking) as well as testMinimum extensively to test falsify’s own generators.

Compatibility

Finally, falsify offers two combinators shrinkWith and fromShrinkTree which provide compatibility with QuickCheck style shrinking

shrinkWith :: (a -> [a]) -> Gen a -> Gen a

fromShrinkTree :: Tree a -> Gen a

respectively. The implementation of these combinators depends on a minor generalization of the sample tree representation; the details are discussed in the paper.

Conclusions

Shrinking is an essential component of any approach to property based testing. In the Haskell world, two libraries offered two competing approaches to shrinking: manual shrinking offered by QuickCheck, where users are entirely responsible for writing shrinkers for their generators, and integrated shrinking, offered by hedgehog. Integrated shrinking is nice, but does not work well with monadic bind. The Python Hypothesis library taught us how we can have “internal” shrinking: like in integrated shrinking, we do not write a separate generator and shrinker, but unlike in integrated shrinking, this approach does work across monadic bind.

The Haskell falsify library takes the core ideas of Hypothesis and applies them in the context of Haskell. As we have seen, however, the actual details of how these two libraries work differ quite significantly. The falsify approach is more suitable to Haskell where we might deal with infinite data structures, provides the user with more predictable shrinking, and provides the user with tools for controlling generator independence (through the use of selective functors).