选择性应用函子

选择性应用函子
Selective Applicative Functors

原始链接: https://blog.veritates.love/selective_applicatives_theoretical_basis.html

本文探讨了 monads、applicatives 的理论基础，以及它们与控制流的关系，最终将它们与 `Alternative` functors 和解析联系起来。文章认为，理解这些概念需要超越简单的定义，深入到范畴论——特别是将 monads 和 applicatives 视为特定函子范畴中的 monoid。一个关键点是这些范畴内组合发生的*方式*的重要性。Applicatives 使用“对称化”组合（Day convolution），而 monads 依赖于标准组合。这引出了“箭头变换器”和自由构造的讨论，用于表示控制流，最终形成一个 `ControlFlow` 数据类型。作者随后将此与 `Alternative` functors（具有 `<|>` 运算符用于非确定性选择）联系起来，并展示了在特定条件下如何使用 `<|>` 和 `mapMaybe` 表达确定性、有限分支（如 `CaseTree`）。虽然 `Alternative` 代表非确定性分支，但 `Selective` applicatives 需要基于箭头的方案来有效地建模确定性控制流。文章最后得出结论，这两种结构既相关又不同，`Alternative` 倾向于通过近半环结构进行静态分析，而 `Selective` 则需要箭头来实现高效的确定性分支。

## 选择性函子：摘要最近的 Hacker News 讨论集中在“选择性函子”概念上，旨在弥合函数式编程中 Applicative 函子和 Monad 之间的差距。核心思想是创建一个比 Applicative 更强大，但不如 Monad 的类型类，允许在固定结构内进行条件计算。本质上：Applicative 具有没有条件逻辑的固定计算图；选择性 Applicative 为该固定图添加条件“分支”；而 Monad 允许完全动态的计算图，并具有运行时控制流。其动机源于实际需求，例如优化数据库查询（受 Facebook 的 Haxl 库启发），可以在运行时结果的基础上做出预定义的选择，从而避免完全动态的 Monad 方法的开销。这允许进行静态分析和可能的优化，而 Monad 的任意回调则无法实现。然而，原文被认为高度抽象且难以理解，即使对于经验丰富的函数式程序员来说也是如此。评论者建议更具体的解释，从实际示例开始，然后再深入到理论形式化，将提高可访问性。

To return to the theoretical considerations.

The free monad gets a lot of attention.

Well, one reason is that it requires the Day convolution, which requires existential types. And we have already established that programmers avoid thinking about existential types.

However, those are specifically the formulation of monads and applicatives as monoid objects (in a monoidal category). That is the source of the “a monad is just a monoid in the [monoidal] category of endofunctors [under composition]” meme, while an applicative functor is “just” a monoid in the [monoidal] category of endofunctors under Day convolution.

I argue that it is important to include “under composition” in the description of the monad, because there are many monoidal products available on functors, and functor composition is a surprising choice, since it is not symmetric at all!

In fact, you can think of the Day convolution as a symmetrization of the composition tensor: there is a natural map Day f g ~> Compose f g, and since Day is symmetric, it also has a map Day f g ~> Compose g f via Day g f ~> Day f g.

The monoid operation is an arrow $\operatorname{merge} : M \otimes M \to M$

So for monads it is join :: Compose f f ~> f, which is just forall r. f (f r) -> f r, and for applicatives it is liftA2 :: Day f f ~> f, which expands to forall r x y. (x -> y -> r) -> f x -> f y -> f r, at which point we can either take x = (y -> r) and apply id to get back (<*>), or we can take r = (x, y) and apply (,) to get a tupling function below.

Applicative functors can also be thought of as lax monoidal functors (between two monoidal categories), which come with an operation $\operatorname{tupling} : f(x) \otimes f(y) \to f (x \otimes y)$

As we just said, it comes from the fact that a universal choice for the (x -> y -> r) function is the tuple constructor: because that preserves information, any other choice factors through it.

data FreeCategory p i o where
  Id :: FreeCategory p y y
  -- Existential `y`
  Compose :: p x y -> FreeCategory p y z -> FreeCategory p x z

newtype MonadArrow f i r = MonadArrow (i -> f r)
newtype FreeMonad f r = FreeMonad
  (FreeCategory (MonadArrow f) Unit r)

newtype ApplicativeArrow f i r = ApplicativeArrow (f (i -> r))
newtype FreeApplicative f r = FreeApplicative
  (FreeCategory (ApplicativeArrow f) Unit r)

change of enriching category (via f being a lax monoidal functor), even though we are only going from Set (which is itself a Set-enriched category) back to a different Set-enriched category (whose arrows are f (i -> r)).

We can form an arrow to model the free selective applicative by using CaseTree as our arrow type, as it already includes ApplicativeArrows as a special case (OneCase).

type SelectiveArrow = CaseTree
newtype FreeSelective f r = FreeSelective
  (FreeCategory (SelectiveArrow f) Unit r)

data ControlFlow f i r where
  Action :: (f (i -> r)) -> ControlFlow f i r
  Pure :: (i -> r) -> ControlFlow f i r
  CaseFlow :: CaseTree (ControlFlow f Unit) i r -> ControlFlow f i r
  Sequencing :: ControlFlow f i x -> ControlFlow f x r -> ControlFlow f i r

One function that is important to convince ourselves that ControlFlow really captures control flow, is this sequencing function, which distributes the remaining control flow across each branch of a CaseTree (like a category – matching up output and input).

sequenceCaseTree ::
  forall i j f r.
    Functor f =>
  CaseTree (ControlFlow f Unit) i j ->
  ControlFlow f j r ->
  CaseTree (ControlFlow f Unit) i r

uncons ::
  forall f i r.
    Functor f =>
  ControlFlow f i r ->
  CaseTree (ControlFlow f Unit) i r
uncons (Pure ir) = OneCase (Pure (const ir))
uncons (Action fir) = OneCase (Action (const <$> fir))
uncons (CaseFlow cases) = cases
uncons (Sequencing ab bc) =
  sequenceCaseTree (uncons ab) bc

bifurcate ::
  forall x y i f r.
    Functor f =>
  (i -> Either x y) ->
  ControlFlow f x r ->
  ControlFlow f y r ->
  ControlFlow f i r
bifurcate fg x y = CaseFlow $ TwoCases fg (uncons x) (uncons y)

data ControlFlow f i r where
  Action :: (f (i -> r)) -> ControlFlow f i r
  Pure :: (i -> r) -> ControlFlow f i r
  -- `j` is existential here
  Sequencing :: ControlFlow f i j -> ControlFlow f j r -> ControlFlow f i r
  Absurd :: (i -> Void) -> ControlFlow f i r
  -- `x` and `y` are existential here
  CaseBranch ::
    (i -> Either x y) ->
    ControlFlow f x r ->
    ControlFlow f y r ->
    ControlFlow f i r

However, this is not incredibly useful: a lot of forms of static analysis really want to deal with case branches at once, so they would need to detect CaseBranch and re-gather all the branches up.

Note that even if f is the type of functor where you could statically analyze the values in it (e.g. List as opposed to IO), we are instantiating it with an unknown function type.

I say unknown function type here because even if you could technically analyze the first action in ControlFlow f i r when i is finite, the fact that Sequencing (Pure id) _ constructs another equivalent (equal?) ControlFlow f i r where you can no longer analyze the first action on that basis (since it is now hidden behind an existential) means that you should not.

This hints to us that we can construct another type that consists of all the static information of ControlFlow, forgetting all of the functions (as they are unanalyzable) and thus all of the types:

data FlowInfo f where
  Info :: f () -> FlowInfo f
  Pure :: FlowInfo f
  Sequencing :: FlowInfo f -> FlowInfo f -> FlowInfo f
  Absurd :: FlowInfo f
  CaseBranch :: FlowInfo f -> FlowInfo f -> FlowInfo f

-- In fact, it is an ordinary ADT now
data FlowInfo f
  = Info (f ())
  | Pure
  | Sequencing (FlowInfo f) (FlowInfo f)
  | Absurd
  | CaseBranch (FlowInfo f) (FlowInfo f)

near-semiring.)

summarize :: NearSemiring m => (f () -> m) -> FlowInfo f -> m
summarize f2m (Info f) = f2m f
summarize _ Pure = one
summarize f2m (Sequencing l r) = summarize f2m l * summarize f2m r
summarize _ Absurd = zero
summarize f2m (CaseBranch l r) = summarize f2m l + summarize f2m r

From the laws for a category, we know that Sequencing should be associative. We also have that Sequencing (Pure id) f ~ f ~ Sequencing f (Pure id) should all be equivalent. So once we forget the functions (by going from ControlFlow f i r to FlowInfo f), we need to treat all Pure :: FlowInfo f as equivalent.From a computational point of view, this is because we allow arbitrary type shuffling, while still treating them as trivial programs from the point of view of the computation that f encodes. So Pure is the identity for whatever operation we map Sequencing to.

Similarly we have that CaseBranch is associative, and its identity is Absurd.

We also would like to have that CaseBranch is commutative. The order of cases should not matter, since only one will be taken – in the monadic interpretation, at least. But we should be careful about making such strong determinations about what should and should not count, since that is what has led previous lines of reasoning about selective applicative functors astray. In particular, imposing this law on all instances would mean that a lot of useful instance derived from applicatives are not allowed.Note that for rings, and semirings with additive cancelation, commutativity of addition follows form two-sided distributivity, but this is not an issue for near-semirings as they only have one-sided distributivity.

Finally, we have the matter of distributivity. It turns out that we should only have one sided distributivity: (x + y) * z = x * z + y * z, which implies 0 * z = 0. (This is what makes it a near-semiring instead of a semiring.)

The other side of distributivity is not easy to satisfy for programs, in particular. The distributivity laws above work well for programs, but the other direction does not apply so well:

This makes sense, especially because we are encoding determined choice, where earlier results determine later control flow. We should not be surprised if an arrow of time occurs in our equations!

data ControlFlowT p i r where
  Action :: p i r -> ControlFlow f i r
  Pure :: (i -> r) -> ControlFlow f i r
  Sequencing :: ControlFlow f i j -> ControlFlow f j r -> ControlFlow f i r
  Absurd :: (i -> Void) -> ControlFlow f i r
  CaseBranch ::
    (i -> Either x y) ->
    ControlFlow f x r ->
    ControlFlow f y r ->
    ControlFlow f i r

Finally it is time to address the elephant in the room: what does this have to do with <|> from Alternative?

As I mentioned in the overview, (<|>) :: f x -> f x -> f x (alternatively: f x -> f y -> f (Either x y)) is a combinator for nondeterministic choice, where the combinator itself does not give any information about which branch to take, it is completely up to the implementation.

The main examples of this are LogicT (a monad transformer that encodes nondeterminism, layering List semantics on top of another monad), and parser combinator transformers, the simplest of which would be a hypothetical ParserT. You can think of the basic monads Maybe, Either, List as simplifications of these ideas.

newtype ParserT m a =
  ParserT { unParserT :: String -> m (Either Bool (String, a)) }

instance Functor m => Alt (ParserT m) where
  ParserT l <|> ParserT r = ParserT \input -> do
    parsedL <- l input
    case parsedL of
      Right parsed ->
        pure parsed
      Nothing ->
        -- This encodes backtracking: restarting on the same input
        -- But usually parser combinators check that the input was
        -- not consumed at all, and use a combinator to enable
        -- backtracking in those cases.
        r input

--------------------

newtype LogicT m a =
  LogicT { unLogicT :: forall r. (a -> m r -> m r) -> m r -> m r }

instance Functor m => Alt (LogicT m) where
  LogicT l <|> LogicT r = LogicT \cons nil ->
    l cons (r cons nil)

observeAllT :: Applicative m => LogicT m a -> m [a]

observeManyT :: Monad m => Int -> LogicT m a -> m [a]

observeManyT, to only obtain the first few results.

Still, the common theme is that to obtain even one successful result, backtracking deeply through the computation may be necessary.

Because they are both monads, they support: f (Maybe x) -> f x, via (_ >>= maybe empty pure). This operation makes sense in some applicatives, too, but it needs a specialized implementation. So generally we will talk about mapMaybe :: (x -> Maybe y) -> f x -> f y.

Indeed, we want to talk about structures on applicative functors simply because ParserT and LogicT have no capability for static analysis, as they are monads. So can we combine mapMaybe and <|> to get an explanation of select, branch, and so on?

The answer is yes: If we have an applicative parser without other side effects (i.e. m = Identity) and with enough backtracking, we could encode determined choice via nondeterministic choice.

selectLeft :: Filterable f => f (Either x y) -> f x
selectLeft = mapMaybe \case
  Left x -> Just x
  Right _ -> Nothing
selectRight :: Filterable f => f (Either x y) -> f y
selectRight = mapMaybe \case
  Left _ -> Nothing
  Right y -> Just y

branch :: Applicative f => Filterable f => Alternative f => f (Either x y) -> f (x -> r) -> f (y -> r) -> f r
branch determiner leftCase rightCase =
  liftA2 (&) (selectLeft determiner) leftCase
    <|>
  liftA2 (&) (selectRight determiner) rightCase

-- ^ This easily generalizes to n-ary branches, unlike `branch` itself:

-- | Use `partitionMap`, `<*>`, and `<|>` to derive `Casing` for idempotent effects.
caseTreeOnPlus :: forall f i r. Apply f => Filterable f => Plus f => f i -> CaseTree i f r -> f r

Because this duplicates determiner, it would only be safe for things like ParserT Identity which does not have observable side effects from failed branches.Strictly speaking, you could have idempotent effects: f *> f = f. And it requires backtracking all the way back to before determiner to even catch a Right case out of it, which is something that parser combinators generally avoid for efficiency (for good reason!).

However, this makes sense for LR parser combinators (yes those are a thing! they are necessarily applicative parsers and not monadic). All of the backtracking is resolved ahead of time in LR parser combinators, thanks to the powers of static analysis. It just requires the capability to prune branches at runtime for mapMaybe, which not all LR parsers support.

And importantly: despite the fact that the branching can be explained via <|> and mapMaybe, it is also incredibly useful to have information about CaseTree, because that can be used to help ensure that the parser is not ambiguous.

From monoids to near-semirings: the essence of MonadPlus and Alternative by Exequiel Rivas, Mauro Jaskelioff, and Tom Schrijvers explains the typeclass laws, free constructions, and Cayley constructions (à la difference lists) from this perspective.Note that we could product a similar datatype for CaseTree and ControlFlow, that is a right-associated free construction with an efficient Cayley construction, but in the case of determined choice, this would turn binary search of the case tree into linear search, compromising efficiency.

So the conclusion is that the branching structures of Alternative and Selective are closely related, but not identical. The Alternative can be thought of as a functor with nondeterministic branching, while Selective needs to be explained via arrows to encode finite deterministic branching. They both allow retaining the structure of branches as independent from the sequential structure, and the selective structure can be encoded via <|> and mapMaybe in some well-behaved cases.

选择性应用函子 Selective Applicative Functors

Monoid objects in monoidal categories

Arrows

Arrow transformer

Alternative

选择性应用函子
Selective Applicative Functors