To return to the theoretical considerations.
The free monad gets a lot of attention.
Well, one reason is that it requires the Day convolution, which requires existential types.
And we have already established that programmers avoid thinking about existential types.
Monoid objects in monoidal categories
However, those are specifically the formulation of monads and applicatives as monoid objects (in a monoidal category). That is the source of the “a monad is just a monoid in the [monoidal] category of endofunctors [under composition]” meme, while an applicative functor is “just” a monoid in the [monoidal] category of endofunctors under Day convolution.
I argue that it is important to include “under composition” in the description of the monad, because there are many monoidal products available on functors, and functor composition is a surprising choice, since it is not symmetric at all!
In fact, you can think of the Day convolution as a symmetrization of the composition tensor: there is a natural map Day f g ~> Compose f g, and since Day is symmetric, it also has a map Day f g ~> Compose g f via Day g f ~> Day f g.
The monoid operation is an arrow in the functor category: that is, a natural transformation.
So for monads it is join :: Compose f f ~> f, which is just forall r. f (f r) -> f r, and for applicatives it is liftA2 :: Day f f ~> f, which expands to forall r x y. (x -> y -> r) -> f x -> f y -> f r, at which point we can either take x = (y -> r) and apply id to get back (<*>), or we can take r = (x, y) and apply (,) to get a tupling function below.
Applicative functors can also be thought of as lax monoidal functors (between two monoidal categories), which come with an operation . (In the case of applicative functors, both are the product .)
As we just said, it comes from the fact that a universal choice for the (x -> y -> r) function is the tuple constructor: because that preserves information, any other choice factors through it.
Arrows
We can formulate them differently. Instead of thinking of monads and applicatives as lax monoidal functors, we can think of them as arrows.
As a warmup, a free category can be expressed as
data FreeCategory p i o where
Id :: FreeCategory p y y
-- Existential `y`
Compose :: p x y -> FreeCategory p y z -> FreeCategory p x z(This is an analogous structure to the humble cons list, since it has the non-recursive structure p _ _ on the left and the recursion FreeCategory p _ _ on the right.)
(Note that there’s some trickery around categories that are also profunctors here [indeed, profunctors with strength], that we will sidestep by only working in categories based on functions.)
newtype MonadArrow f i r = MonadArrow (i -> f r)
newtype FreeMonad f r = FreeMonad
(FreeCategory (MonadArrow f) Unit r)
newtype ApplicativeArrow f i r = ApplicativeArrow (f (i -> r))
newtype FreeApplicative f r = FreeApplicative
(FreeCategory (ApplicativeArrow f) Unit r)The “monad arrow” here is also known as the Kleisli category of the monad f.
It turns out that the “applicative arrow” is explained as a change of enriching category (via f being a lax monoidal functor), even though we are only going from Set (which is itself a Set-enriched category) back to a different Set-enriched category (whose arrows are f (i -> r)).
We can form an arrow to model the free selective applicative by using CaseTree as our arrow type, as it already includes ApplicativeArrows as a special case (OneCase).
type SelectiveArrow = CaseTree
newtype FreeSelective f r = FreeSelective
(FreeCategory (SelectiveArrow f) Unit r)However, it is nice to write out the cases like this, to really grasp how they model control flow for an applicative functor:
data ControlFlow f i r where
Action :: (f (i -> r)) -> ControlFlow f i r
Pure :: (i -> r) -> ControlFlow f i r
CaseFlow :: CaseTree (ControlFlow f Unit) i r -> ControlFlow f i r
Sequencing :: ControlFlow f i x -> ControlFlow f x r -> ControlFlow f i rOne function that is important to convince ourselves that ControlFlow really captures control flow, is this sequencing function, which distributes the remaining control flow across each branch of a CaseTree (like a category – matching up output and input).
sequenceCaseTree ::
forall i j f r.
Functor f =>
CaseTree (ControlFlow f Unit) i j ->
ControlFlow f j r ->
CaseTree (ControlFlow f Unit) i rIt is used to implement uncons (Sequencing _ _), which says that we can think of each ControlFlow as starting with a CaseTree if we want.
(For Pure and Action, this is a single-branch case tree.)
uncons ::
forall f i r.
Functor f =>
ControlFlow f i r ->
CaseTree (ControlFlow f Unit) i r
uncons (Pure ir) = OneCase (Pure (const ir))
uncons (Action fir) = OneCase (Action (const <$> fir))
uncons (CaseFlow cases) = cases
uncons (Sequencing ab bc) =
sequenceCaseTree (uncons ab) bcFinally, this tells us that we can split control flow over ControlFlow f (Either x y) just like we do for CaseTree.
bifurcate ::
forall x y i f r.
Functor f =>
(i -> Either x y) ->
ControlFlow f x r ->
ControlFlow f y r ->
ControlFlow f i r
bifurcate fg x y = CaseFlow $ TwoCases fg (uncons x) (uncons y)Notice that we can essentially inline CaseTree into this data type too:
data ControlFlow f i r where
Action :: (f (i -> r)) -> ControlFlow f i r
Pure :: (i -> r) -> ControlFlow f i r
-- `j` is existential here
Sequencing :: ControlFlow f i j -> ControlFlow f j r -> ControlFlow f i r
Absurd :: (i -> Void) -> ControlFlow f i r
-- `x` and `y` are existential here
CaseBranch ::
(i -> Either x y) ->
ControlFlow f x r ->
ControlFlow f y r ->
ControlFlow f i rHowever, this is not incredibly useful: a lot of forms of static analysis really want to deal with case branches at once, so they would need to detect CaseBranch and re-gather all the branches up.
Note that even if f is the type of functor where you could statically analyze the values in it (e.g. List as opposed to IO), we are instantiating it with an unknown function type.
I say unknown function type here because even if you could technically analyze the first action in ControlFlow f i r when i is finite, the fact that Sequencing (Pure id) _ constructs another equivalent (equal?) ControlFlow f i r where you can no longer analyze the first action on that basis (since it is now hidden behind an existential) means that you should not.
This hints to us that we can construct another type that consists of all the static information of ControlFlow, forgetting all of the functions (as they are unanalyzable) and thus all of the types:
data FlowInfo f where
Info :: f () -> FlowInfo f
Pure :: FlowInfo f
Sequencing :: FlowInfo f -> FlowInfo f -> FlowInfo f
Absurd :: FlowInfo f
CaseBranch :: FlowInfo f -> FlowInfo f -> FlowInfo f
-- In fact, it is an ordinary ADT now
data FlowInfo f
= Info (f ())
| Pure
| Sequencing (FlowInfo f) (FlowInfo f)
| Absurd
| CaseBranch (FlowInfo f) (FlowInfo f)Notice how this looks like an untyped AST for a program now! We removed the existentials and the polymorphic recursion: it is the plainest of plain data types now.
In fact, you might spy what it has turned into: it is a free semiring, or something close to it. (Specifically it should be a near-semiring.)
summarize :: NearSemiring m => (f () -> m) -> FlowInfo f -> m
summarize f2m (Info f) = f2m f
summarize _ Pure = one
summarize f2m (Sequencing l r) = summarize f2m l * summarize f2m r
summarize _ Absurd = zero
summarize f2m (CaseBranch l r) = summarize f2m l + summarize f2m rFrom the laws for a category, we know that Sequencing should be associative.
We also have that Sequencing (Pure id) f ~ f ~ Sequencing f (Pure id) should all be equivalent.
So once we forget the functions (by going from ControlFlow f i r to FlowInfo f), we need to treat all Pure :: FlowInfo f as equivalent.From a computational point of view, this is because we allow arbitrary type shuffling, while still treating them as trivial programs from the point of view of the computation that f encodes.
So Pure is the identity for whatever operation we map Sequencing to.
Similarly we have that CaseBranch is associative, and its identity is Absurd.
We also would like to have that CaseBranch is commutative.
The order of cases should not matter, since only one will be taken – in the monadic interpretation, at least.
But we should be careful about making such strong determinations about what should and should not count, since that is what has led previous lines of reasoning about selective applicative functors astray.
In particular, imposing this law on all instances would mean that a lot of useful instance derived from applicatives are not allowed.Note that for rings, and semirings with additive cancelation, commutativity of addition follows form two-sided distributivity, but this is not an issue for near-semirings as they only have one-sided distributivity.
Finally, we have the matter of distributivity.
It turns out that we should only have one sided distributivity: (x + y) * z = x * z + y * z, which implies 0 * z = 0.
(This is what makes it a near-semiring instead of a semiring.)
The other side of distributivity is not easy to satisfy for programs, in particular. The distributivity laws above work well for programs, but the other direction does not apply so well:
This makes sense, especially because we are encoding determined choice, where earlier results determine later control flow. We should not be surprised if an arrow of time occurs in our equations!
Arrow transformer
You might note that weʼve barely been using f here, just as a placeholder for a convenient arrow type.
This suggests that we could formulate ControlFlow as a arrow transformer, and require the right kind of arrow p (a strong profunctor and category, at least) to make ControlFlowT p behave as we want.
data ControlFlowT p i r where
Action :: p i r -> ControlFlow f i r
Pure :: (i -> r) -> ControlFlow f i r
Sequencing :: ControlFlow f i j -> ControlFlow f j r -> ControlFlow f i r
Absurd :: (i -> Void) -> ControlFlow f i r
CaseBranch ::
(i -> Either x y) ->
ControlFlow f x r ->
ControlFlow f y r ->
ControlFlow f i rI have not thought about this much.
Alternative
Finally it is time to address the elephant in the room: what does this have to do with <|> from Alternative?
As I mentioned in the overview, (<|>) :: f x -> f x -> f x (alternatively: f x -> f y -> f (Either x y)) is a combinator for nondeterministic choice, where the combinator itself does not give any information about which branch to take, it is completely up to the implementation.
The main examples of this are LogicT (a monad transformer that encodes nondeterminism, layering List semantics on top of another monad), and parser combinator transformers, the simplest of which would be a hypothetical ParserT.
You can think of the basic monads Maybe, Either, List as simplifications of these ideas.
newtype ParserT m a =
ParserT { unParserT :: String -> m (Either Bool (String, a)) }
instance Functor m => Alt (ParserT m) where
ParserT l <|> ParserT r = ParserT \input -> do
parsedL <- l input
case parsedL of
Right parsed ->
pure parsed
Nothing ->
-- This encodes backtracking: restarting on the same input
-- But usually parser combinators check that the input was
-- not consumed at all, and use a combinator to enable
-- backtracking in those cases.
r input
--------------------
newtype LogicT m a =
LogicT { unLogicT :: forall r. (a -> m r -> m r) -> m r -> m r }
instance Functor m => Alt (LogicT m) where
LogicT l <|> LogicT r = LogicT \cons nil ->
l cons (r cons nil)
observeAllT :: Applicative m => LogicT m a -> m [a]
observeManyT :: Monad m => Int -> LogicT m a -> m [a]ParserT is the most important example of the usefulness of <|>, where even something as simple as taking different actions based on different characters of input is encoded using <|>.
LogicT is a monad transformer with pervasive backtracking, modeling nondeterminism of the kind “run every possible computation to find every possible result”, essentially.
It can also be lazily explored with observeManyT, to only obtain the first few results.
Still, the common theme is that to obtain even one successful result, backtracking deeply through the computation may be necessary.
Because they are both monads, they support: f (Maybe x) -> f x, via (_ >>= maybe empty pure).
This operation makes sense in some applicatives, too, but it needs a specialized implementation.
So generally we will talk about mapMaybe :: (x -> Maybe y) -> f x -> f y.
Indeed, we want to talk about structures on applicative functors simply because ParserT and LogicT have no capability for static analysis, as they are monads.
So can we combine mapMaybe and <|> to get an explanation of select, branch, and so on?
The answer is yes: If we have an applicative parser without other side effects (i.e. m = Identity) and with enough backtracking, we could encode determined choice via nondeterministic choice.
selectLeft :: Filterable f => f (Either x y) -> f x
selectLeft = mapMaybe \case
Left x -> Just x
Right _ -> Nothing
selectRight :: Filterable f => f (Either x y) -> f y
selectRight = mapMaybe \case
Left _ -> Nothing
Right y -> Just y
branch :: Applicative f => Filterable f => Alternative f => f (Either x y) -> f (x -> r) -> f (y -> r) -> f r
branch determiner leftCase rightCase =
liftA2 (&) (selectLeft determiner) leftCase
<|>
liftA2 (&) (selectRight determiner) rightCase
-- ^ This easily generalizes to n-ary branches, unlike `branch` itself:
-- | Use `partitionMap`, `<*>`, and `<|>` to derive `Casing` for idempotent effects.
caseTreeOnPlus :: forall f i r. Apply f => Filterable f => Plus f => f i -> CaseTree i f r -> f rBecause this duplicates determiner, it would only be safe for things like ParserT Identity which does not have observable side effects from failed branches.Strictly speaking, you could have idempotent effects: f *> f = f.
And it requires backtracking all the way back to before determiner to even catch a Right case out of it, which is something that parser combinators generally avoid for efficiency (for good reason!).
However, this makes sense for LR parser combinators (yes those are a thing! they are necessarily applicative parsers and not monadic).
All of the backtracking is resolved ahead of time in LR parser combinators, thanks to the powers of static analysis.
It just requires the capability to prune branches at runtime for mapMaybe, which not all LR parsers support.
And importantly: despite the fact that the branching can be explained via <|> and mapMaybe, it is also incredibly useful to have information about CaseTree, because that can be used to help ensure that the parser is not ambiguous.
Finally, it is interesting to note that Alternative functors have also been explained as categorified near-semirings (and thus support static analysis via near-semirings): From monoids to near-semirings: the essence of MonadPlus and Alternative by Exequiel Rivas, Mauro Jaskelioff, and Tom Schrijvers explains the typeclass laws, free constructions, and Cayley constructions (à la difference lists) from this perspective.Note that we could product a similar datatype for CaseTree and ControlFlow, that is a right-associated free construction with an efficient Cayley construction, but in the case of determined choice, this would turn binary search of the case tree into linear search, compromising efficiency.
So the conclusion is that the branching structures of Alternative and Selective are closely related, but not identical.
The Alternative can be thought of as a functor with nondeterministic branching, while Selective needs to be explained via arrows to encode finite deterministic branching.
They both allow retaining the structure of branches as independent from the sequential structure, and the selective structure can be encoded via <|> and mapMaybe in some well-behaved cases.