Lotteries: a constructive version of the distributions monad

probability

polynomial functors

category theory

Author

Published

2023-03-23

Abstract

Many category theorists have heard of the distributions monad \mathsf{dist}\colon\mathbf{Set}\to\mathbf{Set}, which sends a set X to the set of finitely supported probability distributions on X. Many have also heard of the operad \Delta of simplices, for which an n-ary operation is a probability distribution on n elements. This operad \Delta corresponds to a certain polynomial monad on \mathbf{Set}, which I’ll denote \mathsf{lott}. What monad is it, and how does it compare to \mathsf{dist}? In this post I’ll explain how to think of it as a monad of lotteries, where the number of tickets—like sides of a coin or faces of a die—are known in advance. I’ll also explain its relation to \mathsf{dist} and how conditioning looks in this setting.

This post used to be called “The distributions monad is a retract of the lotteries monad”, but that statement is not in fact true: the previously-proposed transformation \mathsf{dist}\to\mathsf{lott} is not natural. The post has been edited to correct this error.

1 Introduction

Polynomial functors are nice: they preserve connected limits, such as pullbacks, and they’re pretty easy to think about. A polynomial functor p=\sum_{i\in I}X^{A_i} sends a set X to a disjoint union of powers of X. Polynomial monads are just monads whose underlying functor is polynomial, i.e. preserves connected limits.

On the other hand, probability monads, do not preserve connected limits. For example, consider the monad \mathsf{dist} that sends a set X to the set \mathsf{dist}(X) of finitely-supported probability distributions on X. ¹ Then if there were an isomorphism \mathsf{dist}(X\times_1Y)\cong^?\mathsf{dist}(X)\times_{\mathsf{dist}(1)}\mathsf{dist}(Y) it would imply that every joint distribution on X\times Y is independent.

So you may find it weird to know that there is a polynomial monad \mathsf{lott}, of which \mathsf{dist} is a quotient! That is, there is a surjective natural transformation \mathsf{lott}\to\mathsf{dist}. We will see that \mathsf{lott} comes from the operad of simplices; if you haven’t heard of it, don’t worry: I will explain.

I think of \mathsf{lott} as being more “natural” than \mathsf{dist} is, i.e. closer to the way probability shows up in nature; I’ll explain why when I introduce \mathsf{lott} in the next section. The fact that \mathsf{dist} is an abstraction of (think: syntax for) this more natural thing, means that not so much is really lost if we think in terms of \mathsf{lott}. Now I’m not a probabilist—I’m just a \mathbf{Poly}_{}-lover—so I don’t know how to evaluate how useful \mathsf{lott} really is, but colleagues I’ve explained it to say that \mathsf{lott} is in fact closer to how they think of distributions anyway. Much or all of the stuff below is probably well-known, but it wasn’t to me; if you know a reference, please feel free to leave it in the comments below.

In the remainder of this post, I’ll first talk about the monad \mathsf{lott} underlying the operad of simplices. Then I’ll talk about how \mathsf{dist} is a quotient of it. Finally I’ll explain how one does conditioning with \mathsf{lott} and then very briefly conclude.

2 The monad of lotteries

For any finite set N, define \Delta_N to be the set of probability distributions on N, \tag{$\Delta$} \Delta_N\coloneqq\left\{P\colon N\to[0,1]\;\middle|\; 1=\sum_{n:N}P(n)\right\} These actually form the operations of an operad. You can think of an element of \Delta_N as an N-ary operation. To think about composing these operations, here’s the explanation I learned from Tom Leinster. Suppose we can flip a two-sided coin, and then if we get heads we roll a six-sided die and if we get tails we pick a card from a 52-card deck. This corresponds to composing operations, and the result is a probability distribution on 6+52 elements, \frac{1}{12},\ldots,\frac{1}{12},\frac{1}{104},\ldots,\frac{1}{104}. This composition is represented as a function \Delta_2\times(\Delta_6\times\Delta_{52})\to\Delta_{58}. In general, composition in the operad is given by functions \Delta_N\times(\Delta_{M_1}\times\cdots\times\Delta_{M_N})\to\Delta_{M_1+\cdots+M_N}

Any operad like this (any \Sigma-free operad) has an underlying polynomial monad. ² We’ll denote the polynomial monad corresponding to \Delta by \mathsf{lott}; its underlying functor is: \mathsf{lott}\coloneqq\sum_{N\in\mathbb{N}}\Delta_N\,\mathcal{y}^N. As a polynomial, a position in \mathsf{lott} is a finite set N and a distribution P\colon N\to[0,1] on it; a direction there is an element of N.

I want to think about this in terms of “lotteries” because the Wikipedia article on the Von Neumann–Morgenstern utility theorem says that a lottery is a set of mutually exclusive outcomes together with a probability distribution on them. That sounds exactly like the summands of \mathsf{lott}: \mathsf{lott}(1)=\sum_{N\in\mathbb{N}}\Delta_N.

Here’s how to think of \mathsf{lott} as a functor. Given a set X, an element of \mathsf{lott}(X) consists of a tuple (N,P,f), where (N,P) is a lottery and f\colon N\to X is a function that takes every outcome n:N to an element of X. Let’s call this a lottery with returns in X. So if you say “I’m going to flip a coin, and if it comes up heads I’ll get a coffee, and if it comes up tails I’ll keep walking”, then out of all the possible actions X you can take, you’ve given a function f\colon\{H,T\}\to X, i.e. you’ve given yourself an element (N,P,f):\mathsf{lott}(X). This coin-flip is a lottery that returns an element of X, i.e. an action.

We know this functor \mathsf{lott} is a monad because it comes to us from the operad \Delta. But just to be explicit, we need natural maps X\to\mathsf{lott}(X) \qquad\text{and}\qquad \mathsf{lott}(\mathsf{lott}(X))\to\mathsf{lott}(X) These are not hard; I’ll drop them in a footnote, ³ to let things flow better.

Let’s return to something we said before, that \mathsf{lott} preserves connected limits. For simplicity, let’s just consider the case of pullbacks. Suppose given sets and functions X\to Z\leftarrow Y. Because \mathsf{lott} is a polynomial functor and polynomial functors commute with connected limits, we must have that a lottery on a pullback X\times_ZY is just a pullback of lotteries \mathsf{lott}(X\times_ZY)\cong\mathsf{lott}(X)\times_{\mathsf{lott}(Z)}\mathsf{lott}(Y). The reason it works is that on the right-hand side, the maps \mathsf{lott}(X)\to\mathsf{lott}(Z)\leftarrow\mathsf{lott}(Y) preserve the set of lottery tickets and the distribution on them, and only change the way those tickets return as elements of X, Y, and Z.

I should also mention that, like \mathsf{dist}, the monad \mathsf{lott} is lax monoidal: ⁴ \mathsf{lott}(X)\times\mathsf{lott}(Y)\to\mathsf{lott}(X\times Y) meaning that we can construct independent distributions.

3 The surjection of monads \mathsf{lott}\to\mathsf{dist}

Our next goal is to understand the difference between \mathsf{lott} and \mathsf{dist}.

The distributions monad \mathsf{dist}\colon\mathbf{Set}\to\mathbf{Set} can also be defined in terms of \Delta: \mathsf{dist}(X)\coloneqq\{P\colon X\to[0,1]\mid\exists X':\mathbb{N}, \textnormal{ injection } i\colon X'\rightarrowtail X, P'\in\Delta_{X'}, P=i_*(P')\} in other words, a distribution on X is an element of \Delta_{X'} for some some finite subset X'\subseteq X. The i_*(P) notation for push-forward measure is easy when i is injective, like it is above: to each x:X we assign i_*(P)\coloneqq P(x) if x\in X' and i_*(P)\coloneqq 0 otherwise. But in general, the f_* construction requires an uncomputable aspect—a “finite fiber” condition—indicating another advantage of \mathsf{lott}, which is fully computable. ⁵

The lotteries monad and the distributions monad can feel surprisingly similar. How do we feel into the difference between choosing a finite subset of X and a distribution on it versus choosing a distribution on some set of tickets and then assigning each ticket to an element of X?

The easiest way to distinguish the distributions monad from the lotteries monad is to see that \mathsf{dist}(1)=1 whereas \mathsf{lott}(1)=\sum_{N\in\mathbb{N}}\Delta_N is the union of the points in every simplex. Again, \mathsf{lott} tracks the set of tickets. Another way to distinguish them is to consider the existential \exists in the definition of \mathsf{dist}.

The categorical relationship between \mathsf{lott} and \mathsf{dist} is that there is a natural transformation \mathsf{lott}\xrightarrow{\alpha}\mathsf{dist} which quantifies out the set of tickets. Let’s explain this map \alpha.

Suppose X:\mathbf{Set} is a set. The X-component of \alpha\colon\mathsf{lott}\to\mathsf{dist} sends (N,P,f) to f_*(P), i.e. to the distribution x\mapsto\sum_{\{n:N\mid f(n)=x\}}P(n). Our functor \alpha extends to a map of monads because the unit X\to\mathsf{lott}(X) is the unique lottery on 1 element and this is sent to the dirac distribution, and the multiplication is similarly sent to the appropriate integration of probability distributions.

It can be tempting ⁶ to consider a section \mathsf{dist}\xrightarrow{\beta?}\mathsf{lott} of \alpha, defined as follows. For any element of P\in\mathsf{dist}(X), i.e. finitely-supported distribution on X, we can find some (X', i, P'), where i\colon X'\subseteq X is a finite subset (with inclusion denoted i) and P':\Delta_{X'} is a distribution on it such that i_*(P')=P. Thinking of (X',P') as a lottery and of i as how it returns to X, we could try to write \beta(N,i,P)\coloneqq^?(X',P',i). But the problem is that N was only existentially defined: there is no uniform way to pick N. Any such way to choose N will not be natural with respect to functions X\to Y. So we cannot define such a \beta.

4 Conditioning with \mathsf{lott}

Given a probability distribution P on X, one conditions it on a subset X'\subseteq X by restricting to that subset and renormalizing, i.e. picking a scaling factor and scaling all the probabilities P(x') up by that amount, so that in the end they sum to 1. Sometimes that’s impossible, because for each x'\in X' we have P(x')=0, but other than that case it all makes sense. ⁷

We can do something similar using \mathsf{lott}. The interesting part is a map \tag{\textbf{Cond}} \mathsf{lott}\mathbin{\triangleleft}(\mathcal{y}+ 1)\xrightarrow{\varphi} (\mathcal{y}+ 1) \mathbin{\triangleleft}\mathsf{lott} that is like conditioning the lottery itself, before we include anything about X. An input position to \varphi can be identified with a tuple (N,P, N'), where N:\mathbb{N} is a natural, P:N\to[0,1] is a distribution on it, and N'\subseteq N is a subset. ⁸ For notation reasons, let’s use the isomorphism (\mathcal{y}+1)\mathbin{\triangleleft}\mathsf{lott}\cong\mathsf{lott}+\{\texttt{Nothing}\}. Then \varphi’s job is to send (N,P,N') either to a new lottery or to \texttt{Nothing}. Letting T\coloneqq\sum_{n':N'}P(n') be the total probability mass on N', we define the forward direction of \varphi by \varphi_1(N,P, N')\coloneqq \begin{cases} \texttt{Nothing}&\textnormal{ if }T=0\\ \left(N',n'\mapsto \frac{P(n')}{T}\right)&\textnormal{ if }T\neq 0 \end{cases} The backwards direction of \varphi is straightforward, so I’ll leave that to the interested reader.

Note that the form of (Cond) looks like a distributive law, but it’s not one. ⁹ Three of the four laws hold, but the most crucial of them, about the equality of two maps \mathsf{lott}\mathbin{\triangleleft}\mathsf{lott}\mathbin{\triangleleft}(\mathcal{y}+1)\to(\mathcal{y}+1)\mathbin{\triangleleft}\mathsf{lott}, doesn’t hold.

Now let me get to the part that’s more like conditioning. Suppose you have a lottery on X and a subset X'\subseteq X that you want to condition on. What you do is use the subset to form a map f\colon X\to X'+\{\texttt{Nothing}\}, given by f(x)\coloneqq \begin{cases} x&\textnormal{ if } x\in X'\\ \texttt{Nothing}&\textnormal{ if } x\not\in X' \end{cases} Thinking of this as a map f\colon X\to (\mathcal{y}+1)\mathbin{\triangleleft}X', we’re ready to condition! Indeed, we use the following function: \mathsf{lott}(X)\cong\mathsf{lott}\mathbin{\triangleleft}X\xrightarrow{\mathsf{lott}\mathbin{\triangleleft}f} \mathsf{lott}\mathbin{\triangleleft}(\mathcal{y}+1)\mathbin{\triangleleft}X'\xrightarrow{\varphi\mathbin{\triangleleft}X'} (\mathcal{y}+1)\mathbin{\triangleleft}\mathsf{lott}\mathbin{\triangleleft}X'\cong \mathsf{lott}(X')+1 Given a lottery on X, we get either a lottery on X', the correctly conditioned thing, or if there isn’t one, we get Nothing. This is what we wanted! ¹⁰

5 Conclusion

All of the above could just be seen as a way to move ideas from very basic probability theory into \mathbf{Poly}_{}. To my naive eye, not much is lost by doing this, since the relevant monad \mathsf{dist} is a quotient of \mathsf{lott}. And something is gained: polynomial monads are nice things, and somehow I find it easier to think about lotteries, which feel a little more natural to me than abstract probability distributions do, because there are actual cards or dice or coins involved. Moreover, \mathsf{lott} is more constructive: the finiteness condition is replaced by structure; the existential is replaced by a witness. Finally we obtain conditioning using a structure on \mathsf{lott}, rather than as emerging from a special factorization that may or may not exist. What do you think?

Footnotes

An element of \mathsf{dist}(X) consists of a finite subset X'\subseteq X and a function P\colon X'\to[0,1] such that 1=\sum_{x'\in X'}P(x').↩︎
Verity Scheel told me that this monad has (roughly-speaking) been implemented in Haskell. Namely, \mathsf{lott}(X) is implemented as the set of lists [(x_1,p_1),\ldots,(x_N,p_N)], where x_i:X and the p_i’s are any numeric type. Nothing in this implementation forces the x_i’s to be distinct (which is good from the \mathsf{lott} point of view), and nothing forces the p_i to add up to 1, which is bad from the \mathsf{lott} point of view.↩︎
The map X\to\mathsf{lott}(X) sends x:X to (1,!,x):\mathsf{lott}(X), the unique probability distribution on 1 outcome, and the assignment of that outcome to x. An input to the second map \mathsf{lott}(\mathsf{lott}(X))\to\mathsf{lott}(X) consists of a lottery (M,P,f) where M:\mathbb{N}, P:\Delta_M, and f\colon M\to\mathsf{lott}(X) is a function sending each m:M to a lottery (N_m,Q_m,g_m) where N_m:\mathbb{N}, Q_m:\Delta_{N_m}, and g_m\colon N_m\to X. Sounds complicated, but it’s just like Leinster’s explanation of a coin, a die, and a deck of cards, and you deal with it the same way. Namely, you send the above input to (O,R,h), where O\coloneqq\sum_{m\in M}N_m \qquad\text{and}\qquad R(m,n)=P(m)*Q(n) \qquad\text{and}\qquad h(m,n)=g_m(n).↩︎
Indeed, every polynomial monad is lax monoidal.↩︎
Given a function f\colon X\to Y, we say it has finite fibers if f^{-1}(y) is finite for each y\in Y. Given such a function and a function P\colon X\to[0,1], we define f_*(P)\colon Y\to \mathbb{R}, the push-forward of P along f by y\mapsto\sum_{x\in f^{-1}(y)}P(x)↩︎
Indeed, I posted a previous version of these ideas under the false impression that such a map of monads exists.↩︎
Given a distribution P on X\times Y, one obtains a conditional map from X to distributions on Y using the above means. Given any x_0\in X, renormalize on the subset \{(x,y)\in X\times Y\mid x=x_0\}.↩︎
Here we’ve identified a map N\to 2 with a subset N'\subseteq N.↩︎
For any polynomial monad t, there is a distributive law (\mathcal{y}+1)\mathbin{\triangleleft}t\to t\mathbin{\triangleleft}(\mathcal{y}+1), but that goes oppositely to \varphi. In fact \varphi is a retraction of that map.↩︎
Given a lottery \ell:\mathsf{lott}(X\times Y), we can get a map X\to\mathsf{lott}(Y)+1 as follows. We have X\xrightarrow{X\times\ell} X\times\mathsf{lott}(X\times Y)\to\mathsf{lott}(X)\times\mathsf{lott}(X\times Y)\to\mathsf{lott}(X\times X\times Y). The diagonal X\subseteq X\times X gives rise to a subset that we can condition on as above.↩︎