# Towards a Research Program on Compositional World-Modeling

In this post, we lay out a vision and challenge for using category theory to build tools for understanding systems at a global scope.

# 1 The problem

In scientific applications (biology, ecology, epidemiology, chemistry, physics, economics, etc.) and related engineering domains, full modeling of a complex system frequently involves the following considerations:

**Multi-disciplinarity**. Problems may show up in a certain domain that cannot be solved with the customary mathematical tools and software frameworks of that domain, and the mathematical, scientific, technical, and cultural assumptions of another scientific domain may be illegible from the perspective of the original domain. For instance, modeling hydraulic systems often involves both rigid body physics and fluid physics.**Openness**. One is not always lucky enough that one part of the world can be fully isolated from the rest. Rather, systems tend to have interfaces or boundaries, on which (and beyond which) one’s model does not make endogenous predictions—but on which its predictions must depend. For instance, this is typically the case of electronic circuits; they have terminals that are intended to be connected to other systems.**Continuity in time**. The real world is typically not parceled into discrete time steps. While a sufficiently small time step can generally be chosen, this comes at a great computational cost. We would prefer to be able to reason flexibly about timescales, with a guarantee that for any nonnegative rationals s,t \in \mathbb{Q}^{\geq 0}, the forward propagation T_{s+t} is compatible with T_s ; T_t in the sense that, for any initial knowledge K, T_s T_t K \Rightarrow T_{s+t} K.**Continuity in space**. We often are interested in systems whose states have continuous extent. i.e. electric fields, fluids, deformable bodies, etc., meaning that the state space is naturally an infinite-dimensional function space.**Stochasticity**. We often observe systems at a scale where many degrees of freedom are not visible, and even when they are visible, it is frequently advantageous to ignore them. However, these hidden degrees of freedom affect the larger scale behavior in the form of*noise*, and cause the system to behave stochastically rather than deterministically. The weather is one good example of a system which has some amount of regularity, but also a good deal of noise.**Nondeterminism**. Even stochasticity makes strong claims about uncertain behavior, by asserting precisely known probabilities of any observable. In many cases, we only know that probabilities lie within a certain range, or we do not know them at all. This motivates “imprecise probability,” which we believe is best understood as a monad which naturally generalizes both the probability monad and the nondeterminism monad (such as the monad of convex subsets of probability-distribution space). The behavior of your opponent in chess is nondeterministic; it doesn’t make sense to have a full probability model for the state of your opponent’s brain.**Partiality**. Probabilistic programs may in general be nonterminating, or they may reject inconsistent inputs; these both motivate a general notion of partiality, or the ability of processes to result in \bot (a state representing that something went wrong). For instance, you might want to sample from the inside of a disk by sampling from a square, and throwing away points not inside the disk.**Hybridness**. Complex systems often involve interactions between continuous and discrete (or discretely modeled) components. Discrete components may transition on a fixed timescale, or at times generated by a point process that depends on the outputs of continuous components. Continuous components may depend on the outputs of discrete components. A classical example of this is a hybrid digital-analogue circuit.

Many of these considerations greatly enlarge the space of solutions: the space of time-varying probabilitity distributions over piecewise-continuous vector fields on a manifold is incredibly “large”, in the sense that *multiple* levels of discretization are required to convert this into something that can be represented on a computer.

Analytic or numerical solutions to problems which are “that large” are, in general, intractable. However, a logically-specified “full model” of the situation is still useful, as a ground truth that more tractable approximations can be compared to, or that behavioral guarantees can be extracted from.

It is for this reason that we would like a compositional, formal framework for such systems, which could be used to design and analyze complex models. We expect that category theory is a fruitful method for finding such a framework.

Solving such “large systems” is a problem that has been attacked by brilliant mathematical physicists for generations; the usefulness of our approach is not necessarily predicated on the idea that a compositional framework would significantly advance the state of the art with regard to concrete solutions or theorems for such systems.

However, having a formal framework would organize the field in such a way as to make it easier to produce rigorously verified and interpretable computer-aided analyses. Moreover, organizing software implementation makes it possible to take existing techniques much farther in less time and code.

In the next section, we discuss applications of such a framework, and then in the last section we will discuss some concrete criteria we have for such a framework, as well as tentative approaches we are pursuing.

Finally, we should say that by “formal framework”, we do not mean that all “large systems” will necessarily be captured by a single definition. Rather, we expect that we will capture certain classes of large systems with different definitions, but that we will be able to relate and translate these different definitions formally. The reason this is desirable is that we want to be able to take advantage of the special properties of different classes of system.

For instance, we might have some definition for systems whose solutions are smooth, and be able to prove some theorems that wouldn’t hold for systems with non-smooth solutions. However, we should be able to also translate a smooth system into a more general class of systems, which we might want to do in order to compose it with a non-smooth system or stochastic system.

Some examples of types of systems that we would possibly like to fit within our framework are the following.

- Differential equations
- ordinary
- partial
- stochastic
- random
- jump-diffusion

- Markov processes
- discrete-time Markov chains
- continuous-time Markov chains
- Markov decision processes
- open games
- (open) mean-field games
- Markov automata

- “Hybrid systems”
- Probabilistic models
- probabilistic graphical models / Bayesian networks / causalgraphs
- corecursive programs in a functional probabilistic programming language (including, notably, large language models based on Transformer neural networks)
- generative probabilistic logic programs
- score-based generative models (including, notably, diffusion models based on U-Net neural networks)

Classically, these are all different paradigms of system. However, redefining these systems categorically will mean that it will be easier to relate them to each other. This doesn’t necessarily mean that there will be categories for each type of system; we don’t yet know the precise type of categorical gadget we will need for modeling. It could be monoidal categories, it could be pseudo lax virtual triple categories (whatever that is). But morally speaking, there should be some sort of functorial interpretation of certain types of systems into others. And additionally, we should be able to “bottom out” out in a behavioral interpretation in which systems are described merely as joint epistemic states regarding their observables, which can be composed by intersection.

# 2 Applications

Decision-making in complex environments is often critically dependent on coherent predictions, informed by observational data (o : O), of how actions (a : A) affect the probability of queries Q (each Boolean-valued) pertaining to a future trajectory x(t) \mathrm{Pr}\!\left( \mathrm{Q}(x(t)) \;\middle|\; o : O, \text{do } a : A \right)

Some examples include:

- Deciding what policy interventions to perform during a pandemic:
- observations include recorded cases and outcomes
- the complex environment includes asymptomatic cases, the details of mobility, the virulence of different strains, etc.

- Deciding when to switch on and off power-generating stations, and when to charge or discharge grid-scale energy storage:
- observations include voltages (magnitude and phase) and currents
- the complex environment includes transmission lines, predictable and unpredictable sources of power demand (at various price levels), weather (affecting renewable capacity), etc.

- Deciding what incentives to provide for reducing the emissions of climate-affecting gases:
- observations include weather data, sea levels, sea temperatures, energy supply levels, recorded material flows, etc.
- the complex environment includes atmospheric advection, cloud processes and albedo, radiative forcing, oceanic eddies, etc.

- Planning a project with many subgoals (and sub-subgoals, etc.) for which various alternative approaches may be viable:
- observations include status reports, estimates, milestone completion events, etc.
- the complex environment includes the changing availability of human resources with varying levels of skills and experience, technical risks and uncertain feasibilities, potential synergies between efforts, etc.

- Deciding which combination of treatments to suggest to a cancer patient:
- observations include biomarkers, imaging scans, expert assessments, DNA sequencing, proteomics, metabolomics, etc.
- the complex environment includes the immune system, cellular regulatory networks, the diffusion of blood in and around tumors, etc.

In many of these cases, we need to integrate information from quantitative models across multiple scientific ontologies and scales (in addition to expert opinions where scientific data is sparse) in a computationally tractable way. In the hopes that machine learning may be increasingly able to provide computationally feasible answers to questions that are sufficiently well-defined, our focus here is on being able to make such boundary-transcending questions well-defined, by defining a sufficiently flexible and rigorous meta-ontology that can encompass all necessary ontologies and adapters between them.

The point is not to figure out how to do traditional-style modeling and simulation of complex systems (although that might be a component), the point is for any model (even when highly complex and multidisciplinary) to have a well-defined semantics to which efficient approximation methods can be rigorously compared. This is as opposed to the common state in current practice in which the best known models are only defined by their custom and informal implementations (especially when machine learning is involved), making it very difficult to reason about the model’s behavior on inputs with which it has never specifically been run. If your model is more than 100,000 lines of mathematically ill-specified Python code, it is hard to have confidence in what it is saying, especially when machine learning is involved in any capacity.

# 3 Litmus tests

In the first section, we expressed our problem somewhat vaguely. To a certain extent, this is the level of detail to which we know our problem, in that our technical approaches may fail, and we would only have these goals to fall back on. However, we have some litmus tests that would indicate we are on the path to a “good” solution.

First of all, David Jaz Myers has constructed a “framework of dynamical systems frameworks” (Myers 2022), which he calls “dynamical systems doctrines”, and we expect that a good solution to our problem should fit within one or more dynamical systems doctrines. The formal definition is technical, but informally, a dynamical system doctrine answers the questions

- What is a system?
- What does it mean to compare systems?
- What does it mean to compose systems?

When these questions have been answered in a suitably precise manner, the systems under study form a category theoretic structure that has a variety of nice properties, and can be used to organize the study of the systems at hand.

More technically, for example, a jump-drift-diffusion equation models a system with state space \mathbb{R}^n whose behavior is driven by a combination of

- Jumps: discontinuous movement from one point to another point
- Drift: continuous movement following a vector field
- Diffusion: continuous movement that is Brownian in character

This covers a large class of “stochastic ODEs”. We now make the following conjecture.

Secondly, models are useful to us in as much as they help us make decisions about the world. And in order to choose between different actions, we need to be able to evaluate *counterfactual interventions*. The philosophy of causality is complex, but from a mathematical perspective, we hope to model and subsume popular framings for causality such as Pearl’s do-calculus (Pearl 2009) (see also (Hitchcock 2018), and likely building on the categorical notion of d-separation developed by (Fritz and Klingler 2023)).

This conjecture seems plausible because dynamical systems doctrines have a good notion of control, and “observations and actions” are a central part of control. Moreover, dynamical systems are “intensional” systems: because their state spaces are decomposable, they have a native notion of interventional query (one can factor the state space, force one factor to a certain value, and resume time-evolution).

This is more speculative, but it is the thrust of the direction we want to aim for: constructing a “big tent” in which all such models can be “compatible”.

# 5 Conclusion

What we have outlined in this post is not just a series of conjectures or a simple project outline, but a full research program, which could supply decades of work across a large community of mathematicians, scientists, and engineers. However, that is not to say that we will only see benefits from this research program after decades of work. We expect that progress can be made in many ways along these lines, but more importantly we see that progress *is already* being made, throughout both the applied category theory and alignment communities, as evidenced by our bibliography.

Finally, there are plenty of mathematical problems that need to be solved in order to lay the foundations that we need for this program. But the larger challenge is to integrate all of the progress being made, not into a seamless edifice, but into a cohesive patchwork whole, where each abstract concept learned is exploited in as many ways as possible across all domains. This efficient use of concepts is demanded by the twin requirements that it is possible for humans to use the fruits of this program, and that this program has a truly *global* scope. If you have not yet been convinced of the use of category theory, then hear this: Category theory is the only conceptual technology and intellectual tradition of sufficient maturity and breadth to have a chance of solving this larger challenge.

# 6 Appendix: Some initial threads of research

The general hope here is that by suitably clever foundations, we can relegate some of the tricky analysis necessary in dealing with large state spaces into some (large) constant amount of work, which will then allow us to work synthetically when we go to actually model systems. However, this constant amount of work still has to be done; we need to find “well-behaved” categories to have semantics in. In this section, we talk about some approaches we hope would lead to such categories.

## 6.1 Von Neumann algebra approach

One of the original applications of category theory was algebraic geometry, where category theory allowed the duality between algebra and geometry to be written down explicitly with the identification of the category of affine varieties (spaces described by the zeros of polynomials) with the dual to the category of commutative rings. Since then, many other duality results have been discovered, as tabulated at the nlab (nLab authors 2023). We hope to use similar ideas to get a handle on some of the challenges of stochastic dynamical systems in category theory, using functional analysis.

The direct connection between stochastic dynamical systems and functional analysis is that jump-drift-diffusion systems are often modeled by Markov semigroups acting on function spaces (Applebaum 2019). In this setting, the Hille-Yosida theorem tells us how to write down the “derivative” of such a system as an unbounded operator. With an eye towards formalizing this, we have a couple conjectures related to this setup.

Commutative von Neumann algebras are well-studied objects within functional analysis, known to have deep connections to probability theory. Surprisingly, it seems that although the basic theory of von Neumann algebras and the analogies to probability theory have been known for a long time, a formal proof that the category of commutative von Neumann algebras is actually dual to some category of measurable spaces was not available until quite recently (Pavlov 2022) (specifically, the category of compact strictly localizable enhanced measurable spaces). We conjecture that there should be a generalization of the theorem proved in that paper.

This can be though of as a category-theoretic version of the Riesz-Markov-Kakutani theorem. This states that for a compact Hausdorf space X, unital positive linear maps between L^\infty(X) and \mathbb{C} correspond to probability measures on X. Then more generally, unital positive linear maps from L^\infty(X) to L^\infty(Y) correspond to Markov kernels from Y to X.

We further conjecture that one of the tensor products (there are several) on von Neumann algebras gives an equivalent symmetric monoidal structure on \mathsf{PCVNA}^\mathrm{op} to the one given by the cartesian product on the Kleisli category of the Giry monad.

Then we have a Markov category structure on \mathsf{PCVNA}^\mathrm{op}, because each von Neumann algebra has a multiplication A \otimes A \to A, which behaves like the diagonal in the dual category. A map A \to B is deterministic if and only if it preserves multiplication, which corresponds to the result that a homomorphism of von Neumann algebras L^\infty(X) \to L^\infty(Y) is equivalent to a measurable map Y \to X.

The nice thing about this result is that we did not have to “add” anything to the von Neumann algebras; we just had to relax the notion of morphism to not preserve multiplication.

Finally, we conjecture that we can again relax the notion of morphism, to be only convex instead of linear, and then recover a category similar to the Kleisli category of an infradistribution monad (see (Diffractor and Kosoy n.d.) for an introduction to the idea of infradistributions, and (Topos Institute 2023) for the relation to convex maps C(X) \to \mathbb{R}_{\geq 0}).

Apart from the technically interesting merits of this thread of research, this also leads us to an “observables-first” viewpoint on dynamical systems, which we explore more in the next thread.

## 6.2 Feynman–Kac approach

Functional analysis has some nice results that lend themselves to category theory, but the theory of unbounded operators is both difficult and subtle. Fortunately, we are typically *actually* working with probabilistic observables on some smooth space, which gives us some handles to make our problem more well-behaved. To stochastic processes on manifolds, there are associated Feynman-Kac PDEs (Owen learned this from (Spreij 2023), the original theory goes back to (Kac 1949)). Once we are in PDE land, we have an opportunity to use synthetic differential geometry, which is known to play nicely with category theory. (Lavendhomme 1996).

More concretely, a well-behaved way of dealing with PDEs on manifolds is via the theory of *derivations* on C^\infty rings. Roughly speaking, C^\infty rings are an axiomatization of the properties that the ring of smooth functions X \to \mathbb{R} has for a manifold X.

If R = C^\infty(X) for X a manifold, then first-order derivations on R are in bijective correspondence with vector fields on R. We can then compose first-order derivations as operators and get higher-order spatial derivatives. This allows us to do the second-order differential equations necessary for the evolution of probability distributions in an “algebraic” way.

The disadvantage is that this is not so “natively” probabilistic, and thus might not play well with, for instance, Bayesian inference (or infra-Bayesian inference). Additionally, the classical theory of Feynman-Kac PDEs does not cover jump processes; there may be an extension to the case of jump processes but I’m not aware of it.

In both this approach and the previous approach, there have been interesting insights made by moving to the “algebra of observables” point of view. This point of view is also desirable because symbolic computational systems also benefit from taking an “algebraic” as opposed to a geometric point of view, and as we hope to eventually implement these systems, the fact that the theory meshes well with computational practice is important.

## 6.3 Epistemically generalized behavioral approach

The **behavioral approach to systems theory** (Willems and Polderman 1997) models systems with observable states X as subsets of trajectory space C(\mathbb{R}^+,X)\rightarrow \mathbf{2}. This reflects a “nondeterministic” approach to uncertainty: a subset is an epistemic state which picks out certain trajectories as possible. Given a way of relating two interfaces X \xrightarrow{f_X} R \xleftarrow{f_Y} Y, one can compose systems by pullback.

Probabilistic theories do not have such a simple composition, because of the “disintegration” difficulties around normalization after conditioning on equality.

## 6.4 Generalized spaces for probability

There is a general approach to building nice categories of spaces. What you do is you start with the nice examples of the type of space you care about. For instance, if you want smooth spaces, these are open subsets of \mathbb{R}^n. If you want spaces locally described by polynomial equations, these are affine varieties. If you want spaces suitable for probability theory, these are standard Borel spaces. Then the category of your “nice spaces” with suitable maps may not be nice, in that it might not have all the colimits you want. In order to “complete it”, you put a suitable Grothendieck topology on it, and then consider the corresponding category of sheaves, or possibly the subcategory of concrete sheaves.

This is the approach taken for diffeological spaces, schemes, and quasi-Borel spaces.

We have been thinking about using this approach to make a category of spaces that are suitable for both stochastic behavior and continous behavior. More technically, a category of spaces with both a tangent structure and a Giry monad.

One approach is to work with spaces modeled after convex subsets of \mathbb{R} ^n, which we call convological spaces in analogy to diffeological spaces. Convex spaces have a natural notion of “tangent cone”, which models the fact that convex spaces often have boundaries, so you might only be able to go in one direction from a point and not in the reverse of that direction. Moreover, convex spaces are deeply connected to probability theory, because the space of probability distributions on a measure space has natural convex structure. However, more work is needed to understand exactly what the properties and use of these convological spaces are.

Another approach, which is not due to us, is to work with what are called PAP spaces, which are modeled after subsets of \mathbb{R}^n that are the support of analytic functions (Huot et al. 2023). These have been used to provide semantics for probabilistic programs which are also differentiable, which is needed to run inference algorithms like Hamiltonian Monte Carlo on them.

Building nice categories of spaces via sheaves is such a general technique that even if PAP spaces or convological spaces turn out to not be useful for what we want to do, it is likely that some category of spaces defined in this way will show up at some point.

## References

*Semigroups of Linear Operators: With Applications to Analysis, Probability and Physics*. London Mathematical Society Student Texts. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108672641.

*Journal of Machine Learning Research*24 (46): 1–49. http://jmlr.org/papers/v24/22-0916.html.

*Transactions of the American Mathematical Society*65 (1): 1–13. https://doi.org/10.1090/S0002-9947-1949-0027960-X.

*Algebraic Methodology and Software Technology*, 307–21. Springer. https://doi.org/10.1007/bfb0000479.

*Basic Concepts of Synthetic Differential Geometry*. Vol. 13. Kluwer Texts in the Mathematical Sciences. Boston, MA: Springer US. https://doi.org/10.1007/978-1-4757-4588-7.

*Software and Systems Modeling*22 (2): 495–520. https://doi.org/10.1007/s10270-023-01092-7.

*Categorical Systems Theory*. https://github.com/DavidJaz/DynamicalSystemsBook.

*Journal of Pure and Applied Algebra*226 (4): 106884. https://doi.org/10.1016/j.jpaa.2021.106884.

*Causality*. Cambridge University Press. https://doi.org/10.1017/cbo9780511803161.

*Temporal Type Theory: A Topos-Theoretic Approach to Systems and Behavior*. Progress in Computer Science and Applied Logic. Springer International Publishing. https://books.google.co.uk/books?id=bs6FDwAAQBAJ.

*Stochastic Integration*. Universiteit van Amsterdam. https://staff.fnwi.uva.nl/p.j.c.spreij/onderwijs/master/si.pdf.

*Nisan Stiennon: Metagames and Imprecise Probability*. https://www.youtube.com/watch?v=2HF9wJ88shM.

*Introduction to Mathematical Systems Theory: A Behavioral Approach*. Vol. 26. Springer Science & Business Media.