Introducing CatColab

technology

categorical logic

double categories

Topos

Author

Kevin Carlson

Published

2024-10-02

Abstract

Today we’re excited to announce the first pre-alpha release of our new software CatColab 0.1: Hummingbird. CatColab is software for making models of the world together.

Today we’re excited to announce the first pre-alpha release of our new software CatColab 0.1: Hummingbird. CatColab is software for making models of the world together. We aim to build a tool that can guide your thinking across different domains from unstructured notes all the way to an elaborate model composed of contributions from a large team and quantitative enough to allow for numerical simulations.

Your model is a formal object—it can be visualized, but as an emergent view on its more fundamental, mathematical semantics. You build your model in your preferred domain-specific logic, which you can think of as a kind of programming language, the first few of which are available for experimentation today. Depending on the logic, there are various well-specified analyses that let you compute things about your model, such as finding potentially powerful feedback loops in a social network or simulating the development of an ecosystem over time.

Soon, we’re going to implement capabilities to migrate models among various logic, so that you can communicate with your friends and collaborators without everyone being forced to settle on a single language. CatColab aims to be your universal translator for expressing precise structural thoughts about our world.

1 Why are we building this?

Here at Topos, one of the main things we do is put our and our friends’ mathematical ideas into software, which so far has mainly meant the AlgebraicJulia scientific computing ecosystem. Roughly speaking, the vision of the AlgebraicJulia project has been to write libraries implementing applied category theory papers relevant to scientific computing and make these tools available to working scientists. AlgebraicJulia continues to be the basis of various ongoing projects; our differential equations ecosystem and our agent-based modeling system are two of the most happening parts of the project at the moment.

On the other hand, we’ve become convinced more recently that:

Most AlgebraicJulia libraries require learning quite a bit of the category-theoretic background to use (along with Julia and general scientific computing!)
This learning curve is leaving the audience for our software substantially constrained from what it could be.

In particular, while one can imagine a world in which people developed more approachable apps based on AlgebraicJulia that a third circle of users could engage with without either category-theoretic or programming expertise, with the highly honorable exception of ModelCollab, we’ve found that the kind of people who might do that are largely excluded by the very learning curve they might be helping others to circumvent.

So, we’ve decided to bite the bullet and build out the first approachable, useful, flexible killer app of category theory ourselves.

2 What can it do right now?

CatColab Hummingbird includes five logics for modeling:

Ologs
Schemas
Regulatory networks
Causal loop diagrams
Stock-and-flow diagrams.

Here’s an example of Hummingbird in use building an olog of its own currently available logics.

As seen above, you click to instantiate a new notebook cell, which could be of various types depending on your current logic, then fill in the boxes, making sure that the domain and codomain of arrow-type boxes match up with objects (eg species, entities, types, depending on the logic) you’ve already declared. After clicking the button in the upper-right (next to the help button) you’ll find the Analysis tab, where you can visualize your model and, for some logics, run more complex analyses.

Ologs (ontology logs) are well-known to applied category theorists from Spivak and Kent’s 2011 paper on the subject; they can represent a class of things with relationships pointing from one thing to another, which at Topos we’re likely to think of as a category.

Schemas are a formulation of database schemas that upgrade ologs to differentiate the database tables from its columns. Schemas are the basis of acsets (attributed C-sets), the data structure that form the foundation of AlgebraicJulia, developed by Evan and Owen in collaboration with James Fairbanks at Florida.

Regulatory networks are used in molecular biology to explain how various molecules interact to decide which genes will be expressed by bits of RNA transcribed from our DNA. Evan and various collaborators have written about their category theory here. They’re basically graphs with some of the edges flavored positive, or “promoting”, and some negative, or “inhibitory”, though see the math section below for an explanation of how they’re more than just graphs to CatColab.

Causal loop diagrams come from a completely different field—systems dynamics—and can be used to study, for instance, the qualitative outcomes of a carbon credit system (see below). One reason to have them in Hummingbird is that, mathematically, they’re no different than regulatory networks! The backend of the software sees no difference between these logics, but we’re able to have the front-end present them differently. This shows how spending time developing the mathematical underpinnings of this kind of software becomes tangibly useful: each bit of theoretical work can pay off quite disproportionately in simplifying the implementation later on.

A causal loop diagram modeling various effects of a cap-and-trade system

Finally, stock-and-flow diagrams are the acme of systems dynamics modeling, a rich logic for epidemiological and population modeling, among many other areas of application. They posit a population of “stocks”, such as populations of workers in various geographic locales, and “flows”, such as workers moving to neighboring areas, together with links that describe how the level of certain stocks can influence the rate of certain flows. Sophie, Evan, along with collaborators John Baez, Xiaoyan Li, and Nate Osgood have also written about stock-and-flow diagrams here.

A stock-and-flow SEIR model including ICU and vaccination populations

Today, you can build models in these five logics, including collaboratively in real-time by simply sharing the URL of your model (which will end in a complicated secret hash) with anybody you want to collaborate with. By opening the analysis sidebar, you can add a visualization of your model in any logic. In causal loop diagrams, you can get a taste of further possibilities for more semantically rich analyses, including searching for feedback loops and also simulating the evolution of a system with a population at each node of the network, flowing around according to parameters you choose for the strength of each edge.

3 What will it be able to do in the future?

Lots! First of all, while we encourage you to play around with Hummingbird right now, we’re quite aware that it’s still missing some key quality-of-life features. For instance, models are being saved to a database, but we haven’t given you much ability to interact with that database yet, except by remembering your model’s URL; you’ll soon be able to navigate a library of pre-built models as well as your own saved models, duplicate and modify from those starting points, and import pre-built models as pieces of the bigger one you’re working on. This lack of permissioning structure on the database backend is the only reason we aren’t linking to live examples in this blog post; we don’t yet have a way to send you an example that isn’t editable, so another reader might blow it up before you got to it!

Related to importing models, we will soon have capabilities to compose models in various ways: you can image overlapping models of interacting subsystems to get a super-model, or replacing a node of a model with a mini-model refining that node. (Think of replacing the step “make lemon meringue pie” in your plan for preparing a dinner with the actual recipe.)

Right now, CatColab essentially has two levels: you choose a logic, and then build a model in that logic. A key step forward in functionality will be adding the third level of instances of models. An instance of a model, roughly speaking, assigns an actual set of data to each element of the model, sets related according to the various paths between elements. In the case of schemas, such instances are precisely the acsets we mentioned above, so that databases are a special case of instances!

3.1 Translations

Perhaps an even more fundamental next step to CatColab’s vision will be the ability to move between logics: migration or translation. We are not claiming that you’ll be able to translate faithfully between different logics; people who speak in different languages have different things they’re able to say and understand. But the mathematical formalism behind CatColab does permit migrations that capture certain pieces of a model in one language in another language. For a very simple concrete example, imagine a signed stock-flow diagram, where each flow is marked with + or - depending on whether increasing the stock at its source tends to increase, respectively, decrease, the flow’s rate. (We haven’t implemented this logic yet but it’s a simple modification of the basic stock-flow.) A signed stock-flow diagram has an underlying causal loop diagram given essentially by throwing away the links, and in the opposite direction there are two universal ways to upgrade a causal loop diagram to a stock-flow diagram, either by linking every flow to every stock or by adding no links at all.

These kinds of operations are related to the functorial data migrations introduced by David, though mathematically they live one categorical level farther up. With a rich enough network of translations, we can treat a whole family of logics as if it formed one big happy super-logic, in which various people can use the pieces they prefer. Another use case for translations that I’m particularly excited about is letting you upgrade the complexity of your logic as your model gets bigger and richer. Here’s a just-so story about how you might use these for a basic daily activity. The character first uses migrations to upgrade the model complexity, and only upon wanting to collaborate goes to the deeper level of modifying the logic:

Kevin wants a to-do list app, but without too much bloating. He starts with a plain list of to-dos and check-boxes; this is an instance of a simple schema:

A schema describing a basic to-do list as a set of items with a true/false “done” attribute and a string “name” attribute.

After using this simple app for a while, Kevin finds several features missing; he knows these upgrades are relevant to him, because he’s the user for this app! He migrates his to-do list to the following richer schema, a one-click change once the schema is written:

A more complicated to-do list schema with fields for due dates and urgencies.

Later still, Kevin and Kris are working on a collaboration and want to share some to-dos. Kris also has a CatColab-based to-do list app of his own, with a similar but distinct schema. There is no need to refine the two schemas to match up, adding features each ‘K’ might not want for himself in the process!

Instead, they migrate their apps to a new logic, the logic whose models are two schemas with a mapping (formally, a functor or even a profunctor) between them in each direction. So Kevin can interact with Kris’s to-do list in Kevin’s language, for instance by fusing Kris’s two different kinds of items together, throwing away or adding new fields, or performing more complicated database queries to select exactly the info he needs from Kris’s list.

Finally, we’re going to need a lot more logics! Imagine logics for any kind of graph-like or category-like data structure you can think of. Petri nets, monoidal categories and multicategories (and their instances!), graphs with more complicated coloring structure than just signed edges, or even with variable coloring structure, data structures with a temporal evolution built in, are all coming up next; pending ongoing research, we hope before long to be able to produce a logic whose models include things like monoidal closed categories, at which point we’re really specifying what a PL researcher would recognize as a full programming language. In fact, there is a logic for logics, and you had better believe we’re going to implement it, so that the user can customize their own environment even without understanding the math.

4 What’s up with the math?

Evan, in collaboration with former Topos summer research associate Michael Lambert, has been working really hard on double category theory the last couple of years, following on from a relatively recent growth of the field by a group centered around Robert Paré. In an unusually crisp case of software growing out of fresh basic research, CatColab was actually unimaginable until these papers came out. A logic, as I’ve been vaguely referring to them, is nothing more than a double theory in an appropriate doctrine in Patterson and Lambert’s terms. So a simple logic is a double category, a tabulator logic is a double category theory with tabulators, a cartesian logic is a cartesian double theory, and so on.

In this sense, doctrine is a zeroth level of the CatColab design, beneath even “logic”, but we don’t expect to expose doctrines to the user any time soon. So far we have simple logics and one tabulator logic: the links in stock-flow diagrams, pointing between an “object” (stock) and an “arrow” (flow), use tabulators. Cartesian logics are the next big doctrinal step, since they’ll allow for arrows with multiple inputs, critical for so many applications. We currently expect to use an appropriate doctrine of compact closed double categories to get our hands on things like cartesian closed categories as models; that’s still very much ongoing research.

A model, mathematically, is a lax double functor to the double category of sets, functions, and spans. These gadgets are studied very thoroughly in the papers cited above. Such a model, because of how lax functors work, actually always contains a category for every object in the theory, not just a set; the models always form a double category of their own.

The tight arrows in this double category are the simplest kind of morphism of model, but if you aren’t clued in on double category theory, you’ll likely guess they’re simpler than they actually turn out to be! An olog, for instance, looks like just a graph, but the morphisms when ologs are seen as models of the trivial double theory are morphisms between the corresponding free categories. This is how we can do motif-finding for causal loop diagrams: a reinforcing loop of any finite length is a homomorphism from the single CLD shown below.

A loop from A to B and back, with both links marked positive

This phenomenon basically arises because a category is a lax double functor from the point to the double category of spans, analogously to the more familiar phenomenon that a monad in a bicategory is a lax 2-functor from the point. This ability to get maps of objects with compositional structure from such a short axiomatization (I promise, it’s really quite efficient!) is at the very heart of the power of this approach.

The loose arrows in the double category of models of a theory, called bimodules between models, are certain families of profunctors between those categories arising as values of the models the bimodule is between. The instances mentioned above are a special kind of bimodule that hasn’t been written up yet. Categories of instances of models of simple double theories are presheaf categories; categories of instances of models of cartesian double theories are algebraic categories; and we hope this charming pattern continues. If you’re curious there will soon be a provisional definition in the developer documentation.

Any morphism of double theories will give a pullback functor between double categories of models, the simplest case of a translation. We expect such pullbacks to have left adjoints in more or less all cases and sometimes right adjoints, and those adjoints will give interesting translations analogous to the more familiar \Sigma- and \Pi-migrations of categorical database theory. There may be a more modern approach to such migrations involving double profunctors, but that remains to be seen.

5 How about the software architecture?

5.1 Design

The first thing you’ll notice when interacting with CatColab is the notebook-style interface. You build a model by defining its elements, each in a different notebook cell. The visualization, if you want one, happens later in the analysis tab, and you can’t interact with the model directly via the visualization. The interface is thus somewhere between a text editor and a graphical editor; it’s closer to a text editor, but you can’t write just any old text that then gets checked by a compiler. Rather, you’re guided to work within the bounds of the current logic, editing the structure of the model as directly as we’re able to define the interface to do. We thus think of CatColab’s interface as a structure editor.

We think this is the right synthesis of the antithesis between writing plain text in a programming language, which has proven over 70 years to be too steep a learning curve for most people, even quite a few scientists, to readily climb, and designing models graphically. The latter doesn’t scale well as the logic becomes more and more complex or as we want to meaningfully track structural changes (i.e. not modifications in which pixels contain an edge.) It also takes a whole lot of programmer time to build a good graphical editor, and they have to be customized almost from scratch for each new logic.

We think the interface should be functorial in the language, and CatColab comes pretty close to realizing this ideal: very little in the model interface has to be customized each time we introduce a new logic.

5.2 Tech stack

We’re writing the category theory logic for CatColab in Rust, compiled to WebAssembly in a neat little package that runs in the client. Most of the front-end is currently TypeScript via Solid, although we can dream of the day when it’s all Rust-WASM. The back-end database is just a simple postgres setup which stores JSON blobs snapshotting your model and analyses; there’s already some ability to track a model through its versions in the back-end that’s not yet exposed to the front-end. Real-time collaboration is enabled by Automerge.

6 Thanks for reading!

If you’re interested, curious, excited, scared, amused, bored, inspired, or have any other reaction to this post or, better yet, to your initial toying around with CatColab you’d like to share, feel free to leave me a comment below or drop me a line. If you like to code, you’re very welcome to come check out (pun intended) our repo. Talk soon!