Double categories for databases and knowledge representation

Evan Patterson
(joint with Michael Lambert)

September 6, 2022

Background

Category theory for knowledge representation:

  1. Functional ologs
  2. Relational ologs
  3. Functional and relational ologs?

Databases vs knowledge representation

Databases and knowledge representation (KR) are

  • very distinct as fields of research and practice nowadays
  • essentially the same mathematically

Distinguishing between schemas and instance data,

  • databases have inexpressive schemas but handle large data
  • KR systems have large, expressive schemas but sparse data

This matters for implementation! But the unity is more important.

Functional ologs

Ontology logs (ologs), introdued by Spivak and Kent, are

  • finitely presented categories
  • or, more generally, sketches with limits and/or colimits

In an olog:

  • objects are entities
  • morphisms are functional relations
  • commutative diagrams are facts

Example fact in an olog

I’ll call these functional ologs.

Relational ologs

Because KR systems like OWL are relational, I proposed relational ologs:

  • finitely presented bicategories of relations
  • based on seminal work by Carboni and Walters (1987)

In a relation olog:

  • objects are entities
  • morphisms are relations
  • 2-morphisms are implications (facts)

\(\Downarrow\)

Example fact: transitivity of ancestor relation

Comparison of ologs

Functional ologs Relational ologs
Structure Category \(\cat{C}\)
(simple)
Bicategory of relations \(\bicat{B}\)
(not so simple)
Instance data \(\cat{C} \to \Set\) \(\bicat{B} \to \RelBi\)
Functions ✅ (via “maps”, not first class)
Relations ❌ (only via spans,
cannot compose)
Internal logic ❌ (only via sketches) regular logic
Data migration

Functional and relational ologs?

Can we get the best of both worlds by working in a double category?

Data lives in double category \(\Rel\):

  • objects are sets
  • arrows are functions
  • proarrows are relations
  • cells are implications

Generic cell in \(\Rel\):

\(R(x,y) \implies S(f(x), g(y))\) for all \(x \in X, y \in Y\)

Double categories of relations (Lambert 2021)

  • make both functions and relations be first-class citizens
  • have technical advantages compared to bicategories of relations

Double categories

To make a KR system, we need to understand the abstract structure of \(\Rel\):

  • double category
  • cartesian double category
  • equipment
  • tabulators

Double categories in the abstract

Briefly, a double category \(\dbl{D}\) is a (pseudo)category internal to \(\Cat\):

\[ \dbl{D}_1 \overset{s,t}{\rightrightarrows} \dbl{D}_0, \qquad \dbl{D}_1 \times_{\dbl{D}_0} \dbl{D}_1 \xrightarrow{\odot} \dbl{D}_1, \qquad \dbl{D}_0 \xrightarrow{\id} \dbl{D}_1 \]

  • objects of \(\dbl{D}_0\) are objects of \(\dbl{D}\), denoted \(x, y, z, \dots\)
  • morphisms of \(\dbl{D}_0\) are arrows of \(\dbl{D}\), denoted \(f: x \to y\)
  • objects of \(\dbl{D}_1\) are proarrows of \(\dbl{D}\), denoted \(m: x \proTo y\) when \((s(m),t(m)) = (x,y)\)
  • morphisms of \(\dbl{D}_1\) are cells of \(\dbl{D}\): shown below when \((s(\alpha),t(\alpha)) = (f,g)\)

Cartesian double categories

A double category \(\dbl{D}\) is cartesian if the diagonal and terminal double functors

\[ \Delta: \dbl{D} \to \dbl{D} \times \dbl{D}, \qquad !: \dbl{D} \to \dbl{1} \]

have right adoints in \(\Dbl\). Essentially, this means that

  • both categories \(\dbl{D}_0\) and \(\dbl{D}_1\) have products
  • compatible with category structure, e.g., if \(m: x \proTo y\) and \(m': x' \proTo y'\), then \(m \times m': x \times x' \proTo y \times y'\)

\(\Rel\) as a cartesian double category

Recover the usual cartesian product of sets and relations:

\[ (R \times R')((x,x'),(y,y')) \iff R(x,y) \wedge R'(x',y'). \]

\(\Rel\) as an equipment

A crucial feature of \(\Rel\) is that functions (arrows) can viewed as relations (proarrows) in two different ways:

  • A function \(f: X \to Y\) has a graph \(f_!: X \proTo Y\): \[ f_!(x,y) \iff f(x) = y \]
  • It also has a cograph \(f^*: Y \proTo X\), the converse relation

Moreover, these come with certain cells:

Equipments

An equipment is a double category such that

  • for every arrow \(f: X \to Y\), there are proarrows
    • \(f_!: X \proTo Y\), the companion of \(f\)
    • \(f^*: Y \proTo X\), the conjoint of \(f\)
  • together with four cells as on previous slide
  • which satisfy certain equations (omitted).

Examples: double categories of…

  • relations
  • spans
  • profunctors
  • bimodules over rings
  • \(\mathcal{V}\)-matrices
  • structured cospans

Characterizing equipments

Theorem (Shulman 2008)

Let \(\dbl{D}\) be a double category. The following are equivalent:

  • \(\dbl{D}\) is an equipment (has companions and conjoints)
  • \(\langle s,t \rangle: \dbl{D}_1 \to \dbl{D}_0 \times \dbl{D}_0\) is a fibration
  • \(\langle s,t \rangle: \dbl{D}_1 \to \dbl{D}_0 \times \dbl{D}_0\) is an opfibration

The last two statements mean that niches and coniches have universal fillings:

Later, we’ll see how to use restriction and extension for data manipulation.

Cartesian equipments

A cartesian equipment is a cartesian double category that is an equipment.

Examples:

  • \(\Rel\)
  • \(\Prof\)
  • \(\Span(\cat{S})\) for finitely complete category \(\cat{S}\)
  • \(\Mat{\mathcal{V}}\) for distributive category \(\mathcal{V}\)

Comparison with cartesian bicategories

Carboni, Walters, et al defined cartesian bicategories

The horizontal bicategory of a locally posetal cartesian equipment is a cartesian bicategory (Lambert 2021).

But definition of cartesian equipment is much simpler because it involves only universal properties.

Data munging in double categories

  • We can do “data munging” or “queries” inside a cartesian equipment
  • Examples are inspired by the TV show Stargate SG1

Example data

Suppose our KB, a cartesian equipment \(\dbl{D}\), includes the 4-ary relation

Instance data, a cartesian double functor \(\dbl{D} \to \Rel\), can be shown as a table:

date location purpose team
1998-02-06 P41-771 search & rescue SG3
1998-07-31 Cimmeria assist Cimmerians SG1
1999-01-02 P3R-272 investigate inscriptions SG1
1999-10-22 Netu search & rescue SG1
2004-08-06 Tegalus negotiation SG9

Selecting columns

To select columns, extend along projection maps.

Query:

Result:

date team
1998-02-06 SG3
1998-07-31 SG1
1999-01-02 SG1
1999-10-22 SG1
2004-08-06 SG9

Filtering rows

To filter rows, restrict along an individual.

Given:

Query:

Result:

date location purpose team
1998-07-31 Cimmeria assist Cimmerians SG1
1999-01-02 P3R-272 investigate inscriptions SG1
1999-10-22 Ne’tu search & rescue SG1

Inner joins

To compute an inner join, say of

along \(B\), restrict the product \(p \times q\) along the diagonal \(\Delta: B \to B \times B\):

Inner join: example data

Suppose have relations

with data

person skill
Hammond command
Kovacek law
Maybourne chicanery
Morrison combat
Rothman archaeology
person team
Kovacek SG9
Morrison SG3
Rothman SG11

Inner join: example

Query:

Result:

person skill team
Kovacek law SG9
Morrison combat SG3
Rothman archaeology SG11

Comparison with \(\cat{C}\)-sets

In categorical databases based on functors \(\cat{C} \to \Set\):

  • querying and data migration must be done outside of \(\cat{C}\)
  • because a bare category \(\cat{C}\) has no internal logic

As examples show, in double-categorical databases \(\dbl{D} \to \Rel\):

  • data operations can be done inside \(\dbl{D}\), using its internal logic
  • functorial semantics gives the result in \(\Rel\)

Thanks!

In future work, we hope to implement double categories of relations as

  • a knowledge representation system
  • an in-memory database, extending implementation of \(\cat{C}\)-sets in Catlab.jl

Further reading

Blog post by Michael Lambert: Data Operations are Functorial Semantics

  • types versus propositions
  • tabulators in double categories

References

Aleiferi, Evangelia. 2018. “Cartesian Double Categories with an Emphasis on Characterizing Spans.” PhD thesis, Dalhousie University. https://arxiv.org/abs/1809.06940.
Carboni, Aurelio, G. Max Kelly, Robert F. C. Walters, and Richard J. Wood. 2008. “Cartesian Bicategories II.” Theory and Applications of Categories 19 (6): 93–124. http://www.tac.mta.ca/tac/volumes/19/6/19-06abs.html.
Carboni, Aurelio, and Robert F. C. Walters. 1987. “Cartesian Bicategories I.” Journal of Pure and Applied Algebra 49 (1-2): 11–32. https://doi.org/10.1016/0022-4049(87)90121-6.
Lambert, Michael. 2021. “Double Categories of Relations.” https://arxiv.org/abs/2107.07621.
Patterson, Evan. 2017. “Knowledge Representation in Bicategories of Relations.” https://arxiv.org/abs/1706.00526.
Shulman, Michael. 2008. “Framed Bicategories and Monoidal Fibrations.” Theory and Applications of Categories 20 (18): 650–738. http://www.tac.mta.ca/tac/volumes/20/18/20-18abs.html.
Spivak, David I. 2012. “Functorial Data Migration.” Information and Computation 217: 31–51. https://doi.org/10.1016/j.ic.2012.05.001.
Spivak, David I., and Robert E. Kent. 2012. “Ologs: A Categorical Framework for Knowledge Representation.” PloS One 7 (1): e24274. https://doi.org/10.1371/journal.pone.0024274.