When you light up, I light up

A dynamical monoidal category of Hebbian learners

polynomial functors
dynamical systems
applied category theory
Authors
Published

2022-10-28

Abstract

“Cells that fire together, wire together.” This slogan for Hebbian learning evokes a strategy for reorganization in which an individual strengthens their connection with another if they have similar behavior. Here we give a mathematical account of Hebbian learning as a dynamic monoidal category.

1 Introduction

Here’s how a lot of my friendships begin. Someone at a party is talking about what it means for a calculator to compute, and I think “Hey, I love thinking about dynamics and computation. I’d like to talk to that person more”. Or the girl in my math class pulls out her knitting project when I was also wondering if it would be okay to knit during seminar. Before long we’re sharing guidance and opinions not just about math and knitting, but about Masterpiece theater, relationships, and metaphysics. I recently asked my friend Eliana how she knew that she wanted to be friends with me, and she summed it up like this, “when you light up, I light up.”

Hebbian learning is a neuroscientific theory of how neurons develop their interactions that echoes this social hypothesis of how people develop their friendships. The theory is captured by Donald Hebb’s famous quote “Cells that fire together, wire together”, which refers to what neuroscientists call spike-timing-dependent plasticity. Roughly speaking, suppose neuron A and neuron B are connected; if neuron B fires very soon after neuron A fires, then neuron B strengthens its connection to neuron A. The intuitive idea is that neurons A and B understood the situation similarly. Yet neuron A figured it out first and caused neuron B to figure it out as well. Hence, neuron B sees neuron A as a good guide and wants to hear more from neuron A in the future. In the social analogue, Kate and I both saw the potential for knitting in seminar. Yet she started doing it first, giving me the courage to also pull out my project. So I see her as a good guide and want to hear more from her in the future. I want to strengthen our connection. The above explanation is far from enough to capture what science knows about spike-timing-dependent plasticity, but it is enough to understand our mathematical account of it.

In (Shapiro and Spivak 2022) the authors introduce a new notion of dynamic categorical structure, including dynamic categories, dynamic operads, and dynamic monoidal categories. These structures exhibit responsive cohesion in the sense that the morphisms—roughly speaking, the wiring diagrams or interaction patterns between boxes—change based on what flows between those boxes. Other than degenerate cases, we know of only four examples of this structure to date: deep learning, prediction markets, strategic games, and Hebbian learning. The first two were explained in op. cit., strategic games was explained by Brandon here, and in the remainder of this post we’ll explain the Hebbian learning case.

2 A dynamic monoidal category of Hebbian Learners

A dynamic categorical structure can be seen as forming a kind of bureaucracy. That is, it specifies:

  • The input-output interface of each worker.
  • What each manager obtains from her set of workers.
  • What workers hear from each other and from management.
  • A policy by which the above interactions are updated.

For example, in deep learning, each worker is capable of outputting a real number value and inputting a gradient. The workers send their numerical values “up the corporate ladder”. Each manager obtains an aggregate of the numerical values from her reports. This aggregate is a weighted sum of the reported values, where the weights are based on the various reputations—encoded as multiplicative weights—or trust-levels that the manager has for her reports. The workers hear nothing from each other, but feedback from management is relayed in the form of gradients. Finally, the policy for updating these interactions is to adjust the weights according to the gradient. In prediction markets, each worker can output elements of a set X (predictions) and confidence levels, and can input elements of X (actual results). The workers send predictions “up the corporate ladder”. Managers obtain an aggregate of these predictions, weighted according to the workers’ relative wealths. Workers again do not hear each other, but feedback from management is relayed simply as the correct answer to the prediction problem. Finally, the policy for updating these interactions is to update relative wealths according to Bayes’ rule.

Now we come to the new mathematics: a dynamic monoidal category of Hebbian learners. In Hebbian learning, the peer components communicate responsively with each other, i.e. where the degree to which different components “hear each other” is increased or decreased according to their relative response to input. The bureaucracy of Hebbian learners will have the following structure.

  • There is a worker p_N for each natural N\in\mathbb{N}; its interface is \mathbb{R}^N\times\mathbb{R}^N\mathcal{y}^N. That is, it outputs both its current and previous value, a pair (o_0,o_1)\in\mathbb{R}^N\times\mathbb{R}^N, and it inputs a single value i\in\mathbb{R}^N.
  • The manager will receive everything that her workers produce.
  • The workers hear everything that the manager says, added to a weighted sum of what their peers say.
  • The above interaction—in particular the weighted sum—is updated according to a “spike-timing-dependent plasticity” formula, given in (1).

Let’s make all of this precise, starting with the definition of a dynamic monoidal category (Definition 3.7 in (Shapiro and Spivak 2022)).

A screenshot of Definition 3.7 from (Shapiro and Spivak 2022)

A screenshot of Definition 3.7 from (Shapiro and Spivak 2022)

We construct a dynamic monoidal category with monoid (\mathbb{N}, 0, +) and interfaces p_N \in \mathbf{Poly} defined by p_N \coloneqq \mathbb{R}^N \times \mathbb{R}^N \mathcal{y}^{\mathbb{R}^N} for each N \in \mathbb{N}.

We interpret the positions (O_0, O_1) \in p_N(1) = \mathbb{R}^N \times \mathbb{R}^N as pairs where O_0 is the current output and O_1 is the previous output. The directions I_0 \in p_N[(O_0, O_1)] represent the current input to the system. We think of p_N as having N communication channels, where for i \in \{1, \ldots, N}, the real numbers I_0(i), O_0(i), and O_1(i) are respectively the current input, current output, and previous output on the ith channel.

A box representing the polynomial interface of a Hebbian learner

A box representing the polynomial interface of a Hebbian learner

For each M, N \in \mathbb{N} there is an an obvious isomorphism of polynomials p_M \otimes p_{N} \to p_{M + N}, which gives the monoidal structure on interfaces. The monoidal unit is p_0\cong\mathcal{y}.

With the interfaces in place, we are now ready to introduce the dynamics. For each pair M, N \in \mathbb{N} we define a [p_M, p_N]-coalgebra with states S_{M, N} \coloneqq \begin{cases} \mathbb{R}^{N \times N} & M = N\\ \emptyset & \text{else} \end{cases}.

The N \times N matrix W \in S_{N, N} defines a communication network between the N channels of the interface p_N. This communication network defines the interaction pattern—what the manager hears from her workers and what the workers hear from each other— and will fluctuate through a process inspired by Hebbian learning.

First, let’s specify how a communication network W \in S_{N, N} defines an interaction pattern. Formally, we give a map of polynomials p_N \to p_N. On positions (O_0, O_1) \in \mathbb{R}^N \times \mathbb{R}^N, it is the identity. This is what we mean by “the manager will hear everything that that her workers produce”. On directions, it sends I_0' \in \mathbb{R}^N to I_0' + WO_0. This is what we mean by “The workers hear everything that the manager says (I_0'), added to a weighted sum of what their peers say (O_0).” We can picture this wiring pattern like this:

A wiring diagram representing the interaction pattern specified by a communication network W

A wiring diagram representing the interaction pattern specified by a communication network W

Next, let’s specify how the communication matrix updates. For a current output O_0 \in \mathbb{R}^N and a previous output O_1 \in \mathbb{R}^N, the product O_0(i)O_1(j) \in \mathbb{R} captures

  1. How much channel i’s current output agrees with channel j’s last output. If O_0(i) is in agreement with O_1(j), then channel j “figured it out” right before channel i.
  2. How vocal they both are about their output. Even if both channels are whispering identical numbers (i.e. outputting identical numbers close to zero), then we only count that as weak evidence of “firing together”.

We modulate this agreement by how much channel i actually heard channel j. Even if channel i and channel j are in lockstep agreement, their connection will only be strengthened if they can hear that they are in agreement. The entry W_{i,j} encodes how channel i listens what channel j says. Therefore W_{ij} O_0(i) O_1(j) is the amount by which channel i wants to strengthen (if the sign is positive) or weaken (if the sign is negative) how much it listens to channel j. Fortunately, there’s a nice linear algebra way to capture this data: W \odot (O_0 \otimes O_1) where \odot is Hadamard product (i.e. entry-wise product) and \otimes is Kroenecker product. So, given a communication network W and the outputs of each channel, we update the communication network to be \beta(W + \alpha( W \odot ( O_0 \otimes O_1))) \tag{1} where \alpha is the learning rate and \beta is a regularization term.

Together with the polynomial functor p_N \to p_N defined by W, this specifies the [p_N, p_N]-coalgebra. If M \neq N, then S_{M, N} = \emptyset and so the coalgebra is trivial.

Finally, the identor, compositor, and productor squares are straightforward to define, as is proving the unit, associative, and interchange equations. And thus, we have defined a dynamic monoidal category of Hebbian learners!

3 A few tidbits before we go

3.1 Allo-hebbian learners

You may have noticed something strange in the definition above. If the entry W_{i,i} is not zero, then channel i is listening to itself. If it repeats the same output over and over, then it will always agree with what it just said. The connection with itself will be strengthened, creating an echo-chamber of one. While this feature may be appropriate in some domains—for example, neurons can (although generally do not) have axons that synapse onto to their own dendrites—we may want to avoid it in certain contexts. To do so, we can define a sub-dynamic monoidal category of allo-hebbian learners by changing the story above so that diagonal entries of W must be zero. Note that this property of W will be preserved by the update rule in Equation (1).

3.2 Implementing dynamic categorical structures in AlgebraicJulia

Developing the math is all good fun, but there is something unfinished about the static-ness of the formulas. I want to see these dynamic structures run! I want to see the connections strengthen and weaken! With Kris Brown, we have been implementing the deep learning dynamic multicategory in AlgebraicDynamics, and hope to continue this work for Hebbian learners and predictive markets.

3.3 A mystery

Just as individuals can follow their “when you light up, I light up” nose, so can groups of people. Maybe you peruse Topos’s internet presence (an aggregate of the outputs of the individuals who make up Topos), it causes you to figure something out, and you strengthen your connection to Topos. Unfortnately, our dynamic monoidal category of Hebbian learners cannot account for this sort of aggregate-level reorganization. In our first attempts at creating a dynamic structure for of Hebbian learning, we tried to define the interaction pattern so that the manager could receive an aggregate of what her workers produce—as in deep learners and prediction markets—instead of a direct copy of what all of the workers said. However, we were forced into the latter situation in order to make composition work. This leaves some open questions. Can we extend the state space of Hebbian learners so that aggregate-level reorganization can be accounted for? If not, how can we formalize what happens when individuals form a community and how that affects the flexibility of the entire system to reorganize?

References

Shapiro, Brandon, and David I. Spivak. 2022. “Dynamic Categories, Dynamic Operads: From Deep Learning to Prediction Markets,” no. arXiv:2205.03906 (May). https://doi.org/10.48550/arXiv.2205.03906.
Leaving a comment will set a cookie in your browser. For more information, see our cookies policy.