Computation and Category Theory

category theory

Joshua Meyers




In a recent talk, David Spivak, my advisor at Topos Institute, described Poly as “the language of computation”, due to its facility in describing concepts in computer science such as data migration, dependent types, and Turing machines. But is Poly really the language of computation? To address this question, I decided first to take a step back and ask, “what is computation?”

In a recent talk, David Spivak, my advisor at Topos Institute, described Poly as “the language of computation”, due to its facility in describing concepts in computer science such as data migration, dependent types, and Turing machines. But is Poly really the language of computation?

To address this question, I decided first to take a step back and ask, “what is computation?”

This question has been considered for a long time. It has been answered in many different ways, variously in terms of Turing machines, lambda calculus, and general recursive functions. The famous Church-Turing thesis states the remarkable fact that all these descriptions turn out to be equivalent, pointing towards an underlying concept. In Kurt Gödel’s Remarks before the Princeton bicentennial conference on problems in mathematics (1946, Collected Works v. II p. 150), he highlights this convergence:

Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing’s computability). It seems to me that this importance is largely due to the fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen.

Indeed, the functions which can be computed by a Turing machine are exactly those which are definable in the lambda calculus, which are in turn exactly general recursive functions. And no system of computation has been devised which can compute any more functions than this class. This situation is troubling, as we apparently have a singular notion, computability, which does not depend on the formalism we use, but we have no definition of it except by means of particular formalisms.

What is the essence of computation then, which lurks behind these diverse definitions? This problem, hitherto unsolved, seems apt for category theory, with its spirit of abstraction across multiple frameworks. It is frustrating to me as a mathematician to so often have to make recourse to a concept which is known to be absolute but can only be grasped by means of highly idiosyncratic formalisms. (Think of how many arbitrary choices are made when defining a Turing machine, which nonetheless have no effect whatsoever on the class of functions which can be computed!) If only I could grasp the concept itself!

If category theory can be utilized to understand computation itself, fundamental computer science will be rendered as a field of mathematics, broadening the line of communication between category theorists and computer scientists and facilitating the deployment of category theory in hardware and software, as per Topos Institute’s strategic plan. Moreover, if applied category theorists truly understand computation, we will be able to use its power more intentionally, empowering us to create systems that benefit us all.

Others have attempted this program before. Robin Gandy proposed a generalization of a Turing machine, called a “Gandy machine”, in his 1980 paper “Church’s Thesis and Principles for Mechanisms”. Gandy describes 4 principles which purportedly capture the essence of a “computing machine”. Then he shows that any function which can be computed by a machine satisfying these 4 principles is Turing-computable. Furthermore, he constructs counterexamples “which show that if the principles be weakened in almost any way, then there will be devices which satisfy the weakened principles and which can calculate any number-theoretic function.”

Gandy’s paper is very technically involved, and his analysis was subsequently streamlined by Wilfried Sieg and John Byrnes in their 1999 paper “An Abstract Model for Parallel Computations: Gandy’s Thesis”. In A Category Theoretic Interpretation of Gandy’s Principles for Mechanisms, Joseph Razawi and Andrea Schalk give a categorical description of what Gandy is doing.

All three of these papers are difficult, and I have not worked through all of the details, but I doubt that the essence of computation is really something so technical, as it seems a quite natural notion, albeit hard to pin down. Moreover, in my skimming of the papers, I have not found any sections which claim to justify that the formalisms they choose are the right ones to capture the concepts at hand. If anyone thinks that Gandy machines do in fact get to the essence of computation and can explain to me how, I would be very interested.

In 2008, Nachum Dershowitz and Yuri Gurevich published “A Natural Axiomatization of Computability and Proof of Church’s Thesis”. This paper is very long and full of quotes from famous logicians talking about their concept of computability. Then principles for computability are stated, seemingly justified by appealing to the word of these great masters. Perhaps the paper succeeds at a theory of how these logicians conceive of computability, but that is not the same as getting to the essence of computability itself.

The fact that we still do not understand computability is poignantly illustrated by the existence of a 2020 paper by Matteo De Benedetto, “Explication as a Three-Step Procedure: the case of the Church-Turing Thesis”. In this paper, De Benedetto compares the two recent approaches to explicating computability, that of Sieg (he went on to do a lot more work on computability after his streamlining of Gandy’s work) and that of Dershowitz and Gurevich. To do this, he interrogates the concept of explication itself, arguing that Carnap’s “two-step” picture of explication is not sufficient to understand the differences between these two conceptions of computability and that really we need a “three-step” picture. That such an involved meta-analysis is needed casts doubt that these authors are on the right track whatsoever.

Let us begin our own investigation.

Firstly, we center the process of computation over the property of computability. Looking at “the class of computable functions” and attempting to analyze its properties is tantamount to a project to understand 3-dimensional figures solely by looking at their 2-dimensional projections. The essential fact that makes computable functions computable is the process of computation, so this process is what we must understand, not merely its result.

Historically, this centering of the process of computation was what made Turing’s 1936 paper “On Computable Numbers”, in which he introduces the Turing machine, such a breakthrough. As Turing wrote in 1937, “The identification of ‘effectively calculable’ functions with computable functions is possibly more convincing than an identification with the \lambda-definable or general recursive functions.” Church agreed, writing in his review of Turing’s paper,

As a matter of fact, there is … equivalence of three different notions: computability by a Turing machine, general recursiveness in the sense of Herbrand–Gödel–Kleene, and \lambda-definability in the sense of Kleene and [myself]. Of these, the first has the advantage of making the identification with effectiveness in the ordinary (not explicitly defined) sense evident immediately … The second and third have the advantage of suitability for embodiment in a system of symbolic logic.

Gödel too was convinced by Turing’s formulation. Kleene wrote:

According to a November 29, 1935, letter from Church to me, Gödel ‘regarded as thoroughly unsatisfactory’ Church’s proposal to use \lambda-definability as a definition of effective calculability … It seems that only after Turing’s formulation appeared did Gödel accept Church’s thesis.

[all three quotes from The Essential Turing ed. B. Jack Copeland, pp. 44-45]

Hence we deny that “Turing categories” get to the heart of computation, since they lack an account of the process of computation, but merely describe general recursiveness. We must instead describe the process of computation.

What made Turing machines in particular so compelling to Turing, Church, and Gödel as an explication of “effective calculation”? I find Martin Davis’ 1978 paper “What is a computation?” very elucidating on this point (Davis actually describes a Post-Turing machine, which is equivalent to a Turing machine). He writes (p. 243):

Turing based his precise definition of computation on an analysis of what a human being actually does when he computes. Such a person is following a set of rules which must be carried out in a completely mechanical manner. Ingenuity may well be involved in setting up these rules so that a computation may be carried out efficiently, but once the rules are laid down, they must be carried out in a mercilessly exact way. If we watch a human being calculating something (whether he is carrying out a long division, performing an algebraic manipulation, or doing a calculus problem), we observe symbols being written, say on a piece of paper, and the behavior of the person doing the calculating changing as he notes various specific symbols appearing as results of computation steps.

The problem which Turing faced and solved was this: how can one extract from this process what is essential and eliminate what is irrelevant? Of course some things are clearly irrelevant; obviously it does not matter whether our calculator is or is not drinking coffee as he works, whether he is using pencil or pen, or whether his paper is lined, unlined, or quadruled. Turing’s method was to introduce a series of restrictions on the calculator’s behavior, each of which could clearly be seen to be inessential. However, when he was done all that was left were a few very simple basic steps performed over and over again many times.

Davis goes on to argue that without loss of computational power, the calculator could be restricted to write on a 1-dimensional tape rather than a 2-dimensional sheet of paper; they could be restricted to only use the symbols 0 and 1; and they could be restricted to only look at one square of the tape at a time. Furthermore, their possible actions could be restricted to changing the symbol on the tape and shifting their attention to an adjoining square. With these restrictions, the calculator then follows a finite list of instructions telling them what action they must take, given the symbol on the square they are looking at. To see this whole argument, I recommend reading Davis’ paper.

So Turing machines (or, Post-Turing machines) seem to essentialize the process of human computation, that is, a human following a “mercilessly exact” procedure for calculation. Each successive restriction Davis puts on this human does not reduce their computational power, although they may make the computation a great deal more arduous (imagine trying to multiply two numbers in binary on a 1-dimensional tape, where you can only look at one square at a time! You could do it but it wouldn’t be fun). The insight of Turing which so resonated with the logicians of his day was to put the human into the analysis. It is not enough to just consider which functions are effectively calculable (i.e. computable), says Turing, we must consider the process by which a human calculates them.

From Turing machines all computations can be executed, just as from set theory all mathematics can be constructed. But as category theorists, we know that a particular set-theoretic implementation of a concept does not qualify as the essence of the concept, since it could just as easily be implemented infinitely many other ways. The true essence of a concept is its internal structure, and how it relates to other concepts (its “external structure”, perhaps). Thus from the categorical viewpoint, Turing machines do not succeed in essentializing computation any more than von Neumann ordinals essentialize the naturals, the Sheffer stroke essentializes Boolean algebra (you can get all operations from it! :P), or a particular basis of a vector space essentializes that vector space. Category theory agrees with Davis’ question: “how can one extract from this process what is essential and eliminate what is irrelevant?” But then it departs from his answer, as the particular choices he makes of how to restrict the calculator are introductions rather than eliminations of irrelevancy. What is the categorical essence of computation?

Let us start from Turing’s basic insight, that to understand computation we must understand it as a process involving a human. And not just any process, but one in which the human carries out a “mercilessly exact” procedure, i.e. an algorithm. So now we are led to the concept of “algorithm”. If we can understand this concept formally, as well as what it means to “execute” an algorithm, then we can rest easy that computation is simply the execution of an algorithm. (Needless to say, the fact that the human writes symbols on paper is inessential. They could just as well use an abacus, a clay tablet, or do everything in their head.)

In my next post on this topic, I will consider the question “What is an Algorithm?”