Since joining foursquare, I’ve been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.
This has increased my interest in category theory in an interesting way. As a programming language geek, I’m obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.
Historically (as much as that word can be used for anything in computer science), we’ve thought about data structures primarily in a algorithmic and structural ways.
For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.
In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they’re built.
For example, there’s a structure called a monoid. A monoid is a very simple idea: it’s an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:
There are some really obvious examples of monoids – like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.
Lists with concatenation and the empty list are a monoid: for any list,
l ++  == l,  + l == l, and concatenation is associative.
Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold – that is, operations that collapse a sequence of other operations into a single value – can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.
And that’s where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that’s exactly what you want for talking about algebraic data structures.
The categorical construction of a monoid is, alas, pretty complicated. It’s a simple idea – but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it’s really worth it: when you see a monoidal category, it’s obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.
A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it’s a category C, paired with a bi-functor – that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They’re called its coherence conditions, and basically, they’re the properties that are needed to make the diagrams that we’re going to use commute.
So – the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.
Associativity is expressed categorically using a natural isomorphism, which we’ll name α. For any three object X, Y, and Z, α includes a component αX,Y,Z (which I’ll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.
The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λX and ρX such that λX maps from I⊗X→X, and ρX maps from X⊗I to X.
Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together – what kinds of properties these pieces need to have for this to – that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.
The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.
Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.
Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.