Introducing Algebraic Data Structures via Category Theory: Monoids

Since joining foursquare, I’ve been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.

This has increased my interest in category theory in an interesting way. As a programming language geek, I’m obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.

Historically (as much as that word can be used for anything in computer science), we’ve thought about data structures primarily in a algorithmic and structural ways.

For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.

In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they’re built.

For example, there’s a structure called a monoid. A monoid is a very simple idea: it’s an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:

$\forall s \in S: s * i = s, i * s = s$
$\forall x, y, z \in S: (x * y) * z = x * (y * z)$

There are some really obvious examples of monoids – like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.

Lists with concatenation and the empty list are a monoid: for any list,
l ++ [] == l, [] + l == l, and concatenation is associative.

Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold – that is, operations that collapse a sequence of other operations into a single value – can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.

And that’s where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that’s exactly what you want for talking about algebraic data structures.

The categorical construction of a monoid is, alas, pretty complicated. It’s a simple idea – but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it’s really worth it: when you see a monoidal category, it’s obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.

A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it’s a category C, paired with a bi-functor – that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They’re called its coherence conditions, and basically, they’re the properties that are needed to make the diagrams that we’re going to use commute.

So – the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.

Associativity is expressed categorically using a natural isomorphism, which we’ll name α. For any three object X, Y, and Z, α includes a component α_X,Y,Z (which I’ll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.

The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λ_X and ρ_X such that λ_X maps from I⊗X→X, and ρ_X maps from X⊗I to X.

Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together – what kinds of properties these pieces need to have for this to – that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.

The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.

Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.

Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.

10 thoughts on “Introducing Algebraic Data Structures via Category Theory: Monoids”

Cyan May 13, 2012 at 10:13 pm

I found the description of folds a bit too terse to really understand it. I gather from the post that foldability is (one of?) the point(s) of monoids, so by failing to grok foldability, I’m missing out on the motivation. An example would probably help…

Loading...

Reply ↓
Rob Renaud May 13, 2012 at 11:51 pm

I’d love to see a 4sq vs Google post.

What is to like and dislike about both of them.

Loading...

Reply ↓
John Armstrong May 14, 2012 at 8:56 am

Cyan: the point of a fold — at least of this simple form — is that it’s an operation on a List[Foo] that gives back a Foo in a certain way.

Examples:
1) Sum up all members of this List[Number] to get a Number
2) Concatenate all members of this List[String] to get a String
3) Compose all members of this List[Function[A,A]] to get a Function[A,A]

The common thread in all of these is that the sets and the operations we’re looking at — (Number, sum), (String, concatenate), (Function[A,A], compose) — satisfy some nice properties.

Specifically, they all have an identity — sum has 0, concatenate has “”, compose has the identity function — that gives us a starting place when the list we hand in is empty. What’s the sum of no Numbers? 0; the concatenation of no strings? “”; the composition of no functions? the identity function.

Also, the operations are all associative. This means that it doesn’t really matter what order we group the List elements in, as long as they are all presented in the list order. This is Helpful for computations and leads to some nice flexibility in reasoning about the results.

Loading...

Reply ↓
rolak May 18, 2012 at 1:45 pm

Again an interesting post. Had to be said after reading here that long already…

And now for something completely different: How about s/&lamba;/λ/g ? 😉

Loading...

Reply ↓
billy June 4, 2012 at 1:41 pm

I can’t see your monad laws in Google Chrome browser. 🙁
Now I must go on in life wondering what these 2 laws could have been. 🙁

To be a monoid, these objects must satisfy some rules called the monad laws:

1.?
2.?

-A sad mathematician 🙁

Loading...

Reply ↓
Alex Fowler June 19, 2012 at 11:55 am

A nice post, but I want to join to what Billy is saying – I can’t see any images in your posts that are stored at “scientopia.org/blogs/” but can se those stored on “scientopia.org/img-archive/”.

Loading...

Reply ↓
附中长老 June 30, 2012 at 10:24 pm

the monad laws are visible in firefox, what a disappointment with chrome.

Loading...

Reply ↓
mkl July 16, 2012 at 6:33 am

Just a minor nitpick: shouldn’t the monoid axioms be called “monoid laws” instead of “monad laws”?

Loading...

Reply ↓
1. John Armstrong July 16, 2012 at 7:33 am
  
  Probably should, though a monad is a monoid so it all works out.
  
  Loading...
  
  Reply ↓
wvxvw August 6, 2013 at 5:21 am

Any reason why do you need identity element to be able to fold a sequence? This is really confusing… Do you mean you cannot fold even integers using multiplication, for example?

Loading...

Reply ↓