# Introduction and Motivation

One thing that I keep bumping up against as an engineer who loves functional a programming is category theory. It often seems like there are two kinds of functional programmers: people who came into functional programming via engineering, and people who came into functional programming via math. The problem is that a lot of the really interesting work in languages and libraries for functional programming are being built from the mathematical side, but for people on the engineering side, it’s impenetrable: it’s like it’s written in a whole different language, and even basic discussions about programming go off the rails, because the basic abstractions don’t make any sense if you don’t know category theory.

But how do you learn category theory? It seems impenetrable to mere humans. For example, one of the textbooks on category theory that several people told me was the most approachable starts chapter one with the line:

A group extension of an abelian group $H$ by an abelian group $G$ consists of a group $E$ together with an inclusion of $G \hookrightarrow E$ as a normal subgroup and a surjective homomorphism $E \twoheadrightarrow H$ that displays $H$ as the quotient group $E|G$.

If you’re not a professional mathematician, then that is pure gobbledigook. But that seems to be typical of how initiates of category theory talk about it. But the basic concepts, while abstract, really aren’t all that tricky. In many ways, it feels a lot like set theory: there’s a simple conceptual framework, on which you can build extremely complicated formalisms. The difference is that while many people have spent years figuring out how to make the basics of set theory accessible to lay-people, but that effort hasn’t been applied to set theory.

## What’s the point?

Ok, so why should you care about category theory?

Category theory is a different way of thinking, and it’s a language for talking about abstractions. The heart of engineering is abstraction. We take problems, and turn them into abstract structures. We look at the structures we create, and recognize commonalities between those structures, and then we create new abstractions based on the commonalities. The hardest part of designing a good library is identifying the right abstractions.

Category theory is a tool for talking about structures, which is particularly well suited to thinking about software. In category theory, we think in terms of arrows, where arrows are mappings between objects. We’ll see what that means in detail later, but the gist of it is that one example of arrows mapping between objects is functions mapping between data types in a computer program.

Category theory is built on thinking with orrows, and building structures using arrows. It’s about looking at mathematical constructions built with arrows, and in those structures, figuring out what the fundamental parts are. When we abstract enough, we can start to see that things that look very different are really just different realizations of the same underlying structure. Category theory gives us a language and a set of tools for doing that kind of abstraction – and then we can take the abstract structures that we identify, and turn them into code – into very generic libraries that express deep, fundamental structure.

## Monoids in Code

We’ll get started by looking at a simple mathematical structure called a monoid, and how we can implement it in code; and then, we’ll move on to take an informal look at how it works in terms of categories.

Most of the categorical abstractions in Scala are implemented using something called a typeclass, so we’ll start by looking at typeclasses. Typeclasses aren’t a category theoretical notion, but they make it much, much easier to build categorical structures. And they do give us a bit of categorical flavor: a typeclass defines a kind of metatype – that is, a type of type – and we’ll see, that kind of self-reflective abstraction is a key part of category theory.

The easiest way to think about typeclasses is that they’re a kind of metatype – literally, as the name suggests, they define classes where the elements of those classes are types. So a typeclass provides an interface that a type must provide in order to be an instance of the metatype. Just like you can implement an interface in a type by providing implementations of its methods, you can implement a typeclass by providing implementations of its operations.

In Scala, you implement the operations of a typeclasses using a language construct called an implicit parameter. The implicit parameter attaches the typeclass operations to a meta-object that can be passed around the program invisibly, providing the typeclass’s operations.

Let’s take a look at an example. An operation that comes up very frequently in any kind of data processing code is reduction: taking a collection of values of some type, and combining them into a single value. Taking the sum of a list of integers, the product of an array of floats, and the concatenation of a list of strings are all examples of reduction. Under the covers, these are all similar: they’re taking an ordered group of values, and performing an operation on them. Let’s look at a couple of examples of this:

def reduceFloats(floats: List[Float]): Float =
floats.foldRight(0)((x, y) => x + y)

def reduceStrings(strings: Seq[String]): String =
strings.foldRight("")((x, y) => x.concat(y))

When you look at the code, they look very similar. They’re both just instantiations of the same structural pattern:

def reduceX(xes: List[X]): X =
xes.foldRight(xIdentity)((a, b) => Xcombiner(a, b))

The types are different; the actual operation used to combine the values is different; the base value in the code is different. But they’re both built on the same pattern:

• There’s a type of values we want to combine: Float or String. Everything we care about in reduction is a connected with this type.
• There’s a collection of values that we want to combine, from left to right. In one case, that’s a List[Float], and in the other, it’s a Seq[String]. The type doesn’t matter, as long as we can iterate over it.
• There’s an identity value that we can use as a starting point for building the result; 0 for the floats, and "" (the empty string) for the strings.
• There’s an operation to combine two values: + for the floats, and concat for the strings.

We can capture that concept by writing an interface (a trait, in Scala terms) that captures it; that interface is called a typeclass. It happens that this concept of reducible values is called a monoid in abstract algebra, so that’s the name we’ll use.

trait Monoid[A]  {
def empty: A
def combine(x: A, y: A): A
}

We can read that as saying “A is a monoid if there are implementations of empty and combine that meet these constraints”. Given the declaration of the typeclass, we can implement it as an object which provides those operations for a particular type:

def empty: Float = 0.0
def combine(x: Float, y: Float): Float = x + y
}

object StringConcatMonoid extends Monoid[String] {
def empty: String = ""
def combine(x: String, y: String): String = x.concat(y)
}

FloatAdditionMonoid implements the typeclass Monoid for the type Float. And since we can write an implementation of Monoid for Float or String, we can say that the types Float and String are instances of the typeclass Monoid.

Using our implementation of Monoid, we can write a single, generic reduction operator now:

def reduce[A](values: Seq[A], monoid: Monoid[A]): A =
values.foldRight(monoid.empty)(monoid.combine)

We can use that to reduce a list of floats:

reduce([1.0, 3.14, 2.718, 1.414, 1.732], FloatAdditionMonoid)

And we can do a bit better than that! We can set up an implicit, so that we don’t need to pass the monoid implementation around. In Scala, an implicit is a kind of dynamically scoped value. For a given type, there can be one implicit value of that type in effect at any point in the code. If a function takes an implicit parameter of that type, then the nearest definition in the execution stack will automatically be inserted if the parameter isn’t passed explicitly.

def reduce[A](values: Seq[A])(implicit A: Monoid[A]): A =
list.foldRight(A.empty)(A.combine)

And as long as there’s a definition of the Monoid for a type A in scope, we can can use that now by just writing:

implicit object FloatAdditionMonoid extends Monoid[Float] {
def empty: Float = 0.0
def combine(x: Float, y: Float): Float = x + y
}

val floats: List[Float] = ...
val result = reduce(floats)

Now, anywhere that the FloatAdditionMonoid declaration is imported, you can call reduce on any sequence of floats, and the implicit value will automatically be inserted.

Using this idea of a monoid, we’ve captured the concept of reduction in a common abstraction. Our notion of reduction doesn’t care about whether we’re reducing strings by concatenation, integers by addition, floats by multiplication, sets by union. Those are all valid uses of the concept of a monoid, and they’re all easy to implement using the monoid typeclass. The concept of monoid isn’t a difficult one, but at the same time, it’s not necessarily something that most of us would have thought about as an abstraction.

We’ve got a typeclass for a monoid; now, we’ll try to connect it into category theory. It’s a bit tricky, so we won’t cover it all at once. We’ll look at it a little bit now, and we’ll come back to it in a later lesson, after we’ve absorbed a bit more.

### From Sets to Arrows

For most of us, if we’ve heard of monoids, we’ve heard of them in terms of set theory and abstract algebra. So in that domain, what’s a monoid?

A monoid is a triple $(V, 1, *)$, where:

• $V$ is a set of values;
• $1$ is a value in $V$;
• $*$ is a total binary operator where:
• $1$ is an identity of $*$: For any value $v \in V: v*1 = 1*v = v$.
• $*$ is associative: for any values $v, w, x \in V: (v * w) * x = v * (w * x)$

That’s all just a formal way of saying that a monoid is a set with a binary associative operator and an identity value. The set of integers can form a monoid with addition as the operator, and 0 as identity. Real numbers can be a monoid with multiplication and 1. Strings can be a monoid with concatenation as the operator, and empty string as identity.

But we can look at it in a different way, too, by thinking entirely in terms of function.
Let’s forget about the numbers as individual values, and instead, let’s think about them in functional terms. Every number is a function which adds itself to its parameter. So “2” isn’t a number, it’s a function which adds two to anything.

How can we tell that 2 is a function which adds two to things?

If we compose it with 3 (the function that adds three to things), we get 5 (the function that adds five to things). And how do we know that? Because it’s the same thing that we get if we compose 3 with 1, and then compose the result of that with 1 again. 3+1+1=5, and 3+2=5. We can also tell that it’s 2, because if we just take 1, and compose it with itself, what we’ll get back is the object that we call 2.

In this scheme, all of the numbers are related not by arithmetic, not by an underlying concept of quantity or cardinality or ordinality, but only by how they compose with each other. We can’t see anything else – all we have are these functions. But we can recognize that they are the natural numbers that we’re familiar with.

Looking at it this way, we can think of the world of natural numbers as a single point, which represents the set of all natural numbers. And around that point, we’ve got lots and lots of arrows, each of which goes from that point back to itself. Each of those arrows represents one number. The way we tell them apart is by understanding which arrow we get back when we compose them. Take any arrow from that point back to that point, and compose it with the arrow 0, and what do you get? The arrow you started with. Take any arrow that you want, and compose it with 2. What do you get? You get the same thing that you’d get if you composed it with 1, and then composed it with one again.

That dot, with those arrows, is a category.

What kind of advantage do we get in going from the algebraic notion of a set with a binary operation, to the categorical notion of an object with a bunch of composable arrows? It allows to understand a monoid purely as a structure, without having the think about what the objects are, or what the operator means.

Now, let’s jump back to our monoid typeclass for a moment.

trait Monoid[A]  {
def empty: A
def combine(x: A, y: A): A
}

We can understand this as being a programmable interface for the categorical object that we just described. All we need to do is read “:” as “is an arrow in”: It says that A is a monoid if:

• It has an element called empty which is an arrow in A.
• It has an operation called combine which, given any two arrows in A, composes them into a new arrow in A.

There are, of course, other conditions – combine needs to be associative, and empty needs to behave as the identity value. But just like when we write an interface for, say, a binary search tree, the interface only defines the structure not the ordering condition, the typeclass defines the functional structure of the categorical object, not the logical conditions.

This is what categories are really all about: tearing things down to a simple core, where everything is expressed in terms of arrows. It’s almost reasoning in functions, except that it’s even more abstract than that: the arrows don’t need to be functions – they just need to be composable mappings from things to things.

## Deeper Into Arrows

We can abstract a bit more, and look at the entire construction, including the identity and associativity constraints entirely in terms of arrows. To really understand this, we’ll need to spend some time diving deeper into the actual theory of categories, but as a preview, we can describe a monoid with the following pair of diagrams (copied from wikipedia):

In these diagrams, any two paths between the same start and end-nodes are equivalent (up to isomorphism). When you understand how to read this diagrams, these really do define everything that we care about for monoids.

For now, we’ll just run through and name the parts – and then later, in another lesson, we’ll come back, and we’ll look at this in more detail.

• $\mu$ is an arrow from $M\times M \rightarrow M$, which we’ll call a multiplication operator.
• $\eta$ is an arrow from $I \rightarrow M$, called unit.
• $\alpha$ is an arrow from $(M\times M)\times M \rightarrow M \times (M\times M)$ which represents the associativity property of the monoid.
• $\lambda$ is a morphism which represents the left identity property of the monoid (that is, $1*x=x$), and $\rho$ is a morphism representing the right identity property $(x*1=x)$.

This diagram, using these arrows, is a way of representing all of the key properties of a monoid via nothing but arrows and composition. It says, among other things, that:

• $(M \times M) \times M$ composes with multiplication to be $M \times M$.
That is, applying multiplication to $(M \times M) \times M$ evaluates to (M \times M).
• $(M \times M) \times M$ composed with associativity can become $M \times (M \times M)$.

So it’s a monoid – but it’s a higher level monoid. In this, $M$ isn’t just an object in a category: it’s an entire category. These arrows are arrows between categories in a category of categories.

What we’ll see when we get deeper into category theory is how powerful this kind of abstraction can get. We’ll often see a sequence of abstractions, where we start with a simple concept (like monoid), and find a way to express it in terms of arrows between objects in a category. But then, we’ll lift it up, and look at how we can see in not just as a relation between objects in a category, but as a different kind of relation between categories, by constructing the same thing using a category of categories. And then we’ll abstract even further, and construct the same thing using mappings between categories of categories.

Sorry things have been so slow around here. I know I keep promising that I’m going to post more frequently, but it’s hard. Life as an engineer at a startup is exhausting. There’s so much work to do! And the work is so fun – it’s easy to let it eat up all of your time.

Anyway… last good-math post ’round these parts was about monoids and programming. Today, I’m going to talk about monads and programming!

If you recall, monoids are an algebraic/categorical construct that, when implemented in a programming language, captures the abstract notion of foldability. For example, if you’ve got a list of numbers, you can fold that list down to a single number using addition. Folding collections of values is something that comes up in a lot of programming problems – capturing that concept with a programming construct allows you to write code that exploits the general concept of foldability in many different contexts.

Monads are a construct that have become incredibly important in functional programming, and they’re very, very poorly understood by most people. That’s a shame, because the real concept is actually simple: a monad is all about the concept of sequencing. A monad is, basically, a container that you can wrap something in. Once it’s wrapped, you can form a sequence of transformations on it. The result of each step is the input to the next. That’s really what it’s all about. And when you express it that way, you can begin to see why it’s such an important concept.

I think that people are confused by monads for two reasons:

1. Monads are almost always described in very, very abstract terms. I’ll also get into the abstract details, but I’ll start by elaborating on the simple description I gave above.
2. Monads in Haskell, which are where most people are introduced to them, are very confusing. The basic monad operations are swamped with tons of funny symbols, and the basic monad operations are named in incredibly misleading ways. (“return” does almost the exact opposite of what you expect return to do!)

In programming terms, what’s a monad?

Basically, a monadic computation consists of three pieces:

1. A monadic type, M which is a parametric type wrapper that can wrap a value of any type.
2. An operation which can wrap a value in M.
3. An operation which takes a function that transforms a value wraped in M into another value (possibly with a different type) wrapped in M.

Whenever you describe something very abstractly like this, it can seem rather esoteric. But this is just a slightly more formal way of saying what I said up above: it’s a wrapper for a series of transformations on the wrapped value.

Let me give you an example. At foursquare, we do all of our server programming in Scala. In a year at foursquare, I’ve seen exactly one null pointer exception. That’s amazing – NPEs are ridiculously common in Java programming. But in Scala at foursquare, we don’t allow nulls to be used at all. If you have a value which could either be an instance of A, or no value, we use an option type. An Option[T] can be either Some(t: T) or None.

So far, this is nice, but not really that different from a null. The main difference is that it allows you to say, in the type system, whether or not a given value might be null. But: Option is a monad. In Scala, that means that I can use map on it. (map is one of the transformation functions!)

val t: Option[Int] = ...
val u: Option[String] = t.map( _ + 2 ).map(_.toString)

What this does is: if t is Some(x), then it adds two to it, and returns Some(x+2); then it takes the result of the first map, and converts it toa string, returning an Option[String]. If t is None, then running map on it always returns None. So I can write code which takes care of the null case, without having to write out any conditional tests of nullness – because optionality is a monad.

In a good implementation of a monad, I can do a bit more than that. If I’ve got a Monad[T], I can use a map-like operation with a function that takes a T and returns a Monad[T].

For an example, we can look at lists – because List is a monad:

val l: List[Int] = List(1, 2, 3)
l.flatMap({ e => List( (e, e), (e+1, e+2) )  })
res0: List[(Int, Int)] = List((1,1), (2,3), (2,2), (3,4), (3,3), (4,5))

The monad map operation does a flatten on the map steps. That means a lot of things. You can see one in the rather silly example above.

You can take values, and wrap them as a list. THen you can perform a series of operations on those elements of a list – sequencing over the elements of the list. Each operation, in turn, returns a list; the result of the monadic computation is a single list, concatenating, in order, the lists returned by each element. In Scala, the flatMap operation captures the monadic concept: basically, if you can flatmap something, it’s a monad.

Let’s look at it a bit more specifically.

2. A function to wrap a value into the monad: the constructor function from List def apply[T](value: T): List[T]
3. The map operation: def flatMap[T, U](op: T => List[U]): List[U].

(In the original version of this post, the I put the wrong type in flatMap in the list above. In the explanation demonstrating flatMap, the type is correct. Thanks to John Armstrong for catching that!)

You can build monads around just about any kind of type wrapper where it makes sense to map over the values that it wraps: collections, like lists, maps, and options. Various kinds of state – variable environments (where the wrapped values are, essentially, functions from identifiers to values), or IO state. And plenty of other things. Anything where you perform a sequence of operations over a wrapped value, it’s a monad.

Now that we have some understanding of what this thing we’re talking about it, what is it in mathematical terms? For that, we turn to category theory.

Fundamentally, in category theory a monad is a category with a particular kind of structure. It’s a category with one object. That category has a collection of arrows which (obviously) are from the single object to itself. That one-object category has a functor from the category to itself. (As a reminder, a functor is an arrow between categories in the category of (small) categories.)

The first trick to the monad, in terms of theory, is that it’s fundamentally about the functor: since the functor maps from a category to the same category, so you can almost ignore the category; it’s implicit in the definition of the functor. So we can almost treat the monad as if it were just the functor – that is, a kind of transition function.

The other big trick is closely related to that. For the programming language application of monads, we can think of the single object in the category as the set of all possible states. So we have a category object, which is essentially the collection of all possible states; and there are arrows between the states representing possible state transitions. So the monad’s functor is really just a mapping from arrows to different arrows – which basically represents the way that changing the state causes a change in the possible transitions to other states.

So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that’s translated into programming languages, that becomes a value that implicitly takes an input state, possibly updates it, and returns an output state. Sound familiar?

Let’s take a moment and get formal. As usual for category theory, first there are some preliminary definitions.

1. Given a category, C, 1C is the identity functor from C to C.
2. Given a category C with a functor T : CC, T2 = T º T.
3. Given a functor T, 1T : TT is the natural transformation from T to T.

Now, with that out of the way, we can give the complete formal definition of a monad. Given a category C, a monad on C is a triple: (T:CC, η:1CT, μ:T2T), where T is a functor, and η and μ are natural transformations. The members of the triple must make the following two diagrams commute.

Commutativity of composition with μ

Commutativity of composition with η

What these two diagrams mean is that successive applications of the state-transition functor over C behave associatively – that any sequence of composing monadic functors will result in a functor with full monadic structure; and that the monadic structure will always preserve. Together, these mean that any sequence of operations (that is, applications of the monad functor) are themselves monad functors – so the building a sequence of monadic state transformers is guaranteed to behave as a proper monadic state transition – so what happens inside of the monadic functor is fine – to the rest of the universe, the difference between a sequence and a single simple operation is indistinguishable: the state will be consistently passed from application to application with the correct chaining behavior, and to the outside world, the entire monadic chain looks like like a single atomic monadic operation.

Now, what does this mean in terms of programming? Each element of a monadic sequence in Haskell is an instantiation of the monadic functor – that is, it’s an arrow between states – a function, not a simple value – which is the basic trick to monads. They look like a sequence of statements; in fact, each statement in a monad is actually a function from state to state. And it looks like we’re writing sequence code – when what we’re actually doing is writing function compositions – so that when we’re done writing a monadic sequence, what we’ve actually done is written a function definition in terms of a sequence of function compositions.

Understanding that, we can now clearly understand why we need the return function to use a non-monad expression inside of a monadic sequence – because each step in the sequence needs to be an instance of the monadic functor; an expression that isn’t an instance of the monadic functor couldn’t be composed with the functions in the sequence. The return function is really nothing but a function that combines a non-monadic expression with the id functor.

In light of this, let’s go back and look at the definition of Monad in the Haskell standard prelude.

class  Functor f  where
fmap              :: (a -> b) -> f a -> f b

(>>=)  :: m a -> (a -> m b) -> m b
(>>)   :: m a -> m b -> m b
return :: a -> m a
fail   :: String -> m a

-- Minimal complete definition:
--      (>>=), return
m >> k  =  m >>= _ -> k
fail s  = error s

The declaration of monad is connected with the definition of functor – if you look, you can see the connection. The fundamental operation of Monad is “>>=” – the chaining operation, which is basically the haskell version of the map operation, which is type m a -> (a -> m b) -> m b is deeply connected with the fmap operation from Functor‘s fmap operation – the a in m a is generally going to be a type which can be a Functor. (Remember what I said about haskell and monads? I really prefer map and flatMap to >> and >>=).

So the value type wrapped in the monad is a functor – in fact, the functor from the category definition! And the “>>=” operation is just the functor composition operation from the monad definition.

There are two simple laws for Functor, and it’s pretty obvious why they’re fundamentally just strucure-preservation requirements. The functor class only has one operation, called fmap, and the two functor laws are about how it must behave.

1. fmap id = id
(Mapping ID over any structured sequence results in an unmodified sequence)
2. fmap (f . g) = (fmap f) . (fmap g)
(“.” is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)

The monad laws are a bit harder, but not much. The mainly govern how monadic operations interact with non-monadic operations in terms of the “return” and “>>=” operations of the Monad class.

1. return x >>= f = f x
(injecting a value into the monad is basically the same as passing it as a parameter down the chain – return is really just the identity functor passing its result on to the next step. I hate the use of “return”. In a state functor, in exactly the right context, it does sort-of look like a return statement in an imperative language. But in pretty much all real code, return is the function that wraps a value into the monad.)
2. f >>= return = f
(If you don’t specify a value for a return, it’s the same as just returning the result of the previous step in the sequence – again, return is just identity, so passing something into return shouldn’t affect it.)
3. seq >>= return . f = fmap f seq
(composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad – in other words, it’s just composition with identity.)
4. seq >>= (x -> f x >>= g) = (seq >>= f) >>= g
(Your implementation of “>>=” needs to be semantically equivalent to the usual translation; that is, it must behave like a functor composition.)

# Introducing Algebraic Data Structures via Category Theory: Monoids

Since joining foursquare, I’ve been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.

This has increased my interest in category theory in an interesting way. As a programming language geek, I’m obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.

Historically (as much as that word can be used for anything in computer science), we’ve thought about data structures primarily in a algorithmic and structural ways.

For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.

In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they’re built.

For example, there’s a structure called a monoid. A monoid is a very simple idea: it’s an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:

1. $forall s in S: s * i = s, i * s = s$
2. $forall x, y, z in S: (x * y) * z = x * (y * z)$

There are some really obvious examples of monoids – like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.

Lists with concatenation and the empty list are a monoid: for any list,
l ++ [] == l, [] + l == l, and concatenation is associative.

Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold – that is, operations that collapse a sequence of other operations into a single value – can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.

And that’s where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that’s exactly what you want for talking about algebraic data structures.

The categorical construction of a monoid is, alas, pretty complicated. It’s a simple idea – but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it’s really worth it: when you see a monoidal category, it’s obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.

A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it’s a category C, paired with a bi-functor – that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They’re called its coherence conditions, and basically, they’re the properties that are needed to make the diagrams that we’re going to use commute.

So – the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.

Associativity is expressed categorically using a natural isomorphism, which we’ll name α. For any three object X, Y, and Z, α includes a component αX,Y,Z (which I’ll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.

The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λX and ρX such that λX maps from I⊗X→X, and ρX maps from X⊗I to X.

Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together – what kinds of properties these pieces need to have for this to – that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.

The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.

Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.

Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.

# Categorical Equalizers

Category theory is really all about building math using composition. Everything we do, we do it by defining things completely in terms of composition. We’ve seen a lot of that. For example, we defined subclasses (and other sub-things) in terms of composition. It sounds strange, but it took me an amazingly long time to really grasp that. (I learned category theory from some not-very-good textbooks, which were so concerned with the formalisms that they never bothered to explain why any of it mattered, or to build any intuition.)

One thing that’s really important that we haven’t talked about yet is equality. In category theory, we define equality on arrows using something called a pullback. We’ll use that notion of equality for a lot of other things – like natural transformations. But before we can do pullbacks, we need to look at a weaker notion of arrow equality – something called the equalizer of a pair of arrows.

As part of my attempt to be better than those books that I complained about, we’ll start with intuition, by looking at what an equalizer is in terms of sets. Suppose we have two functions $f, g : A rightarrow B$.

Now, in addition, suppose that they have a not-empty intersection: that is, that there’s some set of values $x in A$ for which $f(x) == g(x)$. We can take that set of values, and call it $C$. $C$ is the equalizer of the functions $f$ and $g$. It’s the largest set $C$ where if you restrict the inputs to the functions
to members of $C$, then $f$ and $g$ are equal.

Now, let’s look at the category theoretic version of that. We have objects $A$ and $B$. We have two arrows $f, g : A rightarrow B$. This is the category analogue of the setup of sets and functions from above.

To get to the equalizer, we need to add an object C which is a subobject of A (which corresponds to the subset of A on which f and g agree in the set model).

The equalizer of $A$ and $B$ is the pair of the object $C$, and an arrow $i : C rightarrow A$. (That is, the object and arrow that define C as a subobject of A.) This object and arrow must satisfy the following conditions:

1. $f circ i = g circ i$
2. $(forall j: D rightarrow A)$ if $f circ j = g circ j$ then $(exists 1 k : D rightarrow C)$ such that $i circ k = j$

That second one is the mouthful. What it says is:

• Suppose that I have any arrow j from some other object D to A:
• if f and g agree on composition about j, then there can only be one unique arrow from C to D which composes with j to get to A.

In other words, (C, i) is a selector for the arrows on which A and B agree; you can only compose an arrow to A in a way that will compose equivalently with f and g to B if you go through (C, i).

Or in diagram form, k in the following diagram is necessarily unique:

There are a couple of interesting properties of equalizers that are worth mentioning. The morphism in an equalizer is a *always* monic arrow (monomorphism); and if it’s epic (an epimorphism), then it must also be iso (an isomorphism).

The pullback is very nearly the same construction as the equalizer we just looked at; except it’s abstracting one step further.

Suppose we have two arrows pointing to the same target: $f : B rightarrow A$ and $g : C rightarrow A$. Then the pullback of of f and g is the triple of an object and two arrows $(B times_A C, p : B times_A C rightarrow B, q : Btimes_A C rightarrow C)$. The elements of this triple must meet the following requirements:

1. $f circ p = g circ q$
2. $(f circ p) : Btimes_A C rightarrow A$
3. for every triple $(D, h : D rightarrow B , k : D rightarrow C)$, there is exactly one unique arrow $langle h,k rangle_A : D rightarrow B times_A C$ where $p circ langle h,k rangle_A = h$, and ${ (x,y) in A times B : f(x) = g(y) }$

Right back where we started, almost. The pullback is an equalizer; working it back shows that.

# Substuff

What’s a subset? That’s easy: if we have two sets A and B, A is a subset of B if every member of A is also a member of B.

We can take the same basic idea, and apply it to something which a tad more structure, to get subgroups. What’s a subgroup? If we have two groups A and B, and the values in group A are a subset of the values in group B, then A is a subgroup of B.

The point of category theory is to take concepts like “subset” and generalize them so that we can apply the same idea in many different domains. In category theory, we don’t ask “what’s a subset?”. We ask, for any structured THING, what does it mean to be a sub-THING? We’re being very general here, and that’s always a bit tricky. We’ll start by building a basic construction, and look at it in terms of sets and subsets, where we already understand the specific concept.

In terms of sets, the most generic way of defining subsets is using functions. Suppose we have a set, A. How can we define all of the subsets of A, in terms of functions? We can do it using injective functions, as follows. (As a reminder, a function from X to Y where every value in X is mapped to a distinct function in y.)

For the set, A, we can take the set of all injective functions to A. We’ll call that set of functions Inj(A).

Given Inj(A), we can define equivalence classes over Inj(A), so that $f: X rightarrow A$ and $g: Y rightarrow A$ are equivalent if there is an isomorphism between X and Y.

The domain of each function in one of the equivalence classes in Inj(A) is a function isomorphic to a subset of A. So each equivalence class of injective functions defines a subset of A.

And there we go: we’ve got a very abstract definition of subsets.

Now we can take that, and generalize that function-based definition to categories, so that it can define a sub-object of any kind of object that can be represented in a category.

Before we jump in, let me review one important definition from before; the monomorphism, or monic arrow.

A monic arrow is an arrow $f : a rightarrow b$ such that
$f$ in $f circ g$ end up at the same object only if they are the same.)

So, basically, the monic arrow is the category theoretic version of an injective function. We’ve taken the idea of what an injective function means, in terms of how functions compose, and when we generalized it, the result is the monic arrow.

Suppose we have a category $C$, and an object $a in mbox{Obj}(C)$. If there are are two monic arrows $f : x rightarrow a$ and $g : y rightarrow a$, and
there is an arrow $h$ such that $g circ h = f$, then we say $f le g$ (read “f factors through g”). Now, we can take that “≤” relation, and use it to define an equivalence class of morphisms using $f equiv g LeftRightArrow f le g land g le f$.

What we wind up with using that equivalence relation is a set of equivalence classes of monomorphisms pointing at A. Each of those equivalence classes of morphisms defines a subobject of A. (Within the equivalence classes are objects which have isomorphisms, so the sources of those arrows are equivalent with respect to this relation.) A subobject of A is the sources of an arrow in one of those equivalence classes.

It’s exactly the same thing as the function-based definition of sets. We’ve created a very general concept of sub-THING, which works exactly the same way as sub-sets, but can be applied to any category-theoretic structure.

# Interpreting Lambda Calculus using Closed Cartesian Categories

Today I’m going to show you the basic idea behind the equivalency of closed cartesian categories and typed lambda calculus. I’ll do that by showing you how the λ-theory of any simply typed lambda calculus can be mapped onto a CCC.

First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of base types – the types of simple atomic values that can appear in lambda expressions. A lambda theory is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.

So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:

1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’

So. Suppose we have a lambda-theory $L$. We can construct a corresponding category $C(L)$. The objects in $C(L)$ are the types in $L$. The arrows in $C(L)$ correspond to families of expressions in $L$; an arrow
$f : A rightarrow B$ corresponds to the set of expressions of type $B$ that contain a single free variable of type $A$.

The semantics of the lambda-theory can be defined by a functor; in particular, a cartesian closed functor $F$ that maps from $C(L)$ to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is much simpler.)

We describe how we build the category for the lambda theory in terms of a CCC using something called an interpretation function. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: $[A]$ is the categorical interpretation of the type $A$ from lambda calculus.

So, first, we define an object for each type in $L$. We need to include a special
type, which we call unit. The idea behind unit is that we need to be able to talk about “functions” that either don’t take any real paramaters, or functions that don’t return anything. Unit is a type which contains exactly one atomic value. Since there’s only one possible value for unit, and unit doesn’t have any extractable sub-values, conceptually, it doesn’t ever need to be passed around. So it’s a “value” that never needs to get passed – perfect for a content-free placeholder.

Anyway, here we go with the base rules:

• $forall A in mbox{basetypes}(L), [A] = A_C in C(L)$
• $[mbox{unit}] = 1_C$

Next, we need to define the typing rules for complex types:

• $[ A times B] == [A] times [B]$
• $[A rightarrow B] = [B]^{[A]}$

Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Just like we did in lambda calculus, the type derivations are done with a context $Gamma$, containing a set of type judgements. Each type judgement assigns a type to a lambda term. There are two translation rules for contexts:

• $[ emptyset ] = 1_C$
• $[Gamma, x: A] = [Gamma] times [A]$

We also need to describe what to do with the values of the primitive types:

• For each value $v : A$, there is an arrow $v : 1 rightarrow A_C$.

And now the rest of the rules. Each of these is of the form $[Gamma :- x : A] = mbox{arrow}$, where we’re saying that $Gamma$ entails the type judgement $x : A$. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in $C(L)$.

• Unit evaluation: $[ Gamma :- mbox{unit}: mbox{Unit}] = !: [Gamma] rightarrow [mbox{Unit}]$. (A unit expression is a special arrow “!” to the unit object.)
• Simple Typed Expressions: $[Gamma :- a: A_C] = a circ ! : [Gamma] rightarrow [A_C]$. (A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)
• Free Variables: $[Gamma x: A :- x : A] = pi_2 : ([Gamma] times [A]) rightarrow [A]$ (A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)
• Inferred typed expressions: $[Gamma, x:A :- x$, where $pi_1: ([Gamma] times [A])rightarrow [A$ (If the type rules of Γ plus the judgement $x : A$ gives us $x$, then the term $x$ is an arrow starting from the product of the interpretation of the full type context with $A$), and ending at $A$. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type $A$.)
• Function Abstraction: $[Gamma :- lambda x:A . M : A rightarrow B] = mbox{curry}([Gamma, x:A :- M:B]) : [Gamma] rightarrow B^{[A]}$. (A function maps to an arrow from the type context to an exponential $[B]^{[A]}$, which is a function from $A$ to $B$.)
• Function application: $[Gamma :- (M M$, $[Gamma :- (M M$, $[Gamma :- M$. (function evaluation takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)

There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.

The intuition behind this is:

• arrows between types are families of values. A particular value is a particular arrow from unit to a type object.
• the categorical exponent in a CC is exactly the same thing as a function type in λ-calculus; and an arrow to an exponent is the same thing as a function value. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; that arrow is the value that the function evaluates to.
• And the semantics – called functorial semantics – maps from the objects in this category, $C(L)$ to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object in $C(L)$, and the set of natural numbers in the sets category would be the target of the functor.)

Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the Categorical Abstract Machine. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), Objective-CAML had its first implementation based on the CAM. (CAML stands for categorical abstract machine language.).

From this, you can see how the CCCs and λ-calculus are related. It turns out that that relation is not just cool, but downright useful. Concepts from category theory – like monads, pullbacks, and functors are really useful things in programming languages! In some later posts, I’ll talk a bit about that. My current favorite programming language, Scala, is one of the languages where there’s a very active stream of work in applying categorical ideas to real-world programming problems.

# Sidetrack from the CCCs: Lambda Calculus

So, last post, I finally defined closed cartesian categories. And I alluded to the fact that the CCCs are, essentially, equivalent to the simply typed λ calculus. But I didn’t really talk about what that meant.

Before I can get to that, you need to know what λ calculus is. Many readers are probably familiar, but others aren’t. And as it happens, I absolutely love λ calculus.

In computer science, especially in the field of programming languages, we tend to use λ calculus a whole lot. It’s also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. λ calculus is great for a lot of reasons, among them:

1. It’s very simple.
2. It’s Turing complete: if a function can be computed by any possible computing device, then it can be written in λ-calculus.
3. It’s easy to read and write.
4. Its semantics are strong enough that we can do reasoning from it.
5. It’s got a good solid model.
6. It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing λ calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the λ calculus: Lisp, ML, Haskell, and my current favorite, Scala, are very strongly λ calculus based.

The λ calculus is based on the concept of functions. In the pure λ calculus, everything is a function; there are no values except for functions. In fact, we can pretty much build up all of mathematics using λ-calculus.

With the lead-in out of the way, let’s dive in a look at λ-calculus. To define a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

### Lambda Calculus Syntax

The λ calculus has exactly three kinds of expressions:

1. Function definition: a function in λ calculus is an expression, written: λ param . body, which defines a function with one parameter.
2. Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
3. Function application: applying a function is written by putting the function value in front of its parameter, as in x y to apply the function x to the value y.

There’s a trick that we play in λ calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that functions are, themselves, values. Instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function, which can then operate on the second parameter. In the end, it’s effectively the same thing as a two parameter function. Taking a two-parameter function, and representing it by two one-parameter functions is called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter: λ x . (λ y . x + y).

Now that we know that adding multiple parameter functions doesn’t really add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.

One important syntactic issue that I haven’t mentioned yet is closure or complete binding. For a λ calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing λ expression; if an identifier is not bound in any enclosing context, then it is called a free variable. Let’s look quickly at a few examples:

• λ x . p x y: in this expression, y and p are free, because they’re not the parameter of any enclosing λ expression; x is bound because it’s a parameter of the function definition enclosing the expression p x y where it’s referenced.
• λ x y.y x: in this expression both x and y are bound, because they are parameters of the function definition, and there are no free variables.
• λ y . (λ x . p x y). This one is a tad more complicated, because we’ve got the inner λ. So let’s start there. In the inner λ, λ x . p x y, y and p are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner λ, and y is bound by the other λ. “p” is still free.

We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.

A λ calculus expression is valid (and thus evaluatable) only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.

### Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in λ calculus; they’re called α and β. α is also called “conversion”, and β is also called “reduction”.

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in λ calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So – for instance, if we had an expression like:

λ x . if (= x 0) then 1 else x^2

We can do an α to replace X with Y (written “α[x/y]” and get):

λ y . if (= y 0) then 1 else y^2

Doing α does not change the meaning of the expression in any way. But as we’ll see later, it’s important because without it, we’d often wind up with situations where a single variable symbol is bound by two different enclosing λs. This will be particularly important when we get to recursion.

β reduction is where things get interesting: this single rule is all that’s needed to make the λ calculus capable of performing any computation that can be done by a machine.

β basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.

Suppose we have the application expression: (λ x . x + 1) 3. By performing a beta reduction, we can replace the application by taking the body x + 1 of the function, and substituting (or αing) the value of the parameter (3) for the parameter variable symbol (x). So we replace all references to x with 3. So the result of doing a beta reduction xs 3 + 1.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It’s an interesting expression, because it’s a λ expression that when applied, results in another λ expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter y with the identifier q; so, the result is λ x . x + q.

One more example, just for the sake of being annoying. Suppose we have: (λ x y. x y) (λ z . z * z) 3

That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter x in the body of the first function with λ z . z * z; and we replace the parameter y with 3, getting: (λ z . z * z) 3. And we can perform beta on that, getting 3 * 3.

Written formally, beta says: λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e])

That condition on the end, “if free(e) ⊂ free(B[x := e]” is why we need α: we can only do beta reduction if doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, λ z . (λ x . x+z). Now, suppose we want to apply it: (λ z . (λ x . x + z)) (x + 2). In the parameter (x + 2), x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “λ x . x + x + 2“. The variable that was free in x + 2 is now bound! We’ve changed the meaning of the function, which we shouldn’t be able to do. If we were to apply that function after the incorrect β, we’d get (λ x . x + x + 2) 3. By beta, we’d get 3 + 3 + 2, or 8.

What if we did α the way we were supposed to?

First, we’d do an α to prevent the name overlap. By α[x/y], we would get λ z . (λ y . y + z) (x+2).

Then by β, we’d get “λ y . y + x + 2“. If we apply this function the way we did above, then by β, we’d get 3+x+2.
3+x+2 and 3+3+2 are very different results!

And that’s pretty much it. There’s another optional rule you can add called η-conversion. η is a rule that adds extensionality, which provides a way of expressing equality between functions.

η says that in any λ expression, I can replace the value f with the value g if/f for all possible parameter values x, f x = g x.

What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.

It’l also important to point out that while I’ve gone through a basic definition of λ calculus, and described its mechanics, I haven’t yet defined a model for λ-calculus. That’s quite an important omission! λ-calculus was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although it looked correct, the early attempts to define a model for it were failures! And without a valid model, the results of the system are meaningless. An invalid model in a logical system like calculus is like a contradiction in axioms: it means that nothing that it produces is valid.

# Categorical Computation Characterized By Closed Cartesian Categories

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I’m a computer scientist/software engineer, it’s a natural: CCCs are, basically, the categorical structure of lambda calculus – and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need – what else? – more definitions.

### Cartesian Categories

A cartesian category $C$ (note not cartesian closed category) is a category:

1. With a terminal object $t$, and
2. $forall a, b in Obj(C)$, the objects and arrows of the categorical product $a times b in C$.

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

### Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it’s a bit hard to really get your head around, but it’s well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of “all functions from A to B”.

Given two objects x and y from a category C, their categorical exponential xy, if it exists in the category, is defined by a set of values:

• An object $x^y$,
• An arrow $mbox{eval}_{y,x}: x^y times y rightarrow x$, called an evaluation map.
• $forall z in Obj(C)$, an operation $Lambda_C: (z times y rightarrow x) rightarrow (z rightarrow x^y)$. (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

1. $forall f : z times y rightarrow x, g : z rightarrow x^y$:
• $mbox{val}_{y,x} circ (Lambda_C(f)times 1_y)$
• $forall f : z times y rightarrow x, g : z rightarrow x^y: Lambda_C(mbox{eval}_{y,x} circ (z times 1_y) = z$

To make that a bit easier to understand, let’s turn it into a diagram.

As I alluded to earlier, you can also think of it as a generalization of a function space.$x^y$ is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then $X^Y$ is the set of functions from Y to X.

Now, look at the diagram:

1. The top part says, basically, that $g$ is a function from $Z$ to to $X^Y$: so $g$ takes a member of $Z$, and uses it to select a function from $Y$ to $X$.
2. The vertical arrow says:
1. given the pair $(z,y)$, $f(z,y)$ maps $(z,y)$ to a value in $X$.
2. given a pair $(z,y)$, we’re going through a function. It’s almost like currying:
1. The vertical arrow going down is basically taking $g(z,y)$, and currying it to $g(z)(y)$.
2. Per the top part of the diagram, $g(z)$ selects a function from $y$ to $x$. (That is, a member of $X^Y$.)
3. So, at the end of the vertical arrow, we have a pair $(g(z), y)$.
3. The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.

### Cartesian Closed Categories

Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system. So next, we’ll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.

# Building Structure in Category Theory: Definitions to Build On

The thing that I think is most interesting about category theory is that what it’s really fundamentally about is structure. The abstractions of category theory let you talk about structures in an elegant way; and category diagrams let you illustrate structures in a simple visual way. Morphisms express the structure of a category; functors are higher level morphisms that express the structure of relationships between categories.

In my last category theory post, I showed how you can use category theory to describe the basic idea of symmetry and group actions. Symmetry is, basically, an immunity to transformation – that is, a kind of structural property of an object or system where applying some kind of transformation to that object doesn’t change the object in any detectable way. The beauty of category theory is that it makes that definition much simpler.

Symmetry transformations are just the tip of the iceberg of the kinds of structural things we can talk about using categories. Category theory lets you build up pretty much any mathematical construct that you’d like to study, and describe transformations on it in terms of functors. In fact, you can even look at the underlying conceptual structure of category theory using category theory itself, by creating a category in which categories are objects, and functors are the arrows between categories.

So what happens if we take the same kind of thing that we did to get group actions, and we pull out a level, so that instead of looking at the category of categories, focusing on arrows from the specific category of a group to the category of sets, we do it with arrows between members of the category of functors?

We get the general concept of a natural transformation. A natural transformation is a morphism from functor to functor, which preserves the full structure of morphism composition within the categories mapped by the functors. The original inventor of category theory said that natural transformations were the real point of category theory – they’re what he wanted to study.

Suppose we have two categories, C and D. And suppose we also have two functors, F, G : C → D. A natural transformation from F to G, which we’ll call η maps every object x in C to an arrow ηx : F(x) → G(x). ηx has the property that for every arrow a : x → y in C, ηy º F(a) = G(a) º ηx. If this is true, we call ηx the component of η for (or at) x.

That paragraph is a bit of a whopper to interpret. Fortunately, we can draw a diagram to help illustrate what that means. The following diagram commutes if η has the property described in that paragraph.

I think this is one of the places where the diagrams really help. We’re talking about a relatively straightforward property here, but it’s very confusing to write about in equational form. But given the commutative diagram, you can see that it’s not so hard: the path ηy º F(a) and the path G(a) º η<sub compose to the same thing: that is, the transformation η hasn’t changed the structure expressed by the morphisms.

And that’s precisely the point of the natural transformation: it’s a way of showing the relationships between different descriptions of structures – just the next step up the ladder. The basic morphisms of a category express the structure of the category; functors express the structure of relationships between categories; and natural transformations express the structure of relationships between relationships.

Of course, this being a discussion of category theory, we can’t get any further without some definitions. To get to some of the interesting material that involves things like natural transformations, we need to know about a bunch of standard constructions: initial and final objects, products, exponentials… Then we’ll use those basic constructs to build some really fascinating constructs. That’s where things will get really fun.

An initial object is a pretty simple idea: it’s an object with exactly one arrow to each of the other objects in the category. To be formal, given a category $C$, an object $o \in Obj(C)$ is an initial object if and only if $\forall b \in Obj(c): \exists_1 f: o \rightarrow b \in Mor(C)$. We generally write $0_c$ for the initial object in a category. Similarly, there’s a dual concept of a terminal object $1_c$, which is object for which there’s exactly one arrow from every object in the category to $1_c$.

Given two objects in a category, if they’re both initial, they must be isomorphic. It’s pretty easy to prove: here’s the sketch. Remember the definition of isomorphism in category theory. An isomorphism is an arrow $f : a \rightarrow b$, where $\exists g : b \rightarrow a)$ such that $f \circ g = 1_b$ and $g \circ f = 1_a$. If an object is initial, then there’s an arrow from it to every other object — including the other initial object. And there’s an arrow back, because the other one is initial. The iso-arrows between the two initials obviously compose to identities.

Now, let’s move on to categorical products. Categorical products define the product of two objects in a category. The basic concept is simple – it’s a generalization of cartesian product of two sets. It’s important because products are one of the major ways of building complex structures using simple categories.

Given a category $C$, and two objects $a,b \in Obj(C)$, the categorical product $a times b$ consists of:

• An object $p$, often written $a \times b$;
• two arrows $p_a$ and $p_b$, where $p \in Obj(C)$, $p_a : p \rightarrow a$, and $p_b : p \rightarrow b$.
• a “pairing” operation, which for every object $c \in C$, maps the pair of arrows $f : c \rightarrow a$ and
$g : c \rightarrow b$ to an arrow $\langle f,g \rangle : c \rightarrow a\times b$, where $\langle f, g \rangle$ has the
following properties:

1. $p_a \circ \langle f,g \rangle = f$
2. $p_b \circ \langle f,g \rangle = g$
3. $\forall h : c \rightarrow a \times b: \langle p_a \circ h, p_b \circ h \rangle = h$

The first two of those properties are the separation arrows, to get from the product to its components; and the third is the merging arrow, to get from the components to the product. We can say the same thing about the relationships in the product in an easier way using a commutative diagram:

One important thing to understand is that categorical products do not have to exist. This definition doen not say that given any two objects $a$ and $b$, that $a times b$ is a member of the category. What it says is what the categorical product
looks like if it exists. If, for a given pair a and b of objects, there is an object that meets this definition, then the product of a and b exists in the category. If not, it doesn’t. For many categories, the products don’t exist for some or even all of the objects in the category. But as we’ll see later, the categories for which the products do exist have some really interesting properties.

# Fun with Functors

So far, we’ve looked at the minimal basics of categories: what they are, and how to categorize the kinds of arrows that exist in categories in terms of how they compose with other arrows. Just that much is already enlightening about the nature of category theory: the focus is always on composition.

But to get to really interesting stuff, we need to build up a bit more, so that we can look at more interesting constructs. So now, we’re going to look at functors. Functors are one of the most fundamental constructions in category theory: they give us the ability to create multi-level constructions.

What’s a functor? Well, it’s basically a structure-preserving mapping between categories. So what does that actually mean? Let’s be a bit formal:

A functor $F$ from a category $C$ to a category $D$ is a mapping from $C$ to $D$ that:

• Maps each member $m in Obj(C)$ to an object $F(m) in Obj(D)$.
• Maps each arrow $a : x rightarrow y in Mor(C)$ to an arrow $F(a) : F(x) rightarrow F(y)$, where:
• $forall o in Obj(C): F(1_o) = 1_{F(o)}$. (Identity is preserved by the functor mapping of morphisms.)
• $forall m,n in Mor(C): F(n circ o) = F(n) circ F(o)$. (Commutativity is preserved by the Functor mapping of morphisms.)

Note: The original version of this post contained a major typo. In the second condition on functors, the “n” and the “o” were reversed. With them in this direction, the definition is actually the definition of something called a covariant functor. Alas, I can’t even pretend that I mixed up covariant and contravariant functors; the error wasn’t nearly so intelligent. I just accidentally reversed the symbols, and the result happened to make sense in the wrong way.

That’s the standard textbook gunk for defining a functor. But if you look back at the original definition of a category, you should notice that this looks familiar. In fact, it’s almost identical to the definition of the necessary properties of arrows!

We can make functors much easier to understand by talking about them in the language of categories themselves. Functors are really nothing but morphisms – they’re morphisms in a category of categories.

There’s a kind of category, called a small category. (I happen to dislike the term “small” category, but I don’t get a say!) A small category is a category whose collections of objects and arrows are sets, not proper classes.

(As a quick reminder: in set theory, a class is a collection of sets that can be defined by a non-paradoxical property that all of its members share. Some classes are sets of sets; some classes are not sets; they lack some of the required properties of sets – but still, the class is a collection with a well-defined, non-paradoxical, unambiguous property. If a class isn’t a set of sets, but just a collection that isn’t a set, then it’s called a proper class.)

Any category whose collections of objects and arrows are sets, not proper classes, are called small categories. Small categories are, basically, categories that are well-behaved – meaning that their collections of objects and arrows don’t have any of the obnoxious properties that would prevent them from being sets.

The small categories are, quite beautifully, the objects of a category called Cat. (For some reason, category theorists like three-letter labels.) The arrows of Cat are all functors – functors really just morphisms between categories. Once you wrap you head around that, then the meaning of a functor, and the meaning of a structure-preserving transformation become extremely easy to understand.

Functors come up over and over again, all over mathematics. They’re an amazingly useful notion. I was looking for a list of examples of things that you can describe using functors, and found a really wonderful list on wikipedia.. I highly recommend following that link and taking a look at the list. I’ll just mention one particularly interesting example: groups and group actions.

If you’ve been reading GM/BM for a very long time, you’ll remember my posts on group theory. In a very important sense, the entire point of group theory is to study symmetry. But working from a set theoretic base, it takes a lot of work to get to the point where you can actually define symmetry. It took many posts to build up the structure – not to present set theory, but just to present the set theoretic constructs that you need to define what symmetry means, and how a symmetric transformation was nothing but a group action. Category theory makes that so much easier that it’s downright dazzling. Ready?

Every group can be represented as a category with a single object. A functor from the category of a group to the category of Sets is a group action on the set that is the target of the functor. Poof! Symmetry.

Since symmetry means structure-preserving transformation; and a functor is a structure preserving transformation – well, they’re almost the same thing. The functor is an even more general abstraction of that concept: group symmetry is just one particular case of a functor transformation. Once you get functors, understanding symmetry is easy. And so are lots of other things.

And of course, you can always carry these things further. There is a category of functors themselves; and notions which can be most easily understood in terms of functors operating on the category of functors!

This last bit should make it clear why category theory is affectionately known as abstract nonsense. Category theory operates at a level of abstraction where almost anything can be wrapped up in it; and once you’ve wrapped something up in a category, almost anything you can do with it can itself be wrapped up as a category – levels upon levels, categories of categories, categories of functors on categories of functors on categories, ad infinitum. And yet, it makes sense. It captures a useful, comprehensible notion. All that abstraction, to the point where it seems like nothing could possibly come out of it. And then out pops a piece of beautiful crystal. It’s really remarkable.