Category Archives: Category Theory

Towards a Model for Linear Logic: Monoidal Categories

Time to come back to category theory from out side-trip. Category theory provides a good framework for defining linear logic – and for building a Curry-Howard style type system for describing computations with *state* that evolves over time. Linear logic provides a way of defining a valid *model* for part of linear logic (the multiplicative operators) that aren’t tractable using other modeling techniques.
I’m going to show you the relationship between models for linear logic and category theory. It’s going to take a couple of days to go through the whole shebang, but even the parts that we need to go through to get there are fun.
The first step is to define a *monoidal category*, also known as a *tensor* category. We’ve already done most of that when we built [monads][monads] earlier this week; a monad is a kind of monoidal category.
A monoidal category is a category C with one object t, and a *binary functor* ⊗ : C × C → C. This binary functor is called *tensor*. Tensor has three required properties defined using *natural isomorphisms*, called α, λ, and ρ.
α says that tensor must be associative: α(A,B,C) : (A ⊗ B) ⊗ C → A ⊗ (B ⊗ C).
λ says that tensor has a *left identity*: λA : (I ⊗ λ) → A.
ρ says that tensor has a *right identity*, which is the same as the left identity: ρA : (ρ ⊗ 1) → A;.
And finally, the natural transformations need to make the following diagrams commute for all values of A, B, and C. These are known as the *coherence conditions* for the monoidal natural transformations.
monoid.jpg
A monoidal category is said to be *strict* if α, λ, and ρ are all identities. It turns out that for every monoidal category, there is an *equivalent* (in the sense of natural isomorphism) to a struct monoidal category.
And now, here comes an example of just why category theory is useful. In some of the [detailed models of quantum physics][quantum-condense], they try to describe the structure of different kinds of matter using what they call *topological orders*. The standard theory for describing the topological orders to different states of matter is called *Landau* theory. It turns out that Landau theory doesn’t describe the topological order of high temperature semiconductors or very-low-temperature condensate states. Category theory – in particular, the theory surrounding strict monoidal categories does a better job of describing the topological order of the different states of matter than any other mathematical approach that’s been tried so far.
[monads]: http://scienceblogs.com/goodmath/2006/07/monads_and_programming_languag_1.php
[quantum-condense]: http://dao.mit.edu/~wen/pub/qorder.html

Linear Logic

[Monday][yesterday], I said that I needed to introduce the sequent calculus, because it would be useful for describing things like linear logic. Today we’re going to take a quick look at linear logic – in particular, at *propositional* linear logic; you can expand to predicate linear logic in a way very similar to the way we went from propositional logic to first order predicate logic.
So first: what the heck *is* linear logic?
The answer is, basically, logic where statements are treated as *resources*. So using a statement in an inference step in linear logic *consumes* the resource. This is a strange notion if you’re coming from a regular predicate logic. For example, in regular predicate logic, if we have the statements: “A”, “A ⇒ B”, and “A ⇒ C”, we know that we can conclude “B ∧ C”. In linear logic, that’s not true: using either implication statement would *consume* the “A”. So we could infer “B”, or we could infer “C”, but we could *not* infer both.
When people talk about linear logic, and why it makes sense, they almost always use a vending machine analogy. Suppose I walk up to a vending machine, and I want to buy a soda and a candy bar. I’ve got 8 quarters in my pocket; the soda costs $1.50; the candy bar costs $.75.
In linear logic, I’d say something like the following (the syntax is wrong, but we’ll get to syntax later): (Q,Q,Q,Q,Q,Q,Q,Q), (Q,Q,Q,Q,Q,Q) ⇒ Soda, (Q,Q,Q) ⇒ Candy.
Using the rules, I can by a soda by “spending” 6 of my Qs. I wind up with “(Q,Q) ∧ Soda”, and “(Q,Q,Q) ⇒ Candy”. I’ve consumed 6 Qs, and I’ve consumed the “(Q,Q,Q,Q,Q,Q) ⇒ Soda” implication. I can’t do anything else; I don’t have enough Qs.
The basic statements in linear logic, with intuitive meanings are:
1. A ⊗ B. This is called *multiplicative conjunction*, also known as *simultaneous occurrence*. This means that I definitely have both A and B. This has an *identity unit* called “1”, such that A ⊗ 1 ≡ 1 ⊗ A ≡ A. 1 represents the idea of the absence of any resource.
2. A & B : *additive conjunction*, aka *internal choice*. I can have either A *or* B, and I get to pick which one. The unit is ⊤, pronounced “top”, and represents a sort of “I don’t care” value.
3. A ⊕ B. This is called *additive disjunction*, also known as *external choice*. It means that I get either A or B, but I don’t get to pick which one. The unit here is 0, and represents the lack of an outcome.
4. A ⅋ B : *multiplicative disjunction*, aka *parallel occurence*; I *need to have* both A and B at the same time. The unit for this is ⊥, pronounced “bottom”, and represents the absence of a goal. In the vending machine metaphor, think of it as the “cancel” button on the vending machine: I decided I don’t want any, so I’m not going to spend my resources.
5. A -o B : Linear implication. Consume resource A to *produce* resource B. The normal symbol for this looks like an arrow with a circle instead of an arrowhead; this operator is often called “lolly” because of what the normal symbol looks like. I’m stuck writing it as “-o”, because there’s no HTML entity for the symbol.
6. !A : Positive exponentiation, pronounced “Of course A”. This *produces* an arbitrary number of As. Equivalent to A ⊗ !A.
7. ?A : Negative exponentiation, pronounced “Why not A?”. This *consumes* As.
Ok. So, suppose I want to talk about buying lunch. I’ve got 10 dollars to buy lunch. I’ll be full if I have a salad, a coke, and a tuna sandwich. If I wanted to write “I’ve got a dollar” as “D”, “I have a salad” as “S”, “I have a coke” as “C”, “I have a tuna sandwich” as “T”, and finally, “I’m full” as “F”
* I can write “I have 10 dollars” in LL as: “(D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ D)”.
* I can write “Tuna sandwich and salad and coke” as a group of things that I want to have all of as: “T ⅋ S ⅋ C”.
* I can say that I’ll be full if I have lunch as “T ⅋ S ⅋ C -o F”
If I want to talk about buying lunch, I can describe the prices of the things I want using implication:
* A coke costs one dollar: “D -o C”; I can spend one dollar, and in return I get one coke.
* A salad costs 3 dollars: “(D ⊗ D ⊗ D) -o S”
* A tuna sandwich also costs three dollars: “(D ⊗ D ⊗ D) -o S”
Now, I can do some reasoning with these.
* By taking 1 of the dollars, I can get one C. That leaves me with “D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ C”
* By taking 3 D, I can get one S. “D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ C ⊗ S”.
* By taking 3 D, I can get one T. “D ⊗ D⊗ D ⊗ C ⊗ S ⊗ T”.
* Now I’ve got my lunch. I can eat it and be full, with three dollars left: “D ⊗ D⊗ D ⊗ F”.
Just from this trivial example, you should be able to see why linear logic is cool: the idea of being able to talk about *how* resources are used in an inference process or a computation is really valuable, and linear logic gives you the ability to really work with the concept of resource in a solid, formal way. If you think of it in terms of the Curry-Howard isomorphism [types-as-proofs concept from the simply typed lambda calculus][types-as-proofs], you can imagine using linear logic for types of values that are *consumed* by a computation – i.e., they’re no longer available once they’ve been used.
I’m going to adopt a slightly different format for the sequents for working in linear logic. The way that I produced the center bars in yesterdays post was really painful to write, and didn’t even end up looking particularly good. So, the way that I’m going to right the sequents in this post is to wrap the “top” and “bottom” of the sequent in curly braces, and separate them by a “⇒”, as in:

{GivenContext :- GivenEntailment } ⇒ { InferredContext :- InferredEntailment}

Now, let’s take a look at the sequent rules for propositional linear logic. I’m using the version of these rules from [Patrick Lincoln’s SIGACT ’92 paper][sigact92]. Yeah, I know that’s a bit of a big load there. Don’t worry about it too much; the important part is the concept described up above; the sequents are useful to look at when you have a hard time figuring out what some operator means in inference. For example, you can see the difference between & and ⊕ (which I found confusing at first) by looking at their sequents, to see what they do.
1. **Identity**: { } ⇒ { A :- A }
2. **Cut**: { Γ1 :- A, Σ1   Γ2, A :- Σ2 } ⇒ { Γ1, Γ2 :- Σ12}
3. **Exchange Left**: { Γ1, A, B, Γ2 :- Σ } ⇒ { Γ1, B, A, Γ2 :- Σ }
4. **Exchange Right**: { Γ :- Σ1, A, B, Σ2 } ⇒ { Γ :- Σ1, B, A, Σ2}
5. **⊗ Left**: {Γ, A, B :- Σ} ⇒ { Γ, A ⊗ B :- Σ }
6. **⊗ Right**: { Γ1 :- A, Σ1   Γ2 :- B, Σ2} ⇒ { Γ1, Γ2 :- (A ⊗ B), Σ12}
7. **-o Left**: { Γ1 :- A, Σ1  Γ2, B :- Σ2 } ⇒ { Γ12, (A -o B) :- Σ12}
8. **-o Right**: { Γ, A :- B, Σ} ⇒ { Γ :- A -o B, Σ}
9. **⅋ Left**: { Γ1,A :- Σ1  Γ2 :- B, Σ2 } ⇒ { Γ12, (A ⅋ B) :- Σ12}
10. **⅋ Right**: { Γ :- A, B, Σ} ⇒ { Γ :- A ⅋ B, Σ}
11. **& Left**: { Γ, A :- Σ } ⇒ { Γ,A & B :- Σ}/{ Γ, B :- Σ } ⇒ { Γ,A & B :- Σ}
12. **& Right**: { Γ :- A,Σ   Γ :- B,Σ} ⇒ { Γ :- (A & B), Σ }
13. **⊕ Left**: Γ,A :- Σ   Γ,B :- Σ} ⇒ { Γ,A ⊕ B :- Σ}
14. **⊕ Right**: {Γ :- A,Σ} ⇒ {Γ :- A ⊕ B, Σ}/{Γ :- B,Σ} ⇒ {Γ :- A ⊕ B, Σ}
15. **!W**: {Γ :- Σ} ⇒ {Γ,!A :- Σ}
16. **!C**: {Γ,!A,!A :- Σ} ⇒ { Γ,!A :- Σ }
17. **!D**: { Γ, A :- Σ} ⇒ { Γ,!A :- Σ}
18. **!S**: { !Γ :- A, ?Σ} ⇒ { !Γ :- !A, ?Σ}
19. **?W**: {Γ :- Σ} ⇒ {Γ:- ?A, Σ}
20. **?C**: {Γ :- ?A,?A, Σ} ⇒ { Γ :- ?A, Σ }
21. **?D**: { Γ :- A, Σ} ⇒ { Γ :- ?A, Σ}
22. **?S**: { !Γ, A :- ?Σ} ⇒ { !Γ,?A :- ?Σ}
23. **⊥ Exp Left**: { Γ :- A, Σ} ⇒ { Γ, A :- Σ}
24. **⊥ Exp Right**: { Γ, A :- Σ} ⇒ { Γ :- A, Σ}
25. **0 Left**: { Γ,0 :- Σ } *(Nothing can be inferred)*
26. **⊤ Right**: { Γ :- ⊤,Σ}
27. **⊥ Left**: { ⊥ :- }
28. **⊥ Right**: {Γ :- Σ} ⇒ {Γ :- ⊥,Σ}
29. **1 Left**: { Γ :- Σ} ⇒ { Γ,1 :- Σ}
30. **1 Right**: { :- 1}
This is long enough that I’m not going to get into how this hooks into category theory today, except to point out that if you look carefully at the multiplication and exponentiations, they might seem a tad familiar.
[types-as-proofs]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[sigact92]: http://www.csl.sri.com/~lincoln/papers/sigact92.ps
[yesterday]: http://scienceblogs.com/goodmath/2006/07/a_brief_diversion_sequent_calc.php

A Brief Diversion: Sequent Calculus

*(This post has been modified to correct some errors and add some clarifications in response to comments from alert readers. Thanks for the corrections!)*
Today, we’re going to take a brief diversion from category theory to play with
some logic. There are some really neat connections between variant logics and category theory. I’m planning on showing a bit about the connections between category theory and one of those, called *linear logic* . But the easiest way to present things like linear logic is using a mechanism based on sequent calculus.
Sequent calculus is a deduction system for performing reasoning in first order propositional logic. But it’s notation and general principles are useful for all sorts of reasoning systems, including many different logics, all sorts of type theories, etc. The specific sequent calculus that I’m to talk about is sometimes called system-LK; the general category of things that use this basic kind of rules is called Gentzen systems.
The sequent calculus consists of a set of rules called *sequents*, each of which is normally written like a fraction: the top of the fraction is what you know before applying the sequent; the bottom is what you can conclude. The statements in the sequents are always of the form:

CONTEXTS, Predicates :- CONTEXTS, Predicates

The “CONTEXTS” are sets of predicates that you already know are true. The “:-” is read “entails”; it means that the *conjuction* of the statements and contexts to the left of it can prove the *disjunction* of the statements to the right of it. In predicate logic, the conjuction is logical and, and disjunction is logical or, so you can read the statements as if “,” is “∧” on the left of the “:-“, and “∨” on the right. *(Note: this paragraph was modified to correct a dumb error that I made that was pointed out by commenter Canuckistani.)*
Contexts are generally written using capital greek letters; predicates are generally written using uppercase english letters. We often put a name for an inference rule to the right of the separator line for the sequent.
For example, look at the following sequent:
Γ :- Δ
————— Weakening-Left
Γ,A :- Δ
This sequent is named Weakening-left; the top says that “Given Γ everything in Δ can be proved.”; and
the bottom says “Using Γ plus the fact that A is true, everything in Δ can be proved”. The full sequent basically says: if Δ is provable given Γ, then it will still be provable when A is added to Γ;in other words, adding a true fact won’t invalidate any proofs that were valid before the addition of A. *(Note: this paragraph was modified to correct an error pointed out by a commenter.)*
The sequent calculus is nothing but a complete set of rules that you can use to perform any inference in predicate calculus. A few quick syntactic notes, and I’ll show you the full set of rules.
1. Uppercase greek letters are contexts.
2. Uppercase english letters are *statements*.
3. Lowercase english letters are *terms*; that is, the objects that predicates
can reason about, or variables representing objects.
4. A[b] is a statement A that contains the term b in some way.
5. A[b/c] means A with the term “b” replaced by the term “c”.
——-
First, two very basic rules:
1.    
———— (Identity)
A :- A
2. Γ :- A, Δ     Σ, A :- Π
—————————————— (Cut)
Γ,Σ :- Δ, Π
Now, there’s a bunch of rules that have right and left counterparts. They’re duals of each other – move terms across the “:-” and switch from ∧ to ∨ or vice-versa.
3. Γ, A :- Δ
————————— (Left And 1)
Γ, A ∧ B :- Δ
4. Γ :- A, Δ
——————— ——— (Right Or 1)
Γ, :- A ∨ B, Δ
5. Γ, B :- Δ
——————— ——(Left And 2)
Γ,A ∧ B :- Δ
6. Γ :- B, Δ
——————— ——— (Right Or 2)
Γ :- A ∧ B, Δ
7. Γ, A :- Δ    Σ,B :- Π
————————————— (Left Or)
Γ,Σ, A ∨ B :- Δ,Π
8. Γ :- A,Δ   Σ :- B,Π
—————————— ——(Right And)
Γ,Σ :- A ∧ B, Δ,Π
9. Γ :- A,Δ
————— —— (Left Not)
Γ, ¬A :- Δ
10. Γ,A :- Δ
——————— (Right Not)
Γ :- ¬A, Δ
11. Γ :- A,Δ    Σ,B :- Π
————————————— (Left Implies)
Γ, Σ, A → B :- Δ,Π
12. Γ,A[y] :- Δ *(y bound)*
————————————— (Left Forall)
Γ,∀x A[x/y] :- Δ
13. Γ :- A[y],Δ *(y free)*
————————————— (Right Forall)
Γ :- ∀x A[x/y],Δ
14. Γ, A[y] :- Δ *(y bound)*
———————————— (Left Exists)
Γ,∃x A[x/y] :- Δ
15. Γ, :- A[y], Δ *(y free)*
————————————(Right Exists)
Γ :- ∃x A[x/y], Δ
16. Γ :- Δ
—————— (Left Weakening)
Γ, A :- Δ
17. Γ :- Δ
—————— (Right Weakening)
Γ :- A, Δ
18. Γ, A, A :- Δ
——————— (Left Contraction)
Γ,A :- Δ
19. Γ :- A, A, Δ
——————— (Right Contraction)
Γ :- A, Δ
20. Γ, A, B, Δ :- Σ
————————— (Left Permutation)
Γ,B, A, Δ :- Σ
21. Γ :- Δ, A, B, Σ
————————— (Right Permutation)
Γ :- Δ B, A, Σ
Here’s an example of how we can use sequents to derive A ∨ ¬ A:
1. Context empty. Apply Identity.
2. A :- A. Apply Right Not.
3. empty :- ¬ A, A. Apply Right And 2.
4. empty : A ∨ ¬A, A. Apply Permute Right.
5. empty :- A, A ∨ ¬ A. Apply Right And 1.
6. empty :- A ∨ ¬ A, A ∨ ¬ A. Right Contraction.
7. empty :- A ∨ ¬ A
If you look *carefully* at the rules, they actually make a lot of sense. The only ones that look a bit strange are the “forall” rules; and for those, you need to remember that the variable is *free* on the top of the sequent.
A lot of logics can be described using Gentzen systems; from type theory, to temporal logics, to all manner of other systems. They’re a very powerful tool for describing all manner of inference systems.

Monads and Programming Languages

One of the questions that a ton of people sent me when I said I was going to write about category theory was “Oh, good, can you please explain what the heck a monad is?”

The short version is: a monad is a category with a functor to itself. The way that this works in a programming language is that you can view many things in programming languages in terms of monads. In particular, you can take things that involve mutable state, and magically hide the state.

How? Well – the state (the set of bindings of variables to values) is an object in a category, State. The monad is a functor from State → State. Since the functor is a functor from a category to itself, the value of the state is implicit – they’re the object at the start and end points of the functor. From the viewpoint of code outside of the monad functor, the states are indistinguishable – they’re just something in the category. For the functor itself, the value of the state is accessible.

So, in a language like Haskell with a State monad, you can write functions inside the State monad; and they are strictly functions from State to State; or you can write functions outside the state monad, in which case the value inside the state is completely inaccessible. Let’s take a quick look at an example of this in Haskell. (This example came from an excellent online tutorial which, sadly, is no longer available.)

Here’s a quick declaration of a State monad in Haskell:

class MonadState m s | m -> s where
  get :: m s
  put :: s -> m ()

instance MonadState (State s) s where
  get   = State $ s -> (s,s)
  put s = State $ _ -> ((),s)

This is Haskell syntax saying we’re defining a state as an object which stores one value. It has two functions: get, which retrieves the value from a state; and put, which updates the value hidden inside the state.

Now, remember that Haskell has no actual assignment statement: it’s a pure functional language. So what “put” actually does is create a new state with the new value in it.

How can we use it? We can only access the state from a function that’s inside the monad. In the example, they use it for a random number generator; the state stores the value of the last random generated, which will be used as a seed for the next. Here we go:

getAny :: (Random a) => State StdGen a
getAny = do g <- get
  (x,g') <- return $ random g
  put g'
  return x

Now – remember that the only functions that exist *inside* the monad are "get" and "put". "do" is a syntactic sugar for inserting a sequence of statements into a monad. What actually happens inside of a do is that *each expression* in the sequence is a functor from a State to State; each expression takes as an input parameter the output from the previous. "getAny" takes a state monad as an input; and then it implicitly passes the state from expression to expression.

"return" is the only way *out* of the monad; it basically says "evaluate this expression outside of the monad". So, "return $ randomR bounds g" is saying, roughly, "evaluate randomR bounds g" outside of the monad; then apply the monad constructor to the result. The return is necessary there because the full expression on the line *must* take and return an instance of the monad; if we just say "(x,g') <- randomR bounds g", we'd get an error, because we're inside of a monad construct: the monad object is going be be inserted as an implicit parameter, unless we prevent it using "return". But the resulting value has to be injected back into the monad – thus the "$", which is a composition operator. (It's basically the categorical º). Finally, "return x" is saying "evaluate "x" outside of the monad – without the "return", it would treat "x" as a functor on the monad.

The really important thing here is to recognize that each line inside of the "do" is a functor from State → State; and since the start and end points of the functor are implicit in the structure of the functor itself, you don't need to write it. So the state is passed down the sequence of instructions – each of which maps State back to State.

Let's get to the formal part of what a monad is. There's a bit of funny notation we need to define for it. (You can't do anything in category theory without that never-ending stream of definitions!)

  1. Given a category C, 1C is the *identity functor* from C to C.
  2. For a category C, if T is a functor C → C, then T2 is the TºT. (And so on for tother )
  3. For a given Functor, T, the natural transformation T → T is written 1T.

Suppose we have a category, C. A *monad on C* is a triple (T,η,μ), where T is a functor from C → C, and η and μ are natural transformations; η: 1C → T, and μ: (TºT) → T. (1C is the identity functor for C in the category of categories.) These must have the following properties:

First, μ º Tμ = μ º μT. Or in diagram form:

monad-prop1.jpg

Second, μ º Tη = μ º ηT = 1T. In diagram form:

monad-prop2.jpg

Basically, what these really comes down to is an associative property ensuring that T behaves properly over composition, and that there is an identity transformation that behaves as we would expect. These two properties together add up to mean that any order of applications of T will behave properly, preserving the structure of the category underlying the monad.

Yoneda's Lemma

So, at last, we can get to Yoneda’s lemma, as I [promised earlier][yoneda-promise]. What Yoneda’s lemma does is show us how for many categories (in fact, most of the ones that are interesting) we can take the category C, and understand it using a structure formed from the functors from C to the category of sets. (From now on, we’ll call the category of sets **Set**.)
So why is that such a big deal? Because the functors from C to the **Set** define a *structure* formed from sets that represents the properties of C. Since we have a good intuitive understanding of sets, that means that Yoneda’s lemma
gives us a handle on how to understand all sorts of difficult structures by looking at the mapping from those structures onto sets. In some sense, this is what category theory is really all about: we’ve taken the intuition of sets and functions; and used it to build a general way of talking about structures. Our knowledge and intuition for sets can be applied to all sorts of structures.
As usual for category theory, there’s yet another definition we need to look at, in order to understand the categories for which Yoneda’s lemma applies.
If you recall, a while ago, I talked about something called *[small categories][smallcats]*: a small category is a categories for which the class of objects is a set, and not a proper class. Yoneda’s lemma applies to a a class of categories slightly less restrictive than the small categories, called the *locally small categories*.
The definition of locally small categories is based on something called the Hom-classes of a category. Given a category C, the hom-classes of C are a partition of the morphisms in the category. Given any two objects a and b in Obj(C), the hom-class **Hom**(a,b) is the class of all morphisms f : a → b. If **Hom**(a,b) is a set (instead of a proper class), then it’s called the hom-set of a and b.
A category C is *locally small* if/f all of the hom-classes of C are sets: that is, if for every pair of objects in Obj(C), the morphisms between them form a set, and not a proper class.
So, on to the lemma.
Suppose we have a locally small category C. Then for each object a in Obj(C), there is a *natural functor* corresponding to a mapping to **Set**. This is called the hom-functor of A, and it’s generally written: *h*a = **Hom**(a,-). *h*a is a functor which maps from a object X in C to the set of morphisms **Hom**(a,x).
If F is a functor from C to **Set**, then for all a ∈ Obj(C), the set of natural transformations from *h*a to F have a one-to-one correspondence with the elements of F(A): that is, the natural transformations – the set of all structure preserving mappings – from hom-functors of C to **Set** are isomorphic to the functors from C to **Set**.
So the functors from C to **Set** provide all of the structure preserving mappings from C to **Set**.
Yesterday, we saw a way how mapping *up* the abstraction hierarchy can make some kinds of reasoning easier. Yoneda says that for some things where we’d like to use our intuitions about sets and functions, we can also *map down* the abstraction hierarchy.
(If you saw my posts on group theory back at GM/BMs old home, this is a generalization of what I wrote about [the symmetric groups][symmetry]: the fact that every group G is isomorphic to a subgroup of the symmetric group on G.)
Coming up next: why computer science geeks like me care about this abstract nonsense? What does all of this gunk have to do with programming and programming languages? What the heck is a Monad? and more.
[symmetry]: http://goodmath.blogspot.com/2006/04/permutations-and-symmetry-groups.html
[yoneda-promise]: http://scienceblogs.com/goodmath/2006/06/category_theory_natural_transf.php
[smallcats]: http://scienceblogs.com/goodmath/2006/06/more_category_theory_getting_i.php

Using Natural Transformations: Recreating Closed Cartesian Categories

Today’s contribution on category theory is going to be short and sweet. It’s an example of why we really care about [natural transformations][nt]. Remember the trouble we went through working up to define [cartesian categories and cartesian closed categories][ccc]?
As a reminder: a [functor][functor] is a structure preserving mapping between categories. (Functors are the morphisms of the category of small categories); natural transformations are structure-preserving mappings between functors (and are morphisms in the category of functors).
Since we know that the natural transformation can be viewed as a kind of arrow, then we can take the definitions of iso-, epi-, and mono-morphisms, and apply them to natural transformations, resulting in *natural isomorphisms*, *natural monomorphisms*, and *natural epimorphisms*.
Expressed this way, a cartesian category is a category C where:
1. C contains a terminal object t; and
2. (∀ a,b ∈ Obj(C)), C contains a product object a×b; and
a *natural isomorphism* Δ, which maps each Functor over (C×C): ((x → a) → b) to (x → (a×b))
What this really says is: if we look at categorical products, then for a cartesian category, there’s a way of understanding the product as a
mapping within the category as a pairing structure over arrows.
structure-preserving transformation from arrows between the pairs of values (a,b) and the products (a×b).
The closed cartesian category is just the same exact trick using the exponential: A CCC is a category C where:
1. C is a cartesian category, and
2. (∀ a,b ∈ Obj(C)), C contains an object ba, and a natural isomorphism Λ, where (∀ y ∈ Obj(C)) Λ : (y×a → b) → (y → ab).
Look at these definitions; then go back and look at the old definitions that we used without the new constructions of the natural transformation. That will let you see what all the work to define natural transformations buys us. Category theory is all about structure; with categories, functors, and natural transformations, we have the ability to talk about extremely sophisticated structures and transformations using a really simple, clean abstraction.
[functor]: http://scienceblogs.com/goodmath/2006/06/more_category_theory_getting_i.php
[nt]: http://scienceblogs.com/goodmath/2006/06/category_theory_natural_transf.php
[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php

Arrow Equality and Pullbacks

We’re almost at the end of this run of category definitions. We need to get to the point of talking about something called a *pullback*. A pullback is a way of describing a kind of equivalence of arrows, which gets used a lot in things like interesting natural transformations. But, before we get to pullbacks, it helps to understand the equalizer of a pair of morphisms, which is a weaker notion of arrow equivalence.
We’ll start with sets and functions again to get an intuition; and then we’ll work our way back to categories and categorical equalizers.
Suppose we have two functions mapping from members of set A to members of set B.

f, g : A → B

Suppose that they have a non-empty intersection: that is, that there is some set of values x ∈ A for which f(x) = g(x). The set of values C from A on which f and g return the same result (*agree*) is called the *equalizer* of f and g. Obviously, C is a subset of A.
Now, let’s look at the category theoretic version of that. We have *objects* A and B.
We have two arrows f, g : A → B. This is the category analogue of the setup of sets and functions from above. To get to the equalizer, we need to add an object C which is a *subobject* of A (which corresponds to the subset of A on which f and g agree in the set model).
The equalizer of A and B is the pair of the object C, and an arrow i : C → A. (That is, the object and arrow that define C as a subobject of A.) This object and arrow must satisfy the following conditions:
1. f º i = g º i
2. (∀ j : D → A) f º j = g º j ⇒ (∃ 1 k : D → C) i º k = j.
That second one is the mouthful. What it says is: if I have any arrow j from some other object D to A: if f and g agree on composition about j, then there can only be *one* *unique* arrow from C to D which composes with j to get to A. In other words, (C, i) is a *selector* for the arrows on which A and B agree; you can only compose an arrow to A in a way that will compose equivalently with f and g to B if you go through (C, i) Or in diagram form, k in the following diagram is necessarily unique:

equalizer.jpg

There are a couple of interesting properties of equalizers that are worth mentioning. The morphism in an equalizer is a *always* monic arrow (monomorphism); and if it’s epic (an epimorphism), then it must also be iso (an isomorphism).
The pullback is very nearly the same construction as the equalizer we just looked at; except it’s abstracting one step further.
Suppose we have two arrows pointing to the same target, f : B → A and g : C → A. Then the pullback of of f and g is the triple of an object and two arrows (B×AC, p : B×AC → B, q : B×AC → C). The elements of this triple must meet the following requirements:
1. f º p = g º q
2. (f º p) : B×AC → A
3. For every triple (D, h : D → B , k : D → C), there is exactly one unique arrow A : D → B×AC where pºA = h, and q º A = k.
As happens so frequently in category theory, this is clearer using a diagram.

pullback.jpg

If you look at this, you should definitely be able to see how this corresponds to the categorical equalizer. If you’re careful and clever, you can also see the resemblance to categorical product (which is why we use the ×A syntax). It’s a general construction that says that f and g are equivalent with respect to the product-like object B×AC.
Here’s the neat thing. Work backwards through this abstraction process to figure out what this construction means if objects are sets and arrows are functions, and what’s the pullback of the sets A and B?

{ (x,y) ∈ A × B : f(x) = g(y) }

Right back where we started, almost. The pullback is an equalizer; working it back shows that.

Categories and SubThings

What’s a subset? That’s easy: if we have two sets A and B, A is a subset of B if every member of A is also a member of B.
What’s a subgroup? If we have two groups A and B, and the values in group A are a subset of the values in group B, then A is a subgroup of B.
For any kind of thing **X**, what does it mean to be a sub-X? Category theory gives us a way of answering that in a generic way. It’s a bit hard to grasp at first, so let’s start by looking at the basic construction in terms of sets and subsets.
The most generic way of defining subsets is using functions. Suppose we have a set, A. How can we define all of the subsets of A, *in terms of functions*?
We can take the set of all *injective* functions to A (an injective function from X to Y is a function that maps each member of X to a unique member of Y). Let’s call that set of injective functions **Inj**(A). Now, we can define equivalence classes over **Inj**(A), where two functions f : X → A and g : Y → A are equivalent if there is an isomorphism between X and Y.
The domain of each function in one of the equivalence classes in **Inj**(A) is a function isomorphic to a subset of A. So each equivalence class of injective functions defines a subset of A.
We can generalize that function-based definition to categories, so that it can define a sub-object of any kind of object that can be represented in a category.
Before we jump in, let me review one important definition from before; the monomorphism, or monic arrow.
>A *monic arrow* is an arrow f : a → b such that (∀ g1,
>g2: x → a) f º g1 = f º g2
>⇒ g1 = g2. (That is, if any two arrows composed with
>f in f º g end up at the same object only if they are the same.)
The monic arrow is the category theoretic version of an injective function.
Suppose we have a category C, and an object a ∈ Obj(C).
If there are are two monic arrows f : x → a and g : y → a, and
there is an arrow h such that g º h = f, then we say f ≤ g (read “f factors through g”). Now, we can take that “≤” relation, and use it to define an equivalence class of morphisms using f ≡ g ⇔ f ≤ g &land; g ≤ f.
What we wind up with using that equivalence relation is a set of equivalence classes of monomorphisms pointing at A. Each of those equivalence classes of morphisms defines a subobject of A. (Within the equivalence classes are objects which have isomorphisms, so the sources of those arrows are equivalent with respect to this relation.) A subobject of A is the sources of an arrow in one of those equivalence classes.

Categories: Products, Exponentials, and the Cartesian Closed Categories

Before I dive into the depths of todays post, I want to clarify something. Last time, I defined categorical products. Alas, I neglected to mention one important point, which led to a bit of confusion in the comments, so I’ll restate the important omission here.
The definition of categorical product defines what the product looks like *if it’s in the category*. There is no requirement that a category include the products for all, or indeed for any, of its members. Categories are *not closed* with respect to categorical product.
That point leads up to the main topic of this post. There’s a special group of categories called the **Cartesian Closed Categories** that *are* closed with respect to product; and they’re a very important group of categories indeed.
However, before we can talk about the CCCs, we need to build up a bit more.
Cartesian Categories
——————–
A *cartesian category* C (note **not** cartesian *closed* category) is a category:
1. With a terminal object t, and
2. ∀ a, b ∈ Obj(C), the objects and arrows of the categorical product a×b are in C.
So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.
Categorical Exponentials
————————
Like categorical product, the value of a categorical exponential is not *required* to included in a category. Given two objects x and y from a category C, their *categorical exponential* xy, if it exists in the category, is defined by a set of values:
* An object xy,
* An arrow evaly,x : xy×y → x, called an *evaluation map*.
* ∀ z ∈ Obj(C), an operation ΛC : (z&cross;y → x) → (z → xy). (That is, an operation mapping from arrows to arrows.)
These values must have the following properties:
1. ∀ f : z×y → x, g : z → xy:
* valy,x º (ΛC(f)×1y)
2. ∀ f : z×y → x, g : z → xy:
* ΛC(evaly,x *ordm; (z×1y)) = z
To make that a bit easier to understand, let’s turn it into a diagram.
exponent.jpg
You can also think of it as a generalization of a function space. xy is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.
(I added the following section after this was initially posted; a commenter asked a question, and I realized that I hadn’t explained enough here, so I’ve added the explanation.
So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.
If X and Y are sets, then XY is the set of functions from Y to X.
Now, look at the diagram:
* The top part says, basically, that g is a function from Z to to XY: so g takes a member of Z, and uses it to select a function from Y to X.
* The vertical arrow says:
* given the pair (z,y), f(z,y) maps (z,y) to a value in X.
* given a pair (z,y), we’re going through a function. It’s almost like currying:
* The vertical arrow going down is basically taking g(z,y), and currying it to g(z)(y).
* Per the top part of the diagram, g(z) selections a function from y to x. (That is, a member of XY.)
* So, at the end of the vertical arrow, we have a pair (g(z), y).
* The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.
Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.
Cartesian Closed Categories
—————————-
With exponentials and products, we’re ready for the cartesian closed categories (CCCs):
A Cartesian closed category is a category that is closed with respect to both products and exponentials.
Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The *structure* of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system.
We’re getting close to being able to get to some really interesting things. Probably one more set of definitions; and then we’ll be able to do things like show a really cool version of the classic halting problem proof.

Category Theories: Some definitions to build on

Sorry, but I actually jumped the gun a bit on Yoneda’s lemma.

As I’ve mentioned, one of the things that I don’t like about category theory is how definition-heavy it is. So I’ve been trying to minimize the number of definitions at any time, and show interesting results of using the techniques of category theory as soon as I can.

Well, there are some important definitions which I haven’t talked about yet. And for Yoneda’s lemma to make sense, you really need to see some more examples of how categories express structure. And to be able to show how category theory lets you talk about structure, we need to plough our way through a few more definitions.

Initial Objects

Suppose we have a category, C. An object o ∈ Obj(C) is called an initial object in C if/f (∀ b ∈ Obj(C)), (∃ 1 f : o → b ∈ Arrows(C)).

In english: an object o is initial in C if there’s exactly one arrow from o to each other object in C. We often write “0C” for the initial object of a category C, or just “0” if it’s obvious what category we’re talking about.

There’s a dual notion of a terminal object: an object is terminal if there’s a exactly one arrow from every object in the category to it. Terminals are written “1C” or just “1”.

Given two objects in a category, if they’re both initial, they must be isomorphic. It’s pretty easy to prove: here’s the sketch. Remember the definition of isomorphism in category theory. An isomorphism is an arrow f : a → b, where (∃ g : b → a) such that f &ormd; g = 1b and g º f = 1a. If an object is initial, then there’s an arrow from it to every other object. Including the other initial object. And there’s an arrow back, because the other one is initial. The iso-arrows between the two initials obviously compose to identities.

Categorical Products

The product of two morphisms is a generalization of the cartesian product of two sets. It’s important because products are one of the major ways of building complex structures using simple categories.

Given a category C, and two objects a,b ∈ Obj(C), the categorical product a × b consists of:

  1. An object p, often written a×b;
  2. two arrows pa and pb, where p ∈ Obj(C), pa : p → a, and pb : p → b.
  3. a “pairing” operation, which for every object c ∈ C, maps the pair of arrows f : c → a and g : c → b to an arrow Pairc(f,g) : c → a×b. Pairc(f,g) is often written <f,g>c>.

Pairc must have three properties.

  1. pa º Pairc(f,g) = f.
  2. pb º Pairc(f,g) = g.
  3. (∀ h : c → a×b) Pairc(pa º h, pb º h) = h.

The first two of those properties are the separation arrows, to get from the product to its components; and the third is the merging arrow, to get from the components to the product. We can say the same thing about the relationships in the product in an easier way using a commutative diagram:

catprod.jpg