Category Archives: goodmath

The Categorical Model of Linear Logic

Today we’ll finally get to building the categories that provide the model for
the multiplicative linear logic. Before we jump into that, I want to explain why it is that we separate out the multiplicative part.
Remember from the simply typed lambda calculus, that [we showed that the type system was precisely a subset of intuitionistic logic][lambda-type], and that programs in the lambda calculus corresponded to proofs in the type system. In particular, it was propositional intuitionistic logic using only the “→” operation. Similarly, if you build up a more interesting typed lambda calculus like [System-F][systemf], the type system is intuitionist logic *with* the “∀” quantifer and the “∧” operator, but without “∨”, negation, or “∃”. Why do we eliminate the existentials and friends? Because if we left them in, then the *type system* becomes Turing complete and undecidable. If we’re careful, we can analyze a program using universal types, logical and (composition in lambda), and implication and infer the types if it’s a valid program. (Actually, even System-F is undecidable without any type annotations, but HMF – the Hindley-Milner subset, *is* decidable. NP-hard, but decidable.)
The main reason that people like me really care about linear logic is because there is a variant of lambda calculus that includes *linear types*. Linear types are types where referencing the value corresponding to the type *consumes* it. For dealing with the semantics of real programming languages, there are many things that make sense as linear types. For example, the values of an input stream behave in a linear way: read something from an input stream, and it’s not there anymore: unless you copied it, to put it someplace safe, it’s gone.
For typing linear lambda calculus, we use *intuitionistic linear logic* with the *multiplicative* operators. They give us a *decidable* type system with the capability to express linear type constraints.
Enough side-tracking; back to the categories.
We can now finally define a *linear category*. A category C is a linear category if it is both a [cartesian category][cartesian] and a [monoidal][monoidal] [closed category][monclose] with a *dualizing functor*. A dualizing functor is a *contravariant* closed functor defining properties of the categorical exponential, written (-)* : C → C. (That should be read with “-” as a placeholder for an object; and * as a placeholder for an exponent.) (-)* has the property that there is a natural isomorphism d : Id ≡ (-)** (That is, d is an identity natural isomorphism, and it’s equivalent to applying (-)* to the result of applying (-)* ) such that the following commutes:

linear-dual.jpg

So, what does this mean? Take the linear implication operator; mix it with the categorical exponent; and what you get *still* behaves *linearly*: that is, if “X -o Y”; that is, one X can be consumed to produce one Y, then 2 Xs can be consumed to produce 2 Ys; and N Xs can be consumed to produce N Ys.
So a linear category behaves cleanly with exponents; t has a linear implication; it has an *eval* operator (from the fact that it’s a cartesian category) to perform the linear implications; it has tensor for producing groups of resources. That’s it; that’s everything we need from categories for a model of linear logic.
Now, there’s just one more question: are there any real linear categories that make sense as a model for linear logic? And obviously, the answer is yes. There are two of them, called **CPO** and **Lin**.
**CPO** is the category with continuous partial orders as objects, and monotonic functions as morphisms. Think about that, and it makes a whole lot of sense for linear logic; the equivalence classes of the partial order are resources of the same “value” (that is, that can be exchanged for one another); and you can’t climb *upwards* in the partial order without adding something.
The other one, **Lin**, is a lot more complicated. I’m not going to explain it in detail; that would lead into another two-month-long series! But to put it briefly, **Lin** is the category with *coherent domains* as objects, and linear maps as morphisms. To make that make sense, I would have to explain domain theory, which (to put it mildly) is complicated; the short version is that a domain is a kind of CPO. But the reason we care about **Lin** is that programming language semantics are generally described using [*denotational semantics*][denote]; and denotational semantics are built using a kind of domain, called a Scott domain. So this gives us a domain that we can use in programming language semantics with exactly the properties we need to explain how linear types work.
[lambda-type]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[systemf]: http://en.wikipedia.org/wiki/System_F
[cartesian]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php
[monclose]: http://scienceblogs.com/goodmath/2006/07/towards_a_model_for_linear_log_1.php
[monoidal]: http://scienceblogs.com/goodmath/2006/07/the_category_structure_for_lin_1.php
[denote]: http://en.wikipedia.org/wiki/Denotational_semantics

Why model evolution as search? (updated)

In the comments on [my post mocking Granville Sewell’s dreadful article][sewell], one of the commenters asked me to write something about why evolution is frequently modeled as a search process: since there is no goal or objective in evolution, search seems like a strange approach. It’s a very good question, and so I’m going to do my best to give you a very good answer.
As I’ve said before, math is all about *abstraction*: taking something, stripping it to its bare-bones essentials, and seeing what it means: for example category theory, which I’ve been writing about for the last month, is about taking the basic idea of a function as a thing that maps things to other things, and seeing what that means when it’s stripped of everything but the most basic properties of that notion of mapping.
When we build mathematical models of real-world phenomena, we’re doing the same kind of thing: we’re looking at some particular aspect of the phenomena, and trying to see what it means when the complexities of reality are abstracted away to let us see what’s going on in a fundamental sense. When we build bridges, we do static-mechanical analysis of the bridge structures; the mast basic static mechanical analysis is remarkably silly in some ways, but it does manage to capture the basic properties of tension and force distribution through the structure. It lets us understand the most basic properties of the structure without having to worry about wind, about gravitational bowing of metal rods, vibration, etc. (Obviously, when a bridge gets built, all of those things need to get factored in; but you start with the basic analysis.)
So when a mathematician looks at modeling evolution, the way that we approach it is to ask: What are the basic fundamental things that define the process of evolution?
The answer, roughly, is two (or three, depending how you count them 🙂 ) things:
1. Reproduction with mutation;
2. Selection
Evolution is a continual process in which those steps are repeated: you reproduce a pool of individual, either mutating while you copy, or copying then mutating. Then you select some subset of them as “winners” and keep them to copy for the next generation, and you discard the rest.
What’s the selection criteria? From the starting point of a mathematical model, *it doesn’t matter*. What matters is that it’s a cyclic process of reproduction with mutation followed by selection.
So we have a basic model. Now we look at it and ask: does this resemble any problem that I know how to solve? Let’s try drawing a picture of that: we’ll draw labelled dots for individuals; with arrows from a parent to a child. The children that were selected and reproduced, we’ll paint green; the children that were not selected we’ll mark red; and the current generation, which we haven’t selected yet, we’ll leave white.
graph.jpg
So, does this look familiar to a mathematician?
Yes: it’s a mathematical structure called a graph. We know a lot about the mathematics of graphs. And evolution looks a lot like a process that’s building a graph. Each node in the graph is an individual; each edge in the graph is a reproduce with mutation step. The leaf nodes of the graph are either successful individuals (in which case the graph is going to continue to grow from that leaf), or it’s an unsuccessful one, in which case that branch is dead.
When you describe it that way – it sounds like you’ve described a classic graph search problem. To complete the model, you need to define the mutation and selection processes; the criteria by which successful children are selected and future generations are created are, of course, a crucial piece of the model. And to be complete, I’ll note that the mutation process is a *random* process, and so the mutation paths generated by running a simulation of this model will be different in each run; and further, selection is undirected: the selection process does *not* have a specific goal. (Note: this paragraph was edited from the original to clarify some points and to expand on the properties of mutation and selection.)
There are other possibilities for models of evolution; but the most obvious mathematical model for evolution is search: either a graph search or a terrain search, depending on exactly how you want to model the process of mutation.
[sewell]: http://scienceblogs.com/goodmath/2006/07/debunking_a_mathematicians_vie.php

Zeros in Category Theory

Things are a bit busy at work on my real job lately, and I don’t have time to put together as detailed a post for today as I’d like. Frankly, looking at it, my cat theory post yesterday was half-baked at best; I should have held off until I could polish it a bit and make it more comprehensible.
So I’m going to avoid that error today. Since we’ve had an interesting discussion focusing on the number zero, I thought it would be fun to show what zero means in category theory.
There are two zeros it category theory: the zero object, and the zero arrow.
Zero Objects
————–
The zero object is easier. Suppose we’ve got a category, C. Suppose that C has a *terminal object* – that is, an object t for which other object x in C has *exactly* *one* arrow f : x → t. And suppose that C also has an *initial object*: that is, an object i for which every object x in C has *exactly one* arrow, f : i → x. If C has both an initial object i, and a terminal object t, *and* i = t, then it’s a *zero object* for the category. A category can actually have *many* zero objects. For example, in the category of groups, any *trivial* group (that is, a group which contains only one element) is a zero object in the category of groups. For an intuition of why this is called “zero” think of the number zero. It’s a strange little number that sits dead in the middle of everything. It’s the central point on the complex plane, it’s the center of the number line. A zero *object* sits in the middle of a category: everything has exactly one arrow *to* it, and one arrow *from* it.
Zero Arrows
————-
The idea of a zero *arrow* comes roughly from the basic concept of a zero *function*. A zero function is a function f where for all x, f(x) = 0. One of the properties of a zero function is that composing a zero function with any other function results in a zero function. (That is, if f is a zero function, f(g(x))) is also a zero function.)
A zero morphism in a category C is part of a set of arrows, called the *zero family* of morphisms of C, where composing *any* morphism with a member of the zero family results in a morphism in the zero family.
To be precise: suppose we have a category C, and for any pair b of objects a and b ∈ Obj(C), there is a morphism 0a,b : a → .
The morphism 0d,e must satisfy one important property:
For any two morphisms m : d → e, and n : f → g, the following diagram must commute:
zero.jpg
To see why this means that any composition with a 0 arrow will give you a zero arrow, look at what happens when we start with an ID arrow. Let’s make n : f → g an id arrow, by making f=g. We get the following diagram:
zero-id.jpg
So – any zero arrow composed with an id arrow is a zero arrow. Now use induction to step up from ID arrows by looking at one arrow composed with an ID arrow and a zero arrow. You’ve still got a zero. Keep going – you’ll keep getting zeros.
Zero arrows are actually very important buggers: the algebraic and topographical notions of a kernel can be defined in category theory using the zero morphisms.
One last little note: in general, it looks like the zero objects and the zero morphisms are unrelated. They’re not. In fact, if you have a category with a zero object *and* a family of zero morphisms, then you can find the zero morphisms by using the zero object. For any objects a and b, 0a,b = (a → 0) º (0 → b).

The Category Structure for Linear Logic

So, we’re still working towards showing the relationship between linear logic and category theory. As I’ve already hinted, linear logic has something to do with certain monoidal categories. So today, we’ll get one step closer, by talking about just what kind of monoidal category. As I keep complaining, the problem with category theory is that anytime you want to do something interesting, you have to wade through a bunch of definitions to set up your structures. I’ll do my best to keep it short and sweet; as short as it is, it’ll probably take a bit of time to sink in.
First, we need a symmetry property. A monoidal category C (with terminal object t, tensor ⊗, and natural isomorphisms λ, ρ) is *symmetric* if/f for all objects a and b in C, there is a natural isomorphism γa,b : (a ⊗ b) → (b ⊗ a); and the categories monoidal natural isomorphism ρ = λb º γb,t : (b ⊗ t) → b.
In other words, if you reverse the arrows, but leave everything else the same, you still wind up with a monoidal category.
In a similar vein: regular functors are also sometimes called *covariant* functors. If you take a functor and *reverse* the directions of all the arrows, you get what’s known an a *contravariant* functor, or *cofunctor*.
We also need a *closure* property. Suppose C is a symmetric monoidal category. C is *monoidally closed* if/f there is a functor ⇒ : C × C → C, such that for every object b in C, there is an isomorphism Λ : (a⊗b → c) → (a → (b ⇒ c)), and Λ is a natural transformation for a and c. (Remember that a and c are categories here; this is a natural transformation from category to category.)
Basically, this just says that the tensor stays cleanly in the monoid, and there’s always a natural transformation corresponding to any use of tensor.
So what these two really do is pull up some of the basic category properties (closure with respect to composition, symmetry) and apply them to categories and natural transformations themselves.
Finally, we need to be able to talk about what it means for a *functor* to be closed. So suppose we have a monoidally closed category, with the usual components, up to and including the Λ natural transformation. A functor F : C → C is *closed* if, for all a,b in C, there is an arrow fa,b : ba → F(b)F(a) such that for all g : a → b, fa,b º Λ(gº λa) = Λ(F(g) º λF(a)).
In simple english, what a closed functor is is a functor that can be *represented by* an arrow in the category itself. That’s what all of that gobbledygook really means.
So… as a quick preview, to give you an idea of why we just waded through all of this gunk: if you take a monoidally closed category of the appropriate type, then collections of resources are going to be categories; ⊗ is the tensor to join collections of resources; and “-o” implications are the closed functors “⇒”.

Zero

Back during the DonorsChoose fundraiser, I promised a donor that I’d write an article about the math of zero. I haven’t done it yet, because zero is actually a suprisingly deep subject, and I haven’t really had time to do the research to do it justice. But in light of the comment thread that got started around [this post][fspog] yesterday, I think it’s a good time to do it with whatever I’ve accumulated now.
History
———
We’ll start with a bit of history. Yes, there’s an actual history to zero!
In general, most early number systems didn’t have any concept of “zero”. Numbers, in early mathematical systems, were measurements of quantity. They were used to ask questions like “How much grain do we have stored away? If we eat this much now, will we have enough to plant crops next season?” A measurement of zero doesn’t really mean much; even when math is applied to measurements in modern math, leading zeros in a number – even if they’re *measured* – don’t count as significant digits in the measurement. (So if I’m measuring some rocks, and one weighs 99 grams, then that measurement has only two significant digits. If I use the same scale to weigh a very slightly larger rock, and it weighs 101 grams, then my measurement of the second rock has *three* significant digits. The leading zeros don’t count!) *(In the original version of this post, I managed to stupidly blow my explanation of significant digits, which several alert commenters pointed out. As usual, my thanks for the correction.)*
Aristotle is pretty typical of the reasoning behind why zero wasn’t part of most early number systems: he believed that zero was like infinity: an *idea* related to numbers, but not an actual number itself. After all, you can’t *have* 0 of anything; zero of something isn’t *any* something: you *don’t have* anything. And you can’t really *get* to zero as he understood it. Take any whole number, and divide into parts, you’ll eventually get a part of size “1”. You can get to any number by dividing something bigger. But not zero: zero, you can never get to by dividing things. You can spend eternity cutting numbers in half, and you’ll still never get to zero.
The first number system that we know of to have any notion of zero is the babylonians; but they still didn’t really quite treat it as a genuine number. They had a base-60 number system, and for digit-places that didn’t have a number, they left a space: the space was the zero. (They later adopted a placeholder that looked something like “//”.) It was never used *by itself*; it just kept the space open to show that there was nothing there. And if the last digit was zero, there was no indication. So, for example, 2 and 120 looked exactly the same – you needed to look at the context to see which it was.
The first real zero came from an Indian mathematician named Brahmagupta in the 7th century. He was quite a fascinating guy: he didn’t just invent zero, but arguably he also invented the idea of negative numbers and algebra! He was the first to use zero as a real number, and work out a set of algebraic rules about how zero, positive, and negative numbers worked. The formulation he worked out is very interesting; he allowed zero as a numerator or a denominator in a fraction.
From Brahmagupta, zero spread both east (to the Arabs) and west (to the Chinese and Vietnamese.) Europeans were just about the last to get it; they were so attached to their wonderful roman numerals that it took quite a while to penetrate: zero didn’t make the grade in Europe until about the 13th century, when Fibonacci (he of the series) translated the works of a Persian mathematican named al-Khwarizmi (from whose name sprung the word “algorithm” for a mathematical procedure). As a result, Europeans called the new number system “arabic”, and credited it to the arabs; but as I said above, the arabs didn’t create it; it originally came from India. (But the Arabic scholars, including the famous poet Omar Khayyam, are the ones who adopted Brahmagupta’s notions *and extended them* to include complex numbers.)
Why is zero strange?
———————-
Even now, when we recognize zero as a number, it’s an annoyingly difficult one. It’s neither positive nor negative; it’s neither prime nor compound. If you include it in the set of real numbers, then they’re not a group – even though the concept of group is built on multiplication! It’s not a unit; and it breaks the closure of real numbers in algebra. It’s a real obnoxious bugger in a lot of ways. One thing Aristotle was right about: zero is a kind of counterpart to infinity: a concept, not a quantity. But infinity, we can generally ignore in our daily lives. Zero, we’re stuck with.
Still, it’s there, and it’s a real, inescapable part of our entire concept of numbers. It’s just an oddball – the dividing line that breaks a lot of rules. But without it, a lot of rules fall apart. Addition isn’t a group without 0. Addition and subtraction aren’t closed without zero.
Our notation for numbers is also totally dependent on zero; and it’s hugely important to making a polynomial number system work. Try looking at the [algorithm for multiplying roman numerals][roman-mult] sometime!
Because of the strangeness of zero, people make a lot of mistakes involving it.
For example, based on that idea of zero and infinities as relatives, a lot of people believe that 1/0=infinity. It doesn’t. 1/0 doesn’t equal *anything*; it’s meaningless. You *can’t* divide by 0. The intuition behind this fact comes from the Aristotelean idea about zero: concept, not quantity. Division is a concept based on quantity: Asking “What is x divided by y” is asking “What quantity of stuff is the right size so that if I take Y of it, I’ll get X?”
So: what quantity of apples can I take 0 of to get 1 apple? The question makes no sense; and that’s exactly right: it *shouldn’t* make sense, because dividing by zero *is meaningless*.
There’s a cute little algebraic pun that can show that 1 = 2, which is based on hiding a division by zero.
1. Start with “x = y”
2. Multiply both sides by x: “x2 = xy”
3. Subtract “y2” from both sides: “”x2 – y2 = xy – y2
4. Factor: “(x+y)(x-y) = y(x-y)”
5. Divide both sides by the common factor “x-y”: “x + y = y”
6. Since x=y, we can substitute y for x: “y + y = y”
7. Simplify: “2y=y”
8. Divide both sides by y: “2 = 1”
The problem, of course, is step 5: x-y = 0, so step five is dividing by zero. Since that’s a meaningless thing to do, everything based on getting a meaningful result from that step is wrong – and so we get to “prove” false facts.
Anyway, if you’re interested in reading more, the best source of information that I’ve found is an online article called [“The Zero Saga”][saga]. It covers not just a bit of history and random chit-chat like this article, but a detailed presentation of everything you could ever want to know, from the linguistics of words meaning zero or nothing to cultural impacts of the concept, to detailed mathematical explanation of how zero fits into algebras and topologies.
[fspog]: http://scienceblogs.com/goodmath/2006/07/restudying_math_in_light_of_th.php
[saga]: http://home.ubalt.edu/ntsbarsh/zero/ZERO.HTM
[roman-mult]: http://www.phy6.org/outreach/edu/roman.htm

Towards a Model for Linear Logic: Monoidal Categories

Time to come back to category theory from out side-trip. Category theory provides a good framework for defining linear logic – and for building a Curry-Howard style type system for describing computations with *state* that evolves over time. Linear logic provides a way of defining a valid *model* for part of linear logic (the multiplicative operators) that aren’t tractable using other modeling techniques.
I’m going to show you the relationship between models for linear logic and category theory. It’s going to take a couple of days to go through the whole shebang, but even the parts that we need to go through to get there are fun.
The first step is to define a *monoidal category*, also known as a *tensor* category. We’ve already done most of that when we built [monads][monads] earlier this week; a monad is a kind of monoidal category.
A monoidal category is a category C with one object t, and a *binary functor* ⊗ : C × C → C. This binary functor is called *tensor*. Tensor has three required properties defined using *natural isomorphisms*, called α, λ, and ρ.
α says that tensor must be associative: α(A,B,C) : (A ⊗ B) ⊗ C → A ⊗ (B ⊗ C).
λ says that tensor has a *left identity*: λA : (I ⊗ λ) → A.
ρ says that tensor has a *right identity*, which is the same as the left identity: ρA : (ρ ⊗ 1) → A;.
And finally, the natural transformations need to make the following diagrams commute for all values of A, B, and C. These are known as the *coherence conditions* for the monoidal natural transformations.
monoid.jpg
A monoidal category is said to be *strict* if α, λ, and ρ are all identities. It turns out that for every monoidal category, there is an *equivalent* (in the sense of natural isomorphism) to a struct monoidal category.
And now, here comes an example of just why category theory is useful. In some of the [detailed models of quantum physics][quantum-condense], they try to describe the structure of different kinds of matter using what they call *topological orders*. The standard theory for describing the topological orders to different states of matter is called *Landau* theory. It turns out that Landau theory doesn’t describe the topological order of high temperature semiconductors or very-low-temperature condensate states. Category theory – in particular, the theory surrounding strict monoidal categories does a better job of describing the topological order of the different states of matter than any other mathematical approach that’s been tried so far.
[monads]: http://scienceblogs.com/goodmath/2006/07/monads_and_programming_languag_1.php
[quantum-condense]: http://dao.mit.edu/~wen/pub/qorder.html

A Brief Diversion: Sequent Calculus

*(This post has been modified to correct some errors and add some clarifications in response to comments from alert readers. Thanks for the corrections!)*
Today, we’re going to take a brief diversion from category theory to play with
some logic. There are some really neat connections between variant logics and category theory. I’m planning on showing a bit about the connections between category theory and one of those, called *linear logic* . But the easiest way to present things like linear logic is using a mechanism based on sequent calculus.
Sequent calculus is a deduction system for performing reasoning in first order propositional logic. But it’s notation and general principles are useful for all sorts of reasoning systems, including many different logics, all sorts of type theories, etc. The specific sequent calculus that I’m to talk about is sometimes called system-LK; the general category of things that use this basic kind of rules is called Gentzen systems.
The sequent calculus consists of a set of rules called *sequents*, each of which is normally written like a fraction: the top of the fraction is what you know before applying the sequent; the bottom is what you can conclude. The statements in the sequents are always of the form:

CONTEXTS, Predicates :- CONTEXTS, Predicates

The “CONTEXTS” are sets of predicates that you already know are true. The “:-” is read “entails”; it means that the *conjuction* of the statements and contexts to the left of it can prove the *disjunction* of the statements to the right of it. In predicate logic, the conjuction is logical and, and disjunction is logical or, so you can read the statements as if “,” is “∧” on the left of the “:-“, and “∨” on the right. *(Note: this paragraph was modified to correct a dumb error that I made that was pointed out by commenter Canuckistani.)*
Contexts are generally written using capital greek letters; predicates are generally written using uppercase english letters. We often put a name for an inference rule to the right of the separator line for the sequent.
For example, look at the following sequent:
Γ :- Δ
————— Weakening-Left
Γ,A :- Δ
This sequent is named Weakening-left; the top says that “Given Γ everything in Δ can be proved.”; and
the bottom says “Using Γ plus the fact that A is true, everything in Δ can be proved”. The full sequent basically says: if Δ is provable given Γ, then it will still be provable when A is added to Γ;in other words, adding a true fact won’t invalidate any proofs that were valid before the addition of A. *(Note: this paragraph was modified to correct an error pointed out by a commenter.)*
The sequent calculus is nothing but a complete set of rules that you can use to perform any inference in predicate calculus. A few quick syntactic notes, and I’ll show you the full set of rules.
1. Uppercase greek letters are contexts.
2. Uppercase english letters are *statements*.
3. Lowercase english letters are *terms*; that is, the objects that predicates
can reason about, or variables representing objects.
4. A[b] is a statement A that contains the term b in some way.
5. A[b/c] means A with the term “b” replaced by the term “c”.
——-
First, two very basic rules:
1.    
———— (Identity)
A :- A
2. Γ :- A, Δ     Σ, A :- Π
—————————————— (Cut)
Γ,Σ :- Δ, Π
Now, there’s a bunch of rules that have right and left counterparts. They’re duals of each other – move terms across the “:-” and switch from ∧ to ∨ or vice-versa.
3. Γ, A :- Δ
————————— (Left And 1)
Γ, A ∧ B :- Δ
4. Γ :- A, Δ
——————— ——— (Right Or 1)
Γ, :- A ∨ B, Δ
5. Γ, B :- Δ
——————— ——(Left And 2)
Γ,A ∧ B :- Δ
6. Γ :- B, Δ
——————— ——— (Right Or 2)
Γ :- A ∧ B, Δ
7. Γ, A :- Δ    Σ,B :- Π
————————————— (Left Or)
Γ,Σ, A ∨ B :- Δ,Π
8. Γ :- A,Δ   Σ :- B,Π
—————————— ——(Right And)
Γ,Σ :- A ∧ B, Δ,Π
9. Γ :- A,Δ
————— —— (Left Not)
Γ, ¬A :- Δ
10. Γ,A :- Δ
——————— (Right Not)
Γ :- ¬A, Δ
11. Γ :- A,Δ    Σ,B :- Π
————————————— (Left Implies)
Γ, Σ, A → B :- Δ,Π
12. Γ,A[y] :- Δ *(y bound)*
————————————— (Left Forall)
Γ,∀x A[x/y] :- Δ
13. Γ :- A[y],Δ *(y free)*
————————————— (Right Forall)
Γ :- ∀x A[x/y],Δ
14. Γ, A[y] :- Δ *(y bound)*
———————————— (Left Exists)
Γ,∃x A[x/y] :- Δ
15. Γ, :- A[y], Δ *(y free)*
————————————(Right Exists)
Γ :- ∃x A[x/y], Δ
16. Γ :- Δ
—————— (Left Weakening)
Γ, A :- Δ
17. Γ :- Δ
—————— (Right Weakening)
Γ :- A, Δ
18. Γ, A, A :- Δ
——————— (Left Contraction)
Γ,A :- Δ
19. Γ :- A, A, Δ
——————— (Right Contraction)
Γ :- A, Δ
20. Γ, A, B, Δ :- Σ
————————— (Left Permutation)
Γ,B, A, Δ :- Σ
21. Γ :- Δ, A, B, Σ
————————— (Right Permutation)
Γ :- Δ B, A, Σ
Here’s an example of how we can use sequents to derive A ∨ ¬ A:
1. Context empty. Apply Identity.
2. A :- A. Apply Right Not.
3. empty :- ¬ A, A. Apply Right And 2.
4. empty : A ∨ ¬A, A. Apply Permute Right.
5. empty :- A, A ∨ ¬ A. Apply Right And 1.
6. empty :- A ∨ ¬ A, A ∨ ¬ A. Right Contraction.
7. empty :- A ∨ ¬ A
If you look *carefully* at the rules, they actually make a lot of sense. The only ones that look a bit strange are the “forall” rules; and for those, you need to remember that the variable is *free* on the top of the sequent.
A lot of logics can be described using Gentzen systems; from type theory, to temporal logics, to all manner of other systems. They’re a very powerful tool for describing all manner of inference systems.

Using Good Math to Study Evolution Using Fitness Landscapes

Via [Migrations][migrations], I’ve found out about a really beautiful computational biology paper that very elegantly demonstrates how, contrary to the [assertions of bozos like Dembski][dembski-nfl], an evolutionary process can adapt to a fitness landscape. The paper was published in the PLOS journal “Computational Biology”, and it titled [“Evolutionary Potential of a Duplicated Repressor-Operator Pair: Simulating Pathways Using Mutation Data”][plos].
Here’s their synopsis of the paper:
>The evolution of a new trait critically depends on the existence of a path of
>viable intermediates. Generally speaking, fitness decreasing steps in this path
>hamper evolution, whereas fitness increasing steps accelerate it.
>Unfortunately, intermediates are hard to catch in action since they occur only
>transiently, which is why they have largely been neglected in evolutionary
>studies.
>
>The novelty of this study is that intermediate phenotypes can be predicted
>using published measurements of Escherichia coli mutants. Using this approach,
>the evolution of a small genetic network is simulated by computer. Following
>the duplication of one of its components, a new protein-DNA interaction
>develops via the accumulation of point mutations and selection. The resulting
>paths reveal a high potential to obtain a new regulatory interaction, in which
>neutral drift plays an almost negligible role. This study provides a
>mechanistic rationale for why such rapid divergence can occur and under which
>minimal selective conditions. In addition it yields a quantitative prediction
>for the minimum number of essential mutations.
And one more snippet, just to show where they’re going, and to try to encourage you to make the effort to get through the paper. This isn’t an easy read, but it’s well worth the effort.
>Here we reason that many characteristics of the adaptation of real protein-DNA
>contacts are hidden in the extensive body of mutational data that has been
>accumulated over many years (e.g., [12-14] for the Escherichia coli lac
>system). These measured repression values can be used as fitness landscapes, in
>which pathways can be explored by computing consecutive rounds of single base
>pair substitutions and selection. Here we develop this approach to study the
>divergence of duplicate repressors and their binding sites. More specifically,
>we focus on the creation of a new and unique protein-DNA recognition, starting
>from two identical repressors and two identical operators. We consider
>selective conditions that favor the evolution toward independent regulation.
>Interestingly, such regulatory divergence is inherently a coevolutionary
>process, where repressors and operators must be optimized in a coordinated
>fashion.
This is a gorgeous paper, and it shows how to do *good* math in the area of search-based modeling of evolution. Instead of the empty refrain of “it can’t work”, this paper presents a real model of a process, shows what it can do, and *makes predications* that can be empirically verified to match observations. This, folks, is how it *should* be done.
[migrations]: http://migration.wordpress.com/2006/07/12/duplication_and_coevolutionary_modeling/
[dembski-nfl]: http://scienceblogs.com/goodmath/2006/06/dembski_and_no_free_lunch_with_2.php
[plos]: http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020058

Monads and Programming Languages

One of the questions that a ton of people sent me when I said I was going to write about category theory was “Oh, good, can you please explain what the heck a monad is?”

The short version is: a monad is a category with a functor to itself. The way that this works in a programming language is that you can view many things in programming languages in terms of monads. In particular, you can take things that involve mutable state, and magically hide the state.

How? Well – the state (the set of bindings of variables to values) is an object in a category, State. The monad is a functor from State → State. Since the functor is a functor from a category to itself, the value of the state is implicit – they’re the object at the start and end points of the functor. From the viewpoint of code outside of the monad functor, the states are indistinguishable – they’re just something in the category. For the functor itself, the value of the state is accessible.

So, in a language like Haskell with a State monad, you can write functions inside the State monad; and they are strictly functions from State to State; or you can write functions outside the state monad, in which case the value inside the state is completely inaccessible. Let’s take a quick look at an example of this in Haskell. (This example came from an excellent online tutorial which, sadly, is no longer available.)

Here’s a quick declaration of a State monad in Haskell:

class MonadState m s | m -> s where
  get :: m s
  put :: s -> m ()

instance MonadState (State s) s where
  get   = State $ s -> (s,s)
  put s = State $ _ -> ((),s)

This is Haskell syntax saying we’re defining a state as an object which stores one value. It has two functions: get, which retrieves the value from a state; and put, which updates the value hidden inside the state.

Now, remember that Haskell has no actual assignment statement: it’s a pure functional language. So what “put” actually does is create a new state with the new value in it.

How can we use it? We can only access the state from a function that’s inside the monad. In the example, they use it for a random number generator; the state stores the value of the last random generated, which will be used as a seed for the next. Here we go:

getAny :: (Random a) => State StdGen a
getAny = do g <- get
  (x,g') <- return $ random g
  put g'
  return x

Now – remember that the only functions that exist *inside* the monad are "get" and "put". "do" is a syntactic sugar for inserting a sequence of statements into a monad. What actually happens inside of a do is that *each expression* in the sequence is a functor from a State to State; each expression takes as an input parameter the output from the previous. "getAny" takes a state monad as an input; and then it implicitly passes the state from expression to expression.

"return" is the only way *out* of the monad; it basically says "evaluate this expression outside of the monad". So, "return $ randomR bounds g" is saying, roughly, "evaluate randomR bounds g" outside of the monad; then apply the monad constructor to the result. The return is necessary there because the full expression on the line *must* take and return an instance of the monad; if we just say "(x,g') <- randomR bounds g", we'd get an error, because we're inside of a monad construct: the monad object is going be be inserted as an implicit parameter, unless we prevent it using "return". But the resulting value has to be injected back into the monad – thus the "$", which is a composition operator. (It's basically the categorical º). Finally, "return x" is saying "evaluate "x" outside of the monad – without the "return", it would treat "x" as a functor on the monad.

The really important thing here is to recognize that each line inside of the "do" is a functor from State → State; and since the start and end points of the functor are implicit in the structure of the functor itself, you don't need to write it. So the state is passed down the sequence of instructions – each of which maps State back to State.

Let's get to the formal part of what a monad is. There's a bit of funny notation we need to define for it. (You can't do anything in category theory without that never-ending stream of definitions!)

  1. Given a category C, 1C is the *identity functor* from C to C.
  2. For a category C, if T is a functor C → C, then T2 is the TºT. (And so on for tother )
  3. For a given Functor, T, the natural transformation T → T is written 1T.

Suppose we have a category, C. A *monad on C* is a triple (T,η,μ), where T is a functor from C → C, and η and μ are natural transformations; η: 1C → T, and μ: (TºT) → T. (1C is the identity functor for C in the category of categories.) These must have the following properties:

First, μ º Tμ = μ º μT. Or in diagram form:

monad-prop1.jpg

Second, μ º Tη = μ º ηT = 1T. In diagram form:

monad-prop2.jpg

Basically, what these really comes down to is an associative property ensuring that T behaves properly over composition, and that there is an identity transformation that behaves as we would expect. These two properties together add up to mean that any order of applications of T will behave properly, preserving the structure of the category underlying the monad.

Yoneda's Lemma

So, at last, we can get to Yoneda’s lemma, as I [promised earlier][yoneda-promise]. What Yoneda’s lemma does is show us how for many categories (in fact, most of the ones that are interesting) we can take the category C, and understand it using a structure formed from the functors from C to the category of sets. (From now on, we’ll call the category of sets **Set**.)
So why is that such a big deal? Because the functors from C to the **Set** define a *structure* formed from sets that represents the properties of C. Since we have a good intuitive understanding of sets, that means that Yoneda’s lemma
gives us a handle on how to understand all sorts of difficult structures by looking at the mapping from those structures onto sets. In some sense, this is what category theory is really all about: we’ve taken the intuition of sets and functions; and used it to build a general way of talking about structures. Our knowledge and intuition for sets can be applied to all sorts of structures.
As usual for category theory, there’s yet another definition we need to look at, in order to understand the categories for which Yoneda’s lemma applies.
If you recall, a while ago, I talked about something called *[small categories][smallcats]*: a small category is a categories for which the class of objects is a set, and not a proper class. Yoneda’s lemma applies to a a class of categories slightly less restrictive than the small categories, called the *locally small categories*.
The definition of locally small categories is based on something called the Hom-classes of a category. Given a category C, the hom-classes of C are a partition of the morphisms in the category. Given any two objects a and b in Obj(C), the hom-class **Hom**(a,b) is the class of all morphisms f : a → b. If **Hom**(a,b) is a set (instead of a proper class), then it’s called the hom-set of a and b.
A category C is *locally small* if/f all of the hom-classes of C are sets: that is, if for every pair of objects in Obj(C), the morphisms between them form a set, and not a proper class.
So, on to the lemma.
Suppose we have a locally small category C. Then for each object a in Obj(C), there is a *natural functor* corresponding to a mapping to **Set**. This is called the hom-functor of A, and it’s generally written: *h*a = **Hom**(a,-). *h*a is a functor which maps from a object X in C to the set of morphisms **Hom**(a,x).
If F is a functor from C to **Set**, then for all a ∈ Obj(C), the set of natural transformations from *h*a to F have a one-to-one correspondence with the elements of F(A): that is, the natural transformations – the set of all structure preserving mappings – from hom-functors of C to **Set** are isomorphic to the functors from C to **Set**.
So the functors from C to **Set** provide all of the structure preserving mappings from C to **Set**.
Yesterday, we saw a way how mapping *up* the abstraction hierarchy can make some kinds of reasoning easier. Yoneda says that for some things where we’d like to use our intuitions about sets and functions, we can also *map down* the abstraction hierarchy.
(If you saw my posts on group theory back at GM/BMs old home, this is a generalization of what I wrote about [the symmetric groups][symmetry]: the fact that every group G is isomorphic to a subgroup of the symmetric group on G.)
Coming up next: why computer science geeks like me care about this abstract nonsense? What does all of this gunk have to do with programming and programming languages? What the heck is a Monad? and more.
[symmetry]: http://goodmath.blogspot.com/2006/04/permutations-and-symmetry-groups.html
[yoneda-promise]: http://scienceblogs.com/goodmath/2006/06/category_theory_natural_transf.php
[smallcats]: http://scienceblogs.com/goodmath/2006/06/more_category_theory_getting_i.php