This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)

Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.

First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.

So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:

1. x = y if x=0 and y=0; and

2. x = y if x=s(x’) and y=s(y’) and x’ = y’

So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.

The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)

We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.

So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:

* ∀ A ∈ basetypes(**L**), [A] = A_{C} ∈ C(**L**)

* [unit] = 1_{C}

Next, we need to define the typing rules for complex types:

* [ A × B ] = [A] × [B]

* [ A → B ] = [B]^{[A]}

Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:

* [∅] = 1_{C}

* [Γ, x : A] = [Γ] × [A]

We also need to describe what to do with the values of the primitive types:

* For each value v : A, there is an arrow v : 1 → A_{C}.

And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).

* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*

* [Γ :- a : A_{C}] = a º ! : [Γ] → [A_{C}] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*

* [Γ x : A :- x : A] = π_{2} : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*

* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π_{1} where π_{1} : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*

* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B]^{[A]} *(A function typing rule: the function maps to an arrow from the type context to an exponential [B]^{[A]}, which is a function from A to B.)*

* [Γ :- (M M’) : B] = eval_{C,B} º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*

There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.

The intuition behind this is:

* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.

* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.

And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)

Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)

[cam]: http://portal.acm.org/citation.cfm?id=34883&dl=ACM&coll=portal

[caml]: http://caml.inria.fr/

[lambda]: http://goodmath.blogspot.com/2006/06/lamda-calculus-index.html

[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php

# Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.

[metamath]: http://us.metamath.org/mpegif/mmset.html#overview

[meta-peano]: http://us.metamath.org/mpegif/peano5.html

# More Loony Christian Math

Yesterday, I posted [this article][bozo] about the bozo who didn’t like his college calculus course because it wasn’t Christian enough. One of the commenters pointed out that there’s actually a site online where a college professor from a Christian college actually has [a collection of “devotionals”][devotional] to be presented along with the lecture in a basic calculus course.

They’re sufficiently insane that I have to quote a couple of them. No comment that I could possibly make could add anything to the sheer goofiness of these.

For the lesson on “Function Operations”:

>**God’s Surgical Improvements of our Actions**

>

>Genesis 50:15-21, Romans 3:9-10, 21-24

>

>One of the more horrible images in the book of Genesis is that of Joseph being

>sold by his brothers into slavery. This type of hate turned into evil act is a

>common occurrence in our world, too. In the Genesis situation, though, we are

>given the gift of 20-20 hindsight because we know the end of the story. God

>used the brothers’ evil action to prevent starvation of the descendants of

>Abraham. Joseph says, “You intended to harm me, but God intended it for

>good. ..” [Gen. 50:20] In the same way, God makes our unrighteous actions

>righteous through Christ. He surgically improves our actions to his own

>purpose.

>

>This idea of twisting something from one form into another is what happens when

>function operations work on elementary functions. You can start with two

>ordinary benign functions, the reciprocal function 1/x and sin(x), say, and put

>them together. Depending on how you put them together, you can create

>something interesting and easily understood, like sin(x)/x, or something with

>wild behavior, like sin(1/x). Either way, you have twisted one object into

>something very different.

For the lesson on the Limit definition of the derivative:

>**Secant Lines and Sanctification**

>

>Ps. 119:33-40

>In differential calculus we study how a slope of a linear function can be

>generalized to the slope of a function whose graph is curved, creating the

>derivative of the original function. The definition of derivative uses a

>sequence of lines (secant lines) drawn through two points on a function that

>are approaching each other and a single point on the function curve. The

>derivative value or tangent line slope is defined to be the limiting slope

>value of this sequence of secant lines. (See the figures below.)

>

> Figure 1 : Secant line between 1 and 1.8

> Figure 2 : Secant line between 1 and 1.5

> Figure 3: Tangent line to f at x=1

>

>Once a person has been called to be a Christian, we are redeemed by Christ but

>not released from following the law of God. We are justified once but continue

>with the process of sanctification for the remainder of our lives. This

>sanctification process is like the limit process of the secant lines

>approaching the tangent line. There is one distinction between the concepts of

>sanctification and secant line limits, however. In the mathematical contexts,

>we accept results that are “sufficiently close,” results that are in an

>epsilon-neighborhood of the desired quantity. While in our quest for

>perfection, the “better” we get the further we realize we are from satisfying

>all aspects of the law.

Ok, just one more. This one just about had me rolling on the floor! For the lesson on the chain rule:

>**Chain Reactions**

>

> 1 Corin 5:16-21

>

>Once students have seen the chain rule for differentiation of composed

>functions, it is natural to extend the chain rule to nested functions, where

>there is more than two functions that are composed. Fun problems to

>investigate are ones that are repeated applications of the same function. Try

>differentiating tan(tan(tan(tan(tan x)))) or ln(ln(ln(ln(ln(ln x))))), for

>example. Working your way from the outside to the inside yields a derivative

>which is product chain of related functions.

>

>In a similar way, when we interact with other people there is a chain reaction

>to our behavior. Most people believe that abused children are more likely to

>become abusers themselves someday, for example. Less dramatic behavior also

>can have a reaction that extends beyond the initial engagement. A popular

>Warner Brothers film of 2000 “Pay it Forward” (based on a novel with the same

>name by Catherine Ryan Hyde) depicts how a chain of reactions to an initial act

>of kindness can change an entire community. Christians need to be specially

>mindful of this chain reaction, since we are ambassadors for Christ. Our

>verbal and nonverbal witness can yield unexpected results, especially under the

>influence of the Holy Spirit.

This, apparently, is how “real” Christians are supposed to teach math classes. I seriously wonder how they have time to actually cover the *math* in class if they spend time on this gibberish!

[bozo]: http://scienceblogs.com/goodmath/2006/08/math_is_bad_because_it_isnt_ch.php

[devotional]: http://www.trnty.edu/faculty/robbert/SRobbertWebFolder/ChristianityMath/Calculus.html#Chain

# Categorical Numbers

Sorry for the delay in the category theory articles. I’ve been busy with work, and haven’t had time to do the research to be able to properly write up the last major topic that I plan to cover in cat theory. While doing my research on closed cartesian categories and lambda calculus, I came across this little tidbit that I hadn’t seen before, and I thought it would be worth taking a brief diversion to look at.

Category theory is sufficiently powerful that you can build all of mathematics from it. This is something that we’ve known for a long time. But it’s not really the easiest thing to demonstrate.

But then, I came across this lovely little article on wikipedia. It turns out that you can [build the natural numbers entirely in terms of categories][wikipedia] in a beautifully simple way.

Before we dive into categorical numbers, let’s take a moment to remember what we really mean by natural numbers.

Peano Arithmetic

——————

The meanings of natural numbers are normally defined in terms of what we call *Peano arithmetic*. Peano arithmetic defines the natural numbers, **N** in terms of five fundamental axioms:

1. 0 ∈ **N** (0 is a natural number.)

2. ∀ a ∈ **N** ∃ a’=s(n), a’ ∈ **N**. (Every natural number a has a successor, s(a))

3. ¬ ∃ a ∈ **N** : s(a)=0. (No natural number has 0 as its successor.)

4. ∀ a,b ∈ **N**: s(a)=s(b) ⇒ a=b. (Each number has a unique successor.)

5. (P(0) ∧ ∀ a : P(a) ⇒ P(s(a))) ⇒ ∀ b ∈ **N** : P(b). (If a property P is true for 0; and it can be shown that for any number a, if it’s true for a, then it’s true for b, then that means that it’s true for all natural numbers. This is the *axiom of induction*.)

To get to arithmetic, we need to add two operations: addition and multiplication.

We can define addition recursively quite easily:

1. ∀ x ∈ **N**: x+0 = x

2. ∀ x,y ∈ **N**: s(x)+y=x+s(y)

And once we have addition, we can do multiplication:

1. ∀ x ∈ **N**: x * 0 = 0 ∧ 0 * x = 0

2. ∀ x ∈ **N**: 1 * x = x

3. ∀ x,y ∈ **N**: s(x)*y = (x*y) + y

To sum up: Peano arithmetic is based on a definition of the natural numbers in terms of five fundamental properties, called the Peano axioms, built on “0”, a successor function, and induction. If we have those five properties, we can define addition and multiplication; and given those, we have the ability to build up arbitrary functions, sets, and the other fundamentals of mathematics in terms of the natural numbers.

Peano Naturals and Categories

———————————

The peano axioms can be captured in a category diagram for a relatively simple object called a *natural number object (NNO)*. A natural number object can exist in many categories: basically any category with a terminal object *can* have NNOs. Any closed cartesian category *does* have an NNO.

There are two alternate constructions; a general construct that works for all NNO-containing categories; and a more specific one for closed cartesian categories. (You can show that they are ultimately the same in a CCC.)

I actually think that the general one is easier to understand, so that’s what I’ll describe. This is basically a construction that captures the ideas of successor and recursion.

Given a category C with a terminal object **1**. A *natural number object* is an object *N* with two arrows z : 1 → *N*, and s : *N* → *N*, where z and s have the following property.

(∀ a ∈ Obj(C), (q : 1 → a, f : a → a) ∈ Arrows(C)) : u º z = q ∧ u º s = f º u.

That statement means the same thing as saying that the following diagram commutes:

Let’s just walk through that a bit. Think of the object **N** as a set. “s” is successor. So S is an arrow from *N* to *N* that maps each number to its successor. *N* is “rooted” by the terminal object; that gives it its basic structure. The “z” arrow from the terminal object maps to the zero value in *N*. So we’ve got an arrow that selects zero; and we’ve got an arrow that defines the successor.

Think of “q” as a predicate, and “u” as a statement that a value in *N* satisfies the predicate.

So u º z, that is the path from 1 to a by stepping through z and u is a statement that the predicate “q” is true for z (zero); because composing u (the number I’m composed with satisfies q) and z (the number zero) gives us “q” (zero satifies q).

Now the induction rule part. *N* maps to *N* by s – that’s saying that s is an arrow from a number in *N* to its successor. If you have an arrow composition that says “q is true for n”, then f can be composed onto that giving you “q is true for n + 1”.

If for all q, u is unique – that is, there is only *one* mapping where induction works; then you’ve got a categorical structure that satisfies the Peano axioms.

[wikipedia]: http://en.wikipedia.org/wiki/Natural_number_object

# Math is Bad, Because it isn't Christian!

By way of Pharyngula, I saw something that I simply had to repeat here.

Every august, James Kennedy – a thoroughly repulsive ultra-fundy preacher from Coral Ridge Ministries – runs a conference called “Reclaiming America for Christ”. At this years conference, he featured a speech by Paul Jehle about “Evaluating your Philosophy of Education”.

Jehle is… umm… how do we say this politely?….

Ah, screw it. Jehle is a fucking frothing at the mouth nutjob lunatic asshole.

His basic argument – the argument that he *expects people to take seriously* – is that *everything* is either christian or non-christian. And if it’s non-christian, then christians shouldn’t look at it, listen to it, or study it. And you can’t *ever* make anything that started out non-christian christian.

How far does he go with this? Pretty damned far. Right into the domain of this here blog. In his talk, he uses the following story as an example:

>I was taking calculus. I was a mathematics major and I was at a Christian

>college that was called Christian, but was not Christian….

>

>I asked a question to my calculus professor: “What makes this course distinctly

>Christian?” He stopped. He said no one has ever asked that question before…

>

>He said, “Okay, I’m a Christian; you’re a Christian.”

>

>I said, “That’s not what I asked! What makes this calculus course distinctly

>Christian? What makes this different from the local secular university? Are we

>using the same text? Yes. Are you teaching it the same way? Yes. Then why is

>this called a Christian college and that one a non-Christian college?”

Yeah. Seriously. **Math is Bad**, because it’s *not explicitly christian*. I mean, it uses *zero*, which was invented by a *hindu*, and brought to europe by *muslims*. Algebra was invented by muslims! The word “algorithm” comes from the name of a *muslim* mathematician!

Uh-oh… I just realized that the alleged “Doctor” Jehle has a very serious problem. The way that we geeks heard his talk to write about it is because it was *digitized* – using a thoroughly non-christian technology – and posted on the *internet*, which is built using those non-christian *algorithms*. And to quite Jehle himself:

>But the issue is you cannot combine something by its nature which is pagan and

>built on humanistic principles and make it Christian by a magic wand.

So the internet, and computers, and digital recording, and the data compression that makes streaming audio work – they’re *non-christian*. And you cannot combine something non-Christian with something Christian.

# φ, the Golden Ratio

Lots of folks have been asking me to write about φ, the golden ratio. I’m finally giving up and doing it. I’m not a big fan of φ. It’s a number

which has been adopted by all sorts of flakes and crazies, and there are alleged sightings of it in all sorts of strange places that are simply *not* real. For example, pyramid loons claim that the great pyramids in Egypt have proportions that come from φ – but they don’t. Animal mystics claim that the ratio of drones to larvae in a beehive is approximately φ – but it isn’t.

But it is an interesting number. My own personal reason for thinking it’s cool is representational: if you write φ as a continued fraction, it’s [1;1,1,1,1…]; and if you write it as a continued square root, it’s 1 + sqrt(1+sqrt(1+sqrt(1+sqrt(1…)))).

What is φ?

—————–

φ is the value so-called “golden ratio”: it’s the number that is a solution for the equation (a+b)/a = (a/b). (1+sqrt(5))/2. It’s the ratio where if you take a rectangle where the ratio of the length of the sides is 1:φ, then if you remove the largest possible square from it, you’ll get another rectangle whose sides have the ration φ:1. If you take the largest square from that, you’ll get a rectangle whose sides have the ratio 1:φ. And so on:

Allegedly, it’s the proportion of sides of a rectangle that produce the most aesthetically beautiful appearance. I’m not enough of a visual artist to judge that, so I’ve always just taken that on faith. But it *does* shows up in many places in geometry. For example, if you draw a five-pointed star, the ratio of the length of one of the point-to-point lines of the star to the length of the sides of the pentagon inside the star are φ:1:

φ is also related to the fibonacci series. In case you don’t remember, the fibonacci series is the set of numbers where each number in the series is the sum of the two previous: 1,1,2,3,5,8,13,… If Fib(n) is the *n*th number in the series, you can compute it as:

History

# Quaternions: upping the dimensions of complex numbers

Quaternions

Last week, after I wrote about complex numbers, a bunch of folks wrote and said “Do quaternions next!” My basic reaction was “Huh?” I somehow managed to get by without ever being exposed to quaternions before. They’re quite interesting things.

The basic idea behind quaterions is: we see some amazing things happen when we expand the dimensionality of numbers from 1 (the real numbers) to 2 (the complex numbers). What if we add *more* dimensions?

It doesn’t work for three dimensions But you *can* create a system of numbers in *four* dimensions. As with complex numbers, you need a distinct unit for each dimension. In complex, those were 1 and i. For quaternions, you need for units: we’ll call them “1”, “i”, “j”, and “k”. In quaternion math, the units have the following properties:

1. i^{2} = j^{2} = k^{2} = ijk = -1

2. ij = k, jk = i, and ki=j

3. ji = -k, kj = -i, and ik=-j

No, that last one is *not* an error. Quaternion multiplication is *not* commutative: ab &neq; ba. It *is* associative over multiplication, meaning (a*b)*c = a*(b*c). And fortunately, it’s both commutative and associative over addition.

Anyway, we write a quaternion as “a + bi + cj + dk”; as in “2 + 4i -3j + 7k”.

Aside from the commutativity thing, quaternions work well as numbers; they’re a non-abelian group over multiplication (assuming you omit 0); they’re an abelian group over addition; they form a division algebra *over* the reals (meaning that you can define an algebra with operations up to multiplication and division over the quaternions, and operations in this algebra will generate the same results as normal real algebra if the i,j, and k components are 0.)

History

———

Quaternions are a relatively recent creation, so we know the history quite precisely. Quaternions were invented by a gentleman named William Hamilton on October 16th, 1843, while walking over the Broom bridge with his wife. He had been working on trying to work out something like the complex numbers with more dimensions for a while, and the crucial identity “ijk=-1” struck him while he was walking over the bridge. He immediately *carved it into the bridge* before he forgot it. Fortunately, this act of vandalism wasn’t taken seriously: the Broom bridge now has a plaque commemorating it!

Quaternions were controversial from the start. The fundamental step that Hamilton had taken in making quaternions work was abandoning commutativity, and many people believed that this was simply *wrong*. For example, Lord Kelvin said “Quaternions came from Hamilton after his really good work had been done; and, though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.”

Why are they non-commutative?

——————————-

Quaternions are decidedly strange things. As I mentioned above, they’re non-commutative. Doing math with things where ab &neq; ba was a *huge* step, and it can be strange. But it is necessary for quaternions. Here’s why.

Let’s take the fundamental identity of quaternions: ijk = -1

Now, let’s multiply both sides by k: ij(k^{2}) = -1k

So, ij(-1)=-k, and ij=k.

Now, multiply by *i* on the left on both sides: i^{2}j = ik

And so, -1j = ik, and ik=-j.

Keep playing with multiplication in the identities, and you find that the identities simple *do not work* unless it’s non-commutative.

What are they good for?

————————-

For the most part, quaternions aren’t used much anymore. Initially, they were used in many places where we now use complex vectors, but that’s mostly faded away. What they *are* used for is all sorts of math that in any way involves *rotation*. Quaternions capture the essential properties of rotation.

To describe a rotation well, you actually need to have *extra* dimensions. To describe an *n*-dimensional rotation, you need *n+2* dimensions. Since in our three dimensional world, we rotate things around a *line* which is two dimensional, the description of the rotation needs *four* dimensions. And further, rotation itself is non-commutative.

Take a book and put it flat on the table, with the binding to the left. Flip it so that the binding stays on the same side, but now you’ve got the back cover on top, upside-down. Then rotate it clockwise 90 degrees on the table. You now have the book with its back cover up, and the binding facing away from you.

Now, start over: book right side up, binding facing left. Rotate it clockwise 90 degrees. Now flip it the same way you did before (over the x axis if you think of left as -x, right as +x, away from you as +y, up as +z, etc.)

What’s the result? You get the book with its back cover up, and the binding factor *towards* you. It’s 180 degrees different depending on which order you do it in. And 180 degrees is a reverse equivalent to the sign difference that you get in quaternion multiplication! Quaternions have exactly the right properties for describing rotations – and in particular, the *composition* of rotations.

The use of quaternions for rotations is used in computer graphics (when something rotates on the screen of your PS2, it’s doing quaternion math!); in satellite navigation; and in relativity equations (for symmetry wrt rotation).

# Ω: my favorite strange number

Ω is my own personal favorite transcendental number. Ω isn’t really a specific number, but rather a family of related numbers with bizzare properties. It’s the one real transcendental number that I know of that comes from the theory of computation, that is important, and that expresses meaningful fundamental mathematical properties. It’s also deeply non-computable; meaning that not only is it non-computable, but even computing meta-information about it is non-computable. And yet, it’s *almost* computable. It’s just all around awfully cool.

So. What is it Ω?

It’s sometimes called the *halting probability*. The idea of it is that it encodes the *probability* that a given infinitely long random bit string contains a prefix that represents a halting program.

It’s a strange notion, and I’m going to take a few paragraphs to try to elaborate on what that means, before I go into detail about how the number is generated, and what sorts of bizzare properties it has.

Remember that in the theory of computation, one of the most fundamental results is the non-computability of *the halting problem*. The halting problem is the question “Given a program P and input I, if I run P on I, will it ever stop?” You cannot write a program that reads P and I and gives you the answer to the halting problem. It’s impossible. And what’s more, the statement that the halting problem is not computable is actually *equivalent* to the fundamental statement of Gödel’s incompleteness theorem.

Ω is something related to the halting problem, but stranger. The fundamental question of Ω is: if you hand me a string of 0s and 1s, and I’m allowed to look at it one bit at a time, what’s the probability that *eventually* the part that I’ve seen will be a program that eventually stops?

When you look at this definition, your reaction *should* be “Huh? Who cares?”

The catch is that this number – this probability – is a number which is easy to define; it’s not computable; it’s completely *uncompressible*; it’s *normal*.

Let’s take a moment and look at those properties:

1. Non-computable: no program can compute Ω. In fact, beyond a certain value N (which is non-computable!), you cannot compute the value of *any bit* of Ω.

2. Uncompressible: there is no way to represent Ω in a non-infinite way; in fact, there is no way to represent *any substring* of Ω in less bits than there are in that substring.

3. Normal: the digits of Ω are completely random and unpatterned; the value of and digit in Ω is equally likely to be a zero or a one; any selected *pair* of digits is equally likely to be any of the 4 possibilities 00, 01, 10, 100; and so on.

So, now we know a little bit about why Ω is so strange, but we still haven’t really defined it precisely. What is Ω? How does it get these bizzare properties?

Well, as I said at the beginning, Ω isn’t a single number; it’s a family of numbers. The value of *an* Ω is based on two things: an effective (that is, turing equivalent) computing device; and a prefix-free encoding of programs for that computing device as strings of bits.

(The prefix-free bit is important, and it’s also probably not familiar to most people, so let’s take a moment to define it. A prefix-free encoding is an encoding where for any given string which is valid in the encoding, *no prefix* of that string is a valid string in the encoding. If you’re familiar with data compression, Huffman codes are a common example of a prefix-free encoding.)

So let’s assume we have a computing device, which we’ll call φ. We’ll write the result of running φ on a program encoding as the binary number p as &phi(p). And let’s assume that we’ve set up φ so that it only accepts programs in a prefix-free encoding, which we’ll call ε; and the set of programs coded in ε, we’ll call Ε; and we’ll write φ(p)↓ to mean φ(p) halts. Then we can define Ω as:

Ω_{φ,ε} = **Σ**_{p ∈ Ε|p↓} 2^{-len(p)}

So: for each program in our prefix-free encoding, if it halts, we *add* 2^{-length(p)} to Ω.

Ω is an incredibly odd number. As I said before, it’s random, uncompressible, and has a fully normal distribution of digits.

But where it gets fascinatingly bizzare is when you start considering its computability properties.

Ω is *definable*. We can (and have) provided a specific, precise definition of it. We’ve even described a *procedure* by which you can conceptually generate it. But despite that, it’s *deeply* uncomputable. There are procedures where we can compute a finite prefix of it. But there’s a limit: there’s a point beyond which we cannot compute *any* digits of it. *And* there is no way to compute that point. So, for example, there’s a very interesting paper where the authors [computed the value of Ω for a Minsky machine][omega]; they were able to compute 84 bits of it; but the last 20 are *unreliable*, because it’s uncertain whether or not they’re actually beyond the limit, and so they may be wrong. But we can’t tell!

What does Ω mean? It’s actually something quite meaningful. It’s a number that encodes some of the very deepest information about *what* it’s possible to compute. It gives a way to measure the probability of computability. In a very real sense, it represents the overall nature and structure of computability in terms of a discrete probability. It’s an amazingly dense container of information – as a n infinitely long number and so thoroughly non-compressible, it contains an unmeasurable quantity of information. And we can’t get near most of it!

[omega]: http://www.expmath.org/expmath/volumes/11/11.3/Calude361_370.pdf

# Q&A: What is information?

I received an email from someone with some questions about information theory; they relate to some sufficiently common questions/misunderstandings of information theory that I thought it was worth turning the answer into a post.

There are two parts here: my correspondent started with a question; and then after I answered it, he asked a followup.

The original question:

————————

>Recently in a discussion group, a member posted a series of symbols, numbers,

>and letters:

>

>`+

>The question was what is its meaning and whether this has information or not.

>I am saying is does have information and an unknown meaning (even though I am

>sure they were just characters randomly typed by the sender), because of the

>fact that in order to recognize a symbol, that is information. Others are

>claiming there is no information because there is no meaning. Specifically that

>while the letters themselves have meaning, together the message or “statement”

>does not, and therefore does not contain information. Or that the information

>content is zero.

>

>Perhaps there are different ways to define information?

>

>I think I am correct that it does contain information, but just has no meaning.

This question illustrates one of the most common errors made when talking about information. Information is *not* language: it does *not* have *any* intrinsic meaning. You can describe information in a lot of different ways; in this case, the one that seems most intuitive is: information is something that reduces a possibility space. When you sit down to create a random string of characters, there is a huge possibility space of the strings that could be generated by that. A specific string narrows the space to one possibility. Anything that reduces possibilities is something that *generates* information.

For a very different example, suppose we have a lump of radium. Radium is a radioactive metal which breaks down and emits alpha particles (among other things). Suppose we take out lump of radium, and put it into an alpha particle detector, and record the time intervals between the emission of alpha particles.

The radium is *generating information*. Before we started watching it, there were a huge range of possibilities for exactly when the decays would occur. Each emission – each decay event – narrows the possibility space of other emissions. So the radium is generating information.

That information doesn’t have any particular *meaning*, other than being the essentially random time-stamps at which we observed alpha particles. But it’s information.

A string of random characters may not have any *meaning*; but that doesn’t mean it doesn’t contain information. It *does* contain information; in fact, it *must* contain information: it is a *distinct* string, a *unique* string – one possibility out of many for the outcome of the random process of generation; and as such, it contains information.

The Followup

—————

>The explanation I have gotten from the person I have been debating, as to what

>he says is information is:

>

>I = -log_{2} P(E)

>

>where:

>I: information in bits

>P: probability

>E: event

>

>So for the example:

>`+

>He says:

>”I find that there is no meaning, and therefore I infer no information. I

>calculate that the probability of that string occuring was ONE (given no

>independent specification), and therefore the amount of information was ZERO. I

>therefore conclude it has no meaning.”

>

>For me, even though that string was randomly typed, I was able to look at the

>characters, and find there placement on a QWERTY keyboard, compare the

>characters to the digits on the hand, and found that over all, the left hand

>index finger keys were used almost twice as much. I could infer that a person

>was left handed tried to type the “random” sequence. So to me, even though I

>don’t know the math, and can’t measure the amount of information, the fact I

>was able to make that inference of a left handed typer tells me that there is

>information, and not “noise”.

The person quoted by my correspondent is an idiot; and clearly one who’s been reading Dembski or one of his pals. In my experience, they’re the only ones who continually stress that log-based equation for information.

But even if we ignore the fact that he’s a Dembski-oid, he’s *still* an idiot. You’ll notice that nowhere in the equation does *meaning* enter into the definition of information content. What *does* matter is the *probability* of the “event”; in this case, the probability of the random string of characters being the result of the process that generated it.

I don’t know how he generated that string. For the sake of working things through, let’s suppose that it was generated by pounding keys on a minimal keyboard; and let’s assume that the odds of hitting any key on that keyboard are equal (probably an invalid assumption, but it will only change the quantity of information, not its presence, which is the point of this example.) A basic minimal keyboard has about sixty keys. (I’m going by counting the keys on my “Happy Hacking Keyboard”. Of those, seven are shifts of various sorts: 2 shift, 2 control, one alt, one command, one mode), and one is “return”. So we’re left with 52 keys. To make things simple, let’s ignore the possibility of shifts affecting the result (this will result in us getting a *lower* information content, but that’s fine). So, it’s 80 characters; each *specific* character generated is an event with probability 1/52. So each *character* of the string has, by the Shannon-based definition quoted above, about 5.7 bits of information. The string as a whole has about 456 bits of information.

The fact that the process that generated it is random doesn’t make the odds of a particular string be one. For a trivial example, I’m going to close my eyes and pound on my keyboard with both hands twice, and then select the first 20 characters from each:

First: “`;kldsl;ksd.z.mdΩ.l.x`”

Second: “`lkficeewrflk;erwm,.r`”

Quite different results overall? Seems like I’m a bit heavy on the right hand on the k/l area. But note how different the outcomes are. *That* is information. Information without *semantics*; without any intrinsic meaning. But that *doesn’t matter*. Information *is not language*; the desire of [certain creationist morons][gitt] to demand that information have the properties of language is nonsense, and has *nothing* to do with the mathematical meaning of information.

The particular importance of this fact is that it’s a common creationist canard that information *must* have meaning; that therefore information can’t be created by a natural process, because natural processes are random, and randomness has no meaning; that DNA contains information; and therefore DNA can’t be the result of a natural process.

The sense in which DNA contains information is *exactly* the same as that of the random strings – both the ones in the original question, and the ones that I created above. DNAs information is *no different*.

*Meaning* is something that you get from language; information *does not* have to be *language*. Information *does not* have to have *any* meaning at all. That’s why we have distinct concepts of *messages*, *languages*, *semantics*, and *information*: because they’re all different.

[gitt]: http://scienceblogs.com/goodmath/2006/07/bad_bad_bad_math_aig_and_infor.php

# Irrational and Transcendental Numbers

If you look at the history of math, there’ve been a lot of disappointments for mathematicians. They always start off with an idea of math as a clean, beautiful, elegant thing. And they seem to often wind up disappointed.

Which leads us into todays strange numbers: irrational and transcendental numbers. Both of them were huge disappointments to the mathematicians who discovered them.

So what are they? We’ll start with the irrationals. They’re numbers that aren’t integers, and that aren’t a ratio of any two integers. So you can’t write them as a normal fraction. If you write them as a continued fraction, then they go on forever. If you write them in decimal form, they go on forever without repeating. π, √2, *e* – they’re all irrational. (Incidentally, the reason that they’re called “irrational” isn’t because they don’t make sense, or because their decimal representation is crazy, but because they can’t be written *as ratios*. My high school algebra teacher who first talked about irrational numbers said that they were irrational because numbers that never repeated in decimal form weren’t sensible.)

The transcendentals are even worse. Transcendental numbers are irrational; but not only can transcendental numbers not be written as a ratio of integers; not only do their decimal forms go on forever without repeating; transcendental numbers are numbers that *can’t* be described by algebraic operations: there’s no finite sequence of multiplications, divisions, additions, subtractions, exponents, and roots that will give you the value of a transcendental number. √2 is not transcendental; *e* is.

History

———–

The first disappointment involving the irrational numbers happened in Greece, around 500 bce. A rather brilliant man by the name of Hippasus, who was part of the school of Pythagoras, was studying roots. He worked out a geometric proof of the fact that √2 could not be written as a ratio of integers. He showed it to his teacher, Pythagoras. Pythagoras was convinced that numbers were clean and perfect, and could not accept the idea of irrational numbers. After analyzing Hippasus’s proof, and being unable to find any error in it, he became so enraged that he *drowned* poor Hippasus.

A few hundred years later, Eudoxus worked out the basic theory of irrationals; and it was published as a part of Euclid’s mathematical texts.

From that point, the study of irrationals pretty much disappeared for nearly 2000 years. It wasn’t until the 17th century that people started really looking at them again. And once again, it led to disappointment; but at least no one got killed this time.

Mathematicians had come up with yet another idea of what the perfection of math meant – this time using algebra. They decided that it made sense that algebra could describe all numbers; so you could write an equation to define any number using a polynomial with rational coefficients; that is, an equation using addition, subtraction, multiple, division, exponents, and roots.

Leibniz was studying algebra and numbers, and he’s the one who made the unfortunate discovery: lots of irrational numbers are algebraic; but lots of them *aren’t*. He discovered it indirectly, by way of the sin function. You see, sin(x) *can’t* be computed from x using algebra. There’s no algebraic function that can compute it. Leibniz called sin a transcendental function, since it went beyond algebra.

Building on the work of Leibniz, Liouville worked out that you could easily construct numbers that couldn’t be computed using algebra. For example, the constant named after Liouville consists of a string of 0s and 1s where 10^{-i} is 1 if/f there is some integer n such that n!=i.

Not too long after, it was discovered that *e* was transcendental. The main reason for this being interesting is because up until that point, every transcendental number known was *constructed*; *e* is a natural, unavoidable constant. Once *e* was shown irrational, others followed; in one neat side-note, π was shown to be transcendental *using e*. One of the properties that they discovered after the transcendence of *e* was that any transcendental number raised to a non-transcendental power was transcendental. Since e^{iπ} is *not* transcendental – it’s 1 – then π must be transcendental.

The final disappointment in this area came soon after; Cantor, studying the irrationals, came up with the infamous “Cantor’s diagonalization” argument, which shows that there are *more* transcendental numbers than there are algebraic ones. *Most* numbers are not only irrational; they’re transcendental.

What does it mean, and why does it matter?

———————————————-

Irrational and transcendental numbers are everywhere. Most numbers aren’t rational. Most numbers aren’t even algebraic. That’s a very strange notion: *we can’t write most numbers down*.

Even stranger, even though we know, per Cantor, that most numbers are transcendental, it’s *incredibly* difficult to prove that any particular number is transcendental. Most of them are; but we can’t even figure out *which ones*.

What does that mean? That our math-fu isn’t nearly as strong as we like to believe. Most numbers are beyond us.

Some interesting numbers that we know are either irrational or transcendental:

1. *e*: transcendental.

2. π: transcendental.

3. √2 : irrational, but algebraic,

4. √x, for all x that are not perfect squares are irrational.

5. log_{2}3 is irrational.

6. Ω, Chaitin’s constant, is transcendental.

What’s interesting is that we really don’t know very much about how transcendentals interact; and given the difficulty of proving that something is transcendental, even for the most well-known transcendentals, we don’t know much of what happens when you put them together. π+*e*; π×*e*; π^{π}; *e*^{*e*}, π^{e} are all numbers that we *don’t know* if they;’re transcendental. In fact, for π+*e* (and π-*e*) we don’t even know if it’s *irrational*.

That’s the thing about these number. We have *such* a weak grasp of them that even things that seem like they should be easy and fundamental, *we don’t know how to do*.