A Brilliant φ link

In the comments onmy post about φ, Polymath, (whose [blog][polymath] is well worth checking out) provided a really spectacular [link involving φ][desert]. It’s an excerpt from a book called “[Mathematical Gems 2][gems]”, showing a problem that came from John Conway, called the “Sending Scouts into the Desert” problem.
The problem is:
Suppose you’re given a checkerboard with all of the squares on the bottom filled. You’re allowed to do standard checks jumps (jump over a man and remove it), but you can’t jump diagonally, only up, left, or right. How far *up* can you get a man? How many men do you need to move to get that far?
To get to the first row above your men, you need to jump one man. To get two rows, you need to jump four men; three rows, you’ll need 8 men; four rows, you’ll need 20 men. How about five rows?
You *can’t* get five rows doing up and sideways jumps. Even if you had an infinitely wide checkerboard, with as many full rows of mens as you want.
The proof, along with a nice Java applet to let you try the solvable ones, is at the link. And φ is an inextricable part of the proof!
[polymath]: http://polymathematics.typepad.com/
[desert]: http://www.cut-the-knot.org/proofs/checker.shtml
[gems]: http://www.amazon.com/gp/redirect.html?link_code=ur2&tag=goodmathbadma-20&camp=1789&creative=9325&location=%2Fgp%2Fproduct%2F0883853027%2Fsr%3D8-2%2Fqid%3D1155392829%2Fref%3Dpd_bbs_2%3Fie%3DUTF8

Friday Pathological Programming: Unlambda, or Programming Without Variables

Todays tasty treat: a variable-free nearly-pure functional programming language: Unlambda.
Unlambda is based on the SKI combinator calculus. The [SKI calculus][ski] is a way of writing lambda calculus without variables, and without lambda. It’s based on three combinators: S, K, and I:
1. S = λ x y z . x z (y z)
2. K = λ x y . x
3. I = λ x . x
Given only S and K, you can actually write *any* lambda calculus expression – and therefore, you can write any computable function using nothing but S and K. I is often added for convenience; it’s just a shorthand for “SKK”.
You can do recursion in SKI; the Y combinator of lambda calculus is:
Y = S S K (S (K (S S (S (S S K)))) K)
Unlambda is based on SKI; but it extends it with a few additional kinds of sugar to make things easier to write. (Nothing makes Unlambda easier to *read*!) The basic idea is that *everything* in unlambda is a function; and in particular, everything is a function that takes a *single* variable: you don’t need more than one parameter, because you can always using [currying][curry] to express any multiparameter function as a series of single-parameter functions.
(As a refresher, if you have a two-parameter function in lambda calculus, like “λ x y . x + y”; *currying* is translating it into a one parameter function that returns a one parameter function: “λ x .(λ y. x + y)”.)
There are eight basic instructions in Unlambda:
1. ““”` : the application operator. “`’AB`” applies the function `A` to the parameter `B`. You can think of this as being something like the “(” in Lisp; there is no close paren because since every function takes only one parameter, there is no *need* for a close paren. And hey, adding a syntactically unnecessary close paren would only make the code easier to read, and we can’t have that!
2. “`k`” : the K combinator.
3. “`s`” : the S combinator.
4. “`i`” : the I combinator.
5. “`v`” : a “drop” combinator: takes a parameter, discards it, and returns “`v`”
6. “`d`” : the “delay” operator; a way of doing *lazy evaluation* in Unlambda. Unlambda normally evaluates the argument before it
applies a function. `’dA` is the same thing as `A`, except that `A` isn’t evaluated until “`’dA`” is applied to something else.
7. “`c`” : the “call-with-current-continuation” combinator. This is equivalent to [“call/cc”][callcc] in Scheme. What it does is call some function with the current state of the computation captured as a function as its parameter. What this means is that “`’cAB`” calls “`A`”. “`A`” receives the value “`cAB`” as a parameter; during A, it can return a value, *or* it can call the continuation, which will evaluate “`cAB`” again. So this is the function-based equivalent of a loop. “c” turns the “loop body” into a function; so at the end of the loop, you re-invoke the continuation with a new parameter to run the next iteration.
8. “`.`” : The “.” operator is actually a meta-operator. “`.x`” is a function which returns its parameter, *and* prints the character “x”.
9. “`r`” is a shorthand for “`.CR`”, where CR is the carriage return character.
And that is it. The entire language.
Note that there are no numbers! We need to build numbers using church numerals.
* The church numeral “0” is “λ f. (λ x . x)”. In Unlambda, that’s “`’ki`”.
* The church numeral “1” is “λ f . (λ x . f x)”; in unlambda, that’s “`i`”.
* The church numeral “2” is “λ f . (λ x. f f x)”; or “`”s”s’kski`”.
* The church numeral “3” is “λ f . (λ x . f f f x)”; or “`”s”s’ksk”s”s’kski`”.
* And so on.. Just keep appending “`”s”skski`” for each successive number.
So… On to a couple of programs!
Hello world in Unlambda is:
`r“““““`.H.e.l.l.o. .w.o.r.l.di
It’s pretty straightforward. It’s actually sort of easier to understand if you look from right to left. The “`.H.e.l.l.o…`” is basically a series of “`i`” combinators, which print out the characters of hello world. The trailing “`i`” is there because “`.d`” needs a parameter in order to do anything. And the list of “`’`”s at the beginning are there to make the “.” functions be applied. Finally, apply “`’r`” to the whole thing – which prints a newline at the end.
Next, a counter: starts with “N=1”; print “N” asterisks, add one, and then repeat.
If you look at it, you can see the number pattern in there; “`”s”s’ksk`”. That fragment is an SKI function to add one. You capture a continuation for the main iteration; run “`i”s’k’cN`”; that’s a fragment that adds “1” to N, and then prints N. Since you’ve captured the continuation, you can re-invoke it on 1+n; the rest captures *another* continuation which is the inner loop for printing “*”s; it does that, and then it invokes the out continuation to go back and do it for “N+1”.
One last, which I’ll leave for you to trace through yourself: this one generates the fibonacci sequence; and for each element of it, it prints out the numbers using a line of asterisks containing “N” asterisks for the number “N”:
I’ll also point out that there are a ton of variants on Unlambda, ranging from the really interesting ([LazyK][lazyk]) to the bizzarely minimal ([Iota][iota]) to the downright goofy ([Jot][jot]).
[callcc]: http://community.schemewiki.org/?call-with-current-continuation
[ski]: http://goodmath.blogspot.com/2006/05/from-lambda-calculus-to-combinator.html
[curry]: http://goodmath.blogspot.com/2006/05/my-favorite-calculus-lambda-part-1.html
[iota]: http://ling.ucsd.edu/~barker/Iota/
[lazyk]: http://homepages.cwi.nl/~tromp/cl/lazy-k.html
[jot]: http://ling.ucsd.edu/~barker/Iota/#Goedel

From Lambda Calculus to Cartesian Closed Categories

This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)
Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.
First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.
So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:
1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’
So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.
The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)
We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.
So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:
* ∀ A ∈ basetypes(**L**), [A] = AC ∈ C(**L**)
* [unit] = 1C
Next, we need to define the typing rules for complex types:
* [ A × B ] = [A] × [B]
* [ A → B ] = [B][A]
Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:
* [∅] = 1C
* [Γ, x : A] = [Γ] × [A]
We also need to describe what to do with the values of the primitive types:
* For each value v : A, there is an arrow v : 1 → AC.
And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).
* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*
* [Γ :- a : AC] = a º ! : [Γ] → [AC] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*
* [Γ x : A :- x : A] = π2 : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*
* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π1 where π1 : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*
* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B][A] *(A function typing rule: the function maps to an arrow from the type context to an exponential [B][A], which is a function from A to B.)*
* [Γ :- (M M’) : B] = evalC,B º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*
There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.
The intuition behind this is:
* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.
* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)
Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)
[cam]: http://portal.acm.org/citation.cfm?id=34883&dl=ACM&coll=portal
[caml]: http://caml.inria.fr/
[lambda]: http://goodmath.blogspot.com/2006/06/lamda-calculus-index.html
[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php

Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.
[metamath]: http://us.metamath.org/mpegif/mmset.html#overview
[meta-peano]: http://us.metamath.org/mpegif/peano5.html

More Loony Christian Math

Yesterday, I posted [this article][bozo] about the bozo who didn’t like his college calculus course because it wasn’t Christian enough. One of the commenters pointed out that there’s actually a site online where a college professor from a Christian college actually has [a collection of “devotionals”][devotional] to be presented along with the lecture in a basic calculus course.
They’re sufficiently insane that I have to quote a couple of them. No comment that I could possibly make could add anything to the sheer goofiness of these.
For the lesson on “Function Operations”:
>**God’s Surgical Improvements of our Actions**
>Genesis 50:15-21, Romans 3:9-10, 21-24
>One of the more horrible images in the book of Genesis is that of Joseph being
>sold by his brothers into slavery. This type of hate turned into evil act is a
>common occurrence in our world, too. In the Genesis situation, though, we are
>given the gift of 20-20 hindsight because we know the end of the story. God
>used the brothers’ evil action to prevent starvation of the descendants of
>Abraham. Joseph says, “You intended to harm me, but God intended it for
>good. ..” [Gen. 50:20] In the same way, God makes our unrighteous actions
>righteous through Christ. He surgically improves our actions to his own
>This idea of twisting something from one form into another is what happens when
>function operations work on elementary functions. You can start with two
>ordinary benign functions, the reciprocal function 1/x and sin(x), say, and put
>them together. Depending on how you put them together, you can create
>something interesting and easily understood, like sin(x)/x, or something with
>wild behavior, like sin(1/x). Either way, you have twisted one object into
>something very different.
For the lesson on the Limit definition of the derivative:
>**Secant Lines and Sanctification**
>Ps. 119:33-40
>In differential calculus we study how a slope of a linear function can be
>generalized to the slope of a function whose graph is curved, creating the
>derivative of the original function. The definition of derivative uses a
>sequence of lines (secant lines) drawn through two points on a function that
>are approaching each other and a single point on the function curve. The
>derivative value or tangent line slope is defined to be the limiting slope
>value of this sequence of secant lines. (See the figures below.)
> Figure 1 : Secant line between 1 and 1.8
> Figure 2 : Secant line between 1 and 1.5
> Figure 3: Tangent line to f at x=1
>Once a person has been called to be a Christian, we are redeemed by Christ but
>not released from following the law of God. We are justified once but continue
>with the process of sanctification for the remainder of our lives. This
>sanctification process is like the limit process of the secant lines
>approaching the tangent line. There is one distinction between the concepts of
>sanctification and secant line limits, however. In the mathematical contexts,
>we accept results that are “sufficiently close,” results that are in an
>epsilon-neighborhood of the desired quantity. While in our quest for
>perfection, the “better” we get the further we realize we are from satisfying
>all aspects of the law.
Ok, just one more. This one just about had me rolling on the floor! For the lesson on the chain rule:
>**Chain Reactions**
> 1 Corin 5:16-21
>Once students have seen the chain rule for differentiation of composed
>functions, it is natural to extend the chain rule to nested functions, where
>there is more than two functions that are composed. Fun problems to
>investigate are ones that are repeated applications of the same function. Try
>differentiating tan(tan(tan(tan(tan x)))) or ln(ln(ln(ln(ln(ln x))))), for
>example. Working your way from the outside to the inside yields a derivative
>which is product chain of related functions.
>In a similar way, when we interact with other people there is a chain reaction
>to our behavior. Most people believe that abused children are more likely to
>become abusers themselves someday, for example. Less dramatic behavior also
>can have a reaction that extends beyond the initial engagement. A popular
>Warner Brothers film of 2000 “Pay it Forward” (based on a novel with the same
>name by Catherine Ryan Hyde) depicts how a chain of reactions to an initial act
>of kindness can change an entire community. Christians need to be specially
>mindful of this chain reaction, since we are ambassadors for Christ. Our
>verbal and nonverbal witness can yield unexpected results, especially under the
>influence of the Holy Spirit.
This, apparently, is how “real” Christians are supposed to teach math classes. I seriously wonder how they have time to actually cover the *math* in class if they spend time on this gibberish!
[bozo]: http://scienceblogs.com/goodmath/2006/08/math_is_bad_because_it_isnt_ch.php
[devotional]: http://www.trnty.edu/faculty/robbert/SRobbertWebFolder/ChristianityMath/Calculus.html#Chain

Categorical Numbers

Sorry for the delay in the category theory articles. I’ve been busy with work, and haven’t had time to do the research to be able to properly write up the last major topic that I plan to cover in cat theory. While doing my research on closed cartesian categories and lambda calculus, I came across this little tidbit that I hadn’t seen before, and I thought it would be worth taking a brief diversion to look at.
Category theory is sufficiently powerful that you can build all of mathematics from it. This is something that we’ve known for a long time. But it’s not really the easiest thing to demonstrate.
But then, I came across this lovely little article on wikipedia. It turns out that you can [build the natural numbers entirely in terms of categories][wikipedia] in a beautifully simple way.
Before we dive into categorical numbers, let’s take a moment to remember what we really mean by natural numbers.
Peano Arithmetic
The meanings of natural numbers are normally defined in terms of what we call *Peano arithmetic*. Peano arithmetic defines the natural numbers, **N** in terms of five fundamental axioms:
1. 0 ∈ **N** (0 is a natural number.)
2. ∀ a ∈ **N** ∃ a’=s(n), a’ ∈ **N**. (Every natural number a has a successor, s(a))
3. ¬ ∃ a ∈ **N** : s(a)=0. (No natural number has 0 as its successor.)
4. ∀ a,b ∈ **N**: s(a)=s(b) ⇒ a=b. (Each number has a unique successor.)
5. (P(0) ∧ ∀ a : P(a) ⇒ P(s(a))) ⇒ ∀ b ∈ **N** : P(b). (If a property P is true for 0; and it can be shown that for any number a, if it’s true for a, then it’s true for b, then that means that it’s true for all natural numbers. This is the *axiom of induction*.)
To get to arithmetic, we need to add two operations: addition and multiplication.
We can define addition recursively quite easily:
1. ∀ x ∈ **N**: x+0 = x
2. ∀ x,y ∈ **N**: s(x)+y=x+s(y)
And once we have addition, we can do multiplication:
1. ∀ x ∈ **N**: x * 0 = 0 ∧ 0 * x = 0
2. ∀ x ∈ **N**: 1 * x = x
3. ∀ x,y ∈ **N**: s(x)*y = (x*y) + y
To sum up: Peano arithmetic is based on a definition of the natural numbers in terms of five fundamental properties, called the Peano axioms, built on “0”, a successor function, and induction. If we have those five properties, we can define addition and multiplication; and given those, we have the ability to build up arbitrary functions, sets, and the other fundamentals of mathematics in terms of the natural numbers.
Peano Naturals and Categories
The peano axioms can be captured in a category diagram for a relatively simple object called a *natural number object (NNO)*. A natural number object can exist in many categories: basically any category with a terminal object *can* have NNOs. Any closed cartesian category *does* have an NNO.
There are two alternate constructions; a general construct that works for all NNO-containing categories; and a more specific one for closed cartesian categories. (You can show that they are ultimately the same in a CCC.)
I actually think that the general one is easier to understand, so that’s what I’ll describe. This is basically a construction that captures the ideas of successor and recursion.
Given a category C with a terminal object **1**. A *natural number object* is an object *N* with two arrows z : 1 → *N*, and s : *N* → *N*, where z and s have the following property.
(∀ a ∈ Obj(C), (q : 1 → a, f : a → a) ∈ Arrows(C)) : u º z = q ∧ u º s = f º u.
That statement means the same thing as saying that the following diagram commutes:


Let’s just walk through that a bit. Think of the object **N** as a set. “s” is successor. So S is an arrow from *N* to *N* that maps each number to its successor. *N* is “rooted” by the terminal object; that gives it its basic structure. The “z” arrow from the terminal object maps to the zero value in *N*. So we’ve got an arrow that selects zero; and we’ve got an arrow that defines the successor.
Think of “q” as a predicate, and “u” as a statement that a value in *N* satisfies the predicate.
So u º z, that is the path from 1 to a by stepping through z and u is a statement that the predicate “q” is true for z (zero); because composing u (the number I’m composed with satisfies q) and z (the number zero) gives us “q” (zero satifies q).
Now the induction rule part. *N* maps to *N* by s – that’s saying that s is an arrow from a number in *N* to its successor. If you have an arrow composition that says “q is true for n”, then f can be composed onto that giving you “q is true for n + 1”.
If for all q, u is unique – that is, there is only *one* mapping where induction works; then you’ve got a categorical structure that satisfies the Peano axioms.
[wikipedia]: http://en.wikipedia.org/wiki/Natural_number_object

Math is Bad, Because it isn't Christian!

By way of Pharyngula, I saw something that I simply had to repeat here.
Every august, James Kennedy – a thoroughly repulsive ultra-fundy preacher from Coral Ridge Ministries – runs a conference called “Reclaiming America for Christ”. At this years conference, he featured a speech by Paul Jehle about “Evaluating your Philosophy of Education”.
Jehle is… umm… how do we say this politely?….
Ah, screw it. Jehle is a fucking frothing at the mouth nutjob lunatic asshole.
His basic argument – the argument that he *expects people to take seriously* – is that *everything* is either christian or non-christian. And if it’s non-christian, then christians shouldn’t look at it, listen to it, or study it. And you can’t *ever* make anything that started out non-christian christian.
How far does he go with this? Pretty damned far. Right into the domain of this here blog. In his talk, he uses the following story as an example:
>I was taking calculus. I was a mathematics major and I was at a Christian
>college that was called Christian, but was not Christian….
>I asked a question to my calculus professor: “What makes this course distinctly
>Christian?” He stopped. He said no one has ever asked that question before…
>He said, “Okay, I’m a Christian; you’re a Christian.”
>I said, “That’s not what I asked! What makes this calculus course distinctly
>Christian? What makes this different from the local secular university? Are we
>using the same text? Yes. Are you teaching it the same way? Yes. Then why is
>this called a Christian college and that one a non-Christian college?”
Yeah. Seriously. **Math is Bad**, because it’s *not explicitly christian*. I mean, it uses *zero*, which was invented by a *hindu*, and brought to europe by *muslims*. Algebra was invented by muslims! The word “algorithm” comes from the name of a *muslim* mathematician!
Uh-oh… I just realized that the alleged “Doctor” Jehle has a very serious problem. The way that we geeks heard his talk to write about it is because it was *digitized* – using a thoroughly non-christian technology – and posted on the *internet*, which is built using those non-christian *algorithms*. And to quite Jehle himself:
>But the issue is you cannot combine something by its nature which is pagan and
>built on humanistic principles and make it Christian by a magic wand.
So the internet, and computers, and digital recording, and the data compression that makes streaming audio work – they’re *non-christian*. And you cannot combine something non-Christian with something Christian.

φ, the Golden Ratio

Lots of folks have been asking me to write about φ, the golden ratio. I’m finally giving up and doing it. I’m not a big fan of φ. It’s a number
which has been adopted by all sorts of flakes and crazies, and there are alleged sightings of it in all sorts of strange places that are simply *not* real. For example, pyramid loons claim that the great pyramids in Egypt have proportions that come from φ – but they don’t. Animal mystics claim that the ratio of drones to larvae in a beehive is approximately φ – but it isn’t.
But it is an interesting number. My own personal reason for thinking it’s cool is representational: if you write φ as a continued fraction, it’s [1;1,1,1,1…]; and if you write it as a continued square root, it’s 1 + sqrt(1+sqrt(1+sqrt(1+sqrt(1…)))).
What is φ?
φ is the value so-called “golden ratio”: it’s the number that is a solution for the equation (a+b)/a = (a/b). (1+sqrt(5))/2. It’s the ratio where if you take a rectangle where the ratio of the length of the sides is 1:φ, then if you remove the largest possible square from it, you’ll get another rectangle whose sides have the ration φ:1. If you take the largest square from that, you’ll get a rectangle whose sides have the ratio 1:φ. And so on:


Allegedly, it’s the proportion of sides of a rectangle that produce the most aesthetically beautiful appearance. I’m not enough of a visual artist to judge that, so I’ve always just taken that on faith. But it *does* shows up in many places in geometry. For example, if you draw a five-pointed star, the ratio of the length of one of the point-to-point lines of the star to the length of the sides of the pentagon inside the star are φ:1:


φ is also related to the fibonacci series. In case you don’t remember, the fibonacci series is the set of numbers where each number in the series is the sum of the two previous: 1,1,2,3,5,8,13,… If Fib(n) is the *n*th number in the series, you can compute it as:

Quaternions: upping the dimensions of complex numbers

Last week, after I wrote about complex numbers, a bunch of folks wrote and said “Do quaternions next!” My basic reaction was “Huh?” I somehow managed to get by without ever being exposed to quaternions before. They’re quite interesting things.
The basic idea behind quaterions is: we see some amazing things happen when we expand the dimensionality of numbers from 1 (the real numbers) to 2 (the complex numbers). What if we add *more* dimensions?
It doesn’t work for three dimensions But you *can* create a system of numbers in *four* dimensions. As with complex numbers, you need a distinct unit for each dimension. In complex, those were 1 and i. For quaternions, you need for units: we’ll call them “1”, “i”, “j”, and “k”. In quaternion math, the units have the following properties:
1. i2 = j2 = k2 = ijk = -1
2. ij = k, jk = i, and ki=j
3. ji = -k, kj = -i, and ik=-j
No, that last one is *not* an error. Quaternion multiplication is *not* commutative: ab &neq; ba. It *is* associative over multiplication, meaning (a*b)*c = a*(b*c). And fortunately, it’s both commutative and associative over addition.
Anyway, we write a quaternion as “a + bi + cj + dk”; as in “2 + 4i -3j + 7k”.
Aside from the commutativity thing, quaternions work well as numbers; they’re a non-abelian group over multiplication (assuming you omit 0); they’re an abelian group over addition; they form a division algebra *over* the reals (meaning that you can define an algebra with operations up to multiplication and division over the quaternions, and operations in this algebra will generate the same results as normal real algebra if the i,j, and k components are 0.)
Quaternions are a relatively recent creation, so we know the history quite precisely. Quaternions were invented by a gentleman named William Hamilton on October 16th, 1843, while walking over the Broom bridge with his wife. He had been working on trying to work out something like the complex numbers with more dimensions for a while, and the crucial identity “ijk=-1” struck him while he was walking over the bridge. He immediately *carved it into the bridge* before he forgot it. Fortunately, this act of vandalism wasn’t taken seriously: the Broom bridge now has a plaque commemorating it!
Quaternions were controversial from the start. The fundamental step that Hamilton had taken in making quaternions work was abandoning commutativity, and many people believed that this was simply *wrong*. For example, Lord Kelvin said “Quaternions came from Hamilton after his really good work had been done; and, though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.”
Why are they non-commutative?
Quaternions are decidedly strange things. As I mentioned above, they’re non-commutative. Doing math with things where ab &neq; ba was a *huge* step, and it can be strange. But it is necessary for quaternions. Here’s why.
Let’s take the fundamental identity of quaternions: ijk = -1
Now, let’s multiply both sides by k: ij(k2) = -1k
So, ij(-1)=-k, and ij=k.
Now, multiply by *i* on the left on both sides: i2j = ik
And so, -1j = ik, and ik=-j.
Keep playing with multiplication in the identities, and you find that the identities simple *do not work* unless it’s non-commutative.
What are they good for?
For the most part, quaternions aren’t used much anymore. Initially, they were used in many places where we now use complex vectors, but that’s mostly faded away. What they *are* used for is all sorts of math that in any way involves *rotation*. Quaternions capture the essential properties of rotation.
To describe a rotation well, you actually need to have *extra* dimensions. To describe an *n*-dimensional rotation, you need *n+2* dimensions. Since in our three dimensional world, we rotate things around a *line* which is two dimensional, the description of the rotation needs *four* dimensions. And further, rotation itself is non-commutative.
Take a book and put it flat on the table, with the binding to the left. Flip it so that the binding stays on the same side, but now you’ve got the back cover on top, upside-down. Then rotate it clockwise 90 degrees on the table. You now have the book with its back cover up, and the binding facing away from you.
Now, start over: book right side up, binding facing left. Rotate it clockwise 90 degrees. Now flip it the same way you did before (over the x axis if you think of left as -x, right as +x, away from you as +y, up as +z, etc.)
What’s the result? You get the book with its back cover up, and the binding factor *towards* you. It’s 180 degrees different depending on which order you do it in. And 180 degrees is a reverse equivalent to the sign difference that you get in quaternion multiplication! Quaternions have exactly the right properties for describing rotations – and in particular, the *composition* of rotations.
The use of quaternions for rotations is used in computer graphics (when something rotates on the screen of your PS2, it’s doing quaternion math!); in satellite navigation; and in relativity equations (for symmetry wrt rotation).

Ω: my favorite strange number

Ω is my own personal favorite transcendental number. Ω isn’t really a specific number, but rather a family of related numbers with bizzare properties. It’s the one real transcendental number that I know of that comes from the theory of computation, that is important, and that expresses meaningful fundamental mathematical properties. It’s also deeply non-computable; meaning that not only is it non-computable, but even computing meta-information about it is non-computable. And yet, it’s *almost* computable. It’s just all around awfully cool.
So. What is it Ω?
It’s sometimes called the *halting probability*. The idea of it is that it encodes the *probability* that a given infinitely long random bit string contains a prefix that represents a halting program.
It’s a strange notion, and I’m going to take a few paragraphs to try to elaborate on what that means, before I go into detail about how the number is generated, and what sorts of bizzare properties it has.
Remember that in the theory of computation, one of the most fundamental results is the non-computability of *the halting problem*. The halting problem is the question “Given a program P and input I, if I run P on I, will it ever stop?” You cannot write a program that reads P and I and gives you the answer to the halting problem. It’s impossible. And what’s more, the statement that the halting problem is not computable is actually *equivalent* to the fundamental statement of Gödel’s incompleteness theorem.
Ω is something related to the halting problem, but stranger. The fundamental question of Ω is: if you hand me a string of 0s and 1s, and I’m allowed to look at it one bit at a time, what’s the probability that *eventually* the part that I’ve seen will be a program that eventually stops?
When you look at this definition, your reaction *should* be “Huh? Who cares?”
The catch is that this number – this probability – is a number which is easy to define; it’s not computable; it’s completely *uncompressible*; it’s *normal*.
Let’s take a moment and look at those properties:
1. Non-computable: no program can compute Ω. In fact, beyond a certain value N (which is non-computable!), you cannot compute the value of *any bit* of Ω.
2. Uncompressible: there is no way to represent Ω in a non-infinite way; in fact, there is no way to represent *any substring* of Ω in less bits than there are in that substring.
3. Normal: the digits of Ω are completely random and unpatterned; the value of and digit in Ω is equally likely to be a zero or a one; any selected *pair* of digits is equally likely to be any of the 4 possibilities 00, 01, 10, 100; and so on.
So, now we know a little bit about why Ω is so strange, but we still haven’t really defined it precisely. What is Ω? How does it get these bizzare properties?
Well, as I said at the beginning, Ω isn’t a single number; it’s a family of numbers. The value of *an* Ω is based on two things: an effective (that is, turing equivalent) computing device; and a prefix-free encoding of programs for that computing device as strings of bits.
(The prefix-free bit is important, and it’s also probably not familiar to most people, so let’s take a moment to define it. A prefix-free encoding is an encoding where for any given string which is valid in the encoding, *no prefix* of that string is a valid string in the encoding. If you’re familiar with data compression, Huffman codes are a common example of a prefix-free encoding.)
So let’s assume we have a computing device, which we’ll call φ. We’ll write the result of running φ on a program encoding as the binary number p as &phi(p). And let’s assume that we’ve set up φ so that it only accepts programs in a prefix-free encoding, which we’ll call ε; and the set of programs coded in ε, we’ll call Ε; and we’ll write φ(p)↓ to mean φ(p) halts. Then we can define Ω as:

Ωφ,ε = Σp ∈ Ε|p↓ 2-len(p)

So: for each program in our prefix-free encoding, if it halts, we *add* 2-length(p) to Ω.
Ω is an incredibly odd number. As I said before, it’s random, uncompressible, and has a fully normal distribution of digits.
But where it gets fascinatingly bizzare is when you start considering its computability properties.
Ω is *definable*. We can (and have) provided a specific, precise definition of it. We’ve even described a *procedure* by which you can conceptually generate it. But despite that, it’s *deeply* uncomputable. There are procedures where we can compute a finite prefix of it. But there’s a limit: there’s a point beyond which we cannot compute *any* digits of it. *And* there is no way to compute that point. So, for example, there’s a very interesting paper where the authors [computed the value of Ω for a Minsky machine][omega]; they were able to compute 84 bits of it; but the last 20 are *unreliable*, because it’s uncertain whether or not they’re actually beyond the limit, and so they may be wrong. But we can’t tell!
What does Ω mean? It’s actually something quite meaningful. It’s a number that encodes some of the very deepest information about *what* it’s possible to compute. It gives a way to measure the probability of computability. In a very real sense, it represents the overall nature and structure of computability in terms of a discrete probability. It’s an amazingly dense container of information – as a n infinitely long number and so thoroughly non-compressible, it contains an unmeasurable quantity of information. And we can’t get near most of it!
[omega]: http://www.expmath.org/expmath/volumes/11/11.3/Calude361_370.pdf