Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.

More Loony Christian Math

Yesterday, I posted [this article][bozo] about the bozo who didn’t like his college calculus course because it wasn’t Christian enough. One of the commenters pointed out that there’s actually a site online where a college professor from a Christian college actually has [a collection of “devotionals”][devotional] to be presented along with the lecture in a basic calculus course.
They’re sufficiently insane that I have to quote a couple of them. No comment that I could possibly make could add anything to the sheer goofiness of these.
For the lesson on “Function Operations”:
>**God’s Surgical Improvements of our Actions**
>Genesis 50:15-21, Romans 3:9-10, 21-24
>One of the more horrible images in the book of Genesis is that of Joseph being
>sold by his brothers into slavery. This type of hate turned into evil act is a
>common occurrence in our world, too. In the Genesis situation, though, we are
>given the gift of 20-20 hindsight because we know the end of the story. God
>used the brothers’ evil action to prevent starvation of the descendants of
>Abraham. Joseph says, “You intended to harm me, but God intended it for
>good. ..” [Gen. 50:20] In the same way, God makes our unrighteous actions
>righteous through Christ. He surgically improves our actions to his own
>This idea of twisting something from one form into another is what happens when
>function operations work on elementary functions. You can start with two
>ordinary benign functions, the reciprocal function 1/x and sin(x), say, and put
>them together. Depending on how you put them together, you can create
>something interesting and easily understood, like sin(x)/x, or something with
>wild behavior, like sin(1/x). Either way, you have twisted one object into
>something very different.
For the lesson on the Limit definition of the derivative:
>**Secant Lines and Sanctification**
>Ps. 119:33-40
>In differential calculus we study how a slope of a linear function can be
>generalized to the slope of a function whose graph is curved, creating the
>derivative of the original function. The definition of derivative uses a
>sequence of lines (secant lines) drawn through two points on a function that
>are approaching each other and a single point on the function curve. The
>derivative value or tangent line slope is defined to be the limiting slope
>value of this sequence of secant lines. (See the figures below.)
> Figure 1 : Secant line between 1 and 1.8
> Figure 2 : Secant line between 1 and 1.5
> Figure 3: Tangent line to f at x=1
>Once a person has been called to be a Christian, we are redeemed by Christ but
>not released from following the law of God. We are justified once but continue
>with the process of sanctification for the remainder of our lives. This
>sanctification process is like the limit process of the secant lines
>approaching the tangent line. There is one distinction between the concepts of
>sanctification and secant line limits, however. In the mathematical contexts,
>we accept results that are “sufficiently close,” results that are in an
>epsilon-neighborhood of the desired quantity. While in our quest for
>perfection, the “better” we get the further we realize we are from satisfying
>all aspects of the law.
Ok, just one more. This one just about had me rolling on the floor! For the lesson on the chain rule:
>**Chain Reactions**
> 1 Corin 5:16-21
>Once students have seen the chain rule for differentiation of composed
>functions, it is natural to extend the chain rule to nested functions, where
>there is more than two functions that are composed. Fun problems to
>investigate are ones that are repeated applications of the same function. Try
>differentiating tan(tan(tan(tan(tan x)))) or ln(ln(ln(ln(ln(ln x))))), for
>example. Working your way from the outside to the inside yields a derivative
>which is product chain of related functions.
>In a similar way, when we interact with other people there is a chain reaction
>to our behavior. Most people believe that abused children are more likely to
>become abusers themselves someday, for example. Less dramatic behavior also
>can have a reaction that extends beyond the initial engagement. A popular
>Warner Brothers film of 2000 “Pay it Forward” (based on a novel with the same
>name by Catherine Ryan Hyde) depicts how a chain of reactions to an initial act
>of kindness can change an entire community. Christians need to be specially
>mindful of this chain reaction, since we are ambassadors for Christ. Our
>verbal and nonverbal witness can yield unexpected results, especially under the
>influence of the Holy Spirit.
This, apparently, is how “real” Christians are supposed to teach math classes. I seriously wonder how they have time to actually cover the *math* in class if they spend time on this gibberish!

Categorical Numbers

Sorry for the delay in the category theory articles. I’ve been busy with work, and haven’t had time to do the research to be able to properly write up the last major topic that I plan to cover in cat theory. While doing my research on closed cartesian categories and lambda calculus, I came across this little tidbit that I hadn’t seen before, and I thought it would be worth taking a brief diversion to look at.
Category theory is sufficiently powerful that you can build all of mathematics from it. This is something that we’ve known for a long time. But it’s not really the easiest thing to demonstrate.
But then, I came across this lovely little article on wikipedia. It turns out that you can [build the natural numbers entirely in terms of categories][wikipedia] in a beautifully simple way.
Before we dive into categorical numbers, let’s take a moment to remember what we really mean by natural numbers.
Peano Arithmetic
The meanings of natural numbers are normally defined in terms of what we call *Peano arithmetic*. Peano arithmetic defines the natural numbers, **N** in terms of five fundamental axioms:
1. 0 ∈ **N** (0 is a natural number.)
2. ∀ a ∈ **N** ∃ a’=s(n), a’ ∈ **N**. (Every natural number a has a successor, s(a))
3. ¬ ∃ a ∈ **N** : s(a)=0. (No natural number has 0 as its successor.)
4. ∀ a,b ∈ **N**: s(a)=s(b) ⇒ a=b. (Each number has a unique successor.)
5. (P(0) ∧ ∀ a : P(a) ⇒ P(s(a))) ⇒ ∀ b ∈ **N** : P(b). (If a property P is true for 0; and it can be shown that for any number a, if it’s true for a, then it’s true for b, then that means that it’s true for all natural numbers. This is the *axiom of induction*.)
To get to arithmetic, we need to add two operations: addition and multiplication.
We can define addition recursively quite easily:
1. ∀ x ∈ **N**: x+0 = x
2. ∀ x,y ∈ **N**: s(x)+y=x+s(y)
And once we have addition, we can do multiplication:
1. ∀ x ∈ **N**: x * 0 = 0 ∧ 0 * x = 0
2. ∀ x ∈ **N**: 1 * x = x
3. ∀ x,y ∈ **N**: s(x)*y = (x*y) + y
To sum up: Peano arithmetic is based on a definition of the natural numbers in terms of five fundamental properties, called the Peano axioms, built on “0”, a successor function, and induction. If we have those five properties, we can define addition and multiplication; and given those, we have the ability to build up arbitrary functions, sets, and the other fundamentals of mathematics in terms of the natural numbers.
Peano Naturals and Categories
The peano axioms can be captured in a category diagram for a relatively simple object called a *natural number object (NNO)*. A natural number object can exist in many categories: basically any category with a terminal object *can* have NNOs. Any closed cartesian category *does* have an NNO.
There are two alternate constructions; a general construct that works for all NNO-containing categories; and a more specific one for closed cartesian categories. (You can show that they are ultimately the same in a CCC.)
I actually think that the general one is easier to understand, so that’s what I’ll describe. This is basically a construction that captures the ideas of successor and recursion.
Given a category C with a terminal object **1**. A *natural number object* is an object *N* with two arrows z : 1 → *N*, and s : *N* → *N*, where z and s have the following property.
(∀ a ∈ Obj(C), (q : 1 → a, f : a → a) ∈ Arrows(C)) : u º z = q ∧ u º s = f º u.
That statement means the same thing as saying that the following diagram commutes:


Let’s just walk through that a bit. Think of the object **N** as a set. “s” is successor. So S is an arrow from *N* to *N* that maps each number to its successor. *N* is “rooted” by the terminal object; that gives it its basic structure. The “z” arrow from the terminal object maps to the zero value in *N*. So we’ve got an arrow that selects zero; and we’ve got an arrow that defines the successor.
Think of “q” as a predicate, and “u” as a statement that a value in *N* satisfies the predicate.
So u º z, that is the path from 1 to a by stepping through z and u is a statement that the predicate “q” is true for z (zero); because composing u (the number I’m composed with satisfies q) and z (the number zero) gives us “q” (zero satifies q).
Now the induction rule part. *N* maps to *N* by s – that’s saying that s is an arrow from a number in *N* to its successor. If you have an arrow composition that says “q is true for n”, then f can be composed onto that giving you “q is true for n + 1”.
If for all q, u is unique – that is, there is only *one* mapping where induction works; then you’ve got a categorical structure that satisfies the Peano axioms.

Math is Bad, Because it isn't Christian!

By way of Pharyngula, I saw something that I simply had to repeat here.
Every august, James Kennedy – a thoroughly repulsive ultra-fundy preacher from Coral Ridge Ministries – runs a conference called “Reclaiming America for Christ”. At this years conference, he featured a speech by Paul Jehle about “Evaluating your Philosophy of Education”.
Jehle is… umm… how do we say this politely?….
Ah, screw it. Jehle is a fucking frothing at the mouth nutjob lunatic asshole.
His basic argument – the argument that he *expects people to take seriously* – is that *everything* is either christian or non-christian. And if it’s non-christian, then christians shouldn’t look at it, listen to it, or study it. And you can’t *ever* make anything that started out non-christian christian.
How far does he go with this? Pretty damned far. Right into the domain of this here blog. In his talk, he uses the following story as an example:
>I was taking calculus. I was a mathematics major and I was at a Christian
>college that was called Christian, but was not Christian….
>I asked a question to my calculus professor: “What makes this course distinctly
>Christian?” He stopped. He said no one has ever asked that question before…
>He said, “Okay, I’m a Christian; you’re a Christian.”
>I said, “That’s not what I asked! What makes this calculus course distinctly
>Christian? What makes this different from the local secular university? Are we
>using the same text? Yes. Are you teaching it the same way? Yes. Then why is
>this called a Christian college and that one a non-Christian college?”
Yeah. Seriously. **Math is Bad**, because it’s *not explicitly christian*. I mean, it uses *zero*, which was invented by a *hindu*, and brought to europe by *muslims*. Algebra was invented by muslims! The word “algorithm” comes from the name of a *muslim* mathematician!
Uh-oh… I just realized that the alleged “Doctor” Jehle has a very serious problem. The way that we geeks heard his talk to write about it is because it was *digitized* – using a thoroughly non-christian technology – and posted on the *internet*, which is built using those non-christian *algorithms*. And to quite Jehle himself:
>But the issue is you cannot combine something by its nature which is pagan and
>built on humanistic principles and make it Christian by a magic wand.
So the internet, and computers, and digital recording, and the data compression that makes streaming audio work – they’re *non-christian*. And you cannot combine something non-Christian with something Christian.

φ, the Golden Ratio

Lots of folks have been asking me to write about φ, the golden ratio. I’m finally giving up and doing it. I’m not a big fan of φ. It’s a number
which has been adopted by all sorts of flakes and crazies, and there are alleged sightings of it in all sorts of strange places that are simply *not* real. For example, pyramid loons claim that the great pyramids in Egypt have proportions that come from φ – but they don’t. Animal mystics claim that the ratio of drones to larvae in a beehive is approximately φ – but it isn’t.
But it is an interesting number. My own personal reason for thinking it’s cool is representational: if you write φ as a continued fraction, it’s [1;1,1,1,1…]; and if you write it as a continued square root, it’s 1 + sqrt(1+sqrt(1+sqrt(1+sqrt(1…)))).
What is φ?
φ is the value so-called “golden ratio”: it’s the number that is a solution for the equation (a+b)/a = (a/b). (1+sqrt(5))/2. It’s the ratio where if you take a rectangle where the ratio of the length of the sides is 1:φ, then if you remove the largest possible square from it, you’ll get another rectangle whose sides have the ration φ:1. If you take the largest square from that, you’ll get a rectangle whose sides have the ratio 1:φ. And so on:


Allegedly, it’s the proportion of sides of a rectangle that produce the most aesthetically beautiful appearance. I’m not enough of a visual artist to judge that, so I’ve always just taken that on faith. But it *does* shows up in many places in geometry. For example, if you draw a five-pointed star, the ratio of the length of one of the point-to-point lines of the star to the length of the sides of the pentagon inside the star are φ:1:


φ is also related to the fibonacci series. In case you don’t remember, the fibonacci series is the set of numbers where each number in the series is the sum of the two previous: 1,1,2,3,5,8,13,… If Fib(n) is the *n*th number in the series, you can compute it as:

Quaternions: upping the dimensions of complex numbers

Last week, after I wrote about complex numbers, a bunch of folks wrote and said “Do quaternions next!” My basic reaction was “Huh?” I somehow managed to get by without ever being exposed to quaternions before. They’re quite interesting things.
The basic idea behind quaterions is: we see some amazing things happen when we expand the dimensionality of numbers from 1 (the real numbers) to 2 (the complex numbers). What if we add *more* dimensions?
It doesn’t work for three dimensions But you *can* create a system of numbers in *four* dimensions. As with complex numbers, you need a distinct unit for each dimension. In complex, those were 1 and i. For quaternions, you need for units: we’ll call them “1”, “i”, “j”, and “k”. In quaternion math, the units have the following properties:
1. i2 = j2 = k2 = ijk = -1
2. ij = k, jk = i, and ki=j
3. ji = -k, kj = -i, and ik=-j
No, that last one is *not* an error. Quaternion multiplication is *not* commutative: ab &neq; ba. It *is* associative over multiplication, meaning (a*b)*c = a*(b*c). And fortunately, it’s both commutative and associative over addition.
Anyway, we write a quaternion as “a + bi + cj + dk”; as in “2 + 4i -3j + 7k”.
Aside from the commutativity thing, quaternions work well as numbers; they’re a non-abelian group over multiplication (assuming you omit 0); they’re an abelian group over addition; they form a division algebra *over* the reals (meaning that you can define an algebra with operations up to multiplication and division over the quaternions, and operations in this algebra will generate the same results as normal real algebra if the i,j, and k components are 0.)
Quaternions are a relatively recent creation, so we know the history quite precisely. Quaternions were invented by a gentleman named William Hamilton on October 16th, 1843, while walking over the Broom bridge with his wife. He had been working on trying to work out something like the complex numbers with more dimensions for a while, and the crucial identity “ijk=-1” struck him while he was walking over the bridge. He immediately *carved it into the bridge* before he forgot it. Fortunately, this act of vandalism wasn’t taken seriously: the Broom bridge now has a plaque commemorating it!
Quaternions were controversial from the start. The fundamental step that Hamilton had taken in making quaternions work was abandoning commutativity, and many people believed that this was simply *wrong*. For example, Lord Kelvin said “Quaternions came from Hamilton after his really good work had been done; and, though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.”
Why are they non-commutative?
Quaternions are decidedly strange things. As I mentioned above, they’re non-commutative. Doing math with things where ab &neq; ba was a *huge* step, and it can be strange. But it is necessary for quaternions. Here’s why.
Let’s take the fundamental identity of quaternions: ijk = -1
Now, let’s multiply both sides by k: ij(k2) = -1k
So, ij(-1)=-k, and ij=k.
Now, multiply by *i* on the left on both sides: i2j = ik
And so, -1j = ik, and ik=-j.
Keep playing with multiplication in the identities, and you find that the identities simple *do not work* unless it’s non-commutative.
What are they good for?
For the most part, quaternions aren’t used much anymore. Initially, they were used in many places where we now use complex vectors, but that’s mostly faded away. What they *are* used for is all sorts of math that in any way involves *rotation*. Quaternions capture the essential properties of rotation.
To describe a rotation well, you actually need to have *extra* dimensions. To describe an *n*-dimensional rotation, you need *n+2* dimensions. Since in our three dimensional world, we rotate things around a *line* which is two dimensional, the description of the rotation needs *four* dimensions. And further, rotation itself is non-commutative.
Take a book and put it flat on the table, with the binding to the left. Flip it so that the binding stays on the same side, but now you’ve got the back cover on top, upside-down. Then rotate it clockwise 90 degrees on the table. You now have the book with its back cover up, and the binding facing away from you.
Now, start over: book right side up, binding facing left. Rotate it clockwise 90 degrees. Now flip it the same way you did before (over the x axis if you think of left as -x, right as +x, away from you as +y, up as +z, etc.)
What’s the result? You get the book with its back cover up, and the binding factor *towards* you. It’s 180 degrees different depending on which order you do it in. And 180 degrees is a reverse equivalent to the sign difference that you get in quaternion multiplication! Quaternions have exactly the right properties for describing rotations – and in particular, the *composition* of rotations.
The use of quaternions for rotations is used in computer graphics (when something rotates on the screen of your PS2, it’s doing quaternion math!); in satellite navigation; and in relativity equations (for symmetry wrt rotation).

Ω: my favorite strange number

Ω is my own personal favorite transcendental number. Ω isn’t really a specific number, but rather a family of related numbers with bizzare properties. It’s the one real transcendental number that I know of that comes from the theory of computation, that is important, and that expresses meaningful fundamental mathematical properties. It’s also deeply non-computable; meaning that not only is it non-computable, but even computing meta-information about it is non-computable. And yet, it’s *almost* computable. It’s just all around awfully cool.
So. What is it Ω?
It’s sometimes called the *halting probability*. The idea of it is that it encodes the *probability* that a given infinitely long random bit string contains a prefix that represents a halting program.
It’s a strange notion, and I’m going to take a few paragraphs to try to elaborate on what that means, before I go into detail about how the number is generated, and what sorts of bizzare properties it has.
Remember that in the theory of computation, one of the most fundamental results is the non-computability of *the halting problem*. The halting problem is the question “Given a program P and input I, if I run P on I, will it ever stop?” You cannot write a program that reads P and I and gives you the answer to the halting problem. It’s impossible. And what’s more, the statement that the halting problem is not computable is actually *equivalent* to the fundamental statement of Gödel’s incompleteness theorem.
Ω is something related to the halting problem, but stranger. The fundamental question of Ω is: if you hand me a string of 0s and 1s, and I’m allowed to look at it one bit at a time, what’s the probability that *eventually* the part that I’ve seen will be a program that eventually stops?
When you look at this definition, your reaction *should* be “Huh? Who cares?”
The catch is that this number – this probability – is a number which is easy to define; it’s not computable; it’s completely *uncompressible*; it’s *normal*.
Let’s take a moment and look at those properties:
1. Non-computable: no program can compute Ω. In fact, beyond a certain value N (which is non-computable!), you cannot compute the value of *any bit* of Ω.
2. Uncompressible: there is no way to represent Ω in a non-infinite way; in fact, there is no way to represent *any substring* of Ω in less bits than there are in that substring.
3. Normal: the digits of Ω are completely random and unpatterned; the value of and digit in Ω is equally likely to be a zero or a one; any selected *pair* of digits is equally likely to be any of the 4 possibilities 00, 01, 10, 100; and so on.
So, now we know a little bit about why Ω is so strange, but we still haven’t really defined it precisely. What is Ω? How does it get these bizzare properties?
Well, as I said at the beginning, Ω isn’t a single number; it’s a family of numbers. The value of *an* Ω is based on two things: an effective (that is, turing equivalent) computing device; and a prefix-free encoding of programs for that computing device as strings of bits.
(The prefix-free bit is important, and it’s also probably not familiar to most people, so let’s take a moment to define it. A prefix-free encoding is an encoding where for any given string which is valid in the encoding, *no prefix* of that string is a valid string in the encoding. If you’re familiar with data compression, Huffman codes are a common example of a prefix-free encoding.)
So let’s assume we have a computing device, which we’ll call φ. We’ll write the result of running φ on a program encoding as the binary number p as &phi(p). And let’s assume that we’ve set up φ so that it only accepts programs in a prefix-free encoding, which we’ll call ε; and the set of programs coded in ε, we’ll call Ε; and we’ll write φ(p)↓ to mean φ(p) halts. Then we can define Ω as:

Ωφ,ε = Σp ∈ Ε|p↓ 2-len(p)

So: for each program in our prefix-free encoding, if it halts, we *add* 2-length(p) to Ω.
Ω is an incredibly odd number. As I said before, it’s random, uncompressible, and has a fully normal distribution of digits.
But where it gets fascinatingly bizzare is when you start considering its computability properties.
Ω is *definable*. We can (and have) provided a specific, precise definition of it. We’ve even described a *procedure* by which you can conceptually generate it. But despite that, it’s *deeply* uncomputable. There are procedures where we can compute a finite prefix of it. But there’s a limit: there’s a point beyond which we cannot compute *any* digits of it. *And* there is no way to compute that point. So, for example, there’s a very interesting paper where the authors [computed the value of Ω for a Minsky machine][omega]; they were able to compute 84 bits of it; but the last 20 are *unreliable*, because it’s uncertain whether or not they’re actually beyond the limit, and so they may be wrong. But we can’t tell!
What does Ω mean? It’s actually something quite meaningful. It’s a number that encodes some of the very deepest information about *what* it’s possible to compute. It gives a way to measure the probability of computability. In a very real sense, it represents the overall nature and structure of computability in terms of a discrete probability. It’s an amazingly dense container of information – as a n infinitely long number and so thoroughly non-compressible, it contains an unmeasurable quantity of information. And we can’t get near most of it!

Q&A: What is information?

I received an email from someone with some questions about information theory; they relate to some sufficiently common questions/misunderstandings of information theory that I thought it was worth turning the answer into a post.
There are two parts here: my correspondent started with a question; and then after I answered it, he asked a followup.
The original question:
>Recently in a discussion group, a member posted a series of symbols, numbers,
>and letters:
>The question was what is its meaning and whether this has information or not.
>I am saying is does have information and an unknown meaning (even though I am
>sure they were just characters randomly typed by the sender), because of the
>fact that in order to recognize a symbol, that is information. Others are
>claiming there is no information because there is no meaning. Specifically that
>while the letters themselves have meaning, together the message or “statement”
>does not, and therefore does not contain information. Or that the information
>content is zero.
>Perhaps there are different ways to define information?
>I think I am correct that it does contain information, but just has no meaning.
This question illustrates one of the most common errors made when talking about information. Information is *not* language: it does *not* have *any* intrinsic meaning. You can describe information in a lot of different ways; in this case, the one that seems most intuitive is: information is something that reduces a possibility space. When you sit down to create a random string of characters, there is a huge possibility space of the strings that could be generated by that. A specific string narrows the space to one possibility. Anything that reduces possibilities is something that *generates* information.
For a very different example, suppose we have a lump of radium. Radium is a radioactive metal which breaks down and emits alpha particles (among other things). Suppose we take out lump of radium, and put it into an alpha particle detector, and record the time intervals between the emission of alpha particles.
The radium is *generating information*. Before we started watching it, there were a huge range of possibilities for exactly when the decays would occur. Each emission – each decay event – narrows the possibility space of other emissions. So the radium is generating information.
That information doesn’t have any particular *meaning*, other than being the essentially random time-stamps at which we observed alpha particles. But it’s information.
A string of random characters may not have any *meaning*; but that doesn’t mean it doesn’t contain information. It *does* contain information; in fact, it *must* contain information: it is a *distinct* string, a *unique* string – one possibility out of many for the outcome of the random process of generation; and as such, it contains information.
The Followup
>The explanation I have gotten from the person I have been debating, as to what
>he says is information is:
>I = -log2 P(E)
>I: information in bits
>P: probability
>E: event
>So for the example:
>He says:
>”I find that there is no meaning, and therefore I infer no information. I
>calculate that the probability of that string occuring was ONE (given no
>independent specification), and therefore the amount of information was ZERO. I
>therefore conclude it has no meaning.”
>For me, even though that string was randomly typed, I was able to look at the
>characters, and find there placement on a QWERTY keyboard, compare the
>characters to the digits on the hand, and found that over all, the left hand
>index finger keys were used almost twice as much. I could infer that a person
>was left handed tried to type the “random” sequence. So to me, even though I
>don’t know the math, and can’t measure the amount of information, the fact I
>was able to make that inference of a left handed typer tells me that there is
>information, and not “noise”.
The person quoted by my correspondent is an idiot; and clearly one who’s been reading Dembski or one of his pals. In my experience, they’re the only ones who continually stress that log-based equation for information.
But even if we ignore the fact that he’s a Dembski-oid, he’s *still* an idiot. You’ll notice that nowhere in the equation does *meaning* enter into the definition of information content. What *does* matter is the *probability* of the “event”; in this case, the probability of the random string of characters being the result of the process that generated it.
I don’t know how he generated that string. For the sake of working things through, let’s suppose that it was generated by pounding keys on a minimal keyboard; and let’s assume that the odds of hitting any key on that keyboard are equal (probably an invalid assumption, but it will only change the quantity of information, not its presence, which is the point of this example.) A basic minimal keyboard has about sixty keys. (I’m going by counting the keys on my “Happy Hacking Keyboard”. Of those, seven are shifts of various sorts: 2 shift, 2 control, one alt, one command, one mode), and one is “return”. So we’re left with 52 keys. To make things simple, let’s ignore the possibility of shifts affecting the result (this will result in us getting a *lower* information content, but that’s fine). So, it’s 80 characters; each *specific* character generated is an event with probability 1/52. So each *character* of the string has, by the Shannon-based definition quoted above, about 5.7 bits of information. The string as a whole has about 456 bits of information.
The fact that the process that generated it is random doesn’t make the odds of a particular string be one. For a trivial example, I’m going to close my eyes and pound on my keyboard with both hands twice, and then select the first 20 characters from each:
First: “`;kldsl;ksd.z.mdΩ.l.x`”
Second: “`lkficeewrflk;erwm,.r`”
Quite different results overall? Seems like I’m a bit heavy on the right hand on the k/l area. But note how different the outcomes are. *That* is information. Information without *semantics*; without any intrinsic meaning. But that *doesn’t matter*. Information *is not language*; the desire of [certain creationist morons][gitt] to demand that information have the properties of language is nonsense, and has *nothing* to do with the mathematical meaning of information.
The particular importance of this fact is that it’s a common creationist canard that information *must* have meaning; that therefore information can’t be created by a natural process, because natural processes are random, and randomness has no meaning; that DNA contains information; and therefore DNA can’t be the result of a natural process.
The sense in which DNA contains information is *exactly* the same as that of the random strings – both the ones in the original question, and the ones that I created above. DNAs information is *no different*.
*Meaning* is something that you get from language; information *does not* have to be *language*. Information *does not* have to have *any* meaning at all. That’s why we have distinct concepts of *messages*, *languages*, *semantics*, and *information*: because they’re all different.

Irrational and Transcendental Numbers

If you look at the history of math, there’ve been a lot of disappointments for mathematicians. They always start off with an idea of math as a clean, beautiful, elegant thing. And they seem to often wind up disappointed.
Which leads us into todays strange numbers: irrational and transcendental numbers. Both of them were huge disappointments to the mathematicians who discovered them.
So what are they? We’ll start with the irrationals. They’re numbers that aren’t integers, and that aren’t a ratio of any two integers. So you can’t write them as a normal fraction. If you write them as a continued fraction, then they go on forever. If you write them in decimal form, they go on forever without repeating. π, √2, *e* – they’re all irrational. (Incidentally, the reason that they’re called “irrational” isn’t because they don’t make sense, or because their decimal representation is crazy, but because they can’t be written *as ratios*. My high school algebra teacher who first talked about irrational numbers said that they were irrational because numbers that never repeated in decimal form weren’t sensible.)
The transcendentals are even worse. Transcendental numbers are irrational; but not only can transcendental numbers not be written as a ratio of integers; not only do their decimal forms go on forever without repeating; transcendental numbers are numbers that *can’t* be described by algebraic operations: there’s no finite sequence of multiplications, divisions, additions, subtractions, exponents, and roots that will give you the value of a transcendental number. √2 is not transcendental; *e* is.
The first disappointment involving the irrational numbers happened in Greece, around 500 bce. A rather brilliant man by the name of Hippasus, who was part of the school of Pythagoras, was studying roots. He worked out a geometric proof of the fact that √2 could not be written as a ratio of integers. He showed it to his teacher, Pythagoras. Pythagoras was convinced that numbers were clean and perfect, and could not accept the idea of irrational numbers. After analyzing Hippasus’s proof, and being unable to find any error in it, he became so enraged that he *drowned* poor Hippasus.
A few hundred years later, Eudoxus worked out the basic theory of irrationals; and it was published as a part of Euclid’s mathematical texts.
From that point, the study of irrationals pretty much disappeared for nearly 2000 years. It wasn’t until the 17th century that people started really looking at them again. And once again, it led to disappointment; but at least no one got killed this time.
Mathematicians had come up with yet another idea of what the perfection of math meant – this time using algebra. They decided that it made sense that algebra could describe all numbers; so you could write an equation to define any number using a polynomial with rational coefficients; that is, an equation using addition, subtraction, multiple, division, exponents, and roots.
Leibniz was studying algebra and numbers, and he’s the one who made the unfortunate discovery: lots of irrational numbers are algebraic; but lots of them *aren’t*. He discovered it indirectly, by way of the sin function. You see, sin(x) *can’t* be computed from x using algebra. There’s no algebraic function that can compute it. Leibniz called sin a transcendental function, since it went beyond algebra.
Building on the work of Leibniz, Liouville worked out that you could easily construct numbers that couldn’t be computed using algebra. For example, the constant named after Liouville consists of a string of 0s and 1s where 10-i is 1 if/f there is some integer n such that n!=i.
Not too long after, it was discovered that *e* was transcendental. The main reason for this being interesting is because up until that point, every transcendental number known was *constructed*; *e* is a natural, unavoidable constant. Once *e* was shown irrational, others followed; in one neat side-note, π was shown to be transcendental *using e*. One of the properties that they discovered after the transcendence of *e* was that any transcendental number raised to a non-transcendental power was transcendental. Since e is *not* transcendental – it’s 1 – then π must be transcendental.
The final disappointment in this area came soon after; Cantor, studying the irrationals, came up with the infamous “Cantor’s diagonalization” argument, which shows that there are *more* transcendental numbers than there are algebraic ones. *Most* numbers are not only irrational; they’re transcendental.
What does it mean, and why does it matter?
Irrational and transcendental numbers are everywhere. Most numbers aren’t rational. Most numbers aren’t even algebraic. That’s a very strange notion: *we can’t write most numbers down*.
Even stranger, even though we know, per Cantor, that most numbers are transcendental, it’s *incredibly* difficult to prove that any particular number is transcendental. Most of them are; but we can’t even figure out *which ones*.
What does that mean? That our math-fu isn’t nearly as strong as we like to believe. Most numbers are beyond us.
Some interesting numbers that we know are either irrational or transcendental:
1. *e*: transcendental.
2. π: transcendental.
3. √2 : irrational, but algebraic,
4. √x, for all x that are not perfect squares are irrational.
5. log23 is irrational.
6. Ω, Chaitin’s constant, is transcendental.
What’s interesting is that we really don’t know very much about how transcendentals interact; and given the difficulty of proving that something is transcendental, even for the most well-known transcendentals, we don’t know much of what happens when you put them together. π+*e*; π×*e*; ππ; *e**e*, πe are all numbers that we *don’t know* if they;’re transcendental. In fact, for π+*e* (and π-*e*) we don’t even know if it’s *irrational*.
That’s the thing about these number. We have *such* a weak grasp of them that even things that seem like they should be easy and fundamental, *we don’t know how to do*.

Friday Pathological Programming: Programming Through Grammars in Thue

Today’s treat: [Thue][thue], pronounced two-ay, after a mathematician named Axel Thue.
Thue is a language based on [semi-Thue][semi-thue] grammars. semi-Thue grammars are equivalent to Chosmky-0 grammars, except that they’re even more confusing: a semi-Thue grammar doesn’t distinguish between terminal and non-terminal symbols. (Since Chomsky-0 is equivalent to a turing machine, we know that Thue is turing complete.) The basic idea of it is that a program is a set of rewrite rules that tell you how to convert one string into another. The language itself is incredibly simple; programs written in it are positively mind-warping.
There are two parts to a Thue program. First is a set of grammar rules, which are written in what looks like pretty standard BNF: “*<lhs> ::= <rhs>*”. After that, there is a section which is the initial input string for the program. The two parts are separated by the “::=” symbol by itself on a line.
As I said, the rules are pretty much basic BNF: whatever is on the left-hand side can be replaced with whatever is on the right-hand side of the rule, whenever a matching substring is found. There are two exceptions, for input and output. Any rule whose right hand side is “:::” reads a line of input from the user, and replaces its left-hand-side with whatever it read from the input. And any rule that has “~” after the “::=” prints whatever follows the “~” to standard out.
So, a hello world program:
x ::=~ Hello world
Pretty simple: the input string is “x”; there’s a replacement for “x” which replaces “x” with nothing, and prints out “Hello world”.
How about something much more interesting. Adding one in binary. This assumes that the binary number starts off surrounded by underscore characters to mark the beginning and end of a line.
Let’s tease that apart a bit.
1. Start at the right hand side:
1. “1_::=1++” – If you see a “1” on the right-hand end of the number,
replace it with “1++”, removing the “_” on the end of the line. The rest
of the computation is going to use the “++” to mark the point in
string where the increment is going to alter the string next.
2. “0_::=1” – if you see a “0” on the right-hand-end, then replace it with 1. This doesn’t insert a “++”, so none of the other rules are going to do anything: it’s going to stop. That’s what you want: if the string ended in a zero, then adding one to it just changes that 0 to a 1.
2. Now, we get to something interesting: doing the addition:
1. If the string by the “++” is “01”, then adding one will change it to “10”, and that’s the end of the addition. So we just change it to 10, and get rid of the “++. ”
2. If the string by the “++” is “11”, then we change the last digit to “0”, and move the “++” over to the left – that’s basically like adding 1 to 9 in addition: write down a zero, carry the one to the left.
3. Finally, some termination cleanup. If we get to the left edge, and there’s a zero, after the “_”, we can get rid of it. If we get to the left edge, and there’s a “1++”, then we replace it with 10 – basically adding a new digit to the far left; and get rid of the “++”, because we’re done.
3. Finally, inputting the number “//::=:::” says “If you see “//”, then read some input from the user, and replace the “//” with whatever you read.
4. Now we have the marker for the end of the rules section, and the string to start the program is “_//_”. So that triggers the input rule, which inputs a binary number, and puts it between the “_”s.
Let’s trace this on a couple of smallish numbers.
* Input: “111”
1. “_111_” → “_111++”
2. “_111++_” → “_11++0”
3. “_11++0” → “_1++00”
4. “_1++00” → “1000”
* Input: “1011”
1. “_1011_” → “_1011++”
2. “_1011++” → “_101++0”
3. “_101++0” → “_1100”.
Not so hard, eh?
Decrement is the same sort of thing; here’s decrement with an argument hardcoded instead of reading from input:
Now, something more complicated. From [here][vogel], a very nicely documented program that computes fibonacci numbers. It’s an interesting mess – it converts things into a unary form, does the computation in unary, then converts back. This program uses a fairly neat trick to get comments. Thue doesn’t have any comment syntax. But they use “#” on the left hand side as a comment marker, and then make sure that it never appears anywhere else – so none of the comment rules can ever be invoked.
#::= fibnth.t – Print the nth fibonacci number, n input in decimal
#::= (C) 2003 Laurent Vogel
#::= GPL version 2 or later (
#::= Thue info at:
#::= This is modified thue where only foo::=~ outputs a newline.
#::= ‘n’ converts from decimal to unary-coded decimal (UCD).
#::= ‘.’ prints an UCD number and eats it.
#::= messages moving over ‘,’, ‘*’, ‘|’
#::= Decrement n. if n is >= 0, send message ‘2>’;
#::= when n becomes negative, we delete the garbage using ‘z’ and
#::= print the left number using ‘.’.
#::= move the left number to the right, reversing it (0 becomes 9, …)
#::= reversal is performed to help detect the need for carry. As the
#::= order of rule triggering is undetermined in Thue, a rule matching
#::= nine consecutive * would not work.
#::= when the copy is done, ‘d’ is at the right place to begin the sum.
#::= add left to right. 'f' moves left char by char when prompted by a '<'.
#::= it tells 'g' to its right what the next char is.
#::= when done for this sum, decrement nth.
|f’ ‘g’ drops an ‘i’ that increments the current digit.
|i::=’ tells ‘g’ to move left to the next decimal position,
#::= adding digit 0 if needed (i.e. nine ‘*’ in reversed UCD)
,1>g::=g::=|’ tells ‘g’ that the sum is done. We then prepare for copy (moving
#::= a copy cursor ‘c’ to the correct place at the beginning of the left
#::= number) and reverse (using ‘j’) the right number back.
#::= ‘j’ reverses the right number back, then leaves a copy receiver ‘d’
#::= and behind it a sum receiver ‘g’.
j,|::=’ sent by ‘-‘
#::= will start when hitting ‘g’ the first swap + sum cycle
#::= for numbers 0 (left) and 1 (reversed, right).
Ok. Now, here’s where we go *really* pathological. Here, in all it’s wonderful glory, is a [Brainfuck][brainfuck] interpreter written in Thue, written by Mr. Frederic van der Plancke. (You can get the original version, with documentation and example programs, along with a Thue interpreter written in Python from [here][vdp-thue])
#::=# BRAINFUNCT interpreter in THUE
#::=# by Frederic van der Plancke; released to the Public Domain.
WARN]WARN::=~ERROR: missing ‘]’ in brainfunct program; inserted at end
WARNDEC0WARN::=~WARNING: attempt at decrementing zero (result is still zero)