Monthly Archives: August 2006

The Poincarė Conjecture

The Poincarė conjecture has been in the news lately, with an article in the Science Times today. So I’ve been getting lots of mail from people asking me to explain what the Poincarė conjecture is, and why it’s a big deal lately?
I’m definitely not the best person to ask; the reason for the recent attention to the Poincarė conjecture is deep topology, which is not one of my stronger fields. But I’ll give it my best shot. (It’s actually rather bad timing. I’m planning on starting to write about topology later this week; and since the Poincarė conjecture is specifically about topology, it really wouldn’t have hurt to have introduced some topology first. But that’s how the cookie crumbles, eh?)
So what is it?
In 1904, the great mathematician Henri Poincarė was studying topology, and came up with an interesting question.
We know that if we look at closed two-dimensional surfaces forming three dimensional shapes (manifolds), that if the three dimensional shape has no holes in it, then it’s possible to transform it by bending, twisting, and stretching – but *without tearing* – into a sphere.
Poincarė wondered about higher dimensions. What about a three dimensional closed surface in a four-dimensional space (a 3-manifold)? Or a closed 4-manifold?
The conjecture, expressed *very* loosely and imprecisely, was that in any number of dimensions *n*, any figure without holes could be reduced to an *n*-dimensional sphere.
It’s trivial to show that that’s true for 2-dimensional surfaces in a three dimensional space; that is, that all closed 2-dimensional surfaces without holes can be transformed without tearing into our familiar sphere (which topologists call a 2-sphere, because it’s got a two dimensional surface).
For surfaces with more than two dimensions, it becomes downright mind-bogglingly difficult. And in fact, it turns out to be *hardest* to prove this for the 3-sphere. Nearly every famous mathematician of the 20th century took a stab at it, and all of them failed. (For example, Whitehead of the infamous Russell & Whitehead “Principia” published an incorrect proof in 1934.)
Why is it so hard?
Visualizing the shapes of closed 2-manifolds is easy. They form familiar figures in three dimensional space. We can imagine grabbing them, twisting them, stretching them. We can easily visualize almost anything that you can do with a closed two-dimensional surface. So reasoning about them is very natural to us.
But what about a “surface” that is itself three dimensional, forming a figure that takes 4 dimensions. What does it look like? What does *stretching* it mean? What is does a hole in a 4-dimensional shape look like? How can I tell if a particular complicated figure is actually just something tied in knots to make it look complicated, or if it actually has holes in it? What are the possible shapes of things in 4, 5, 6 dimensions?
That’s basically the problem. The math of it is generally expressed rather differently, but what it comes down to is that we don’t have a good intuitive sense of what transformations and what shapes really work in more than three dimensions.
What’s the big deal lately?
The conjecture was proved for all surfaces with seven or more dimensions in 1960. Five and six dimensions followed only two years later, proven in 1962. It took another 20 years to find a proof for 4 dimensions, which was finally done in 1982. Since 1982, the only open question was the 3- manifold. Was the Poincarė conjecture true for all dimensions?
There’s a million dollar reward for answer to that question with a correct proof; and each of the other proofs of the conjecture for higher dimensions won the mathematical equivalent of the Nobel Prize. So the rewards for figuring out the answer and proving it are enormous.
In 2003, a rather strange reclusive Russian mathematician named Grigory Perelman published a proof of a *stronger* version of the Poincarė conjecture under the incredibly obvious title “The Entropy Formula for the Ricci Flow and Its Geometric Application”.
It’s taken 3 years for people to work through the proof and all of its details in order to verify its correctness. In full detail, it’s over 1000 pages of meticulous mathematical proof, so verifying its correctness is not exactly trivial. But now, three years later, to the best of my knowledge, pretty much everyone is pretty well convinced of its correctness.
So what’s the basic idea of the proof? This is *so* far beyond my capabilities that it’s almost laughable for me to even attempt to explain it, but I’ll give it my best shot.
The Ricci flow is a mathematical transformation which effectively causes a *shrinking* action on a closed metric 3-surface. As it shrinks, it “pinches off” irregularities or kinks in the surface. The basic idea behind the proof is that it shows that the Ricci flow applied to metric 3-surfaces will shrink to a 3-sphere. The open question was about the kinks: will the Ricci flow eliminate all of them? Or are there structures that will *continually* generate kinks, so that the figure never reduces to a 3-sphere?
What Perelman did was show that all of the possible types of kinks in the Ricci flow of a closed metric 3-surface would eventually all disappear into either a 3-sphere, or a 3-surface with a hole.
So now that we’re convinced of the proof, and people are ready to start handing out the prizes, where’s Professor Perelman?
*No one knows*.
He’s a recluse. After the brief burst of fame when he first published his proof, he disappeared into the deep woods in the hinterlands of Russia. The speculation is that he has a cabin back there somewhere, but no one knows. No one knows where to find him, or how to get in touch with him.

Messing with big numbers: using probability badly

After yesterdays post about the sloppy probability from ann coulter’s chat site, I thought it would be good to bring back one of the earliest posts on Good Math/Bad Math back when it was on blogger. As usual with reposts, I’ve revised it somewhat, but the basic meat of it is still the same.
There are a lot of really bad arguments out there written by anti-evolutionists based on incompetent use of probability. A typical example is [this one][crapcrap]. This article is a great example of the mistakes that commonly get made with probability based arguments, because it makes so many of them. (In fact, it makes every single category of error that I list below!)
Tearing down probabilistic arguments takes a bit more time than tearing down the information theory arguments. 99% of the time, the IT arguments are built around the same fundamental mistake: they’ve built their argument on an invalid definition of information. But since they explicitly link it to mathematical information theory, all you really need to do is show why their definition is wrong, and then the whole thing falls apart.
The probabilistic arguments are different. There isn’t one mistake that runs through all the arguments. There’s many possibly mistakes, and each argument typically stacks up multiple errors.
For the sake of clarity, I’ve put together a taxonomy of the basic probabilistic errors that you typically see in creationist screeds.
Big Numbers
This is the easiest one. This consists of using our difficulty in really comprehending how huge numbers work to say that beyond a certain probability, things become impossible. You can always identify these argument, by the phrase “the probability is effectively zero.”
You typically see people claiming things like “Anything with a probability of less than 1 in 10^60 is effectively impossible”. It’s often conflated with some other numbers, to try to push the idea of “too improbable to ever happen”. For example, they’ll often throw in something like “the number of particles in the entire universe is estimated to be 3×10^78, and the probability of blah happening is 1 in 10^100, so blah can’t happen”.
It’s easy to disprove. Take two distinguishable decks of cards. Shuffle them together. Look at the ordering of the cards – it’s a list of 104 elements. What’s the probability of *that particular ordering* of those 104 elements?
The likelihood of the resulting deck of shuffled cards having the particular ordering that you just produced is roughly 1 in 10166. There are more possible unique shuffles of two decks of cards than there are particles in the entire universe.
If you look at it intuitively, it *seems* like something whose probability is
100 orders of magnitude worse than the odds of picking out a specific particle in the entire observable universe *should* be impossible. Our intuition says that any probability with a number that big in its denominator just can’t happen. Our intuition is wrong – because we’re quite bad at really grasping the meanings of big numbers.
Perspective Errors
A perspective error is a relative of big numbers error. It’s part of an argument to try to say that the probability of something happening is just too small to be possible. The perspective error is taking the outcome of a random process – like the shuffling of cards that I mentioned above – and looking at the outcome *after* the fact, and calculating the likelihood of it happening.
Random processes typically have a huge number of possible outcomes. Anytime you run a random process, you have to wind up with *some* outcome. There may be a mind-boggling number of possibilities; the probability of getting any specific one of them may be infinitessimally small; but you *will* end up with one of them. The probability of getting an outcome is 100%. The probability of your being able to predict which outcome is terribly small.
The error here is taking the outcome of a random process which has already happened, and treating it as if you were predicting it in advance.
The way that this comes up in creationist screeds is that they do probabilistic analyses of evolution built on the assumption that *the observed result is the only possible result*. You can view something like evolution as a search of a huge space; at any point in that spaces, there are *many* possible paths. In the history of life on earth, there are enough paths to utterly dwarf numbers like the card-shuffling above.
By selecting the observed outcome *after the fact*, and then doing an *a priori* analysis of the probability of getting *that specific outcome*, you create a false impression that something impossible happened. Returning to the card shuffling example, shuffling a deck of cards is *not* a magical activity. Getting a result from shuffling a deck of cards is *not* improbable. But if you take the result of the shuffle *after the fact*, and try to compute the a priori probability of getting that result, you can make it look like something inexplicable happened.
Bad Combinations
Combining the probabilities of events can be very tricky, and easy to mess up. It’s often not what you would expect. You can make things seem a lot less likely than they really are by making a easy to miss mistake.
The classic example of this is one that almost every first-semester probability instructor tries in their class. In a class of 20 people, what’s the probability of two people having the same birthday? Most of the time, you’ll have someone say that the probability of any two people having the same birthday is 1/3652; so the probability of that happening in a group of 20 is the number of possible pairs over 3652, or 400/3652, or about 1/3 of 1 percent.
That’s the wrong way to derive it. There’s more than one error there, but I’ve seen three introductory probability classes where that was the first guess. The correct answer is very close to 50%.
Fake Numbers
To figure out the probability of some complex event or sequence of events, you need to know some correct numbers for the basic events that you’re using as building blocks. If you get those numbers wrong, then no matter how meticulous the rest of the probability calculation is, the result is garbage.
For example, suppose I’m analyzing the odds in a game of craps. (Craps is a casino dice game using six sided dice.) If I say that in rolling a fair die, the odds of rolling a 6 is 1/6th the odds of rolling a one, then any probabilistic prediction that I make is going to be wrong. It doesn’t matter that from that point on, I do all of the analysis exactly right. I’ll get the wrong results, because I started with the wrong numbers.
This one is incredibly common in evolution arguments: the initial probability numbers are just pulled out of thin air, with no justification.
Misshapen Search Space
When you model a random process, one way of doing it is by modeling it as a random walk over a search space. Just like the fake numbers error, if your model of the search space has a different shape than the thing you’re modeling, then you’re not going to get correct results. This is an astoundingly common error in anti-evolution arguments; in fact, this is the basis of Dembski’s NFL arguments.
Let’s look at an example to see why it’s wrong. We’ve got a search space which is a table. We’ve got a marble that we’re going to roll across the table. We want to know the probability of it winding up in a specific position.
That’s obviously dependent on the surface of the table. If the surface of the table is concave, then the marble is going to wind up in nearly the same spot every time we try it: the lowest point of the concavity. If the surface is bumpy, it’s probably going to wind up a concavity between bumps. It’s *not* going to wind up balanced on the tip of one of the bumps.
If we want to model the probability of the marble stopping in a particular position, we need to take the shape of the surface of the table into account. If the table is actually a smooth concave surface, but we build our probabilistic model on the assumption that the table is a flat surface covered with a large number of uniformly distributed bumps, then our probabilistic model *can’t* generate valid results. The model of the search space does not reflect the properties of the actual search space.
Anti-evolution arguments that talk about search are almost always built on invalid models of the search space. Dembski’s NFL is based on a sum of the success rates of searches over *all possible* search spaces.
False Independence
If you want to make something appear less likely than it really is, or you’re just not being careful, a common statistical mistake is to treat events as independent when they’re not. If two events with probability p1 and p2 are independent, then the probability of both p1 and p2 is p1×p2. But if they’re *not* independent, then you’re going to get the wrong answer.
For example, take all of the spades from a deck of cards. Shuffle them, and them lay them out. What are the odds that you laid them out in numeric order? It’s 1/13! = 1/6,227,020,800. That’s a pretty ugly number. But if you wanted to make it look even worse, you could “forget” the fact that the sequential draws are dependent, in which case the odds would be 1/1313 – or 1/3×1014 – about 50,000 times worse.

Big Numbers: Bad Anti-Evolution Crap from

A reader sent me a copy of an article posted to “”. I can’t see the original article; is a subscriber-only site, and I’ll be damned before I *register* with that site.
Fortunately, the reader sent me the entire article. It’s another one of those stupid attempts by creationists to assemble some *really big* numbers in order to “prove” that evolution is impossible.
>One More Calculation
>The following is a calculation, based entirely on numbers provided by
>Darwinists themselves, of the number of small selective steps evolution would
>have to make to evolve a new species from a previously existing one. The
>argument appears in physicist Lee Spetner’s book “Not By Chance.”
>At the end of this post — by “popular demand” — I will post a bibliography of
>suggested reading on evolution and ID.
>Problem: Calculate the chances of a new species emerging from an earlier one.
>What We Need to Know:
>(1) the chance of getting a mutation;
>(2) the fraction of those mutations that provide a selective advantage (because
>many mutations are likely either to be injurious or irrelevant to the
>(3) the number of replications in each step of the chain of cumulative >selection;
>(4) the number of those steps needed to achieve a new species.
>If we get the values for the above parameters, we can calculate the chance of
>evolving a new species through Darwinian means.
Fairly typical so far. Not *good* mind you, but typical. Of course, it’s already going wrong. But since the interesting stuff is a bit later, I won’t waste my time on the intro 🙂
Right after this is where this version of this argument turns particularly sad. The author doesn’t just make the usual big-numbers argument; they recognize that the argument is weak, so they need to go through some rather elaborate setup in order to stack things to produce an even more unreasonably large phony number.
It’s not just a big-numbers argument; it’s a big-numbers *strawman* argument.
>(1) we will reckon the odds of evolving a new horse species from an earlier
>horse species.
>(2) we assume only random copying errors as the source of Darwinian variation.
>Any other source of variation — transposition, e.g., — is non-random and
>therefore NON-DARWINIAN.
This is a reasonable assumption, you see, because we’re not arguing against *evolution*; we’re arguing against the *strawman* “Darwinism”, which arbitrarily excludes real live observed sources of variation because, while it might be something that really happens, and it might be part of real evolution, it’s not part of what we’re going to call “Darwinism”.
Really, there are a lot of different sources of variation/mutation. At a minimum, there are point mutations, deletions (a section getting lost while copying), insertions (something getting inserted into a sequence during copying), transpositions (something getting moved), reversals (something get flipped so it appears in the reverse order), fusions (things that were separate getting merged – e.g., chromasomes in humans vs. in chimps), and fissions (things that were a single unit getting split).
In fact, this restriction *a priori* makes horse evolution impossible; because the modern species of horses have *different numbers of chromasomes*. Since the only change he allows is point-mutation, there is no way that his strawman Darwinism can do the job. Which, of course, is the point: he *wants* to make it impossible.
>(3) the average mutation rate for animals is 1 error every 10^10 replications
>(Darnell, 1986, “Molecular Cell Biology”)
Nice number, shame he doesn’t understand what it *means*. That’s what happens when you don’t bother to actually look at the *units*.
So, let’s double-check the number, and discover the unit. Wikipedia reports the human mutation rate as 1 in 108 mutations *per nucleotide* per generation.
He’s going to build his argument on 1 mutation in every 10^10 reproductions *of an animal*, when the rate is *per nucleotide*, *per cell generation*.
So what does that tell us if we’re looking at horses? Well, according to a research proposal to sequence the domestic horse genome, it consists of 3×109 nucleotides. So if we go by wikipedia’s estimate of the mutation rate, we’d expect somewhere around 30 mutations per individual *in the fertilized egg cell*. Using the numbers by the author of this wretched piece, we’d still expect to see 1 out of every three horses contain at least one unique mutation.
The fact is, pretty damned nearly every living thing on earth – each and every human being, every animal, every plant – each contains some unique mutations, some unique variations in their genetic code. Even when you start with a really big number – like one error in every 1010 copies; it adds up.
>(4) To be part of a typical evolutionary step, the mutation must: (a) have a
>positive selective value; (b) add a little information to the genome ((b) is a
>new insight from information theory. A new species would be distinguished from
>the old one by reason of new abilities or new characteristics. New
>characteristics come from novel organs or novel proteins that didn’t exist in
>the older organism; novel proteins come from additions to the original genetic
>code. Additions to the genetic code represent new information in the genome).
I’ve ripped apart enough bullshit IT arguments, so I won’t spend much time on that, other to point out that *deletion* is as much of a mutation, with as much potential for advantage, as *addition*.
A mutation also does not need to have an immediate positive selective value. It just needs to *not* have negative value, and it can propagate through a subset of the population. *Eventually*, you’d usually (but not always! drift *is* an observed phenomenon) expect to see some selective value. But that doesn’t mean that *at the moment the mutation occurs*, it must represent an *immediate* advantage for the individual.
>(5) We will also assume that the minimum mutation — a point mutation — is
>sufficient to cause (a) and (b). We don’t know if this is n fact true. We don’t
>know if real mutations that presumably offer positive selective value and small
>information increases can always be of minimum size. But we shall assume so
>because it not only makes the calculation possible, but it also makes the
>calculation consistently Darwinian. Darwinians assume that change occurs over
>time through the accumulation of small mutations. That’s what we shall assume,
>as well.
Note the continued use of the strawman. We’re not talking about evolution here; We’re talking about *Darwinism* as defined by the author. Reality be damned; if it doesn’t fit his Darwinism strawman, then it’s not worth thinking about.
>Q: How many small, selective steps would we need to make a new species?
>A: Clearly, the smaller the steps, the more of them we would need. A very
>famous Darwinian, G. Ledyard Stebbins, estimated that to get to a new species
>from an older species would take about 500 steps (1966, “Processes of Organic
>So we will accept the opinion of G. Ledyard Stebbins: It will take about 500
>steps to get a new species.
Gotta love the up-to-date references, eh? Considering how much the study of genetics has advanced in the last *40 years*, it would be nice to cite a book younger than *me*.
But hey, no biggie. 500 selective steps between speciation events? Sounds reasonable. That’s 500 generations. Sure, we’ve seen speciation in less than 500 generations, but it seems like a reasonable guestimate. (But do notice the continued strawman; he reiterates the “small steps” gibberish.)
>Q: How many births would there be in a typical small step of evolution?
>A: About 50 million births / evolutionary step. Here’s why:
>George Gaylord Simpson, a well known paleontologist and an authority on horse
>evolution estimated that the whole of horse evolution took about 65 million
>years. He also estimated there were about 1.5 trillion births in the horse
>line. How many of these 1.5 trillion births could we say represented 1 step in
>evolution? Experts claim the modern horse went through 10-15 genera. If we say
>the horse line went through about 5 species / genus, then the horse line went
>through about 60 species (that’s about 1 million years per species). That would
>make about 25 billion births / species. If we take 25 billion and divided it by
>the 500 steps per species transition, we get 50 million births / evolutionary
>So far we have:
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
Here we see some really stupid mathematical gibberish. This is really pure doubletalk – it’s an attempt to generate *another* large number to add into the mix. There’s no purpose in it: we’ve *already* worked out the mutation rate and the number of mutations per speciation. This gibberish is an alternate formulation of essentially the same thing; a way of gauging how long it will take to go through a sequence of changes leading to speciation. So we’re adding an redundant (and meaningless) factor in order to inflate the numbers.
>Q: What’s the chance that a mutation in a particular nucleotide will occur and
>take over the population in one evolutionary step?
>A: The chance of a mutation in a specific nucleotide in one birth is 10^-10.
>Since there are 50 million births / evolutionary step, the chance of getting at
>least one mutation in the whole step is 50 million x 10^-10, or 1-in-200
>(1/200). For the sake of argument we can assume that there is an equal chance
>that the base will change to any one of the other three (not exactly true in
>the real world, but we can assume to make the calculation easier – you’ll see
>that this assumption won’t influence things so much in the final calculation);
>so the chance of getting specific change in a specific nucleotide is 1/3rd of
>1/200 or 1-in-600 (1/600).
>So far we have:
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
>1/600 chance of a point mutation taking over the population in 1 evolutionary >step (derived from numbers by Darnell in his standard reference book)
This is pure gibberish. It’s so far away from being a valid model of things that it’s laughable. But worse, again, it’s redundant. Because we’ve already introduced a factor based on the mutation rate; and then we’ve introduced a factor which was an alternative formulation of the mutation rate; and now, we’re introducing a *third* factor which is an even *worse* alternative formulation of the mutation rate.
>Q: What would the “selective value” have to be of each mutation?
>A: According to the population-genetics work of Sir Ronald Fisher, the chances
>of survival for a mutant is about 2 x (selective value).
>”Selective Value” is a number that is ASSIGNED by a researcher to a species in
>order to be able to quantify in some way its apparent fitness. Selective Value
>is the fraction by which its average number of surviving offspring exceeds that
>of the population norm. For example, a mutant whose average number of surviving
>offspring is 0.1% higher than the rest of the population would have a Selective
>Value = 0.1% (or 0.001). If the norm in the population were such that 1000
>offspring usually survived from the original non-mutated organism, 1001
>offspring would usually survive from the mutated one. Of course, in real life,
>we have no idea how many offspring will, IN FACT, survive any particular
>organism – which is the reason that Survival Value is not something that you go
>into the jungle and “measure.” It’s a special number that is ASSIGNED to a
>species; not MEASURED in it (like a species’ average height, weight, etc.,
>which are objective attributes that, indeed, can we can measure).
>Fisher’s statistical work showed that a mutant with a Selective Value of 1% has
>a 2% chance of survival in a large population. A chance of 2-in-100 is that
>same as a chance of 1-in-50. If the Selective Value were 1/10th of that, or
>0.1%, the chance would be 1/10th of 2%, or about 0.2%, or 1-in-500. If the
>Selective Value were 1/100th of 1%, the chance of survival would be 1/100th of
>2%, or 0.02%, or 1-in-5000.
>We need a Selection Value for our calculation because it tells us what the
>chances are that a mutated species will survive. What number should we use? In
>the opinion of George Gaylord Simpson, a frequent value is 0.1%. So we shall
>use that number for our calculation. Remember, that’s a 1-in-500 chance of
>So far we have:
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
>1/600 chance of a point mutation taking over the population in 1 evolutionary
>step (derived from numbers by Darnell in his standard reference book)
>1/500 chance that a mutant will survive (as per G. G. Simpson)
And, once again, *another* meaningless, and partially redundant factor added in.
Why meaningless? Because this isn’t how selection works. He’s using his Darwinist strawman again: everything must have *immediate* *measurable* survival advantage. He also implicitly assumes that mutation is *rare*; that is, a “mutant” has a 1-in-500 chance of seeing its mutated genes propagate and “take over” the population. That’s not at all how things work. *Every* individual is a mutant. In reality, *every* *single* *individual* possesses some number of unique mutations. If they reproduce, and the mutation doesn’t *reduce* the likelihood of its offspring’s survival, the mutation will propagate through the generations to some portion of the population. The odds of a mutation propagating to some reasonable portion of the population over a number of generations is not 1 in 500. It’s quite a lot better.
Why partially redundant? Because this. once again, factors in something which is based on the rate of mutation propagating through the population. We’ve already included that twice; this is a *third* variation on that.
>Already, however, the numbers don’t crunch all that well for evolution.
>Remember, probabilities multiply. So the probability, for example, that a point
>mutation will BOTH occur AND allow the mutant to survive is the product of the
>probabilities of each, or 1/600 x 1/500 = 1/300,000. Not an impossible number,
>to be sure, but it’s not encouraging either … and it’s going to get a LOT
>worse. Why? Because…
**Bzzt. Bad math alert!**
No, these numbers *do not multiply*. Probabilities multiply *when they are independent*. These are *not* independent factors.
>Q. What are the chances that (a) a point mutation will occur, (b) it will add
>to the survival of the mutant, and (c) the last two steps will occur at EACH of
>the 500 steps required by Stebbins’ statement that the number of evolutionary
>steps between one species and another species is 500?
See, this is where he’s been going all along.
* He created the darwinian strawman to allow him to create bizzare requirements.
* Then he added a ton of redundant factors.
* Then he combined probabilities as if they were independent when they weren’t.
* and *now* he adds a requirement for simultaneity which has no basis in reality.
>A: The chances are:
>The product of 1/600 x 1/500 multiplied by itself 500 times (because it has to
>happen at EACH evolutionary step). Or,
>Chances of Evolutionary Step 1: 1/300,000 x
>Chances of Evolutionary Step 2: 1/300,000 x
>Chances of Evolution Step 3: 1/300,000 x …
>. . . Chances of Evolution Step 500: 1/300,000
*Giggle*, *snort*. I seriously wonder if he actually believe this gibberish. But this is just silly. For the reasons mentioned above: this is taking the redundant factors that he already pushed into each step, inflating them by adding the simultaneity requirement, and then *exponentiating* them.
>This is approximately equal to:
>2.79 x 10^-2,739
>A number that is effectively zero.
As I’ve said before: no one who understands math *ever* uses the phrase *effectively zero* in a mathematical argument. There is no such thing as effectively zero.
On a closing note, this entire thing, in addition to being both an elaborate strawman *and* a sloppy big numbers argument is also an example of another kind of mathematical error, which I call a *retrospective error*. A retrospective error is when you take the outcome of a randomized process *after* it’s done, treat it as the *only possible outcome*, and compute the probability of it happening.
A simple example of this is: shuffle a deck of cards. What’s the odds of the particular ordering of cards that you got from the shuffle? 1/52! = 1/(8 * 1067). If you then ask “What was the probability of a shuffling of cards resulting in *this order*?”, you get that answer: 1 in 8 * 1067 – an incredibly unlikely event. But it *wasn’t* an unlikely event; viewed from the proper perspective, *some* ordering had to happen: any result of the shuffling process would have the same probability – but *one* of them had to happen. So the odds of getting a result whose *specific* probability is 1 in 8 * 1067 was actually 1 in 1.
The entire argument that our idiot friend made is based on this kind of an error. It assumes a single unique path – a single chain of specific mutations happening in a specific order – and asks about the likelihood that *single chain* leading to a *specific result*.
But nothing ever said that the primitive ancestors of the modern horse *had* to evolve into the modern horse. If they weren’t to just go extinct, they would have to evolve into *something*; but demanding that the particular observed outcome of the process be the *only possibility* is simply wrong.


How can you talk about interesting numbers without bringing up π?
The oldest value we know for π comes from the Babylonians. (Man, but those guys were impressive mathematicians; almost any time you look at the history of fundamental numbers and math, you find the Babylonians in the roots.) They tended to work in ratios, and the approximation that they used 25/8s (3.125), which is not a terribly bad approximation. Especially when you realize *when* they came up with this approximation: 1900BC!
The next best approximation came from Egypt, around the time of Pharaoh Amenemhat in the mid 17th century BC, where it had been refined to 256/81 (3.1605). Which isn’t such a great step forward; it’s actually a hair *farther* from the true value of π than the Babylonian approximation.
We don’t see any real progress until we get to the Greek. Archimedes (yes, *that* Archimedes) worked out a better approximation. He used a *really* neat trick. He worked out how to compute the perimeter of a 96-sided polygon; and then worked out the perimeter of the largest 96-gon that could be drawn inside the circle; and the smallest 96-gon that the circle could be drawn inside. Here’s a quick diagram using octagons to give you a clearer idea of what he did:


That gives you an approximation of π as the average of 223/71 and 22/7 – or 3.14185.
And next, we find progress in India, where the mathematician Madhava worked out a power series definition of π, which allowed him to compute π to *13* decimal places. *13* decimal places, computing a power series completely by hand! Astounding!
Even better, during the same century, when this work made its way to the great Persian Arabic mathematicians, they worked it out to 9 digits in base-60 (base-60 was in inheritance from the Babylonians). 9 digits in base 60 is roughly 16 digits in decimal!
And finally, we get back to Europe; in the 17th century, van Ceulen used the power series to work out 35 decimal places of π. Alas, the publication of it was on his tombstone.
Then we get to the 19th century, when William Rutherford calculuated *208* decimal places of π. The real pity of that is that he made an error in the 153rd digit, and so only the first 152 digits were correct. (Can you imagine the amount of time he wasted?)
That was pretty much it until the first computers came along, and once that happened, the fun went out of trying to calculate it, since any bozo could write a program to do it. There’s a website that will let you look at its computation of [the first *2 hundred million* digits of π][digits].
The *name* of π came from Euler (he of the great equation, e + 1 = 0). It’s an abbreviation for *perimeter* in Greek.
There’s also one bit of urban myth about π that is, alas, not true. The story goes that some state in the American midwest (Indiana, Iowa, Ohio, Illinois in various versions) passed a law that π=3. Didn’t happen.
What is π?
Pretty much everyone is familiar with what π is. Take a circle on a plane. Measure the distance around the outside of it, which is called the circumference. Divide that by the diameter of the circle. That’s π.


It also shows up in almost anything else involving measurements of circles and angles, from things like the sin function to the area of a circle to the volume of a sphere.
Where it gets interesting is when you start to ask about how to compute it. You get the relatively obvious things – like equations based on integrals to calculate the area of a circle. But then, you get the surprising ones. After all, π is a fundamental geometric number; it comes from the circumference of a circle.
So why in the world is the radius of a circle related to an infinite sum of the reciprocals of odd numbers? It is.
π/4 is the sum of the infinite series 1/1 – 1/3 + 1/5 – 1/7 + 1/9 – 1/11 + 1/13 …., or more formally:
Or how about this? π/2 =
What about a bit of probability? Pick any integer at random. What’s the probability that neither it nor any of its factors is a perfect square? 6/π2.
How about a a connection between circles and prime numbers? Take the set of all of the prime numbers P, then the *product* of all factors (1-1/p2) is.. 6/π2.
What’s the average number of ways to write an integer as the sum of two perfect squares? π/4.
There’s also a funny little trick for memorizing π, called a piem (argh!). A piem is a little poem where the length of each word is a digit of π.
How I need a drink, (3 1 4 1 5)
alcoholic of course, (9 2 6)
after the heavy lectures (5 3 5 8)
involving quantum mechanics. (9 7 9)
It’s a strange, fascinating little number.

Innumerate Fundamentalists and π

The stupidity and innumeracy of Americans, and in particular American fundamentalists, never ceases to astound me.
Recently on Yahoo, some bozo posted [something claiming that the bible was all correct][yahoo], and that genetics would show that bats were actually birds. But that’s not the real prize. The *real* prize of the discussion was in the ensuing thread.
A doubter posted the following question:
>please explain 1 kings 7.23 and how a circle can have a circumference of 30 of
>a unit and a radiius of 10 of a unit and i will become a christian
>23 And he made the Sea of cast bronze, ten cubits from one brim to the other;
>it was completely round. Its height was five cubits, and a line of thirty
>cubits measured its circumference. (1 Kings 7:23, NKJV)
And the answer is one of the all-time greats of moronic innumeracy:
>Very easy. You are talking about the value of Pi.
>That is actually 3 not 3.14…….
>The digits after the decimal forms a geometric series and
>it will converge to the value zero. So, 3.14…..=3.00=3.
>Nobody still calculated the precise value of Pi. In future
>they will and apply advenced Mathematics to prove the value of Pi=3.

Return of the Bible Code Bozos

Remember back in the end of june, when I [talked about these insane bozos][code] who were [predicting that a nuclear bomb would be blown up in the UN plaza?][firsttime] And they were on their fourth prediction about when this would happen? And how each time they blew it, [they came up with an excuse for why they were wrong?][update]
I thought it would be fun to take a look back at them, and see what they’re up to six weeks later. Naturally, [they’re still at it.][bozos] They’ve updated their prediction 5 more times now.
But there are a couple of really amusing things about what they’re up to now.
First, they’ve declared victory:
>We correctly predicted that the UN would lose its headship in 2006Tammuz (this
>being the 2nd head of the image of the Beast of Revelation 13) which gets a
>death stroke but then recovers. It lost its head on 2006July12 when Israel
>invaded Lebanon without a UN mandate. The UN lost credibility and lost control >for a month. It lost headship over Israel for one month. But the image of the
>Beast does not lose two heads, it only loses one head. Each of the 7 heads of
>the image of the UN Beast stands for one month of military headship over the
>governments of the world, just as the 7 heads of the UN Beast itself each stand
>for one year of military headship over the governments of the world. So we knew
>it had to regain headship this month and it did by virtue of the multilaterally
>agreed UN Security Council ceasefire resolution of Friday 2006August11. Now the
>UN will declare Peace and Security and then sudden destruction will befall them
>according to 1 Thessalonians 5:3. So please leave NYC for the sabbath!!!
And second, they’ve worked *their errors* into their code. That is, they’re now saying that their bible code *predicts* that they’d get it wrong seven times, so that their *eighth* prediction will be right. But they’ve made 9 predictions! So what to do? Well, hedge of course. You see, only the *correct* incorrect predictions count. So they have an explanation for why *some* of the incorrect predictions were *correct* incorrect predictions, and some were *incorrect* incorrect predictions.
>On April 29th we started predicting dates for a terrorist Nuclear Bomb at the
>UN in midtown. After making several mistakes we realised that 1 Kings 18:43
>declared we would get it right at the 8th attempt (Since Elijah asked his
>attendant to go and look for a man made mushroom cloud 7 times after the first
>no show, making 8 attempts in all). The trouble is that we have found it hard
>to decide just what a valid attempt is. Here are all the incorrect dates we
>have so far proposed…
>2006Iyyar21 (May 19/20) [7 days after 2006Iyyar14]
>2006Iyyar28 (May 26/27) [7 days after first mistaken date]
>2006Iyyar11 (June 8/9) [First day of the 2,000 pigs of Mark 5 incorrectly calculated]
>2006Sivan12 (June 9/10) [First day of the 2,000 pigs of Mark 5 correctly calculated but misinterpreted]
>2006Tammuz2/3 (June30-July2) [7th sabbath after first mistaken date/7th sabbath omitting 2006Sivan5]
>2006Tammuz4-6 (July2 – July 4) [7th sabbath day when we asked people to lookout]
>2006Tammuz28/29 (July 25 – 27) [Assumed contest began on 911]
>2006Ab3/4 (July 30 – August 1) [Assumed second ‘day’ of contest began when wheat went limit up in Chicago]
>2006Ab8 (August 4/5) [Assumed second ‘day’ of contest began on non BLC day of 2006Adar28 so that 1750th day is sabbath]

>What we now propose is…
>2006Ab15 (August 11/12) [7th sabbath lookout period after first mistake]
>So one could say you have had 9 attempts, you screwed up each time, give up and
>try doing something less dramatic! But although our repeated failures would at
>first sight take more and more credibility away from our work, the world
>security situation is for a fact moving in a direction that lends more and more
>credibility to our work. We started behaving as prophets of Doom for the UN and
>for Manhattan on April 29th 2006, when we proposed 2006Iyyar11 (June 8/9) as
>the date of the first nuclear terrorist attack. At that time the following
>global security situation existed…
Gotta love it, eh?
I’ll give ’em credit for their persistence, if not for their intelligence, or their sanity, or their rationality, or even their theology.

A Brilliant φ link

In the comments onmy post about φ, Polymath, (whose [blog][polymath] is well worth checking out) provided a really spectacular [link involving φ][desert]. It’s an excerpt from a book called “[Mathematical Gems 2][gems]”, showing a problem that came from John Conway, called the “Sending Scouts into the Desert” problem.
The problem is:
Suppose you’re given a checkerboard with all of the squares on the bottom filled. You’re allowed to do standard checks jumps (jump over a man and remove it), but you can’t jump diagonally, only up, left, or right. How far *up* can you get a man? How many men do you need to move to get that far?
To get to the first row above your men, you need to jump one man. To get two rows, you need to jump four men; three rows, you’ll need 8 men; four rows, you’ll need 20 men. How about five rows?
You *can’t* get five rows doing up and sideways jumps. Even if you had an infinitely wide checkerboard, with as many full rows of mens as you want.
The proof, along with a nice Java applet to let you try the solvable ones, is at the link. And φ is an inextricable part of the proof!

Friday Pathological Programming: Unlambda, or Programming Without Variables

Todays tasty treat: a variable-free nearly-pure functional programming language: Unlambda.
Unlambda is based on the SKI combinator calculus. The [SKI calculus][ski] is a way of writing lambda calculus without variables, and without lambda. It’s based on three combinators: S, K, and I:
1. S = λ x y z . x z (y z)
2. K = λ x y . x
3. I = λ x . x
Given only S and K, you can actually write *any* lambda calculus expression – and therefore, you can write any computable function using nothing but S and K. I is often added for convenience; it’s just a shorthand for “SKK”.
You can do recursion in SKI; the Y combinator of lambda calculus is:
Y = S S K (S (K (S S (S (S S K)))) K)
Unlambda is based on SKI; but it extends it with a few additional kinds of sugar to make things easier to write. (Nothing makes Unlambda easier to *read*!) The basic idea is that *everything* in unlambda is a function; and in particular, everything is a function that takes a *single* variable: you don’t need more than one parameter, because you can always using [currying][curry] to express any multiparameter function as a series of single-parameter functions.
(As a refresher, if you have a two-parameter function in lambda calculus, like “λ x y . x + y”; *currying* is translating it into a one parameter function that returns a one parameter function: “λ x .(λ y. x + y)”.)
There are eight basic instructions in Unlambda:
1. ““”` : the application operator. “`’AB`” applies the function `A` to the parameter `B`. You can think of this as being something like the “(” in Lisp; there is no close paren because since every function takes only one parameter, there is no *need* for a close paren. And hey, adding a syntactically unnecessary close paren would only make the code easier to read, and we can’t have that!
2. “`k`” : the K combinator.
3. “`s`” : the S combinator.
4. “`i`” : the I combinator.
5. “`v`” : a “drop” combinator: takes a parameter, discards it, and returns “`v`”
6. “`d`” : the “delay” operator; a way of doing *lazy evaluation* in Unlambda. Unlambda normally evaluates the argument before it
applies a function. `’dA` is the same thing as `A`, except that `A` isn’t evaluated until “`’dA`” is applied to something else.
7. “`c`” : the “call-with-current-continuation” combinator. This is equivalent to [“call/cc”][callcc] in Scheme. What it does is call some function with the current state of the computation captured as a function as its parameter. What this means is that “`’cAB`” calls “`A`”. “`A`” receives the value “`cAB`” as a parameter; during A, it can return a value, *or* it can call the continuation, which will evaluate “`cAB`” again. So this is the function-based equivalent of a loop. “c” turns the “loop body” into a function; so at the end of the loop, you re-invoke the continuation with a new parameter to run the next iteration.
8. “`.`” : The “.” operator is actually a meta-operator. “`.x`” is a function which returns its parameter, *and* prints the character “x”.
9. “`r`” is a shorthand for “`.CR`”, where CR is the carriage return character.
And that is it. The entire language.
Note that there are no numbers! We need to build numbers using church numerals.
* The church numeral “0” is “λ f. (λ x . x)”. In Unlambda, that’s “`’ki`”.
* The church numeral “1” is “λ f . (λ x . f x)”; in unlambda, that’s “`i`”.
* The church numeral “2” is “λ f . (λ x. f f x)”; or “`”s”s’kski`”.
* The church numeral “3” is “λ f . (λ x . f f f x)”; or “`”s”s’ksk”s”s’kski`”.
* And so on.. Just keep appending “`”s”skski`” for each successive number.
So… On to a couple of programs!
Hello world in Unlambda is:
`r“““““`.H.e.l.l.o. .w.o.r.l.di
It’s pretty straightforward. It’s actually sort of easier to understand if you look from right to left. The “`.H.e.l.l.o…`” is basically a series of “`i`” combinators, which print out the characters of hello world. The trailing “`i`” is there because “`.d`” needs a parameter in order to do anything. And the list of “`’`”s at the beginning are there to make the “.” functions be applied. Finally, apply “`’r`” to the whole thing – which prints a newline at the end.
Next, a counter: starts with “N=1”; print “N” asterisks, add one, and then repeat.
If you look at it, you can see the number pattern in there; “`”s”s’ksk`”. That fragment is an SKI function to add one. You capture a continuation for the main iteration; run “`i”s’k’cN`”; that’s a fragment that adds “1” to N, and then prints N. Since you’ve captured the continuation, you can re-invoke it on 1+n; the rest captures *another* continuation which is the inner loop for printing “*”s; it does that, and then it invokes the out continuation to go back and do it for “N+1”.
One last, which I’ll leave for you to trace through yourself: this one generates the fibonacci sequence; and for each element of it, it prints out the numbers using a line of asterisks containing “N” asterisks for the number “N”:
I’ll also point out that there are a ton of variants on Unlambda, ranging from the really interesting ([LazyK][lazyk]) to the bizzarely minimal ([Iota][iota]) to the downright goofy ([Jot][jot]).

From Lambda Calculus to Cartesian Closed Categories

This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)
Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.
First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.
So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:
1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’
So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.
The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)
We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.
So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:
* ∀ A ∈ basetypes(**L**), [A] = AC ∈ C(**L**)
* [unit] = 1C
Next, we need to define the typing rules for complex types:
* [ A × B ] = [A] × [B]
* [ A → B ] = [B][A]
Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:
* [∅] = 1C
* [Γ, x : A] = [Γ] × [A]
We also need to describe what to do with the values of the primitive types:
* For each value v : A, there is an arrow v : 1 → AC.
And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).
* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*
* [Γ :- a : AC] = a º ! : [Γ] → [AC] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*
* [Γ x : A :- x : A] = π2 : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*
* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π1 where π1 : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*
* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B][A] *(A function typing rule: the function maps to an arrow from the type context to an exponential [B][A], which is a function from A to B.)*
* [Γ :- (M M’) : B] = evalC,B º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*
There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.
The intuition behind this is:
* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.
* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)
Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)

Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.