i : the Imaginary Number

After the amazing response to my post about zero, I thought I’d do one about something that’s fascinated me for a long time: the number *i*, the square root of -1. Where’d this strange thing come from? Is it real (not in the sense of real numbers, but in the sense of representing something *real* and meaningful)?
The number *i* has its earliest roots in some of the work of early arabic mathematicians; the same people who really first understood the number 0. But they weren’t quite as good with *i* as they were with 0: they didn’t really get it. They had some concept of roots of a cubic equation, where sometimes the tricks for finding the roots of the equation *just didn’t work*. They knew there was something going on, some way that the equation needed to have roots, but just what that really mean, they didn’t get.
Things stayed that way for quite a while. Various others, like the Greeks, encountered them in various ways when things didn’t work, but no one *really* grasped the idea that algebra required numbers that were more than just points on a one-dimensional number-line.
The next step was in Italy, over 1000 years later. During the 16th century, people were searching for solutions to the cubic equations – the same thing that the arabic scholars were looking at. But getting some of the solutions – even solutions to equations with real roots – required playing with the square root of -1 along the way. It was first really described by Rafael Bombelli in the context of the solutions to the cubic; but Bombello didn’t really think that they were *real*, *meaningful* numbers: it was viewed as a useful artifact of the process of solving the equations, but it wasn’t accepted.
It got its name as the *imaginary number* as a result of a diatribe by Rene Descartes, who believed it was a phony artifact of sloppy algebra. He did not accept that it had any meaning at all: thus it was an “imaginary” number.
They finally came into wide acceptance as a result of the work of Euler in the 18th century. Euler was probably the first to really, fully comprehend the complex number system created by the existence of *i*. And working with that, he discovered one of the most fascinating and bizzare mathematical discoveries ever, known as *Euler’s equation*. I have no idea how many years it’s been since I was first exposed to this, and I *still* have a hard time wrapping my head around *why* it’s true.

e = cos θ + i sin θ

And what *that* really means is:

e = -1

That’s just astonishing. The fact that there is *such* a close relationship between i, π, and e is just shocking to me.
What *i* does
Once the reality of *i* as a number was accepted, mathematics was changed irrevocably. Instead of the numbers described by algebraic equations being points on a line, suddenly they become points *on a plane*. Numbers are really *two dimensional*; and just like the integer “1” is the unit distance on the axis of the “real” numbers, “i” is the unit distance on the axis of the “imaginary” numbers. As a result numbers *in general* become what we call *complex*: they have two components, defining their position relative to those two axes. We generally write them as “a + bi” where “a” is the real component, and “b” is the imaginary component.


The addition of *i* and the resulting addition of complex numbers is a wonderful thing mathematically. It means that *every* polynomial equation has roots; in particular, a polynomial equation in “x” with maximum exponent “n” will always have exactly “n” complex roots.
But that’s just an effect of what’s really going on. The real numbers are *not* closed algebraically under multiplication and addition. With the addition of *i*, multiplicative algebra becomes closed: every operation, every expression in algebra becomes meaningful: nothing escapes the system of the complex numbers.
Of course, it’s not all wonderful joy and happiness once we go from real to complex. Complex numbers aren’t ordered. There is no < comparison for complex numbers. The ability to do meaningful inequalities evaporates when complex numbers enter the system in a real way.
What *i* means
But what do complex numbers *mean* in the real world? Do they really represent real phenomena? Or are they just a mathematical abstraction?
They’re very real. There’s one standard example that everyone uses: and the reason that we all use it is because it’s such a perfect example. Take the electrical outlet that’s powering your computer. It’s providing alternating current. What does that mean?
Well, the *voltage* – which (to oversimplify) can be viewed as the amount of force pushing the current – is complex. In fact, if you’ve got a voltage of 110 volts AC at 60 hz (the standard in the US), what that means is that the voltage is a number of magnitude “110”. If you were to plot the “real” voltage on a graph with time on the X axis and voltage of the Y, you’d see a sine wave:


But that’s not really accurate. If you grabbed the wire when the voltage is supposedly zero on that graph, *you’d still get a shock*! Take the moment marked “t1” on the graph above. The voltage at time t1 on the complex plane is a point at “110” on the real axis. At time t2, the voltage on the “real” axis is zero – but on the imagine axis it’s 110. In fact, the *magnitude* of the voltage is *constant*: it’s always 110 volts. But the vector representing that voltage *is rotating* through the complex plane.


You also see it in the Fourier transform: when we analyze sound using a computer, one of the tricks we use is decomposing a complex waveform (like a human voice speaking) into a collection of basic sine waves, where the sine waves added up equal the wave at a given point in time. The process by which we
do that decomposition is intimately tied with complex numbers: the fourier transform, and all of the analyses and transformations built on it are dependent on the reality of complex numbers (and in particular on the magnificent Euler’s equation up above).

Bad, bad, bad math! AiG and Information Theory

While taking a break from some puzzling debugging, I decided to hit one of my favorite comedy sites, Answers in Genesis. I can pretty much always find something sufficiently stupid to amuse me on their site. Today, I came across a gem called [“Information, science and biology”][gitt], by the all too appropriately named “Werner Gitt”. It’s yet another attempt by a creationist twit to find some way to use information theory to prove that life must have been created by god.
It looks like the Gitt hasn’t actually *read* any real information theory, but has rather just read Dembski’s wretched mischaracterizations, and then regurgitated and expanded upon them. Dembski was bad enough; building on an incomplete understand of Dembski’s misrepresentations and errors is just astonishing.
Anyway, after butchering an introduction to Shannon theory, he moves onward.
>The highest information density known to us is that of the DNA
>(deoxyribonucleic acid) molecules of living cells. This chemical storage medium
>is 2 nm in diameter and has a 3.4 NM helix pitch (see Figure 1). This results
>in a volume of 10.68 x 10-21 cm3 per spiral. Each spiral contains ten chemical
>letters (nucleotides), resulting in a volumetric information density of 0.94 x
>1021 letters/cm3. In the genetic alphabet, the DNA molecules contain only the
>four nucleotide bases, that is, adenine, thymine, guanine and cytosine. The
>information content of such a letter is 2 bits/nucleotide. Thus, the
>statistical information density is 1.88 x 1021 bits/cm3.
This is, of course, utter gibberish. DNA is *not* the “highest information density known”. In fact, the concept of *information density* is not well-defined *at all*. How do you compare the “information density” of a DNA molecule with the information density of an electromagnetic wave emitted by a pulsar? It’s meaningless to compare. But even if we accept information as only physically encoded, Consider the information density of a crystal, like a diamond. A diamond is an incredibly compact crystal of carbon atoms. There are no perfect diamonds: all crystals contain irregularities and impurities. Consider how dense the information of that crystal is: the position of every flaw, every impurity, the positions of the subset of carbon atoms in the crystal that are carbon-14 as opposed to carbon-12. Considerably denser than DNA, huh?
After this is where it *really* starts to get silly. Our Gitt claims that Shannon theory is incomplete, because after all, it’s got a strictly *quantitative* measure of information: it doesn’t care about what the message *means*. So he sets out to “fix” that problem. He proposes five levels of information: statistics, syntax, semantics, pragmatics, and apobetics. He claims that Shannon theory (and in fact information theory *as a whole*) only concerns itself with the first; because it doesn’t differentiate between syntactically valid and invalid information.
Let’s take a quick run through the five, before I start mocking them.
1. Statistics. This is what information theory refers to as information content, expressed in terms of an event sequence (as I said, he’s following Dembski); so we’re looking at a series of events, each of which is receiving a character of a message, and the information added by each event is how surprising that event was. That’s why he calls it statistical.
2. Syntax. The structure of the language encoded by the message. At this level, it is assumed that every message is written in a *code*; you can distinguish between “valid” and “invalid” messages by checking whether they are valid strings of characters for the given code.
3. Semantics. What the message *means*.
4. Pragmatics. The *primitive intention* of the transmitter of the message; the specific events/actions that the transmitter wanted to occur as a result of sending the message.
5. Apobetics: The *purpose* of the message.
According to him, level 5 is the most important one.
Throughout the article, he constantly writes “theorems”. He clearly doesn’t understand what the word “theorem” means, because these things are just statements that he would *like* to be true, but which are unproven, and often unprovable. A few examples?
For example, if we look at the section about “syntax”, we find the following as theorems:
>Theorem 4: A code is an absolutely necessary condition for the representation
>of information.
>Theorem 5: The assignment of the symbol set is based on convention and
>constitutes a mental process.
>Theorem 6: Once the code has been freely defined by convention, this definition
>must be strictly observed thereafter.
>Theorem 7: The code used must be known both to the transmitter and receiver if
>the information is to be understood.
>Theorem 8: Only those structures that are based on a code can represent
>information (because of Theorem 4). This is a necessary, but still inadequate,
>condition for the existence of information.
>These theorems already allow fundamental statements to be made at the level of
>the code. If, for example, a basic code is found in any system, it can be
>concluded that the system originates from a mental concept.
How do we conclude that a code is a necessary condition for the representation of information? We just assert it. Worse, how do we conclude that *only* things that are based on a code represent information? Again, just an assertion – but an *incredibly* strong one. He is asserting that *nothing* without a
structured encoding is information. And this is also the absolute crux of his argument: information only exists as a part of a code *designed by an intelligent process*.
Despite the fact that he claims to be completing Shannon theory, there is *nothing* to do with math in the rest of this article. It’s all words. Theorems like the ones quoted above, but becoming progressively more outrageous and unjustified.
For example, his theorem 11:
>The apobetic aspect of information is the most important, because it embraces
>the objective of the transmitter. The entire effort involved in the four lower
>levels is necessary only as a means to an end in order to achieve this
After this, we get to his conclusion, which is quite a prize.
>On the basis of Shannon’s information theory, which can now be regarded as
>being mathematically complete, we have extended the concept of information as
>far as the fifth level. The most important empirical principles relating to the
>concept of information have been defined in the form of theorems.
See, to him, a theorem is nothing but a “form”: a syntactic structure. And this whole article, to him, is mathematically complete.
>The Bible has long made it clear that the creation of the original groups of
>fully operational living creatures, programmed to transmit their information to
>their descendants, was the deliberate act of the mind and the will of the
>Creator, the great Logos Jesus Christ.
>We have already shown that life is overwhelmingly loaded with information; it
>should be clear that a rigorous application of the science of information is
>devastating to materialistic philosophy in the guise of evolution, and strongly
>supportive of Genesis creation.
That’s where he wanted to go all through this train-wreck. DNA is the highest-possible density information source. It’s a message originated by god, and transmitted by each generation to its children.
And as usual for the twits (or Gitts) that write this stuff, they’re pretending to put together logical/scientific/mathematical arguments for god; but they can only do it by specifically including the necessity of god as a premise. In this case, he asserts that DNA is a message; and a message must have an intelligent agent creating it. Since living things cannot be the original creators of the message, since the DNA had to be created before us. Therefore there must be a god.
Same old shit.
[gitt]: http://www.answersingenesis.org/tj/v10/i2/information.asp

The Categorical Model of Linear Logic

Today we’ll finally get to building the categories that provide the model for
the multiplicative linear logic. Before we jump into that, I want to explain why it is that we separate out the multiplicative part.
Remember from the simply typed lambda calculus, that [we showed that the type system was precisely a subset of intuitionistic logic][lambda-type], and that programs in the lambda calculus corresponded to proofs in the type system. In particular, it was propositional intuitionistic logic using only the “→” operation. Similarly, if you build up a more interesting typed lambda calculus like [System-F][systemf], the type system is intuitionist logic *with* the “∀” quantifer and the “∧” operator, but without “∨”, negation, or “∃”. Why do we eliminate the existentials and friends? Because if we left them in, then the *type system* becomes Turing complete and undecidable. If we’re careful, we can analyze a program using universal types, logical and (composition in lambda), and implication and infer the types if it’s a valid program. (Actually, even System-F is undecidable without any type annotations, but HMF – the Hindley-Milner subset, *is* decidable. NP-hard, but decidable.)
The main reason that people like me really care about linear logic is because there is a variant of lambda calculus that includes *linear types*. Linear types are types where referencing the value corresponding to the type *consumes* it. For dealing with the semantics of real programming languages, there are many things that make sense as linear types. For example, the values of an input stream behave in a linear way: read something from an input stream, and it’s not there anymore: unless you copied it, to put it someplace safe, it’s gone.
For typing linear lambda calculus, we use *intuitionistic linear logic* with the *multiplicative* operators. They give us a *decidable* type system with the capability to express linear type constraints.
Enough side-tracking; back to the categories.
We can now finally define a *linear category*. A category C is a linear category if it is both a [cartesian category][cartesian] and a [monoidal][monoidal] [closed category][monclose] with a *dualizing functor*. A dualizing functor is a *contravariant* closed functor defining properties of the categorical exponential, written (-)* : C → C. (That should be read with “-” as a placeholder for an object; and * as a placeholder for an exponent.) (-)* has the property that there is a natural isomorphism d : Id ≡ (-)** (That is, d is an identity natural isomorphism, and it’s equivalent to applying (-)* to the result of applying (-)* ) such that the following commutes:


So, what does this mean? Take the linear implication operator; mix it with the categorical exponent; and what you get *still* behaves *linearly*: that is, if “X -o Y”; that is, one X can be consumed to produce one Y, then 2 Xs can be consumed to produce 2 Ys; and N Xs can be consumed to produce N Ys.
So a linear category behaves cleanly with exponents; t has a linear implication; it has an *eval* operator (from the fact that it’s a cartesian category) to perform the linear implications; it has tensor for producing groups of resources. That’s it; that’s everything we need from categories for a model of linear logic.
Now, there’s just one more question: are there any real linear categories that make sense as a model for linear logic? And obviously, the answer is yes. There are two of them, called **CPO** and **Lin**.
**CPO** is the category with continuous partial orders as objects, and monotonic functions as morphisms. Think about that, and it makes a whole lot of sense for linear logic; the equivalence classes of the partial order are resources of the same “value” (that is, that can be exchanged for one another); and you can’t climb *upwards* in the partial order without adding something.
The other one, **Lin**, is a lot more complicated. I’m not going to explain it in detail; that would lead into another two-month-long series! But to put it briefly, **Lin** is the category with *coherent domains* as objects, and linear maps as morphisms. To make that make sense, I would have to explain domain theory, which (to put it mildly) is complicated; the short version is that a domain is a kind of CPO. But the reason we care about **Lin** is that programming language semantics are generally described using [*denotational semantics*][denote]; and denotational semantics are built using a kind of domain, called a Scott domain. So this gives us a domain that we can use in programming language semantics with exactly the properties we need to explain how linear types work.
[lambda-type]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[systemf]: http://en.wikipedia.org/wiki/System_F
[cartesian]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php
[monclose]: http://scienceblogs.com/goodmath/2006/07/towards_a_model_for_linear_log_1.php
[monoidal]: http://scienceblogs.com/goodmath/2006/07/the_category_structure_for_lin_1.php
[denote]: http://en.wikipedia.org/wiki/Denotational_semantics

Mocking a Silly Anti-Relativity Rant

I was reading an article on Slashdot the other day about a recent discovery of what might be a MECO. A [MECO][wiki-meco] is a “magnetospheric eternally collapsing object”; if this were true, it would be a big deal because according to relativity, either black holes exist and MECOs don’t, or MECOs exist and black holes don’t.
I have no intention of getting into the MECO vs. black hole argument. But a commenter there put down a link to something that he seemed to think was a [reasonable argument against relativity][nastytruth]. I took a look, and it’s just *hysterically* funny. The author of the site is a total crackpot; not only does he propose a way of totally redefining physics, but he also proposes an explanation for everything that’s wrong with modern software, and exactly how to build a real, proper AI.
One of my mantras for dealing with crackpots is: “The very worst math is no math”. This guy does a spectacular job of demonstrating that.
Just for fun, I’ve got to quote the beginning of his diatribe. There’s nothing more fun than watching a crackpot rant about how it’s the *rest* of the world that are crackpots.
>The Crackpottery
>We have all been taught that there is no such thing as absolute motion or
>position or that every motion and position in the universe is relative. This
>unsubstantiated belief, which I have named exclusive relativity, has been
>around for centuries, even before the advent of Albert Einstein and the theory
>of relativity. It was not until early in the twentieth century, however, that
>exclusive relativity became in vogue. Nowadays most physicists consider the
>concept of absolute motion to be no more credible than the flat earth.
>Simple Proof #1 That Exclusive Relativity Is Bogus
>If all positions are relative, then we have a self-referential system in which
>every position is ultimately relative to itself. For example, suppose we have a
>two-body universe. Body A’s position is relative to body B’s position and vice
>versa. Since both positions are relative to the other and there are no other
>bodies, each body’s position is ultimately relative to itself. Of course, it
>does not matter whether there are only two bodies or a billion.
>Exclusive relativity amounts to saying things like, “you are as tall as you
>are” or “this sound is as loud as itself” or “pick yourself up by your own
>bootstraps.” Of course this is silly but this is the sort of silliness we have
>to believe in if we accept exclusive relativity.
If you have two particles and nothing else, you can define their *positions* relative to each other in terms of their *distance* from each other. It’s not circular. Distance is the important fact. In a relativistic universe, there is no special *distinguished* reference point where the “real” position of objects is defined relative to that reference. Everything is described relative to *a* reference; but that reference can be pretty much any location you choose.
This doesn’t mean that measurements or positions are meaningless. It just means that they’re *relative*.
There’s actually a whole field of mathematics that studies things like this: it’s called metric topology. Speaking *very* loosely, metric topology looks at what kinds of *shapes* a continuous surface can take, and how to measure distance in those different kinds of spaces.
For example, if we lived in a two dimensional world, we could imagine that the world was a flat plane. In that case, the distance between two points is defined in one way. And it doesn’t matter *where* you put your reference point on the plane; the distance between two objects on that surface will be the same. We could also imagine a two dimensional world that was the surface of a torus. The distance between objects would be rather different there; but still, you could measure the distance between two objects on the surface of the torus. And no matter what point of reference you choose, the torus looks the same.
But if you’re a clueless twit who doesn’t understand what “relative position” means, then you can end up with the argument that this guy just presented.
>Simple Proof #2 That Exclusive Relativity Is Bogus
>Suppose there is a force acting on a particle so as to accelerate it. The
>particle has as many relative velocities as there are possible frames of
>reference, an infinite number in fact. Which of the myriads of relative
>velocities does the force change? How does the accelerating agent know about
>them so as to change them all? Answer, it does not. Only one velocity is
>changed by the force because it has no access to the others. The others are
>abstract, i.e., non-physical.
Once again, nope.
One of the things that’s beautiful about relativity is that it provides a set of equations that make this all work. From one point of reference, it may appear that an object is accelerating at rate X; from another point of view, it may appear that it’s accelerating at rate Y; work out the relativity equations, and they’re *both* right. Time dilation and relativistic mass shift makes it all work. (If fact, if you were around to read [my series on group theory][groups], you can see [where Blake Stacey explained in a comment][relativity] how relativity describes a lot of things as groups that are symmetric over the kinds of transformations that we’re discussing.)
The problem with the author of this piece is that *he’s not doing math*. Relativity isn’t just a theory with a bunch of words that say “position is relative”, etc. It’s a set of mathematical equations that define in a very precise way what that means, and how it works. Like I said: the worst math is no math. If he’d tried to understand the math, he’d know that there’s no problem here.
>Simple Proof #3 That Exclusive Relativity Is Bogus
>Let’s consider the motion of a particle. How does a particle “know” about its
>motion or rest relative to extrinsic frames of references so as to move or be
>at rest relative to them? Are particles psychic? I think not. No particle in
>the universe can make use of the relative because it has no access to it. It
>follows that the universe does not use the relative. The only properties that
>it can use are absolute ones.
Same exact problem as his “simple proof #2”. He didn’t do the math, and so he drew a really stupid invalid conclusion. The math of relativity explains how this works: the apparent velocity and acceleration of a particle in all frames of reference are equally valid; and the reason that they’re equally valid is because if you do the math for shifting the reference frame, you find that the different apparent values are really just different views of the same thing.
[nastytruth]: http://pages.sbcglobal.net/louis.savain/Crackpots/nasty.htm
[groups]: http://goodmath.blogspot.com/2006/06/group-theory-index.html
[relativity]: http://goodmath.blogspot.com/2006/04/some-applications-of-group-theory.html
[wiki-meco]: http://en.wikipedia.org/wiki/Magnetospheric_eternally_collapsing_object
[slashdot-meco]: http://science.slashdot.org/article.pl?sid=06/07/28/0543250

Friday Pathological Programming Language: Whenever

I was hoping for a bit of a vanity post for todays pathological programming language in honor of my 40th birthday (tomorrow), but I didn’t have time to finish implementing my own little piece of insanity. So it’ll have to wait for some other occasion.
Todays pathological programming language is a really simple monstrosity called [“Whenever”][whenever]. Whenever is a programming language where programs get executed in *random* order, and there are *no* variables. The only state, the only way of manipulating information in a Whenever program is by manipulating the collection of executable statements itself: the program is both the code *and* the data at the same time.
The basic program model of Wheneveris: you write a set of statements. The statements are inserted into a grab-bag by the interpreter. The interpreter then repeatedly picks a statement out of the bag *at random* and executes it. It keeps doing that until there are no more statements in the bag.
Everything in Whenever is done by manipulating the statement bag.
Each statement is labeled by a number. The number has no direct meaning; it’s just an identifier that will be used to reference the statement. The numbers assigned to lines don’t have to match the order in which the lines were written in the source file; and they have no effect on the order by which statements are pulled out of the bag.
So how do you do anything in this crazy language?
There’s a print statement for printing things out. So, for example,
23 print(“Hello worldn”);
is the hello world program. Since there’s only one statement, it’ll get grabbed from the statement bag, and executed.
There’s also a read statement, which reads a number from standard input, and then acts as if the statement were the number that it read.
The simplest statement is just a number. A number statement *adds* a copy of the line identifier by that number to the bag. If the number is negative, then it *removes* a copy of the statement from the bag. So, for example:
23 print(“Hello worldn”);
11 23;
would print “Hello world” twice. If 23 were executed first, it would print “Hello world”, and then only 11 would be in the bag, so it would execute 11, which would add 23 to the bag. If 11 went first, then it would add 23 to the bag, and there would be two 23s in the bag, which would get executed.
You can add multiple copies of a line to the bag: 5#3 adds 3 copies of statement 5 to the bag. And you can add multiple statements to the bag at once, by separating them with a comma. So:
17 21, -2, 3#4, -5#2;
Would insert one copy of statement 21 and four copies of statement 3 to the bag; and remove one copy of statement 2, and two copies of statement 5.
You can also do any normal arithmetic operation on numbers. The result of the arithmetic operation is interpreter as a line number.
There’s also two kinds of conditionals, “defer” and “again”. Defer takes a parameter which is evaluated to a line number, and if there are any copies of that line number in the bag, then it reinserts itself, and doesn’t do anything else. If there are no copies of the parameter in the bag, then the statement on the rest of the line is executed.
Again, an example:
1 print(“Hello”);
2 defer(1) print(“Worldn”);
is a more complicated version of hello world.
The “again” statement is very similar to the “defer” statement; but if its argument is true (i.e., evaluates to a statement that is present in the bag), then it adds a copy of itself to the bag; whether the parameter is true or false, it then executes the rest of the statement.
There’s one helpful built-in function: N(x) returns the number of copies of statement x in the bag.
So, a couple of interesting example programs:
1 defer (4 || N(1)<N(2) && N(2)<N(3)) print(N(1)+" bottles of beer on the wall, "+N(1)+" bottles of beer,");
2 defer (4 || N(1)==N(2)) print("Take one down and pass it around,");
3 defer (4 || N(2)==N(3)) print(N(1)+" bottles of beer on the wall.");
4 1#98,2#98,3#98;
This first ensures that statement four runs first: statements 1, 2, and 3 will all defer until 4 has been executed. Once four is run, there are 99 copies of statements 1, 2, and 3 in the bag. The rest of the defer statement makes sure that 1 executes before 2, and 2 before 3; so it cycles through 1, 2, 3 99 times. Pretty simple, right?
How about this?
1 again (1) defer (3 || N(1)99) 2#N(1),3,7;
2 again (2) defer (3 || N(2)99) 1#N(2),3,7;
3 defer (5) print(N(1)+N(2));
4 defer (5) print(“1”);
5 4,-3,7;
6 defer (4) 3;
7 7;
8 defer (N(7)<100) -1#N(1),-2#N(2),-7#100;
9 defer (3 || 6) 1,3;
If you look carefully.. It generates the first 100 fibonacci numbers.
It's an incredibly simple language. Simple, but quite twisted, and seriously mind-warping to try to program: you need to always keep track of the fact that the statements that represent your data could get selected for execution, which will modify the data unless you used a "defer" as a guard, but then you need to make sure that the guard gets preserved correctly… It's quite devious.
[whenever]: http://www.dangermouse.net/esoteric/whenever.html

Dembski on Non-Evolutionary Math

I’ve been taking a look at William Dembski’s paper, “[Information as a Measure of Variation][imv]”. It was recommended to me as a paper demonstrating Demsbki’s skill as a mathematician that isn’t aimed at evolution-bashing. I’m not going to go into too much detail about it; it’s just not that good. If this is the best work he’s done as a mathematician, well, that’s pretty sad.
The main problems with the paper are:
1. He either doesn’t understand or misrepresents some of the fundamentals of the field he’s allegedly discussing;
2. He presents many of the central ideas of the paper (that is, describing information content of an event in an event sequence in terms of how it affects the probabilities of events that occur after it) as if they were novel when they really are not (in fact, this idea dates back to Shannon’s [self-information][self-info]); and
3. Much of the paper is very unclear, even obfuscatory.
The second two are typical of Dembski’s writing, and not particularly interesting. I’m going to focus on three egregious cases of misrepresentation.
### Misrepresentation One: IT = Shannon IT
The misrepresentations start from quite literally the first line of the paper. The first two paragraphs of the body of the paper are:
>Ordinarily, information refers to the meaning or semantic content of a message.
>Getting a handle on the meaning of a message, however, has proven difficult
>mathematically. Thus, when mathematicians speak of information, they are
>concerned not so much with the meaning of a message as with the vehicle by
>which the message is transmitted from sender to receiver.
>The most common vehicle for transmitting messages is the character string.
>The mathematical theory of information is largely about quantifying the com-
>plexity of such strings, characterizing their statistical properties when they
>are sent across a noisy communication channel (noise being represented as a
>stochastic process that disrupts the strings in statistically well-defined
>ways), preserving the strings despite the presence of noise (i.e., the theory
>of error-correcting codes), compressing the strings to improve efficiency, and
>transforming the strings into other strings to maintain their security (i.e.,
This is wrong, and it’s deliberately wrong.
The reason it’s wrong? As I described in my [introduction to information theory][intro-it] (IT), there are two main branches of IT: Shannon and Kolmogorov-Chaitin. Demsbki is definitely aware of both; he references the work of Chaitin in papers written *before* this one. But in his description of information theory here, he focuses exclusively on Shannon theory, and presents it as if it were the entirety of mathematical IT.
Why would he do that? Well, because it makes it easier to make his argument about why it makes sense to view information in terms of a series of events. Later in the paper, he’s going to argue that information entropy is the *wrong* measure for information; but that argument can *only* make sense in Shannon theory. And even for the Shannon IT that he uses, the way that he characterizes it is sloppy.
### Misrepresentation Two: Information Content in Poker Hands
Moving on, here’s another really dreadful passage:
>Consider, for instance, the following individuation of poker hands: RF (a
>royal flush) and ¬RF (all other poker hands). To learn that something other
>than a royal flush was dealt (i.e., possibility ¬RF ) is clearly to acquire
>less information than to learn that a royal flush was dealt (i.e., possibility
>RF ). A royal flush is highly specific. We have acquired a lot of information
>when we learn that a royal flush was dealt. On the other hand, we have acquired >hardly any information when we learn that something other than a royal flush
>was dealt. Most poker hands are not royal flushes, and we expect to be dealt
>them only rarely. Nevertheless, if our measure of information is simply an
>enumeration of eliminated possibilities, the same numerical value must be
>assigned in both instances since, in each instance, a single possibility is
Looking at this, it’s hard to tell if he just really *doesn’t get it*, or if he’s trying to slip something past the readers. It’s sufficiently messed up that it’s hard to determine exactly what he means. I can see two very different readings.
1. The two possibilities are, as he says, RF and ¬ RF. That is, we’re going to be told whether or not a hand is a royal flush. We are *not* going to be told what the cards in the hand are: we are simple going to get a yes/no answer to the question “Is the hand a royal flush?” If that’s the case, then this is completely wrong: being told “Yes, it’s an RF” gives you enough information to determine that the hand is one of four sets of cards: (the RF in each of the four suits); being told “No, it’s not an RF” is only enough information to determine that the hand is one of 52! – 4 possible sets of cards. And *in either case*, the information content is determined *not solely by the statement that the cards are, or are not, a royal flush*. The information content of those statements (per Kolmogorov-Chaitin, which is the kind of IT applicable to analyzing statements of this sort) is based on the statement *plus* the rules of poker that define the meaning of a royal flush.
2. We are told exactly what cards are in the hand. In that case, we know whether or not it’s a RF because we know what cards are there. In that case, whether you’ve got an RF or not, *you have a precise description of exactly one poker hand*. No matter what variant of IT you’re using; no matter whether you’ve got a flush or not; the amount of information that you have *is exactly the same*: it’s the identity of the 5 cards in your hand.
### Misrepresentation Three: Explaining Entropy as a Measure
On more example of the misleadingness: the beginning of section two tries to explain why information theorists use entropy. He presents an explanation of information associated with an event; and then he explains entropy *in terms of* that presentation of information as events; and then presents the IT notion of entropy as something *mysterious*:
>Nonetheless, this this notion is almost entirely passed over in favor of a
>different notion, called entropy. Entropy, rather than being associated with a
>particular event, is associated with a partition of events for a given
>reference class of possibilities Ω….
Followed by an rather obfuscatory presentation of the equation for the Shannon IT entropy of an event sequence. So far, this part *could* be taken as just typical of Dembski’s dreadfully obscure writing style. But the next step he takes is where he’s deliberately being misleading. He asks why entropy rather than event probability is the preferred measure for information content?
But he shifts the goalposts. Up until now, he’s been talking about mathematicians and IT theorists. Now suddenly, the question isn’t about why entropy is the preferred measure among *information theorists*; it’s why it’s the preferred measure among *communication theorists* (which are a different group of people); and the *answer* that he quotes is about *communication engineers*.
So: again, he’s deliberately being misleading. He’s trying to convince you that you should think of information content in terms of probability. But instead of making that argument, he presents the definitions in a very strange way, and uses subtle unjustified changes to allow him to present the *weakest possible explanation* for why his preferred measure is not widely accepted.
### Conclusion
Dembski’s a damned lousy mathematician. Even when he isn’t trying to mislead people about evolution, he’s sloppy, obfuscatory, and prone to arguments based on strawmen. If this is an example of his best work as a mathematician, then someone really screwed up in giving him a PhD.
[imv]: http://www.iscid.org/pcid/2005/4/2/dembski_information_variation.php
[intro-it]: http://scienceblogs.com/goodmath/2006/06/an_introduction_to_information.php
[self-info]: http://en.wikipedia.org/wiki/Self-information

Why model evolution as search? (updated)

In the comments on [my post mocking Granville Sewell’s dreadful article][sewell], one of the commenters asked me to write something about why evolution is frequently modeled as a search process: since there is no goal or objective in evolution, search seems like a strange approach. It’s a very good question, and so I’m going to do my best to give you a very good answer.
As I’ve said before, math is all about *abstraction*: taking something, stripping it to its bare-bones essentials, and seeing what it means: for example category theory, which I’ve been writing about for the last month, is about taking the basic idea of a function as a thing that maps things to other things, and seeing what that means when it’s stripped of everything but the most basic properties of that notion of mapping.
When we build mathematical models of real-world phenomena, we’re doing the same kind of thing: we’re looking at some particular aspect of the phenomena, and trying to see what it means when the complexities of reality are abstracted away to let us see what’s going on in a fundamental sense. When we build bridges, we do static-mechanical analysis of the bridge structures; the mast basic static mechanical analysis is remarkably silly in some ways, but it does manage to capture the basic properties of tension and force distribution through the structure. It lets us understand the most basic properties of the structure without having to worry about wind, about gravitational bowing of metal rods, vibration, etc. (Obviously, when a bridge gets built, all of those things need to get factored in; but you start with the basic analysis.)
So when a mathematician looks at modeling evolution, the way that we approach it is to ask: What are the basic fundamental things that define the process of evolution?
The answer, roughly, is two (or three, depending how you count them 🙂 ) things:
1. Reproduction with mutation;
2. Selection
Evolution is a continual process in which those steps are repeated: you reproduce a pool of individual, either mutating while you copy, or copying then mutating. Then you select some subset of them as “winners” and keep them to copy for the next generation, and you discard the rest.
What’s the selection criteria? From the starting point of a mathematical model, *it doesn’t matter*. What matters is that it’s a cyclic process of reproduction with mutation followed by selection.
So we have a basic model. Now we look at it and ask: does this resemble any problem that I know how to solve? Let’s try drawing a picture of that: we’ll draw labelled dots for individuals; with arrows from a parent to a child. The children that were selected and reproduced, we’ll paint green; the children that were not selected we’ll mark red; and the current generation, which we haven’t selected yet, we’ll leave white.
So, does this look familiar to a mathematician?
Yes: it’s a mathematical structure called a graph. We know a lot about the mathematics of graphs. And evolution looks a lot like a process that’s building a graph. Each node in the graph is an individual; each edge in the graph is a reproduce with mutation step. The leaf nodes of the graph are either successful individuals (in which case the graph is going to continue to grow from that leaf), or it’s an unsuccessful one, in which case that branch is dead.
When you describe it that way – it sounds like you’ve described a classic graph search problem. To complete the model, you need to define the mutation and selection processes; the criteria by which successful children are selected and future generations are created are, of course, a crucial piece of the model. And to be complete, I’ll note that the mutation process is a *random* process, and so the mutation paths generated by running a simulation of this model will be different in each run; and further, selection is undirected: the selection process does *not* have a specific goal. (Note: this paragraph was edited from the original to clarify some points and to expand on the properties of mutation and selection.)
There are other possibilities for models of evolution; but the most obvious mathematical model for evolution is search: either a graph search or a terrain search, depending on exactly how you want to model the process of mutation.
[sewell]: http://scienceblogs.com/goodmath/2006/07/debunking_a_mathematicians_vie.php

Zeros in Category Theory

Things are a bit busy at work on my real job lately, and I don’t have time to put together as detailed a post for today as I’d like. Frankly, looking at it, my cat theory post yesterday was half-baked at best; I should have held off until I could polish it a bit and make it more comprehensible.
So I’m going to avoid that error today. Since we’ve had an interesting discussion focusing on the number zero, I thought it would be fun to show what zero means in category theory.
There are two zeros it category theory: the zero object, and the zero arrow.
Zero Objects
The zero object is easier. Suppose we’ve got a category, C. Suppose that C has a *terminal object* – that is, an object t for which other object x in C has *exactly* *one* arrow f : x → t. And suppose that C also has an *initial object*: that is, an object i for which every object x in C has *exactly one* arrow, f : i → x. If C has both an initial object i, and a terminal object t, *and* i = t, then it’s a *zero object* for the category. A category can actually have *many* zero objects. For example, in the category of groups, any *trivial* group (that is, a group which contains only one element) is a zero object in the category of groups. For an intuition of why this is called “zero” think of the number zero. It’s a strange little number that sits dead in the middle of everything. It’s the central point on the complex plane, it’s the center of the number line. A zero *object* sits in the middle of a category: everything has exactly one arrow *to* it, and one arrow *from* it.
Zero Arrows
The idea of a zero *arrow* comes roughly from the basic concept of a zero *function*. A zero function is a function f where for all x, f(x) = 0. One of the properties of a zero function is that composing a zero function with any other function results in a zero function. (That is, if f is a zero function, f(g(x))) is also a zero function.)
A zero morphism in a category C is part of a set of arrows, called the *zero family* of morphisms of C, where composing *any* morphism with a member of the zero family results in a morphism in the zero family.
To be precise: suppose we have a category C, and for any pair b of objects a and b ∈ Obj(C), there is a morphism 0a,b : a → .
The morphism 0d,e must satisfy one important property:
For any two morphisms m : d → e, and n : f → g, the following diagram must commute:
To see why this means that any composition with a 0 arrow will give you a zero arrow, look at what happens when we start with an ID arrow. Let’s make n : f → g an id arrow, by making f=g. We get the following diagram:
So – any zero arrow composed with an id arrow is a zero arrow. Now use induction to step up from ID arrows by looking at one arrow composed with an ID arrow and a zero arrow. You’ve still got a zero. Keep going – you’ll keep getting zeros.
Zero arrows are actually very important buggers: the algebraic and topographical notions of a kernel can be defined in category theory using the zero morphisms.
One last little note: in general, it looks like the zero objects and the zero morphisms are unrelated. They’re not. In fact, if you have a category with a zero object *and* a family of zero morphisms, then you can find the zero morphisms by using the zero object. For any objects a and b, 0a,b = (a → 0) º (0 → b).

The Category Structure for Linear Logic

So, we’re still working towards showing the relationship between linear logic and category theory. As I’ve already hinted, linear logic has something to do with certain monoidal categories. So today, we’ll get one step closer, by talking about just what kind of monoidal category. As I keep complaining, the problem with category theory is that anytime you want to do something interesting, you have to wade through a bunch of definitions to set up your structures. I’ll do my best to keep it short and sweet; as short as it is, it’ll probably take a bit of time to sink in.
First, we need a symmetry property. A monoidal category C (with terminal object t, tensor ⊗, and natural isomorphisms λ, ρ) is *symmetric* if/f for all objects a and b in C, there is a natural isomorphism γa,b : (a ⊗ b) → (b ⊗ a); and the categories monoidal natural isomorphism ρ = λb º γb,t : (b ⊗ t) → b.
In other words, if you reverse the arrows, but leave everything else the same, you still wind up with a monoidal category.
In a similar vein: regular functors are also sometimes called *covariant* functors. If you take a functor and *reverse* the directions of all the arrows, you get what’s known an a *contravariant* functor, or *cofunctor*.
We also need a *closure* property. Suppose C is a symmetric monoidal category. C is *monoidally closed* if/f there is a functor ⇒ : C × C → C, such that for every object b in C, there is an isomorphism Λ : (a⊗b → c) → (a → (b ⇒ c)), and Λ is a natural transformation for a and c. (Remember that a and c are categories here; this is a natural transformation from category to category.)
Basically, this just says that the tensor stays cleanly in the monoid, and there’s always a natural transformation corresponding to any use of tensor.
So what these two really do is pull up some of the basic category properties (closure with respect to composition, symmetry) and apply them to categories and natural transformations themselves.
Finally, we need to be able to talk about what it means for a *functor* to be closed. So suppose we have a monoidally closed category, with the usual components, up to and including the Λ natural transformation. A functor F : C → C is *closed* if, for all a,b in C, there is an arrow fa,b : ba → F(b)F(a) such that for all g : a → b, fa,b º Λ(gº λa) = Λ(F(g) º λF(a)).
In simple english, what a closed functor is is a functor that can be *represented by* an arrow in the category itself. That’s what all of that gobbledygook really means.
So… as a quick preview, to give you an idea of why we just waded through all of this gunk: if you take a monoidally closed category of the appropriate type, then collections of resources are going to be categories; ⊗ is the tensor to join collections of resources; and “-o” implications are the closed functors “⇒”.

Yet Another Crappy Bayesian Argument

A reader sent me a link to yet another purported Bayesian argument for the existence of god, this time by a physicist named Stephen Unwin. It’s actually very similar to Swinburne’s argument, which I discussed back at the old home of this blog. The difference is the degree of *dishonesty* demonstrated by the author.

As usual, you can only see the entire argument if you buy his book. But from a number of reviews of the book, and a self-interview posted on his personal website, we can get the gist. Scientific American’s review has the best concise description of his argument that I could find: (the equation in it is retyped by me.)

Unwin rejects most scientific attempts to prove the divine–such as the anthropic principle and intelligent design–concluding that this “is not the sort of evidence that points in either direction, for or against.” Instead he employs Bayesian probabilities, a statistical method devised by 18th-century Presbyterian minister and mathematician Reverend Thomas Bayes. Unwin begins with a 50 percent probability that God exists (because 50-50 represents “maximum ignorance”), then applies a modified Bayesian theorem:

Pafter = Pbefore×D/(Pbefore×D + 100% -Pbefore)

The probability of God’s existence after the evidence is considered is afunction of the probability before times D (“Divine Indicator Scale”): 10 indicates the evidence is 10 times as likely to be produced if God exists, 2 is two times as likely if God exists, 1 is neutral, 0.5 is moderately more likely if God does not exist, and 0.1 is much more likely if God does not exist. Unwin offers the following figures for six lines of evidence: recognition of goodness (D = 10), existence of moral evil (D = 0.5), existence of natural evil (D = 0.1), intranatural miracles (prayers) (D = 2), extranatural miracles (resurrection) (D = 1), and religious experiences (D = 2).

Plugging these figures into the above formula (in sequence, where the Pafter figure for the first computation is used for the Pbefore figure in the second computation, and so on for all six Ds), Unwin concludes: “The probability that God exists is 67%.” Remarkably, he then confesses: “This number has a subjective element since it reflects my assessment of the evidence. It isn’t as if we have calculated the value of pi for the first time.”

It’s pretty clear looking at this that the argument is nothing more than “I assert God exists, therefore God exists”. The “probability” result is generated by pulling numbers at random for his D-value. Even he admits that the numbers are “subjective”, but I would go much further than that: the numbers are fundamentally built on the assumption of the existence of god. How can you pretend that you haven’t already accepted the assumption that god exists, and then use stories about the occurrence of divine interventions as facts?

But this doesn’t touch on the reason that I call him dishonest. So far, it’s just sloppiness; typical of the sloppy reasoning of religious people trying to make arguments for the existence of god. But then, on his website, there’s a little self-interview:

Q: So does He exist?

SDU: God?

Q: Yes.

SDU: I don’t know. Although my book does expand on this response.

It goes on like that. He claims to not know; to not have a belief about whether or not there is a god; that his book is an honest enquiry by someone uncertain, trying to use evidence to reason about whether or not god exists.

He’s lying. Plain and simple. Everything about his argument is completely predicated on his acceptance of the existence of god. And there’s no way that he’s dumb enough to not know that. But the argument seems so much more convincing to a layman if the author isn’t sure, but is just carefully working through the probabilities. And that final figure: exactly 2/3s… It’s nicely convenient. After all, he’s not saying he’s sure; but he’s saying that an objective review of the evidence gives a number that makes it look good, while not certain – it preserves that illusion of objectivity.

This guy is using his scientific background to give him authority as someone who understands how this kind of math works; and then he’s lying about his intentions in order to increase the credibility of his argument.