Dembski on Non-Evolutionary Math

I’ve been taking a look at William Dembski’s paper, “[Information as a Measure of Variation][imv]”. It was recommended to me as a paper demonstrating Demsbki’s skill as a mathematician that isn’t aimed at evolution-bashing. I’m not going to go into too much detail about it; it’s just not that good. If this is the best work he’s done as a mathematician, well, that’s pretty sad.
The main problems with the paper are:
1. He either doesn’t understand or misrepresents some of the fundamentals of the field he’s allegedly discussing;
2. He presents many of the central ideas of the paper (that is, describing information content of an event in an event sequence in terms of how it affects the probabilities of events that occur after it) as if they were novel when they really are not (in fact, this idea dates back to Shannon’s [self-information][self-info]); and
3. Much of the paper is very unclear, even obfuscatory.
The second two are typical of Dembski’s writing, and not particularly interesting. I’m going to focus on three egregious cases of misrepresentation.
### Misrepresentation One: IT = Shannon IT
The misrepresentations start from quite literally the first line of the paper. The first two paragraphs of the body of the paper are:
>Ordinarily, information refers to the meaning or semantic content of a message.
>Getting a handle on the meaning of a message, however, has proven difficult
>mathematically. Thus, when mathematicians speak of information, they are
>concerned not so much with the meaning of a message as with the vehicle by
>which the message is transmitted from sender to receiver.
>
>The most common vehicle for transmitting messages is the character string.
>The mathematical theory of information is largely about quantifying the com-
>plexity of such strings, characterizing their statistical properties when they
>are sent across a noisy communication channel (noise being represented as a
>stochastic process that disrupts the strings in statistically well-defined
>ways), preserving the strings despite the presence of noise (i.e., the theory
>of error-correcting codes), compressing the strings to improve efficiency, and
>transforming the strings into other strings to maintain their security (i.e.,
>cryptography).
This is wrong, and it’s deliberately wrong.
The reason it’s wrong? As I described in my [introduction to information theory][intro-it] (IT), there are two main branches of IT: Shannon and Kolmogorov-Chaitin. Demsbki is definitely aware of both; he references the work of Chaitin in papers written *before* this one. But in his description of information theory here, he focuses exclusively on Shannon theory, and presents it as if it were the entirety of mathematical IT.
Why would he do that? Well, because it makes it easier to make his argument about why it makes sense to view information in terms of a series of events. Later in the paper, he’s going to argue that information entropy is the *wrong* measure for information; but that argument can *only* make sense in Shannon theory. And even for the Shannon IT that he uses, the way that he characterizes it is sloppy.
### Misrepresentation Two: Information Content in Poker Hands
Moving on, here’s another really dreadful passage:
>Consider, for instance, the following individuation of poker hands: RF (a
>royal flush) and ¬RF (all other poker hands). To learn that something other
>than a royal flush was dealt (i.e., possibility ¬RF ) is clearly to acquire
>less information than to learn that a royal flush was dealt (i.e., possibility
>RF ). A royal flush is highly specific. We have acquired a lot of information
>when we learn that a royal flush was dealt. On the other hand, we have acquired >hardly any information when we learn that something other than a royal flush
>was dealt. Most poker hands are not royal flushes, and we expect to be dealt
>them only rarely. Nevertheless, if our measure of information is simply an
>enumeration of eliminated possibilities, the same numerical value must be
>assigned in both instances since, in each instance, a single possibility is
>eliminated.
Looking at this, it’s hard to tell if he just really *doesn’t get it*, or if he’s trying to slip something past the readers. It’s sufficiently messed up that it’s hard to determine exactly what he means. I can see two very different readings.
1. The two possibilities are, as he says, RF and ¬ RF. That is, we’re going to be told whether or not a hand is a royal flush. We are *not* going to be told what the cards in the hand are: we are simple going to get a yes/no answer to the question “Is the hand a royal flush?” If that’s the case, then this is completely wrong: being told “Yes, it’s an RF” gives you enough information to determine that the hand is one of four sets of cards: (the RF in each of the four suits); being told “No, it’s not an RF” is only enough information to determine that the hand is one of 52! – 4 possible sets of cards. And *in either case*, the information content is determined *not solely by the statement that the cards are, or are not, a royal flush*. The information content of those statements (per Kolmogorov-Chaitin, which is the kind of IT applicable to analyzing statements of this sort) is based on the statement *plus* the rules of poker that define the meaning of a royal flush.
2. We are told exactly what cards are in the hand. In that case, we know whether or not it’s a RF because we know what cards are there. In that case, whether you’ve got an RF or not, *you have a precise description of exactly one poker hand*. No matter what variant of IT you’re using; no matter whether you’ve got a flush or not; the amount of information that you have *is exactly the same*: it’s the identity of the 5 cards in your hand.
### Misrepresentation Three: Explaining Entropy as a Measure
On more example of the misleadingness: the beginning of section two tries to explain why information theorists use entropy. He presents an explanation of information associated with an event; and then he explains entropy *in terms of* that presentation of information as events; and then presents the IT notion of entropy as something *mysterious*:
>Nonetheless, this this notion is almost entirely passed over in favor of a
>different notion, called entropy. Entropy, rather than being associated with a
>particular event, is associated with a partition of events for a given
>reference class of possibilities Ω….
Followed by an rather obfuscatory presentation of the equation for the Shannon IT entropy of an event sequence. So far, this part *could* be taken as just typical of Dembski’s dreadfully obscure writing style. But the next step he takes is where he’s deliberately being misleading. He asks why entropy rather than event probability is the preferred measure for information content?
But he shifts the goalposts. Up until now, he’s been talking about mathematicians and IT theorists. Now suddenly, the question isn’t about why entropy is the preferred measure among *information theorists*; it’s why it’s the preferred measure among *communication theorists* (which are a different group of people); and the *answer* that he quotes is about *communication engineers*.
So: again, he’s deliberately being misleading. He’s trying to convince you that you should think of information content in terms of probability. But instead of making that argument, he presents the definitions in a very strange way, and uses subtle unjustified changes to allow him to present the *weakest possible explanation* for why his preferred measure is not widely accepted.
### Conclusion
Dembski’s a damned lousy mathematician. Even when he isn’t trying to mislead people about evolution, he’s sloppy, obfuscatory, and prone to arguments based on strawmen. If this is an example of his best work as a mathematician, then someone really screwed up in giving him a PhD.
[imv]: http://www.iscid.org/pcid/2005/4/2/dembski_information_variation.php
[intro-it]: http://scienceblogs.com/goodmath/2006/06/an_introduction_to_information.php
[self-info]: http://en.wikipedia.org/wiki/Self-information

Why model evolution as search? (updated)

In the comments on [my post mocking Granville Sewell’s dreadful article][sewell], one of the commenters asked me to write something about why evolution is frequently modeled as a search process: since there is no goal or objective in evolution, search seems like a strange approach. It’s a very good question, and so I’m going to do my best to give you a very good answer.
As I’ve said before, math is all about *abstraction*: taking something, stripping it to its bare-bones essentials, and seeing what it means: for example category theory, which I’ve been writing about for the last month, is about taking the basic idea of a function as a thing that maps things to other things, and seeing what that means when it’s stripped of everything but the most basic properties of that notion of mapping.
When we build mathematical models of real-world phenomena, we’re doing the same kind of thing: we’re looking at some particular aspect of the phenomena, and trying to see what it means when the complexities of reality are abstracted away to let us see what’s going on in a fundamental sense. When we build bridges, we do static-mechanical analysis of the bridge structures; the mast basic static mechanical analysis is remarkably silly in some ways, but it does manage to capture the basic properties of tension and force distribution through the structure. It lets us understand the most basic properties of the structure without having to worry about wind, about gravitational bowing of metal rods, vibration, etc. (Obviously, when a bridge gets built, all of those things need to get factored in; but you start with the basic analysis.)
So when a mathematician looks at modeling evolution, the way that we approach it is to ask: What are the basic fundamental things that define the process of evolution?
The answer, roughly, is two (or three, depending how you count them 🙂 ) things:
1. Reproduction with mutation;
2. Selection
Evolution is a continual process in which those steps are repeated: you reproduce a pool of individual, either mutating while you copy, or copying then mutating. Then you select some subset of them as “winners” and keep them to copy for the next generation, and you discard the rest.
What’s the selection criteria? From the starting point of a mathematical model, *it doesn’t matter*. What matters is that it’s a cyclic process of reproduction with mutation followed by selection.
So we have a basic model. Now we look at it and ask: does this resemble any problem that I know how to solve? Let’s try drawing a picture of that: we’ll draw labelled dots for individuals; with arrows from a parent to a child. The children that were selected and reproduced, we’ll paint green; the children that were not selected we’ll mark red; and the current generation, which we haven’t selected yet, we’ll leave white.
graph.jpg
So, does this look familiar to a mathematician?
Yes: it’s a mathematical structure called a graph. We know a lot about the mathematics of graphs. And evolution looks a lot like a process that’s building a graph. Each node in the graph is an individual; each edge in the graph is a reproduce with mutation step. The leaf nodes of the graph are either successful individuals (in which case the graph is going to continue to grow from that leaf), or it’s an unsuccessful one, in which case that branch is dead.
When you describe it that way – it sounds like you’ve described a classic graph search problem. To complete the model, you need to define the mutation and selection processes; the criteria by which successful children are selected and future generations are created are, of course, a crucial piece of the model. And to be complete, I’ll note that the mutation process is a *random* process, and so the mutation paths generated by running a simulation of this model will be different in each run; and further, selection is undirected: the selection process does *not* have a specific goal. (Note: this paragraph was edited from the original to clarify some points and to expand on the properties of mutation and selection.)
There are other possibilities for models of evolution; but the most obvious mathematical model for evolution is search: either a graph search or a terrain search, depending on exactly how you want to model the process of mutation.
[sewell]: http://scienceblogs.com/goodmath/2006/07/debunking_a_mathematicians_vie.php

Zeros in Category Theory

Things are a bit busy at work on my real job lately, and I don’t have time to put together as detailed a post for today as I’d like. Frankly, looking at it, my cat theory post yesterday was half-baked at best; I should have held off until I could polish it a bit and make it more comprehensible.
So I’m going to avoid that error today. Since we’ve had an interesting discussion focusing on the number zero, I thought it would be fun to show what zero means in category theory.
There are two zeros it category theory: the zero object, and the zero arrow.
Zero Objects
————–
The zero object is easier. Suppose we’ve got a category, C. Suppose that C has a *terminal object* – that is, an object t for which other object x in C has *exactly* *one* arrow f : x → t. And suppose that C also has an *initial object*: that is, an object i for which every object x in C has *exactly one* arrow, f : i → x. If C has both an initial object i, and a terminal object t, *and* i = t, then it’s a *zero object* for the category. A category can actually have *many* zero objects. For example, in the category of groups, any *trivial* group (that is, a group which contains only one element) is a zero object in the category of groups. For an intuition of why this is called “zero” think of the number zero. It’s a strange little number that sits dead in the middle of everything. It’s the central point on the complex plane, it’s the center of the number line. A zero *object* sits in the middle of a category: everything has exactly one arrow *to* it, and one arrow *from* it.
Zero Arrows
————-
The idea of a zero *arrow* comes roughly from the basic concept of a zero *function*. A zero function is a function f where for all x, f(x) = 0. One of the properties of a zero function is that composing a zero function with any other function results in a zero function. (That is, if f is a zero function, f(g(x))) is also a zero function.)
A zero morphism in a category C is part of a set of arrows, called the *zero family* of morphisms of C, where composing *any* morphism with a member of the zero family results in a morphism in the zero family.
To be precise: suppose we have a category C, and for any pair b of objects a and b ∈ Obj(C), there is a morphism 0a,b : a → .
The morphism 0d,e must satisfy one important property:
For any two morphisms m : d → e, and n : f → g, the following diagram must commute:
zero.jpg
To see why this means that any composition with a 0 arrow will give you a zero arrow, look at what happens when we start with an ID arrow. Let’s make n : f → g an id arrow, by making f=g. We get the following diagram:
zero-id.jpg
So – any zero arrow composed with an id arrow is a zero arrow. Now use induction to step up from ID arrows by looking at one arrow composed with an ID arrow and a zero arrow. You’ve still got a zero. Keep going – you’ll keep getting zeros.
Zero arrows are actually very important buggers: the algebraic and topographical notions of a kernel can be defined in category theory using the zero morphisms.
One last little note: in general, it looks like the zero objects and the zero morphisms are unrelated. They’re not. In fact, if you have a category with a zero object *and* a family of zero morphisms, then you can find the zero morphisms by using the zero object. For any objects a and b, 0a,b = (a → 0) º (0 → b).

The Category Structure for Linear Logic

So, we’re still working towards showing the relationship between linear logic and category theory. As I’ve already hinted, linear logic has something to do with certain monoidal categories. So today, we’ll get one step closer, by talking about just what kind of monoidal category. As I keep complaining, the problem with category theory is that anytime you want to do something interesting, you have to wade through a bunch of definitions to set up your structures. I’ll do my best to keep it short and sweet; as short as it is, it’ll probably take a bit of time to sink in.
First, we need a symmetry property. A monoidal category C (with terminal object t, tensor ⊗, and natural isomorphisms λ, ρ) is *symmetric* if/f for all objects a and b in C, there is a natural isomorphism γa,b : (a ⊗ b) → (b ⊗ a); and the categories monoidal natural isomorphism ρ = λb º γb,t : (b ⊗ t) → b.
In other words, if you reverse the arrows, but leave everything else the same, you still wind up with a monoidal category.
In a similar vein: regular functors are also sometimes called *covariant* functors. If you take a functor and *reverse* the directions of all the arrows, you get what’s known an a *contravariant* functor, or *cofunctor*.
We also need a *closure* property. Suppose C is a symmetric monoidal category. C is *monoidally closed* if/f there is a functor ⇒ : C × C → C, such that for every object b in C, there is an isomorphism Λ : (a⊗b → c) → (a → (b ⇒ c)), and Λ is a natural transformation for a and c. (Remember that a and c are categories here; this is a natural transformation from category to category.)
Basically, this just says that the tensor stays cleanly in the monoid, and there’s always a natural transformation corresponding to any use of tensor.
So what these two really do is pull up some of the basic category properties (closure with respect to composition, symmetry) and apply them to categories and natural transformations themselves.
Finally, we need to be able to talk about what it means for a *functor* to be closed. So suppose we have a monoidally closed category, with the usual components, up to and including the Λ natural transformation. A functor F : C → C is *closed* if, for all a,b in C, there is an arrow fa,b : ba → F(b)F(a) such that for all g : a → b, fa,b º Λ(gº λa) = Λ(F(g) º λF(a)).
In simple english, what a closed functor is is a functor that can be *represented by* an arrow in the category itself. That’s what all of that gobbledygook really means.
So… as a quick preview, to give you an idea of why we just waded through all of this gunk: if you take a monoidally closed category of the appropriate type, then collections of resources are going to be categories; ⊗ is the tensor to join collections of resources; and “-o” implications are the closed functors “⇒”.

Yet Another Crappy Bayesian Argument

A reader sent me a link to yet another purported Bayesian argument for the existence of god, this time by a physicist named Stephen Unwin. It’s actually very similar to Swinburne’s argument, which I discussed back at the old home of this blog. The difference is the degree of *dishonesty* demonstrated by the author.

As usual, you can only see the entire argument if you buy his book. But from a number of reviews of the book, and a self-interview posted on his personal website, we can get the gist. Scientific American’s review has the best concise description of his argument that I could find: (the equation in it is retyped by me.)

Unwin rejects most scientific attempts to prove the divine–such as the anthropic principle and intelligent design–concluding that this “is not the sort of evidence that points in either direction, for or against.” Instead he employs Bayesian probabilities, a statistical method devised by 18th-century Presbyterian minister and mathematician Reverend Thomas Bayes. Unwin begins with a 50 percent probability that God exists (because 50-50 represents “maximum ignorance”), then applies a modified Bayesian theorem:

Pafter = Pbefore×D/(Pbefore×D + 100% -Pbefore)

The probability of God’s existence after the evidence is considered is afunction of the probability before times D (“Divine Indicator Scale”): 10 indicates the evidence is 10 times as likely to be produced if God exists, 2 is two times as likely if God exists, 1 is neutral, 0.5 is moderately more likely if God does not exist, and 0.1 is much more likely if God does not exist. Unwin offers the following figures for six lines of evidence: recognition of goodness (D = 10), existence of moral evil (D = 0.5), existence of natural evil (D = 0.1), intranatural miracles (prayers) (D = 2), extranatural miracles (resurrection) (D = 1), and religious experiences (D = 2).

Plugging these figures into the above formula (in sequence, where the Pafter figure for the first computation is used for the Pbefore figure in the second computation, and so on for all six Ds), Unwin concludes: “The probability that God exists is 67%.” Remarkably, he then confesses: “This number has a subjective element since it reflects my assessment of the evidence. It isn’t as if we have calculated the value of pi for the first time.”

It’s pretty clear looking at this that the argument is nothing more than “I assert God exists, therefore God exists”. The “probability” result is generated by pulling numbers at random for his D-value. Even he admits that the numbers are “subjective”, but I would go much further than that: the numbers are fundamentally built on the assumption of the existence of god. How can you pretend that you haven’t already accepted the assumption that god exists, and then use stories about the occurrence of divine interventions as facts?

But this doesn’t touch on the reason that I call him dishonest. So far, it’s just sloppiness; typical of the sloppy reasoning of religious people trying to make arguments for the existence of god. But then, on his website, there’s a little self-interview:

Q: So does He exist?

SDU: God?

Q: Yes.

SDU: I don’t know. Although my book does expand on this response.

It goes on like that. He claims to not know; to not have a belief about whether or not there is a god; that his book is an honest enquiry by someone uncertain, trying to use evidence to reason about whether or not god exists.

He’s lying. Plain and simple. Everything about his argument is completely predicated on his acceptance of the existence of god. And there’s no way that he’s dumb enough to not know that. But the argument seems so much more convincing to a layman if the author isn’t sure, but is just carefully working through the probabilities. And that final figure: exactly 2/3s… It’s nicely convenient. After all, he’s not saying he’s sure; but he’s saying that an objective review of the evidence gives a number that makes it look good, while not certain – it preserves that illusion of objectivity.

This guy is using his scientific background to give him authority as someone who understands how this kind of math works; and then he’s lying about his intentions in order to increase the credibility of his argument.

Debunking "A Mathematicians View of Evolution"

This weekend, I came across Granville Sewell’s article “[A Mathematicians View of Evolution][sewell]”. My goodness, but what a wretched piece of dreck! I thought I’d take a moment to point out just how bad it is. This article, as described by the [Discovery Institute][diref], purportedly shows:
>… that Michael Behe’s arguments against neo-Darwinism from irreducible
>complexity are supported by mathematics and the quantitative sciences,
>especially when applied to the problem of the origin of new genetic
>information.
I have, in the past, commented that the *worst* math is no math. This article contains *no math*. It’s supposedly arguing that mathematics supports the idea of irreducible complexity. Only there’s no math – none!
The article claims that there are *two* arguments from mathematics that disprove evolution. Both are cheap rehashes of old creationist canards, so I won’t go into much depth. But it’s particularly appalling to see someone using trash like this with the claim that it’s a valid *mathematical* argument.
The First Argument: You can’t make big changes by adding up small ones.
————————————————————————-
Sewell:
>The cornerstone of Darwinism is the idea that major (complex) improvements can
>be built up through many minor improvements; that the new organs and new
>systems of organs which gave rise to new orders, classes and phyla developed
>gradually, through many very minor improvements.
This is only the first sentence of the argument, but it’s a good summary of what follows. There are, of course, several problems with this, but the biggest one coming from a mathematician is that this asserts that it’s impossible to move a large finite distance by taking small finite steps. This is allegedly a mathematician making this argument – but that’s what he’s claiming: that it’s impossible for any large change to occur as a result of a large number of small changes.
It also incorrectly assumes a *directionality* to evolution. This is one of the requirements of Behe’s idea: that evolution can only *add*. So if we see a complex system, the only way it could have been produced by an evolutionary process is by *adding* parts to an earlier system. That’s obviously not true – and it’s not even consistent with the other creationist arguments that he uses. And again, as a mathematician, he *should* be able to see the problem with that quite easily. In mathematical terms, this is the assertion that evolution is monotonically increasing in complexity over time. But neither he nor Behe makes any argument for *why* evolution would be monotonically increasing with respect to complexity.
So there’s the first basic claim, and my summary of what’s wrong with it. How does he support this claim?
Quite badly:
>Behe’s book is primarily a challenge to this cornerstone of Darwinism at the
>microscopic level. Although we may not be familiar with the complex biochemical
>systems discussed in this book, I believe mathematicians are well qualified to
>appreciate the general ideas involved. And although an analogy is only an
>analogy, perhaps the best way to understand Behe’s argument is by comparing the
>development of the genetic code of life with the development of a computer
>program. Suppose an engineer attempts to design a structural analysis computer
>program, writing it in a machine language that is totally unknown to him. He
>simply types out random characters at his keyboard, and periodically runs tests
>on the program to recognize and select out chance improvements when they occur.
>The improvements are permanently incorporated into the program while the other
>changes are discarded. If our engineer continues this process of random changes
>and testing for a long enough time, could he eventually develop a sophisticated
>structural analysis program? (Of course, when intelligent humans decide what
>constitutes an “improvement”, this is really artificial selection, so the
>analogy is far too generous.)
Same old nonsense. This is a *bad* analogy. A *very* bad analogy.
First of all, in evolution, *we start with a self-reproducing system*. We don’t start with completely non-functional noise. Second of all, evolution *does not have a specific goal*. The only “goal” is continued reproduction.
But most importantly for an argument coming from a supposed mathematician: he deliberately discards what is arguably *the* most important property of evolution. In computer science terms (since he’s using a programming argument, it seems reasonable to use a programming-based response): parallelism.
In evolution, you don’t try *one* change, test it to see if it’s good and keep it if it is, then go on and try another change. In evolution, you have millions of individuals *all reproducing at the same time*. You’re trying *millions* of paths at the same time.
In real evolutionary algorithms, we start with some kind of working program. We then copy it, *Many* times; as many as we can given the computational resources available to us. While copying, we randomly “mutate” each of the copies. Then we run them, and see what does best. The best ones, we keep for the next generation.
What kind of impact does parallelism have?
As an experiment, I grabbed a rather nifty piece of software for my mac called [Breve Creatures][breve]. Breve is an evolutionary algorithms toolkit; BC uses it to build moving machines. The way it works is that it produces a set of random assemblies of blocks, interconnected by hinges, based on an internal “genetic code”. For each one, it flexes the hinges. Each generation, it picks the assemblies that managed to move the farthest, and mutates it 20 times. Then it tries each of those. And so on. So Breve gives us just 20 paths per generation.
Often, in the first generation, you see virtually no motion. The assemblies are just random noise; one or two just happen to wiggle in a way that makes them fall over, which gives that a tiny bit of distance.
Typically within 20 generations, you get something that moves well; within 50, you get something that looks amazingly close to the way that some living creature moves. Just playing with this a little bit, I’ve watched it evolve things that move like inchworms, like snakes, like tripeds (two legs in front, one pusher leg in back), and quadrapeds (moving like a running dog).
In 20 generations of Breve, we’ve basically picked a path to successful motion from a tree of 2020 possible paths. Each generation, we’ve pruned off the ones that weren’t likely to lead us to faster motion, and focused on the subtrees that showed potential in the tests.
Breve isn’t a perfect analogy for biological evolution either; but it’s better than Sewell’s. There’s two important things to take from this Breve example:
1. Evolution doesn’t have a specific goal. In the case of Breve Creations, we didn’t say “I want to evolve something that walks like a dog.” The selection criteria was nothing more than “the ones that moved the furthest”. Different runs of BC create very different results; similarly, if you were to take a given species, and put isolated two populations of it in similar conditions, you’d likely see them evolve in *different* ways.
2. Evolution is a process that is massively parallel. If you want to model it as a search, it’s a massively parallel search that prunes the search space as it goes. Each selection step doesn’t just select one “outcome”; it prunes off huge areas of the search space.
So comparing the process to *one* guy randomly typing, trying *each* change to see how it works – it’s a totally ridiculous analogy. It deliberately omits the property of the process that allows it to work.
The Second Argument: Thermodynamics
————————————-
>The other point is very simple, but also seems to be appreciated only by more
>mathematically-oriented people. It is that to attribute the development of life
>on Earth to natural selection is to assign to it–and to it alone, of all known
>natural “forces”–the ability to violate the second law of thermodynamics and
>to cause order to arise from disorder.
Yes, it’s the old argument from thermodynamics.
I want to focus on one aspect of this which I think has been very under-discussed in refutations of the thermodynamic argument. Mostly, we tend to focus on the closed-system aspect: that is, the second law of thermodynamics says that in a *closed system*, entropy increases monotonically. Since the earth is manifestly *not* a closed system, there’s nothing about seeing a local decrease in entropy that would be a problem from a thermodynamic point of view.
But there’s another very important point. Entropy is *not* chaos. An system that seems ordered is *not* necessarily lower entropy than a system that seems chaotic. With respect to thermodynamics, the real question about biology is: do the chemical processes of life result in a net increase in entropy? The answer? *I don’t know*. But neither does Sewell or the other creationists who make this argument. Certainly, watching the action of life: the quantity of energy we consume, and the quantity of waste we produce, it doesn’t seem *at all* obvious that overall, life represents a net decrease in entropy. Sewell and folks like him make the argument from thermodynamics *never even try* to actually *do the math* and figure out if if the overall effect of any biological system represents a net increase or decrease in entropy.
For someone purportedly writing a *mathematicians* critique of evolution, to argue about thermodynamic entropy *without bothering to do the math necessary to make the argument* is a disgrace.
[sewell]: http://www.math.utep.edu/Faculty/sewell/articles/mathint.html
[diref]: http://www.discovery.org/scripts/viewDB/index.php?command=view&id=3640&program=CSC%20-%20Scientific%20Research%20and%20Scholarship%20-%20Science
[breve]: http://www.spiderland.org/breve/

Zero

Back during the DonorsChoose fundraiser, I promised a donor that I’d write an article about the math of zero. I haven’t done it yet, because zero is actually a suprisingly deep subject, and I haven’t really had time to do the research to do it justice. But in light of the comment thread that got started around [this post][fspog] yesterday, I think it’s a good time to do it with whatever I’ve accumulated now.
History
———
We’ll start with a bit of history. Yes, there’s an actual history to zero!
In general, most early number systems didn’t have any concept of “zero”. Numbers, in early mathematical systems, were measurements of quantity. They were used to ask questions like “How much grain do we have stored away? If we eat this much now, will we have enough to plant crops next season?” A measurement of zero doesn’t really mean much; even when math is applied to measurements in modern math, leading zeros in a number – even if they’re *measured* – don’t count as significant digits in the measurement. (So if I’m measuring some rocks, and one weighs 99 grams, then that measurement has only two significant digits. If I use the same scale to weigh a very slightly larger rock, and it weighs 101 grams, then my measurement of the second rock has *three* significant digits. The leading zeros don’t count!) *(In the original version of this post, I managed to stupidly blow my explanation of significant digits, which several alert commenters pointed out. As usual, my thanks for the correction.)*
Aristotle is pretty typical of the reasoning behind why zero wasn’t part of most early number systems: he believed that zero was like infinity: an *idea* related to numbers, but not an actual number itself. After all, you can’t *have* 0 of anything; zero of something isn’t *any* something: you *don’t have* anything. And you can’t really *get* to zero as he understood it. Take any whole number, and divide into parts, you’ll eventually get a part of size “1”. You can get to any number by dividing something bigger. But not zero: zero, you can never get to by dividing things. You can spend eternity cutting numbers in half, and you’ll still never get to zero.
The first number system that we know of to have any notion of zero is the babylonians; but they still didn’t really quite treat it as a genuine number. They had a base-60 number system, and for digit-places that didn’t have a number, they left a space: the space was the zero. (They later adopted a placeholder that looked something like “//”.) It was never used *by itself*; it just kept the space open to show that there was nothing there. And if the last digit was zero, there was no indication. So, for example, 2 and 120 looked exactly the same – you needed to look at the context to see which it was.
The first real zero came from an Indian mathematician named Brahmagupta in the 7th century. He was quite a fascinating guy: he didn’t just invent zero, but arguably he also invented the idea of negative numbers and algebra! He was the first to use zero as a real number, and work out a set of algebraic rules about how zero, positive, and negative numbers worked. The formulation he worked out is very interesting; he allowed zero as a numerator or a denominator in a fraction.
From Brahmagupta, zero spread both east (to the Arabs) and west (to the Chinese and Vietnamese.) Europeans were just about the last to get it; they were so attached to their wonderful roman numerals that it took quite a while to penetrate: zero didn’t make the grade in Europe until about the 13th century, when Fibonacci (he of the series) translated the works of a Persian mathematican named al-Khwarizmi (from whose name sprung the word “algorithm” for a mathematical procedure). As a result, Europeans called the new number system “arabic”, and credited it to the arabs; but as I said above, the arabs didn’t create it; it originally came from India. (But the Arabic scholars, including the famous poet Omar Khayyam, are the ones who adopted Brahmagupta’s notions *and extended them* to include complex numbers.)
Why is zero strange?
———————-
Even now, when we recognize zero as a number, it’s an annoyingly difficult one. It’s neither positive nor negative; it’s neither prime nor compound. If you include it in the set of real numbers, then they’re not a group – even though the concept of group is built on multiplication! It’s not a unit; and it breaks the closure of real numbers in algebra. It’s a real obnoxious bugger in a lot of ways. One thing Aristotle was right about: zero is a kind of counterpart to infinity: a concept, not a quantity. But infinity, we can generally ignore in our daily lives. Zero, we’re stuck with.
Still, it’s there, and it’s a real, inescapable part of our entire concept of numbers. It’s just an oddball – the dividing line that breaks a lot of rules. But without it, a lot of rules fall apart. Addition isn’t a group without 0. Addition and subtraction aren’t closed without zero.
Our notation for numbers is also totally dependent on zero; and it’s hugely important to making a polynomial number system work. Try looking at the [algorithm for multiplying roman numerals][roman-mult] sometime!
Because of the strangeness of zero, people make a lot of mistakes involving it.
For example, based on that idea of zero and infinities as relatives, a lot of people believe that 1/0=infinity. It doesn’t. 1/0 doesn’t equal *anything*; it’s meaningless. You *can’t* divide by 0. The intuition behind this fact comes from the Aristotelean idea about zero: concept, not quantity. Division is a concept based on quantity: Asking “What is x divided by y” is asking “What quantity of stuff is the right size so that if I take Y of it, I’ll get X?”
So: what quantity of apples can I take 0 of to get 1 apple? The question makes no sense; and that’s exactly right: it *shouldn’t* make sense, because dividing by zero *is meaningless*.
There’s a cute little algebraic pun that can show that 1 = 2, which is based on hiding a division by zero.
1. Start with “x = y”
2. Multiply both sides by x: “x2 = xy”
3. Subtract “y2” from both sides: “”x2 – y2 = xy – y2
4. Factor: “(x+y)(x-y) = y(x-y)”
5. Divide both sides by the common factor “x-y”: “x + y = y”
6. Since x=y, we can substitute y for x: “y + y = y”
7. Simplify: “2y=y”
8. Divide both sides by y: “2 = 1”
The problem, of course, is step 5: x-y = 0, so step five is dividing by zero. Since that’s a meaningless thing to do, everything based on getting a meaningful result from that step is wrong – and so we get to “prove” false facts.
Anyway, if you’re interested in reading more, the best source of information that I’ve found is an online article called [“The Zero Saga”][saga]. It covers not just a bit of history and random chit-chat like this article, but a detailed presentation of everything you could ever want to know, from the linguistics of words meaning zero or nothing to cultural impacts of the concept, to detailed mathematical explanation of how zero fits into algebras and topologies.
[fspog]: http://scienceblogs.com/goodmath/2006/07/restudying_math_in_light_of_th.php
[saga]: http://home.ubalt.edu/ntsbarsh/zero/ZERO.HTM
[roman-mult]: http://www.phy6.org/outreach/edu/roman.htm

Friday Pathological Programming: Befunge, the 2-dimensional language

Today, we’re going to take a look at a brilliant language called Befunge. Befunge is the work of an evil genius named Chris Pressey.

Normal programming languages are based on a basically one-dimensional syntax; the program is a string, a sequence of characters, and it’s processed by reading that string in a straight-ahead fashion. But that’s not Befunge! Befunge is something like a two-dimensional turing machine: it says that the program and data are written on a two dimensionaltorus. Each instruction in Befunge is a single character, and where it’s located on the torus is crucial. (In case you’re not familiar with a torus, it’s what you get if you take a very flexible sheet of paper, and roll it so that you connect the top edge to the bottom, and then roll that tube so that you connect the left edge to the right. You get a donut shape where moving up from what used to be the top of the page puts you on the bottom of the page; moving left from the left edge of the page puts you on the right.) This torus is called the playfield.

The basics of computation in Befunge are pretty straightforward. It’s a stack based language. Operations take their parameters from the stack, and leave their results on the stack. Nothing too complicated there. There are arithmetic operators, IO operators, control flow operators, all operating on the values on the stack.

The arithmetic operators are the usual kinds of things: There are operators for addition (+), subtraction (-), division (/), multiplication (*), modulo (%), and logical negation (!). Digit characters are treated as operators that push the numeric value of the digit onto the stack. (So “99” will create a stack with two nines.) Comparisons are done using “`”, which pops the top two values from the stack and compares them. So if the top of the stack was “x”, and the value beneath it was “y”, then “`” would leave a “0” on the stack if x≤y, and 1 if x > y.

For IO, there are four operators. “&” reads an integer from standard input and pushes it onto the stack. “~” reads a single character from standard input, and leaves it on the stack. “.” pops a value off the stack and writes it to standard out as an integer. “,” pops a value off the stack and writes it to standard out as a character.

Of course, we can’t have a stack-based language without some stack operators: “:” makes a duplicate of the top value on the stack; “$” discards the top value on the stack; “” swaps the top two values of the stack.

So far, nothing has looked particularly pathological – in fact, nothing even looks particularly strange, other that the fact that it’s pretty illegible because of the single-character operators. But now, we get to control flow, and that is where the insanity/brilliance of Befunge reveals itself.

In Befunge, there’s a read-head that moves over the program. Each step, it executes the instruction under the head. But instead of just moving left or right, it can move left, right, up, or down. “>” is an instruction that tells the head to start moving to the right; “<" tells the head to start moving left; "^" means start moving up, and "v" means to start moving down. So, for example:

>v
^<

Is a program that runs an infinite loop: the head will just cycle over those four characters. An even more interesting infinite loop (taken from the befunge documentation) is:

>v>v
>^
^  <

Conditionals work by picking the direction that the head will move: “_” pops the stack, and if the value is zero, then it makes the head move right (“>”); if it’s non-zero, it makes the head move left (“<"). Similarly, "|" pops a value, and makes the head move up if the value was non-zero, or down if it was zero. To make things confusing, "#" means "skip the next instruction." (Actually, it's important for when a vertical and horizontal control flow cross.) And finally,
"@" is the exit command; it makes the head stop, and the program halt.

There’s also a little hack for strings. A double-quote character (“) starts a string; when one is encountered, the head keeps moving in the same direction, but instead of executing the characters as instuctions, it just pushes the character values onto the stack. When the second quote is found, it goes back to executing instructions.

Finally, just in case the basic two dimensional flow control isn’t pathological enough, there are two instructions for modifying cells on the playfield! “g” pops an X and Y value off the stack, and pushes the character at (X,Y) onto the stack; “p” pops an X, Y, and a character off the stack, and writes the character onto location (X,Y) of the playfield. (The coordinates are relative to the cell where the program started.)

So, let’s look at a couple of befunge programs. As usual, we start with the good old “hello world”.

v
>v"Hello world!"0<
,:
^_25*,@

We start at the top left, head moving right. It moves until it hits the “v” character, which makes it go down; then “<" makes it go left. 0 pushes a zero onto the stack. Then we've got a quote – so it starts pushing characters onto the stack until the next quote. When we get to the next quote, if we wrote the stack so that the top comes first, it would look like : ('H' 'e' 'l' 'l' 'o' ' ' 'w' 'o' 'r' 'l' 'd' '!' 0 ).

Then we hit a “v” which makes the head go down. This is the beginning of a loop; the leftmost two characters of rows 2, 3, and 4 are a while loop! The head goes down to “:” which duplicates the top of the stack; then it hits “_”, which turns left if the value on top of the stack is not zero; then the head turns up, outputs a character, turns right, and we’re back at the start of the loop. So the loop will output each character until we get the the “0” we pushed before the string; then at the “_”, we turn right. 2 and 5 are pushed on the stack and multiplied, leaving a 10, which is the linefeed character (basically “n” for you C weenies). It outputs the linefeed, and then exits.

How about a truly pathological example? Here’s a self-reproducing program in 12 bytes.

:0g,:93+`#@_1+

Stepping through that:

  1. Dup the value on top of the stack. That’ll produce a “0” if the stack is empty. (So first pass, stack=[0])
  2. “0” Push a zero on the stack. (First pass, stack=[0,0])
  3. “g”: Fetch the value at (x,y) on the stack; that’s (0,0) initially. (First pass, stack = [‘:’])
  4. “,”: Output it. So we printed the character at (0,0) (First pass, stack = [])
  5. “:” dup the stack top again. (First pass, stack = [0])
  6. “93+”. Get the number 12 onto the stack. (First pass, stack = [12,0])
  7. Compare what was on top of the stack to twelve. Leave a 0 there if it was, or a 1 if it wasn’t. (First pass, stack = [0]).
  8. “#” skip over the next character.
  9. “_” go right if the top of stack is zero; left if it’s one. So if the value copied by the second “:” was greater than 12, then go left (in which case you hit “@” and halt); otherwise, keep going right.
  10. “1+”: add one to the top of the stack (First pass, stack = ([1])). Then keep going right until you hit the right edge, and the you jump back to the left edge, so you’re at the first “:” again.
  11. Now the whole thing repeats, except that there’s a one on the stack. So the “g” will fetch (1,0); the “12” will be compared to 1, and the “1+” on the end will leave “2” on the stack. So now it fetches and outputs (2,0). And so on, until it reaches the “(_)” after outputting (12,0), when it halts.

One more, which I’ll let you figure out for yourself. Here’s a program that prompts you for a number, and computes its factorial:

v
>v"Please enter a number (1-16) : "0$*99g1-:99p#v_.25*,@
^_&:1-99p>:1-:!|10          <
^     <

Restudying Math in light of The First Scientific Proof of God?

A reader sent me a link to [this amusing blog][blog]. It’s by a guy named George Shollenberger, who claims to have devised The First scientific Proof of God (and yes, he always capitalizes it like that).
George suffers from some rather serious delusions of grandeur. Here’s a quote from his “About Me” bio on his blog:
>I retired in 1994 and applyied my hard and soft research experience to today’s
>world social problems. After retirement, my dual research career led to my
>discovery of the first scientific proof of God. This proof unifies the fields
>of science and theology. As a result of my book, major changes can be expected
>throughout the world.
>…
>I expect these blogs and the related blogs of other people to be detected by
>Jesus Christ and those higher intelligent humans who already live on other
>planets.
So far, he has articles on his blog about how his wonderful proof should cause us to start over again in the fields of science, mathematics, theology, education, medical care, economics, and religion.
Alas, the actual First Scientific Proof of God is [only available in his book][buymybook]. But we can at least look at why he thinks we need to [restudy the field of mathematics][restudy].
>The field of mathematics is divided into pure and applied mathematics. Pure
>mathematicians use mathematics to express their own thoughts and thus express
>the maximum degree of freedom found in the field of mathematics. On the other
>hand, applied mathematicians lose a degree of their freedom because they use
>mathematics to express the thoughts of people in the fields they serve. Most
>mathematicians are applied mathematicians and serve either counters (e.g.,
>accountants, pollsters, etc.) or sciences (e.g., physicists, sociologists,
>etc.).
That’s a pretty insulting characterization of mathematicians, but since George is an engineer by training, it’s not too surprising – that’s a fairly common attitude about mathematicians among engineers.
>The field of physics is served by applied mathematicians who are called
>mathematical physicists. These physicists are the cause of the separation of
>theologians and scientists in the 17th century, after Aristotle’s science was
>being challenged and the scientific method was beginning to be applied to all
>sciences. But, these mathematical physicists did not challenge Aristotle’s
>meaning of infinity. Instead, they accepted Aristotle’s infinity, which is
>indeterminate and expressed by infinite series such as the series of integers (
>1, 2, 3, ….etc.). Thus, to the mathematical physicist, a determinate infinity
>does not exist. This is why many of today’s physicists reject the idea of an
>infinite God who creates the universe. I argue that this is a major error in
>the field of mathematics and explain this error in the first chapter of The
>First Scientific Proof of God.
So, quick aside? What was Aristotle’s infinity? The best article I could find quickly is [here][aristotle-infinity]. The short version? Aristotle believed that infinity doesn’t really *exist*. After all, there’s no number you can point to and say “That’s infinity”. You can never assemble a quantity of apples where you can say “There’s infinity apples in there”. Aristotle’s idea about infinity was that it’s a term that describes a *potential*, but not an *actual* number. He also went on the describe two different kinds of infinity – infinity by division (which describes zero, which he wasn’t sure should really be considered a *number*); and infinity by addition (which corresponds to what we normally think of as infinity).
So. George’s argument comes down to: mathematics, and in particular, mathematical physics, needs to be rebooted, because it uses the idea of infinity as potential – that is, there is no specific *number* that we can call infinity. So since our math says that there isn’t, well, that means we should throw it all away. Because, you see, according to George, there *is* a number infinity. It’s spelled G O D.
Except, of course, George is wrong. George needs to be introduced to John Conway, who devised the surreal numbers, which *do* contain infinity as a number. Oh, well.
Even if you were to accept his proposition, what difference would it make?
Well – there’s two ways it could go.
We could go the [surreal][onag] [numbers][surreal] route. In the surreal numbers (or several similar alternatives), infinity *does* exist as a number; but despite that, it has the properties that we expect of infinity; e.g., dividing it by two doesn’t change it. If we did that, it would have no real effect on science: surreal numbers are the same as normal reals in most ways; they differ when you hit infinitesimals and infinities.
If we didn’t go the surreal-ish route, then we’re screwed. If infinity is a *real* real number, then the entire number system collapses. What’s 1/0? If infinity is *real*, then 1/0 = infinity. What about 2/0? Is that 2*infinity? If it is, it makes no sense; if it isn’t, it makes no sense.
>I believe that the field of mathematics must restudy their work by giving ample
>consideration to the nature of man’s symbolic languages, the nature of the
>human mind, Plato’s negative, and the nature of dialectical thinking.
Plato’s negative is, pretty much, the negative of intuitionistic logic. Plato claimed that there’s a difference between X, not-X, and the opposite of X. His notion of the opposite of X is the intuitionistic logic notion of not-X; his notion of not-X is the intuitionistic notion of “I don’t have a proof of X”.
In other words, George is hopelessly ignorant of real mathematics; and his reasoning about what needs to be changed about math makes no sense at all.
[aristotle-infinity]: http://plato.stanford.edu/entries/aristotle-mathematics/supplement3.html
[blog]: http://georgeshollenberger.blogspot.com
[restudy]: http://georgeshollenberger.blogspot.com/2006/07/restudying-field-of-mathematics.html
[buymybook]: http://rockstarramblings.blogspot.com/2006/06/doggerel-19-read-my-book.html
[surreal]: http://www.tondering.dk/claus/surreal.html
[onag]: http://www.akpeters.com/product.asp?ProdCode=1276

Towards a Model for Linear Logic: Monoidal Categories

Time to come back to category theory from out side-trip. Category theory provides a good framework for defining linear logic – and for building a Curry-Howard style type system for describing computations with *state* that evolves over time. Linear logic provides a way of defining a valid *model* for part of linear logic (the multiplicative operators) that aren’t tractable using other modeling techniques.
I’m going to show you the relationship between models for linear logic and category theory. It’s going to take a couple of days to go through the whole shebang, but even the parts that we need to go through to get there are fun.
The first step is to define a *monoidal category*, also known as a *tensor* category. We’ve already done most of that when we built [monads][monads] earlier this week; a monad is a kind of monoidal category.
A monoidal category is a category C with one object t, and a *binary functor* ⊗ : C × C → C. This binary functor is called *tensor*. Tensor has three required properties defined using *natural isomorphisms*, called α, λ, and ρ.
α says that tensor must be associative: α(A,B,C) : (A ⊗ B) ⊗ C → A ⊗ (B ⊗ C).
λ says that tensor has a *left identity*: λA : (I ⊗ λ) → A.
ρ says that tensor has a *right identity*, which is the same as the left identity: ρA : (ρ ⊗ 1) → A;.
And finally, the natural transformations need to make the following diagrams commute for all values of A, B, and C. These are known as the *coherence conditions* for the monoidal natural transformations.
monoid.jpg
A monoidal category is said to be *strict* if α, λ, and ρ are all identities. It turns out that for every monoidal category, there is an *equivalent* (in the sense of natural isomorphism) to a struct monoidal category.
And now, here comes an example of just why category theory is useful. In some of the [detailed models of quantum physics][quantum-condense], they try to describe the structure of different kinds of matter using what they call *topological orders*. The standard theory for describing the topological orders to different states of matter is called *Landau* theory. It turns out that Landau theory doesn’t describe the topological order of high temperature semiconductors or very-low-temperature condensate states. Category theory – in particular, the theory surrounding strict monoidal categories does a better job of describing the topological order of the different states of matter than any other mathematical approach that’s been tried so far.
[monads]: http://scienceblogs.com/goodmath/2006/07/monads_and_programming_languag_1.php
[quantum-condense]: http://dao.mit.edu/~wen/pub/qorder.html