Someone sent me some questions about information theory; or actually,
about some questions raised by a bozo creationist arguing about information
theory. But the issues raised are interesting. So while I’m
nominally snarking at a clueless creationist, I’m really using it
as an excuse to talk about one of my favorite mathematical subjects.
The creationist is confused by the information theory idea of information and complexity. That is somewhat confusing.
Intuitively, we think of information in terms of semantics and communication. We think that information is related to semantic content. And our understanding of semantic content comes from language. So we expect
things that are rich in information to be highly structured, like language.
But our initial intuition is wrong. wrong. As I explained when I was writing about information theory, a totally unstructured random string of bits has a lot more information than an equal-length string of english text.
What we see as structure is really redundancy – multiple expressions of the same bits of information.
One of the things that I find particularly interesting about information theory is that it’s an area where many of the basic ideas can be
expressed in intuitive ways. There is a good intuition for why randomness has high information content – and once you hear it, it makes perfect sense, and you can easily see what’s wrong with the argument that structure implies high information content. And the intuition is actually very nearly a direct rendering of the theoretical (Kolmogorov-Chaitin) definition of information!
Imagine you have a string of symbols. (In information theory, we describe everything as strings of symbols.) Now suppose you want
to tell someone else how to write down the same string of symbols. The
amount of information in the string is the length of the shortest description you can give them that allows them to write down the same string.
Think of a simple string like “1111111111111111”. To tell someone how to write that string, you just say “Write 16 ones”. For a slightly more interesting string, like “1010101010101010” – you could say “write ’10’ 8 times.”. For “ABCDEFGHIJKLMNOP”, you could say “write the first sixteen letters of the alphabet”. For each of those, there’s a nice simple description, because they’ve got a lot of structure. But that structure means that it doesn’t take much to describe them; and if there’s a short
description, then that means that there’s not a lot of information.
Now think of a random string: “XEGAPC]GI(BAUL+”? How could you tell
someone to write down that string of symbols? There’s some amount of pattern there – the letters are all uppercase; there are only 13 distinct symbols out of the sixteen. But you’re not going to get away with a simple description. It’s harder to give someone instructions on how to reproduce that exact string than it is to describe a nicely structured string – which is why we say that the random string has more information in it.
Think of a simple image: a triangle, drawn onto a 1000×1000 bitmap. To allow someone to produce an identical image, you’d say “A 1000×1000 bitmap, with a triangle with vertices (x1,y1), (x2,y2), (x3,y3)”. The bit map has
a million bits. The description above would be about 90 characters long with real vertex positions, which is 720 bits – using an extremely inefficient representation of the text. Because it has structure, it takes very little description to allow someone to reproduce it exactly.
Now, think of a random image. For every pixel in the image, you use
a truly random process to determine whether it’s black or white. So you’ve got something that looks like a screen capture of static on an old TV. How can you describe that image to someone else to allow them to produce an exact duplicate? You might be able to do it with less that a million
bits – but you’re not going to be able to do it with less that 1,000 bits. There’s just too much information in the image!
The hardest thing to describe – the string that needs the longest
description for its length – is something with absolutely no pattern.
Something totally random, where there’s absolutely no pattern, no logic, no
way of predicting even a single bit without being told its value explicitly.
A string with maximum information context is a string where it’s so random
that the complete string itself is its own shortest description. And the
string with the most information is, by definition, the most complex. The
reason why should be absolutely clear by now: it takes longer to describe
something complex than something simple, and random stuff is the hardest to
describe. So random stuff has the most information.
One last point, and I’ll get to a couple of quotes. The word “random” means something very specific in the context of information theory. A perfectly random string is a string where knowledge of what came before in the string gives you absolutely no knowledge of what comes later. It’s closely related to the probabilistic notion of random. The main difference is that from the viewpoint of probability, a string is “random” if it was generated by a random process – and a random process can (and in fact by necessity will) periodically generate a string that we wouldn’t call random. For example, a truly random letter generator, given enough time, will eventually create a string consisting of the letters of the alphabet in order. But from the viewpoint of information theory, if we looked at that string, we’d say it’s low information content, meaning non-random. On the other hand, if we take a highly compressed version of
the bitmap of a triangle, it looks random – it’s got very high
information content for its size per information theory, but according
to probability theory, it isn’t random, because it’s generated by a deterministic process. The big difference here is that when information
theory talks about randomness, it’s talking about unpredictability for someone viewing a string – it’s not talking about meaning or semantics. You can take any meaningful statement, and render it down to a high information content string via compression. High information content versus low information content isn’t a property of the statement or entity being encoded; it’s a property of the representation. An image of a triangle
is a representation of some information in a very inefficient form; a string of random-looking bits is a representation of the same information in a more compact form. The image is low information content because it’s got very little information for its size; the compressed bits are high information content because they’ve got a lot of information for their size. But it’s the same information.
Now, let’s look at what the creationist in question actually had to say, so that we can see where he went wrong. The context was one of the theism/atheism arguments. Theists argue that the universe/life/whatever is too complex to be the product of a random process, therefore there must be a God. The atheists respond that it’s just a pushback: if the universe/life is too complex to come into existence itself, then how is it possible for an omniscient being to have come into being by himself?
In the course of this discussion, the argument came around to
the creationist pulling out one of the old canards about information theory,
to which the atheist explained that information theory really says
that “structure” isn’t information – that randomness is really information.
I’m going to stay away from the theism/atheism aspect of this,
because it’s a tired old discussion, and I’ve got absolutely nothing to add. I’m a religious Jew, but I find the religious argument from complexity
to be unbearably stupid.
I’m also going to take things a bit out of order, because
there’s a stupid thing that I want to get out of the way before
I get to the interesting part.
FWIW, I disagree with T-Stone’s version of information and complexity. And despite what his post would lead you to believe, the idea that “maximal randomness = maximal complexity” is not true for all information theories. And in fact, if I were to use T-Stone’s definition of complexity then I would ask him to explain not why there is so much complexity in the universe, but rather why there is so little complexity. If complexity = randomness, then it doesn’t take a rocket scientist to realize that there’s a lot of the universe that is not random, and therefore there is a lot of this universe that is not complex. Under his information theory, randomness is the default. We do not need to explain random data. We do need to explain structured and ordered data. Therefore, we do not need to explain complexity; we need to explain non-complexity.
This isn’t something where there’s room for disagreement. This is sort
of like arguing that “I don’t agree that 2+2=4.” Informational complexity is
well-defined by information, and it’s got a precise meaning. The precise
definitions vary between algorithmic information theory (Kolmogorov-Chaitin)
and communication information theory (Shannon), but the basic concept
underlying both is the same, and they agree that complexity is related to
information content, and maximum information content (and thus maximum complexity) is perfect randomness.
There is no information theory that says randomness doesn’t
maximize information content and complexity. None! This
is something that you see frequently from the clueless about information theory: they really don’t like the idea that randomness contains
maximum information, and they assert that not all information
theory agrees with that – like the statement above” maximal randomness = maximal complexity is not true for all information theories”; but they
never actually cite any kind of information theory at all – because there is none that does what they want. They’re sure that there must be,
because K/C and Shannon seem wrong. But there is no such theory, no
matter how much you may want one to exist.
Now, we’ll jump backwards to the interesting part.
Unfortunately for T-Stone, if he paid attention to what he has written here he’d see that he’s soundly refuted Dawkins. After all, if maximal randomness is equivalent to maximal complexity, then it is easy for me to write a program that will generate completely random output. In other words, it is easy for me–a person who is not maximally complex–to produce a program with output that is maximally complex. Thus, if we want to play T-Stone’s game and use complexity in this sense, then Dawkin’s argument must be surrendered.
If I can make a program that is more complex than I am, then God can create a universe that is more complex than He is.
Our creationist friend is making several mistakes here. One of them is his claim that he can easily write a program that generates
a random string. Generating random data – true random data, not just data that appears random – is really difficult. What he’s thinking about, I’m almost sure, is running a program that calls a function like the
C “srandom” function. That’s not random; it’s completely deterministic. Start with a particular seed, and you’ll always get the
same sequence of “random” numbers. In fact, there’s only one
sequence of values that you’ll get: the function is a long-cycle
periodic function. Every sequence generated using that function is a part of the same, long, fixed cycle. And you can perfectly reproduce any string
that you generated in it with a simple set of instructions. The
original pseudo-random function used by “srandom”, in the Berkeley Unix C library is one line long; it’s basically the next “random” number is “
(lastrandom*1103515245L + 12345L) & 0x7fffffff“. So given that line of code, and the initial seed, you can generate the “random” sequence. So our friend’s “random” sequence is anything but random. It’s got a nicely random distribution, but it’s not random – it’s reproducible with a very short sequence of instructions. (Even if you use a better pseudo-random function, it goes up to around 64 lines of code for a really nice one.) Computers are deterministic: they suck at randomness. In fact true randomness must come from a data source outside of the computer. Real randomness comes from random external events, like radioactive decay. Building a true random generator is a very non-trivial task.
That’s thoroughly clueless. But it’s not the worst of his errors. He’s confused about two really important things about how you do things in information theory. First, he’s confused about what we mean in information theory by having a program generate a string; and second, he’s confused about how to compare the complexity of two different strings.
In information theory, when we talk about a program or a set of
instructions for generating a string, we’re talking about a program that
repeatably and deterministically generates that string.
What makes a high complexity string is the difficulty of
reproducing it, not of producing it. If I’ve got a true
randomness source, I can generate a random string from it with very little
difficulty. But that’s not the important thing, from the viewpoint of
information theory. What information theory cares about is how can I
describe the string in order to allow it to be reproduced. If he’s
written a short program to generate a string that has high information
content, that’s not very impressive, and it says nothing about information
theory. The short program that generates the random string doesn’t mean that
the random string is easy to reproduce, or even that the string is really
produceable by a short program. Because when information theory says a
string is produceable by a short program, what it means is that it’s
produceable by a short deterministic program. Run a random program
twice, it will generate two different strings! So from the viewpoint of
information theory, the string isn’t a product of the program; it’s the
product of randomness which was the input to the program, and that says
nothing about its information content. What matters is how long a program
that will always generate the random string, and only the
Completely random strings are incredibly rare in the real world. Almost any finite string that you can imagine is at least somewhat compressible – which means that it’s got some amount of redundant structure. If you think of it by metaphor with numbers, theoretically, the vast majority of numbers are irrationals. But in the real physical world, encountering irrational
numbers isn’t particularly common. Most things that we really encounter
have finite precision. We don’t see perfect circles that really have a perfect ratio of π between their measured circumference and diameter. We don’t see triangles so perfect that their edges have a 1/1/sqrt(2) ratio. We see finite approximations. Similarly, we don’t often see true randomness.
Our friend doesn’t get that. And that’s a huge problem for his
The second mistake in in his idea of comparison. Even if I could write a
program that generates terabytes of randomness, it won’t be as complex as
a human being. Because for all of the structure that there is in us, for
all of the redundancy, for all the duplication, we are enormously
complex. To create a string of information that provided you
with enough information to be able to accurately reproduce the precise
state of my body at a moment in time is really mind-boggling. It’s
an amount of information that’s difficult to conceive. The error that
our friend is making is forgetting that complexity isn’t a percentage. A
string that is 80% random (meaning 20% compressible) can be less complex
than a string that is 10% random – provided the second one is
longer. If the 80% random string is 1,000 characters long, and
the 10% random string is 1,000,000 characters long, then the 10% random
string is more complex!
Of course, that’s not the end of his cluelessness. Following
directly on that error, he finishes up with:
T-Stone is just giving a sleight of hand here. It would be like a mathematician saying “a > b” and having T-Stone say, “The greater than sign is inverted with the less than sign, therefore ‘a > b’ means ‘a is less than b’.”
But as soon as he engages in his sleight of hand, we respond: “If the greater than sign is inverted with the less than sign, then ‘a > b’ is no longer true, rather ‘a < b’ is.”
Inverting the operator without inverting the operands does not refute the original expression.
What he’s going on about here is based on his misunderstanding of what
complexity means in terms of information. He wants structured strings to have more information than random. He’s insisting that that’s the way things really work. And so, by his reasoning, an argument that
randomness has more information than structure is reversing the correct comparison – more structure should be more complexity, so if you’re making an argument that something needs to have more complexity, it necessarily
must be an argument that it has more structure. If you insist that
randomness is complex, then you’re reversing the argument: every statement that you make arguing for complexity needs to be switched to an argument
for simplicity, because you’ve swapped things around.
Unfortunately for him, the fact that he’s completely clueless changes
nothing about the nature of reality. Randomness is more
complex than structure. And when we talk about how complex the universe
is, we’re using the same definition of complexity. It doesn’t work the way he thinks it does – he’s just plain wrong.