Someone sent me some questions about information theory; or actually,
about some questions raised by a bozo creationist arguing about information
theory. But the issues raised are interesting. So while I’m
nominally snarking at a clueless creationist, I’m really using it
as an excuse to talk about one of my favorite mathematical subjects.
The creationist is confused by the information theory idea of information and complexity. That is somewhat confusing.
Intuitively, we think of information in terms of semantics and communication. We think that information is related to semantic content. And our understanding of semantic content comes from language. So we expect
things that are rich in information to be highly structured, like language.
But our initial intuition is wrong. wrong. As I explained when I was writing about information theory, a totally unstructured random string of bits has a lot more information than an equal-length string of english text.
What we see as structure is really redundancy – multiple expressions of the same bits of information.
One of the things that I find particularly interesting about information theory is that it’s an area where many of the basic ideas can be
expressed in intuitive ways. There is a good intuition for why randomness has high information content – and once you hear it, it makes perfect sense, and you can easily see what’s wrong with the argument that structure implies high information content. And the intuition is actually very nearly a direct rendering of the theoretical (Kolmogorov-Chaitin) definition of information!
Imagine you have a string of symbols. (In information theory, we describe everything as strings of symbols.) Now suppose you want
to tell someone else how to write down the same string of symbols. The
amount of information in the string is the length of the shortest description you can give them that allows them to write down the same string.
Think of a simple string like “1111111111111111”. To tell someone how to write that string, you just say “Write 16 ones”. For a slightly more interesting string, like “1010101010101010” – you could say “write ’10’ 8 times.”. For “ABCDEFGHIJKLMNOP”, you could say “write the first sixteen letters of the alphabet”. For each of those, there’s a nice simple description, because they’ve got a lot of structure. But that structure means that it doesn’t take much to describe them; and if there’s a short
description, then that means that there’s not a lot of information.
Now think of a random string: “XEGAPC]GI(BAUL+”? How could you tell
someone to write down that string of symbols? There’s some amount of pattern there – the letters are all uppercase; there are only 13 distinct symbols out of the sixteen. But you’re not going to get away with a simple description. It’s harder to give someone instructions on how to reproduce that exact string than it is to describe a nicely structured string – which is why we say that the random string has more information in it.
Think of a simple image: a triangle, drawn onto a 1000×1000 bitmap. To allow someone to produce an identical image, you’d say “A 1000×1000 bitmap, with a triangle with vertices (x1,y1), (x2,y2), (x3,y3)”. The bit map has
a million bits. The description above would be about 90 characters long with real vertex positions, which is 720 bits – using an extremely inefficient representation of the text. Because it has structure, it takes very little description to allow someone to reproduce it exactly.
Now, think of a random image. For every pixel in the image, you use
a truly random process to determine whether it’s black or white. So you’ve got something that looks like a screen capture of static on an old TV. How can you describe that image to someone else to allow them to produce an exact duplicate? You might be able to do it with less that a million
bits – but you’re not going to be able to do it with less that 1,000 bits. There’s just too much information in the image!
The hardest thing to describe – the string that needs the longest
description for its length – is something with absolutely no pattern.
Something totally random, where there’s absolutely no pattern, no logic, no
way of predicting even a single bit without being told its value explicitly.
A string with maximum information context is a string where it’s so random
that the complete string itself is its own shortest description. And the
string with the most information is, by definition, the most complex. The
reason why should be absolutely clear by now: it takes longer to describe
something complex than something simple, and random stuff is the hardest to
describe. So random stuff has the most information.
One last point, and I’ll get to a couple of quotes. The word “random” means something very specific in the context of information theory. A perfectly random string is a string where knowledge of what came before in the string gives you absolutely no knowledge of what comes later. It’s closely related to the probabilistic notion of random. The main difference is that from the viewpoint of probability, a string is “random” if it was generated by a random process – and a random process can (and in fact by necessity will) periodically generate a string that we wouldn’t call random. For example, a truly random letter generator, given enough time, will eventually create a string consisting of the letters of the alphabet in order. But from the viewpoint of information theory, if we looked at that string, we’d say it’s low information content, meaning non-random. On the other hand, if we take a highly compressed version of
the bitmap of a triangle, it looks random – it’s got very high
information content for its size per information theory, but according
to probability theory, it isn’t random, because it’s generated by a deterministic process. The big difference here is that when information
theory talks about randomness, it’s talking about unpredictability for someone viewing a string – it’s not talking about meaning or semantics. You can take any meaningful statement, and render it down to a high information content string via compression. High information content versus low information content isn’t a property of the statement or entity being encoded; it’s a property of the representation. An image of a triangle
is a representation of some information in a very inefficient form; a string of random-looking bits is a representation of the same information in a more compact form. The image is low information content because it’s got very little information for its size; the compressed bits are high information content because they’ve got a lot of information for their size. But it’s the same information.
Now, let’s look at what the creationist in question actually had to say, so that we can see where he went wrong. The context was one of the theism/atheism arguments. Theists argue that the universe/life/whatever is too complex to be the product of a random process, therefore there must be a God. The atheists respond that it’s just a pushback: if the universe/life is too complex to come into existence itself, then how is it possible for an omniscient being to have come into being by himself?
In the course of this discussion, the argument came around to
the creationist pulling out one of the old canards about information theory,
to which the atheist explained that information theory really says
that “structure” isn’t information – that randomness is really information.
I’m going to stay away from the theism/atheism aspect of this,
because it’s a tired old discussion, and I’ve got absolutely nothing to add. I’m a religious Jew, but I find the religious argument from complexity
to be unbearably stupid.
I’m also going to take things a bit out of order, because
there’s a stupid thing that I want to get out of the way before
I get to the interesting part.
FWIW, I disagree with T-Stone’s version of information and complexity. And despite what his post would lead you to believe, the idea that “maximal randomness = maximal complexity” is not true for all information theories. And in fact, if I were to use T-Stone’s definition of complexity then I would ask him to explain not why there is so much complexity in the universe, but rather why there is so little complexity. If complexity = randomness, then it doesn’t take a rocket scientist to realize that there’s a lot of the universe that is not random, and therefore there is a lot of this universe that is not complex. Under his information theory, randomness is the default. We do not need to explain random data. We do need to explain structured and ordered data. Therefore, we do not need to explain complexity; we need to explain non-complexity.
This isn’t something where there’s room for disagreement. This is sort
of like arguing that “I don’t agree that 2+2=4.” Informational complexity is
well-defined by information, and it’s got a precise meaning. The precise
definitions vary between algorithmic information theory (Kolmogorov-Chaitin)
and communication information theory (Shannon), but the basic concept
underlying both is the same, and they agree that complexity is related to
information content, and maximum information content (and thus maximum complexity) is perfect randomness.
There is no information theory that says randomness doesn’t
maximize information content and complexity. None! This
is something that you see frequently from the clueless about information theory: they really don’t like the idea that randomness contains
maximum information, and they assert that not all information
theory agrees with that – like the statement above” maximal randomness = maximal complexity is not true for all information theories”; but they
never actually cite any kind of information theory at all – because there is none that does what they want. They’re sure that there must be,
because K/C and Shannon seem wrong. But there is no such theory, no
matter how much you may want one to exist.
Now, we’ll jump backwards to the interesting part.
Unfortunately for T-Stone, if he paid attention to what he has written here he’d see that he’s soundly refuted Dawkins. After all, if maximal randomness is equivalent to maximal complexity, then it is easy for me to write a program that will generate completely random output. In other words, it is easy for me–a person who is not maximally complex–to produce a program with output that is maximally complex. Thus, if we want to play T-Stone’s game and use complexity in this sense, then Dawkin’s argument must be surrendered.
If I can make a program that is more complex than I am, then God can create a universe that is more complex than He is.
Our creationist friend is making several mistakes here. One of them is his claim that he can easily write a program that generates
a random string. Generating random data – true random data, not just data that appears random – is really difficult. What he’s thinking about, I’m almost sure, is running a program that calls a function like the
C “srandom” function. That’s not random; it’s completely deterministic. Start with a particular seed, and you’ll always get the
same sequence of “random” numbers. In fact, there’s only one
sequence of values that you’ll get: the function is a long-cycle
periodic function. Every sequence generated using that function is a part of the same, long, fixed cycle. And you can perfectly reproduce any string
that you generated in it with a simple set of instructions. The
original pseudo-random function used by “srandom”, in the Berkeley Unix C library is one line long; it’s basically the next “random” number is “
(lastrandom*1103515245L + 12345L) & 0x7fffffff“. So given that line of code, and the initial seed, you can generate the “random” sequence. So our friend’s “random” sequence is anything but random. It’s got a nicely random distribution, but it’s not random – it’s reproducible with a very short sequence of instructions. (Even if you use a better pseudo-random function, it goes up to around 64 lines of code for a really nice one.) Computers are deterministic: they suck at randomness. In fact true randomness must come from a data source outside of the computer. Real randomness comes from random external events, like radioactive decay. Building a true random generator is a very non-trivial task.
That’s thoroughly clueless. But it’s not the worst of his errors. He’s confused about two really important things about how you do things in information theory. First, he’s confused about what we mean in information theory by having a program generate a string; and second, he’s confused about how to compare the complexity of two different strings.
In information theory, when we talk about a program or a set of
instructions for generating a string, we’re talking about a program that
repeatably and deterministically generates that string.
What makes a high complexity string is the difficulty of
reproducing it, not of producing it. If I’ve got a true
randomness source, I can generate a random string from it with very little
difficulty. But that’s not the important thing, from the viewpoint of
information theory. What information theory cares about is how can I
describe the string in order to allow it to be reproduced. If he’s
written a short program to generate a string that has high information
content, that’s not very impressive, and it says nothing about information
theory. The short program that generates the random string doesn’t mean that
the random string is easy to reproduce, or even that the string is really
produceable by a short program. Because when information theory says a
string is produceable by a short program, what it means is that it’s
produceable by a short deterministic program. Run a random program
twice, it will generate two different strings! So from the viewpoint of
information theory, the string isn’t a product of the program; it’s the
product of randomness which was the input to the program, and that says
nothing about its information content. What matters is how long a program
that will always generate the random string, and only the
Completely random strings are incredibly rare in the real world. Almost any finite string that you can imagine is at least somewhat compressible – which means that it’s got some amount of redundant structure. If you think of it by metaphor with numbers, theoretically, the vast majority of numbers are irrationals. But in the real physical world, encountering irrational
numbers isn’t particularly common. Most things that we really encounter
have finite precision. We don’t see perfect circles that really have a perfect ratio of π between their measured circumference and diameter. We don’t see triangles so perfect that their edges have a 1/1/sqrt(2) ratio. We see finite approximations. Similarly, we don’t often see true randomness.
Our friend doesn’t get that. And that’s a huge problem for his
The second mistake in in his idea of comparison. Even if I could write a
program that generates terabytes of randomness, it won’t be as complex as
a human being. Because for all of the structure that there is in us, for
all of the redundancy, for all the duplication, we are enormously
complex. To create a string of information that provided you
with enough information to be able to accurately reproduce the precise
state of my body at a moment in time is really mind-boggling. It’s
an amount of information that’s difficult to conceive. The error that
our friend is making is forgetting that complexity isn’t a percentage. A
string that is 80% random (meaning 20% compressible) can be less complex
than a string that is 10% random – provided the second one is
longer. If the 80% random string is 1,000 characters long, and
the 10% random string is 1,000,000 characters long, then the 10% random
string is more complex!
Of course, that’s not the end of his cluelessness. Following
directly on that error, he finishes up with:
T-Stone is just giving a sleight of hand here. It would be like a mathematician saying “a > b” and having T-Stone say, “The greater than sign is inverted with the less than sign, therefore ‘a > b’ means ‘a is less than b’.”
But as soon as he engages in his sleight of hand, we respond: “If the greater than sign is inverted with the less than sign, then ‘a > b’ is no longer true, rather ‘a < b’ is.”
Inverting the operator without inverting the operands does not refute the original expression.
What he’s going on about here is based on his misunderstanding of what
complexity means in terms of information. He wants structured strings to have more information than random. He’s insisting that that’s the way things really work. And so, by his reasoning, an argument that
randomness has more information than structure is reversing the correct comparison – more structure should be more complexity, so if you’re making an argument that something needs to have more complexity, it necessarily
must be an argument that it has more structure. If you insist that
randomness is complex, then you’re reversing the argument: every statement that you make arguing for complexity needs to be switched to an argument
for simplicity, because you’ve swapped things around.
Unfortunately for him, the fact that he’s completely clueless changes
nothing about the nature of reality. Randomness is more
complex than structure. And when we talk about how complex the universe
is, we’re using the same definition of complexity. It doesn’t work the way he thinks it does – he’s just plain wrong.
Opt default for zero gods. Postulate infinite gods given a creation mechanism. Any other number is a rigged demo, Semites’ singular Yahweh to Hindus’ 36 crores.
Dentistry is the defintive argument against God. In the whole of human history across the entire planet not one deity has volunteered Novocain. It is a telling omission. Oh yeah… test of faith!
Thanks for the very clear and convincing explanation!
Informative post. One nit: I think you missed a character, which made one sentence very confusing:
“Theists argue that the universe/life/whatever is too complex to be the product of a random process, therefore there must be a God. The *theists* respond that it’s just a pushback: if the universe/life is too complex to come into existence itself, then how is it possible for an omniscient being to have come into being by himself?”
I think the starred word is meant to be “atheists”.
I think there’s a missing point: it’s purely a matter of a -possibly unfortunate- choice of words, that max information = max complexity = max randomness.
It’s not that an entirely random string (in the sense that every next bit it’s unpredictable given the past) contains “a lot of information”, it’s the fact that to communicate the information the string has -given any information measure- it’s not compressible, with respect to the lenght of the string itself.
I heard Chaitin said it himself many times: Comprehension it’s compression
I think that the (false) intuition that a random string contains less information than a structured string comes from exactly the same idea as Kolmogorov complexity: how hard is it to describe the string in words. But when humans describe something, they always gloss over “differences that make no difference”. And whether something “makes a difference” is subjective.
Let me give an example. Suppose you have a huge pile of bricks. If the bricks are piled up randomly, then for a human, the important information is just: How many bricks are there? That can be answered with one number. Now, suppose instead that someone takes those bricks and builds a building with them. Now, to describe that building requires many, many words. You have to say how tall it is, the number and locations of the windows, what the shape is, etc. For what a human is interested in, a building requires a lot more information to describe than a pile of bricks.
Information theory says just the opposite, because it really takes a lot more than one number to describe a pile of bricks. You have to give, for each brick, 3 real numbers describing the location of its center of mass, and then another 3 real numbers giving its orientation. That’s a lot of numbers. But the vast majority of that information is of no interest to a human. Who cares what the precise locations and orientations of the bricks are?
So what happens when humongous amounts of hydrogen gas turns into planets and people after 15 billion years is that information that nobody cares about (the precise locations and velocities of an ungodly number of hydrogen atoms) turns into information that humans do care about: Facts about the solar system, the geography, biology and chemistry of the Earth. The original state can be described in a few words: “A huge amount of hydrogen”. The final state requires many, many words. But it’s the subjective judgement of humans that most of the information about the hydrogen atoms is irrelevant.
I think Daryl has the right idea: more formally, the difference between information theory and “common sense” is that information theory describes all of the signal, whereas “common sense” is only interested in the important part of the signal.
Statistics is all about separating these two out: the noise is then easy to describe, because it is a random variable, controlled by a relatively small number of parameters (e.g. a covariance matrix). The interest s usually in the part of the signal that contains a pattern, i.e. which is predictable. As this becomes more complex, so the overall model becomes more complex.
I guess this is where the creationists’ intuition is coming from, but I haven’t seen them formalize it. I think some progress might be possible, using the ideas that stem from Akaike’s work. Complexity could then be represented by the degrees of freedom/dimensionality of a model.
Could you comment on definitions of complexity that attempt to measure structure? In this view, both completely disordered and completely ordered systems have zero complexity. To me this seems like a more interesting and physically relevant notion of complexity. One paper on this is by Feldman and Crutchfield, Physics Letters A, v. 238, pp244-252, 1998.
“There is no information theory that says randomness doesn’t maximize information content and complexity. None!”
I think that some commentators confuse “information theory” with “information science,” the study of how humans gather, communicate and store “information.” I think that some people assume that information science and information theory are closely related. They are not.
I have a degree in Library and Information Science, but never took a single hour of mathematics in the program. We did discuss Shannon’s information theory, but only to emphasize that his notion of “information” and our notion of “information” were two very different notions.
(This was my first serious case of math envy: Shannon knew what he was talking about when he said “information.” I didn’t and still don’t, though I can give several very nice handwavey descriptions.)
While the content of your rebuttal is correct, I think you do miss an important issue with what the creationist is saying. In particular, the “correct” creationist argument is that living things are *highly structured* (e.g. their Kolmogorov complexity is much lower than that of a “random” collection of matter), and thus could not arise randomly. The creationist should really have been talking about structure rather than information content.
And this argument does indeed require a rebuttal, which is what the formal (e.g. mathematical) theory of evolution does.
[There is no information theory that says randomness doesn’t maximize information content and complexity. None!]
I certainly don’t have much knowledge in the field… but what about generalized information theory?
Daryl hit the nail I was aiming at. It’s one thing to say that the complexity of a string is high due to the amount of information required to reproduce it. Check. But what of the amount of semantic information that the string conveys to a human? A splot of randomness conveys little to me, but a quick shuffle into a highly compressible form could convey a world of semantic information through the resulting structure.
I’m not arguing information theory, BTW. I’m more wondering what field this second viewport falls under and how it relates.
“T-Stone” here. Great write up. I put up a post a couple days ago responding to Pike on my own (I’m banned at their site, of course):
This little spat got started because similarly dumb stuff like this offered by someone called “Vox Day” who supposed he’d refuted Dawkins in The God Delusion because fractals and other recursive functions were “infinitely complex” (see this post, for example: http://debunkingchristianity.blogspot.com/2008/08/vox-day-fractal-intelligence-delusion.html).
While I think fractals are a great example to work with, I like the triangle example you used here, and will steal it if I may (with attribution). As you’re aware, one of the big problems here is pedagogy. You’ve got more skill in patience in that regard than I have.
A question from the peanut gallery… You said that, “The main difference is that from the viewpoint of probability, a string is “random” if it was generated by a random process – and a random process can (and in fact by necessity will) periodically generate a string that we wouldn’t call random.”
If random processes will do this at some point, then wouldn’t a vast quantity of random data in a string at some point imply that the creation of that random data is not itself random? I know that’s an awkward question, but I hope you understand it.
Perhaps a different way of phrasing it would be this, what are the odds that a random string of any significant length actually does not have any patterns that mimic non-random data? I know we could create a string in that manner by intentionally weeding out the random fluctuations that mimic non-random strings when they occur, and if memory serves me that’s what they do when they use radioactive decay to generate ciphers, but if we are doing it it is no longer “pure” randomness at work.
So I suppose the final phrasing of the question might be, wouldn’t it be more likely that a string that has _no_ areas that mimic non-random strings would be generated by someone weeding out the “bad” and only keeping the “good” instead of a random source?
I don’t mean this to imply ID or anything like that. I just wonder if such a random sequence could exist at all, other than if we created it.
To me, by far the biggest mistake of the creationist was his trust in his ability to easily create random sequences. This really means he can prove the randomness of . As Mark already said it, one can only suspect it, fake it, etc.
If we could prove randomness, I think this we could do something like this.
“Ok, here is a truly random sequence which I created with my god-like-programming awesome power. You, rest of the world, go ahead and compress it. And because my programming is so great I say you won’t be able to compress my sequence even one bit. And because I am god-like, I have the ability to create things and set lower bounds on their algorithmic complexity, just like I did with this sequence.”
But read this:
and you will see that putting lower bounds on complexity is not really an option.
Another thing I remembered while reading this post was Chaitin’s “MATHEMATICAL DEFINITION OF LIFE” http://www.cs.auckland.ac.nz/~chaitin/sicact.pdf) (read from page 7) where he mentions the “enormous interdependence [between the parts that make a living being] must be a monstrously rare [random] occurrence in a universe, unless it has evolved gradually” (read: as in a process of evolution). Now compare this with a supposedly random arrangement of the parts that make a living being and which, is, in theory much more complex informationally.
I have never fully been able to wrap my head around the concept of randomness,If a result is acquired it is because of a specific set of influence. If I break the pool balls they will react according to the variables and parameters that are known.If a drunk walks by and falls on the table skewing the output, it is still within the possible outcomes. So what I am saying is that if all contributing data is known, how can anything be considered random?
There is no amount of contributing data that will permit you to predict radioactive decay
“Real randomness comes from random external events, like radioactive decay.”
As a physics student, I get irritated when people point to quantum processes as examples of “true randomness.” As far as I know, there’s no way to distinguish a “truly random” string from a “pseudorandom” one, even in principle! If this is true, the “true randomness” of quantum events can never be tested, and should not be regarded as a scientific fact!
Of course, if there is a way to distinguish a truly random string from a pseudorandom one, I would be super-interested in hearing about it!
“There is no amount of contributing data that will permit you to predict radioactive decay.” (trrll)
Ahhh… but that’s the question, isn’t it? 😉
Good post. But I am unhappy with one of your examples. You present the strings
"ABCDEFGHIJKLMNOP"as strings of low randomness and the string
"XEGAPC]GI(BAUL+"as a string of high randomness because, as you say, you can give constructions of the form “write ’10’ 8 times.”, “Write 16 ones” and “write the first sixteen letters of the alphabet”.
But I would say that the first two these constructions are entirely different from the third. The first two are genuine constructions in terms of the symbols, the string is made up of, while the third is looking up a reference source.
If the string
"ABCDEFGHIJKLMNOP"really was less random than the string
"XEGAPC]GI(BAUL+"because it was just “the first sixteen letters of the alphabet”, wouldn’t I be able to “lower” it’s randomness by calling it “the random string from that post on gm/bm”?
All the best from the nitpicking neighbourhood
jvk: an alternative description of “ABCDEFGHIJKLMNOP” would be:
“start with A, and repeat 15 times: add next character = previous character+1”. This description does not reference any external source of information at all. Of course, it is still using the ordering of the alphabet you are using to represent your information in the first place, but I think it’s not unreasonable to use your information representation system as a given.
By the way, referencing an external information source does not necessary use less information. First, the information source must actually exist, so you’ll still have to include its information content. Second, even if you don’t count the information content of the source, “the random string from that post on gm/bm” only means something if you’ve actually read “that post”, in which case you’d already have the information. Somebody who didn’t read it (or, inconceivably, doesn’t know what “gm/bm” means), you’ll need to provide with a URL at least, and preferably a paragraph number too. Try doing that in 16 characters.
Very cool blog, Mark! Thank you.
It used to be, before experiments testing Bell’s inequality eliminated local “hidden variable” models.
Daryl and Billy C say it best. There once was something called communication studies which tried to understand how we convey meaning to each other. Data (information) and structure (semantics) were the tools used to achieve this. This was hijacked by information theory that threw away meaning. What use is that?
For example, if you have a string of n random binary digits, you will have the maximum amount of information that n binary digits can have. It’s still compressible by replacing it with the phrase ‘n random binary digits’, or ‘part of the binary representation of pi, between sciblogs and the complete works of Pratchett’. You can’t complain that a different string of n random binary digits has more or less info. The only difference is in meaning and information theory threw the meaning away.
A waste of time and effort.
Also, hear hear jvk. The particular order of the alphabet represents a vast amount of structure that can’t be swept under the rug.
Eddie, I think you missed the point completely: “information science” as BillyC described it has little to nothing to do with information theory, they’re completely different fields that study completely different things. You don’t fault biology for not addressing gravity either, do you?
And a waste of effort? I think anyone dealing with cryptography or compression algorithms would disagree with you. Loudly.
First: A “random” string isn’t replacable by the phrase “N random digits”; the “random digits” can be the most compact representation of something with significant meaning. It’s *not* compressible or replaceable without losing that meaning.
Second: information theory didn’t throw out or replace anything. It’s a discrete field of study. There is still library science and non-mathematical communication theory which deal with *knowledge* – which is something quite different.
Information theory is an extremely valuable and important field of mathematical study. It’s used in communications – the ways in which the networks that we’re using to exchange these messages were built relied on information theoretic study of communication channels. It’s used in physics. It’s used in computer science (and not just in theoretical ways).
For the kinds of arguments that creationists like to try to put together about complexity, the complexity that they’re talking about is exactly the same as the information theoretical complexity of Kolmogorov/Chaitin information theory. They just get grumpy about it because they don’t like the results.
You’re absolutely right, and my intention was that it’s clear that the way you enumerate the letters of the alphabet is based on the fact that you’ve got a discrete set of symbols used in your string (the alphabet), and there is a complete ordering relationship over those letters.
In the full-out formal definition from information theory, the amount of information in a string is the shortest program
that can generate it, starting from a minimal effective computing system. So the full ordering relationship of the alphabet is required in the “program” to generate the alphabet.
But in terms of the programs that generate the strings, I’m assuming that knowledge of the basic symbol set and its orderings are part of either program.
It’s always dangerous to try to talk about formal subjects in informal terms. This example fell victim to that.
I saw a paper, I think last year, which demonstrated by experiment that there aren’t any local hidden variables affecting things like decay events. My understanding of that is that it means that for all intents and purposes, within
our universe, we cannot observe a cause for those events – there is no local phenomena interacting with the primitive particles and forces that determines when a decay will occur. From our viewpoint, they are truly random events.
I did not mean to claim that information theory misses any point. I claim that information theory’s “information” and information science’s “information” are two different things.
Some people who really should know better assume that information science and information theory are closely related. I know this from discussions with colleagues. I shouldn’t be surprised if the same confusion occurs among, say, religious evangelists who want you to think that “the information content of DNA” implies that somebody has used our cells to write an essay.
At the same time, I’d like to amend Mark’s claim that “library science and non-mathematical communication theory deal with *knowledge*.” Information science distinguishes between information and knowledge, and information scientists study both. Admittedly, the distinction between information and knowledge is not very clear. Welcome to the social sciences.
But saying that information scientists like librarians and communication theorists don’t study information is like saying that group therapists don’t work with groups. We do; we just happen to call something else by the same name.
You say: “In the full-out formal definition from information theory, the amount of information in a string is the shortest program that can generate it, starting from a minimal effective computing system. So the full ordering relationship of the alphabet is required in the “program” to generate the alphabet.”
Yes, I do see the idea with the ordering. But I think this just shifts the problem. Introducing an arbitrary ordering on your symbols (which is quite useful for programming, though) should not make a difference on the randomness of a string. Otherwise there would be a “minimal effective computing system” for any given string, minimising its randomness.
With basically no knowledge on information theory I’d venture that this minimal computing system was a limit of computing systems and that this limit was unique (and equal for all programmes).
In short, I guess randomness would be a natural (mathematically speaking) concept, so it could not depend on an arbitrary choice of ordering.
Or from another perspective, since the series on cryptography just started – If the random looking string was what you got from the first 16 letters by a simple substitution cipher they should have the same randomness, shouldn’t they?
I hypothesize that Intelligent Design propoganda is more effective aginst people suffering from Dyscalculia. Those math-averse math-disabled people are immune to counterarguments which require one to know enough Math to be shown the difference between Good Math and Bad Math.
I took courses in Information Theory at Caltech, back in the 1960s and 1970s. Some were in the Math Department and Applied Math, some were in Electrical Engineering. In grad school, my M.S. was in a department named (at the time) “Computer & Information Science.” That department is now named “Computer Science.” Also, in grad school, my Graph Theory course was in the Engineering Department. Which department has which course in which department is generally a historical accident. For instance, “Introduction to Logic” m,ight be taught by Philosophy, or by Math, or by Computer Science. Yes, there has been a historical difference between “Information Science” and “Information Theory.” In other posts of other threads of this blog I’ve described what assumptions Claude Shannon did and did not make, and had conversed with him on those.
As to Dyscalculia, which is extremely important to me as having been both a professor, and a high school and middle school math teacher for urban teenagers flunking math, here’s an interesting new study.
Aboriginal Kids Can Count Without Numbers
ScienceDaily (Aug. 19, 2008) — Knowing the words for numbers is not necessary to be able to count, according to a new study of aboriginal children by UCL (University College London) and the University of Melbourne. The study of the aboriginal children – from two communities which do not have words or gestures for numbers – found that they were able to copy and perform number-related tasks.
The findings, published in the journal Proceedings of the National Academy of Sciences, suggest that we possess an innate mechanism for counting, which may develop differently in children with dyscalculia.
Actually, Kolmogorov’s definition of randomness for a finite string has the same problem, that it depends on some arbitrary choices. If you want to say that the complexity of a string is the size of the smallest program that computes that string, then the answer will depend on what programming language you are using. If you take the limit as the size of the string goes to infinity, then this arbitrariness goes away (because the differences between programming languages become irrelevant in this limit). (It’s a little bit tricky to do the limit and come up with a good definition of what it means for an infinite string to be “random”, but Chaitin gives the details somewhere.)
So, I don’t think that there is a completely objective, absolute definition of “randomness” for a finite string, it’s only random with respect to a particular programming language.
Thank you for the clarification.
Do you know whether this is a problem of Kolmogorov’s definition only or if it is inherent to finite strings?
I think it is inherently a problem with saying what finite strings are complex. Everything is always complex with respect to some set of basic strings and basic operations on strings. If you change which operations are considered basic, then you will make slight changes to the complexity of short strings. For very long strings, it doesn’t make much difference in practice.
So complexity of long but finite strings is one of those judgments that are just on the border between being objective and being subjective. It’s almost objective, but not quite.
Thanks for helping my understanding on this.
The point I was making is that information science and information theory ought not to be so separated. Inasmuch as info theory is not about communication, it’s not as useful to fields such as computing, physics, etc. as it could be.
Also, if biology used terms like force, mass, acceleration to describe interactions between cells and molecules, you’d have the right to expect some consistency of meaning with those terms used in physics.
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
[T. S. Eliot, The Rock, pageant play, 1934]
eddie, regarding #35: “if biology used terms like force, mass, acceleration to describe interactions between cells and molecules, you’d have the right to expect some consistency of meaning with those terms used in physics.”
Speaking as someone who is extensively published in both Physics and in Biology, I assure you that “force, mass, acceleration” mean exactly the same things in Physics and in Biology.
As to the differences between Information Science and Information Theory, the nature of the Alphabet, where “meaning” resides, and the social reality of human communications, I politely suggest that you read definitions (both in Linguistics and in Computer Science for the first 2) of:
When do we talk about how complex the universe is, and how do we define it? We don’t do that in physics; and according to Sean Carroll:
For an MWI cosmologist the wavefunction of the universe remains the same physical object, and has the same microstates. What differs is how the macrostates look.
And if we are looking at macrostates as structures, there isn’t one specific complexity measure that can capture all structures. I think it was Gell-Mann from Santa Fe Institute that claimed that.
See for example comment #7; complexity measures that I have seen discussed among for example biologists are mutual information (among say clusters of neurons) and functional complexity (among say functional sequences of proteins), and they seem to typically max out for structures that AFAIU resembles small-world networks, i.e. between completely disordered and completely ordered structures.
Various types of functional complexity makes IMHO intuitive sense since evolution makes functional traits to increase fitness during adaptation, so that should be the type of structures we expect.
Another type of complexity that is expected is interlocking complexity that was predicted during the 1930’s by biologists, that biological functional systems gets interdependent by contingency during evolution. (This corresponds to what creationists later high-jacked as “irreducible complexity”.)
@ Aaron F., #18:
I agree that it is irritating, as randomness is defined as Mark describes, as the inability to predict the next state, and isn’t sourced by quantum processes alone. And because the alternative to randomness isn’t “false randomness” but determinism such as pseudorandomness.
OTOH it is as trrll notes (#23) evident that Bell test experiments reveals that QM randomness results in what I prefer to call “genuine randomness”.
If it makes you feel better decoherence shows that this randomness is only genuine as far as our ability to measure goes. Blake Stacey who sometimes visits here IIRC uses to point to some pages that shows how the information is still present in the environment according to decoherence (but, alas, I can’t find them now). It is just lost for us in a process AFAIU analogous to thermodynamical losses described by entropy.
Something that perhaps was evidenced by the recent press release that made the rounds about how weak measurements allows for reversing decoherence. Have you read it? (Don’t have the stamina right now to find that either, sorry.)
“high-jacked” – hijacked, highjacked. [grumbles] Coffee! [/grumbles]
@ trrll, #23:
“It used to be, before experiments testing Bell’s inequality eliminated local ‘hidden variable’ models.”
Oh, good point—I forgot about that! 😛 But is there a Bell-type inequality that applies to radioactive decay? From a few minutes of Googling, I get the impression that there’s no general procedure for constructing these inequalities.
@ Torbjörn Larsson, OM, #38:
“If it makes you feel better decoherence shows that this randomness is only genuine as far as our ability to measure goes.”
Can you explain in more detail what that statement means?
What I really want to know is whether it’s possible to amass evidence for or against the proposition that a given finite string (say, a string of waiting times between radioactive decays) is a substring of a Kolmogorov-random string. In other words, can we test the assertion that quantum events are Kolmogorov-random?
Of course, there’s also the problem that in Einsteinian relativity, there’s no total order on events, so I don’t know if it’s even possible to view a set of events as a string.
“Gell-Mann from Santa Fe Institute” parallels “Einstein from Princeton.” It’s not where either one made their reputation, but it is a comfortable place with fascinating people and little pressure. In fact, the case has been made that The Institute for Advanced Study has too LITTLE pressure, and good theorists arrive there and become nonproductive.
Still, I’m glad that Torbjörn Larsson brought this up. The Santa Fe Institute has radically redefined how many people see Complexity. I’ve been there several times (such as at the first few Artificial Life conferences), would love to have an appointment to research there, and think that Murray Gell-Mann went there in part because of his wife’s appointment to an English Lit department.
If you like counterculture, neat architecture, avant garde art, and spicy food, Santa Fe is a great place to hang out.
As to new approaches to Complexity see:
ALIFE XI: The conference attracted lots of media coverage: see online for links to related articles and videos.
The next European Conference of Artificial Life will be held September 2009 in Budapest, Hungary. The next International Conference of Artificial Life will be held in 2010, location to be decided.
Artificial life investigates the fundamental properties of living systems through simulating and synthesizing biological entities and processes in artificial media. Summer 2008 will see the international Alife conference hosted by the University of Southampton, UK, bringing the meeting to Europe for the first time in its 21-year history.
Over the last two decades, some of the highly speculative ideas that were discussed at the field’s inception have matured to the extent that new conferences and journals devoted to them are being established: synthesising artificial cells, simulating massive biological networks, exploiting biological substrates for computation and control, and deploying bio-inspired engineering are now cutting-edge practice. The Artificial Life XI conference provides an opportunity for those working across these topics to get together and exchange ideas and results. To this end, the conference will present a selection of the best current work in the field, highlight new directions for investigation, and present high-profile keynote speakers.
Artificial Life XI took place in Winchester, a beautiful historic city in southern England, in August 2008. It was hosted by the Science and Engineering of Natural Systems group in the School of Electronics and Computer Science at the University of Southampton.
The Organizing Committee of Artificial Life XI:
Seth Bullock (Chair), Jason Noble, Richard Watson and Mark Bedau
It used to be, before experiments testing Bell’s inequality eliminated local “hidden variable” models.
Posted by: trrll | August 19, 2008 8:56 AM
Can you point to a paper that eliminates loopholes,inconsistencies, or assumptions?
Blake Stacey who sometimes visits here IIRC uses to point to some pages that shows how the information is still present in the environment according to decoherence.
(Don’t have the stamina right now to find that either, sorry.)
Posted by: Torbjörn Larsson, OM | August 19, 2008 6:02 PM
I would be very interested in this information when you are able, If you would be so kind,I would be very much obliged.
Sorry, Didn’t mean to be (Anonymous)
However, arguing with, or even simply explaining anything to, creationists is futile, because they are constantly changing what words mean.
To a creationist the meaning of ‘information’ freely shifts back-and-forth between Shannon information, K-C information, and Dembski’s deliberately vague usage; even if you specify up front exactly which definition you are using.
The same goes for ‘random’ (probability theory, information theory, or some muddled nonsense someone just made up), and pretty much every other word in the English language.
I do not see the alternative to randomness as deterministic(given the magnitude of possible outcomes) I see it as reality. You can have a magnitude of possible outcomes and not be random.
We are yet in our infancy concerning S.R.&Q.M. These are the tools we have yet there are inconsistencies.
Science is only accurate for the present.
Science(Knowledge) is a dynamic pursuit, I can not resign to accept the concept of random when it has not been proven conclusively. I do not believe in magic. I can not accept the postulates with faith.Their is more information to be gained. Random to me would equate to God, and faith is not science.
God plays dice…Science does not!
I think I understand well why it is that people’s intuitive sense of what does and what doesn’t have high information content seems so far wrong. As happens so often it turns out that the cause is not because people are so ignorant, it is because they are so SMART… and they’re making assumptions that may well be correct, but which are not used by the mathematics.
Consider the following three sequences of digits:
(A) The first four thousand digits produced by concatenating base-10 representations of the odd square numbers.
(B) Four thousand random digits. (It doesn’t really matter which ones)
(C) Four thousand random digits. (A PARTICULAR set of random digits… it MATTERS which ones)
The math says that C has more information content than A. And it does. The uninformed commentator says that A has more information content than B. And it does — all you really need to specify is the length, and that needs to be specified for A as well. The problem is that both B and C are usually expressed without the clause in parentheses, so it is difficult to tell them apart. The careful mathematician will use definitions that clarify which is intended; the amateur will use his or her common sense to realize that the details don’t matter (but sometimes they DO).
This viewpoint has an impact on the creationist arguments also. For instance, the probability of intelligent life should not be based on the number of bits worth of DNA in a human (that would be like C), but on the number of bits worth of DNA needed for ANY intelligent life form, whether it has 2 legs or 5 may be irrelevant but most of the DNA is coding for that. This is a detail most creationist arguments get wrong.
I doubt I’ll be adding any profound insights so I appreciate your patience. It is easy for most people to confuse (equate) “random” with “noise”. That is, noise is a completely random signal, but it is not something you care whether it gets reproduced exactly when it is transmitted. The same with a random string of characters, most people are going to look at “qwiewr;sadf;lknvoijer” and say there is no information because it does not matter if any of the letters are transmitted wrong, since it is just random. It takes a lot of education to understand the distinction between “random” and “noise” and between “information” and “meaning”. And it is this confusion that the creationist try to exploit to their own ends.
@ Aaron F., #40:
The result applies to all of QM by consistency.
Blake Stacey refers to physicist and author Greg Egan on decoherence:
Incompressible noise as a time series is white noise of some form, say gaussian. If you can transform your decoherence measurement results to something related to a Wiener process, I guess you are close. The variations in CMBR is assumed to be caused by inflated quantum variations, and it is AFAIU nicely Gaussian over all spatial frequencies.
Not a problem if you are studying the result of local processes, see the CMBR for an example of an integrated result.
@ JVP, #41:
True – that’s why I didn’t want to bring up his Nobel prize.
I traveled through there, and I can attest to the architecture, art and food. Also, they have some beautiful people – I can still picture some of the girls. [Tsk, tsk!]
@ Sphere Coupler, #43-44:
My pleasure, and done, see previous comments. This time I bookmarked it.
As regards the source, at least I’ve seen Egan do his physics and math on sundry blogs like Cosmic Variance and the n-category café blog, and he knows his stuff. Decoherence (as well as MWI, according to cosmologist Sean Carroll’s latest Bloggingheads video) has AFAIU gained considerably ground among cosmologists and perhaps particle physicists.
@ Torbjörn Larsson, OM, #49
Thanks for the Greg Egan link! That clears a lot of things up—at least, I think it does. 😛
There’s a nice statement of what goes on in evolution, through simulation experiments, with specific types of randomness (of a network, of a landscape, of mutations).
Title: Evolution of a population of random Boolean networks
Authors: Tamara Mihaljev, Barbara Drossel
Comments: 10 pages, 17 pictures
Subjects: Populations and Evolution (q-bio.PE); Molecular Networks (q-bio.MN)
The Neodarwinian view of biological evolution considers random mutations and natural selection as the main shaping forces of organisms. Mutations act on the genotype, while selection acts on the phenotype. For this reason, the relation between genotype and fitness is very complicated and far from fully understood. Mathematical models of biological evolution often contain a direct mapping of the genotype on the fitness. The “fitness landscape” may be smooth and single-peaked or random and rugged, or the fitness is taken as the additive contribution of the alleles at several loci. However, the “real” fitness landscape might have completely different properties. For this reason, it is important to investigate models that do not make a direct mapping of the genotype to the fitness, but that determine the fitness from some “trait” that is related in a nontrivial way to the genotype.
The most famous example of such models are based on RNA. The genotype is the RNA sequence, while the phenotype is the two-dimensional fold. When fitness is based on some desired fold, it is found that the fitness landscape contains a huge plateau of high fitness that spans the genotype space. The same feature is displayed by the fitness landscape that is based on the three-dimensional fold of proteins, with the genotype being the nucleotide sequence of the corresponding gene.
However, most traits of an organism result from the interaction of many genes….
@ Torbjörn Larsson, OM #50
Thank you greatly for the info,and I will make use of it as time allows. As it is I am terribly rushing from point to point that I don’t always know exactly where I’ll be next.
However,they say time waits for no man and I am no exception.Alas, the young and eager are returning to University and my lovely slumber d summer break is over.
What he’s thinking about, I’m almost sure, is running a program that calls a function like the C “srandom” function.
You, and your readers, should know that “srandom” isn’t really part of C. It’s part of the POSIX standard widely implemented on Unix-like systems such as Linux and much imitated elsewhere, but it’s quite possible to find a C compiler on something such as Windoze that doesn’t have the POSIX goodies. The only pseudorandom number function guaranteed to be part of your C library is “rand”.
He’s confusing “information” with “useful information”. The latter is not what information theory concerns itself with (at least in this context), and it’s highly subjective.
#55: I agree. That’s why I quoted (in #36):
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
[T. S. Eliot, The Rock, pageant play, 1934]
Data, information, knowledge, wisdom. Maybe a hierarchy. No two synonymous. Differing degree of engineering versus subjectivity. Different domains of human experience.
All this made worse if the human believes that all wisdom flows from God.
“Information” means what it means, according to strict mathematical definition, in Information Theory. Commenters already established that this is utterly different from “Information Science.” And in even more vague vernacular usage.
Intelligent Design leaders know that, and exploit it, as a matter of intentional fraud. They prey on the ignorance of the masses, and use linguistic ambiguity as a weapon. I am disgusted. I applaud Mark CC for fighting this horror.
It bothers me that ignorant Creationists think that “random” is a simple concept. They will never, ever, read and comprehend a paper such as:
Every Computably Enumerable Random Real Is Provably Computably Enumerable Random
Authors: Cristian S. Calude, Nicholas J. Hay
(Submitted on 15 Aug 2008 (v1), last revised 23 Aug 2008 (this version, v2))
Abstract: We prove that every computably enumerable (c.e.) random real is provable in Peano Arithmetic (PA) to be c.e. random, a statement communicated to one author by B. Solovay. A major step in the proof is to show that the theorem stating that “a real is c.e. and random iff it is the halting probability of a universal prefix-free Turing machine” can be proven in PA. Our proof, which is simpler than the standard one, can also be used for the original theorem.
The result that every c.e. random real is provably c.e. random is in contrast with the case of computable functions, where not every computable function is provably computable. It is even more interesting to compare this result with the fact that–in general–random finite strings are not provably random.
The paper also includes a sharper form of the Kraft-Chaitin Theorem, as well as an automatic proof of this theorem written with the interactive theorem prover Isabelle.
“What makes a high complexity string is the difficulty of reproducing it, not of producing it.”
So doesn’t that make something like the existence of DNA (which is complex and reproduces itself) an improbable occurrence as far as random evolution is concerned ? And doesn’t it strengthen the case of the creationists in this regard ?
It might, provided that: (A) evolution is truly “random”, and (B) DNA is truly high-information-content in the information theory sense.
In fact, evolution is absolutely *not* random. Individual
mutations are random, but evolution is a process of continually selecting the most successful competitors. This comes back to one of the constant bullshit problems with creationists: they insist on conflating evolution with all sorts of things that aren’t evolution. The theory of evolution doesn’t describe the origin of life, much less the origin of the solar system or the universe. All it does is describe how, once life exists, it develops and changes. And that’s not a random process. The fact that we have DNA isn’t random; there was some early self-replicator that was quite a lot simpler than (but in some way related to) DNA. But over time, DNA developed, and because it was so vastly successful, it wiped out all of its competition, and so we see it today.
And that leads to the second point. DNA sure seems complex to us. But it’s really not, in terms of information theory. It’s a simple encoding system. From the simplest point of view, it’s a system that encode a string of base-4 digits.
But for each of those digits, there’s a massive chemical backbone which is identical for each digit; the digits themselves are paired across the helix, so that every digit is essentially represented twice. It’s a vastly redundant system. In terms of information theory, it’s massively compressible. And that’s completely ignoring any of the more high-level structure – which makes it even more compressible.
There’s a good reason why it’s so sparse from an IT point of view: the chemical processes of life are astoundingly erratic. An awful lot depends on effectively random and sporadic things like brownian motion! The massive redundancy,
the complex and repetitive backbone, it’s all valuable for having a stable, well-preserved genetic code. That’s why DNA is the main surviving fundamental replicator and genetic coding mechanism in basically all living things on earth: because it’s such a strong system.
“Completely random strings are incredibly rare in the real world. Almost any finite string that you can imagine is at least somewhat compressible”
Actually, that’s completely (and provably) wrong. The fact that most random strings are not compressible is a fairly trivial result given in many textbooks.
Here’s a quick proof by reductio ad absurdum:
Take the set of all binary strings of length n. There are 2n of them. Assume that more than 2-k of them are compressible by k bits or more. Then we have that more than 2n-k different numbers can be described by no more than 2n-k compressed strings. But since each compressed string can only yield one uncompressed string after decompression, this is clearly impossible.
Thus we have that at least 1/2 of all random bitstrings are not compressible by even 1 bit, at least 3/4 are not compressible by 2 bits or more, at least 7/8 are not compressible by 3 bits or more, etc.
Howard, Mark is talking about concrete strings from the real world. Yes, the vast majority of strings (of some bounded length, say) is completely random in the sense of not being compressible; but it is very hard to actually point out one of any considerable length. Just think about it: to produce an incompressible string, a deterministic algorithm needs to be of at least the same size; to produce a set of incompressible strings deterministically, the algorithm needs to be larger than the total size of the produced strings. So practically everything produced by a computer without an actual RNG (as in having a GM counter built in) is actually trivially compressible, and I think that the amount of hand-crafted random strings can safely be ignored next to the amounts of data generated by computers.
I really like your blog, especially the good math part. It’s nice to read your informative texts.
I couldn’t help noticing that you are actively defending us all against creationism, like a computer scientist in shining armor. I do hope you guys win “over there”. I’m a computer scientist who sits on the European continent and we do not have any trouble with creationist and scientism like you Americans have it.
Don’t get me wrong, we do have religion, but they tend not to interfere with science.
I hope yo never get tired of defending mankind against the dark pits of religion.
Best regards, Martin Kaarup