If you regularly follow comments on this blog, you’ll know that I’ve been
having a back-and-forth with a guy who doesn’t know much about information
theory, and who’s been using his ignorance to try to assemble arguments against the
Kolmogorov-Chaitin information-theoretic measure of information in a string.
In the course of doing that, I came up with what I think are some interesting ways
of explaining bits of it, so I thought I’d promote it to the top-level to share
with more readers.
To be absolutely clear up front: I’m far from an expert on K-C theory. It’s something that I find incredibly fascinating, and I’ve read a lot of Chaitin’s work. I’ve been fortunate enough to meet Greg Chaitin a bunch of times when we both worked at
IBM, and I’ve attended a bunch of his lectures. But this isn’t my main area of expertise,
not by a long-shot.
If you don’t know any K-C information theory, then you can look at my
introduction here. The rest is beneath the fold.
Information vs. Meaning
Kolmogorov-Chaitin information theory is focused on
quantification – that is, on being able to measure the quantity of
information in a string of characters. It doesn’t care what the string
means. Information theory cares about how much information
is in a string; it doesn’t care what the string means. In fact, you can say
something much stronger about information theory: if you consider meaning
at all, then you’re not doing information theory. A truly
abstract string could mean many different things: information theory’s
measurement of its information content must include everything that
could be used relative to any possible system of meaning to determine
the information content of a string.
I’ll borrow a tiny bit of formality from denotational semantics. In
denotational semantics, we take a domain of objects and functions that range
over those objects, and we map statements – sentences that have some
syntactic structure – onto the objects and functions in the domain.
To assign meaning, what you’re basically doing is creating a system where
you can map from syntax – that is, from strings of symbols – to
objects and functions. The string of characters “Mark Chu-Carroll” has
absolutely no intrinsic meaning. But by interpreting in a domain
of meaning, we can say that the domain maps that string of symbols
in a way that makes it a representation of a reference to me. But there’s
nothing about that string that makes it necessarily be a reference
to me. It could mean many different things, depending on domain of meaning
which we use to interpret it.
To be a bit more specific, consider the string
“0000000000000000000000000000000000000000”. That’s a string of 40 “0”s. If you
interpret that as a number – that is, you understand the string of digits as
having a meaning as a number, it’s the number 0. But a string of 40
zeros is not equivalent to the number 0 in terms of information
theory. That string could also mean the number 40 understood as a unary
number. Or it could mean a sequence of 5 bytes containing nothing but zeros in
a computer memory. Or it could mean sequence of 41 bytes, the first forty
containing the ASCII code for the character “0”, and one containing a
terminator-mark. The string contains enough information to allow you to
interpret it as any of those things. But if you choose any of those
as the meaning of it, and work out a representation of that meaning,
you’ll wind up with a different answer to the quantity of information
contained in that string than you would for the string considered as an
abstract. Choosing a framework in which to assign it meaning can make a string
appear to have either more or less information than the
You can make it appear to have more information by choosing a system of
meaning in which the system itself contains extra information that it assigns
to the string; for example, when I say “the complete table of printable
unicode code-points in order”, if I interpret that string in the right
framework of meaning, it means a sequence of over 1100 characters.
But those 1100 characters aren’t containing in the string; they’re part of the
framework of meaning that’s used to interpret the string.
You can also choose a framework in which to assign meaning which makes a
string appear to have less information content than the
string-as-abstract. For example, the string of 0s above. If we interpret a
string of 40 “0”s as meaning zero, then in terms of meaning, that
string has one character of information.
Intuitively, it’s hard for us to separation information from meaning. Informally,
we consider them to be the same thing. But in information theory, they’re very
different. As I said above, meaning is what happens when you interpret
a piece of information in some context. But absent any context, how can
you quantify its information content?
To be strict, you can’t. Without any context of any kind, you’ve got no
way to quantify anything. But you can create a sort of minimal context, which
gives you a really simple way to quantify the amount of information in the
The basic, fundamental idea is description. The amount of
information contained in a string is the length of the shortest
description that uniquely identifies that string.
And that’s the essence of information theory: description. The
amount of information in something is the shortest description that
can uniquely reproduce that thing. It doesn’t matter what the thing means. You
don’t need to understand it. You don’t need to interpret it. All you need is a
description that allows you to reproduce it. To produce the string
10101101, you don’t need to know that it means the number ten million, one hundred and
one thousand, one hundred and one interpreted as a number written
in decimal notation. You don’t need to know that it means the number
173, interpreted as a binary number. All you need to know is that
it’s a sequence of symbols.
That definition might seem like a trick – it just replaces “quantity of
information” with “length of the shortest description”. How can we
describe something in a way that doesn’t involve meaning?
The answer is: by computation. We can define a simple,
minimal universal computing device. A description is a program that
generates a string. The shortest description of a string is the shortest
input-free program that can generate that string.
Of course, there’s another copout there. What’s minimal? And
won’t the information content of a string be different using different computing
Minimality is a tricky topic to tackle formally. You could easily write
a PhD dissertation (and someone probably already has) on a precise
definition of a minimal computing device. But informally, the idea that
we’re getting at is very simple: no magic instructions. That is,
each instruction performs a single action – so you can’t output
more than one symbol with a single instruction.
For the second question: yes, that’s true – to a limited extent. But
given any pair of computing machines, the difference between the
length of the shortest programs to generate a specific string will vary
by no more than a single fixed constant.
That last point is easy (and fun) to prove. First, we need to
formalize the statement. Take two computing
machines, P and Q. Let the information content of a string S relative
to P be written as I(S, P), and relative to Q be I(S,Q).
For all strings S, the absolute value of I(S,P) – I(S,Q) will
be less than or equal to a constant C.
Here’s the proof:
- if P and Q are both universal computing devices, then
there is a program for P which allows it to emulate Q;
and there is a program for Q which allows it to emulate P.
- Let the program for P to emulate Q have size Pq,
and let the program for Q to emulate P be Qp.
- Consider a string S. Let the program for P to generate S
be PS, and the program for Q to generate S be
QS. If PS is shorter than QS,
then the shortest program for Q to generate S cannot possibly
be longer than QS + QP.
- Reverse the above statement for P and Q.
- Now, take the longer of PQ and QP. For
any string S, it should be obvious that information content of
S relative to P or Q cannot possibly differ by more that
No magic there. It’s very straightforward.