(This is a revised version of an old post; you can see the original here.)
Please read the addendum at the bottom – I got carried away with computability, and ended up screwing up quite badly.
In the comments on my last post, a bit of discussion came up about some of the strange properties of real numbers, and that made me want to go back and get this old post about numbers that can’t even be described, and update it.
In those comments, we were talking about whether or not π contains all sorts of interesting subsequences. The fact is, when it comes to π, we just don’t know.
But… We do know that there are many numbers that do. There’s a property called normality. Informally, normality just says that taken to infinity, all sequences of digits are equally likely to occur. In an infinitely long normal number, that means that any finite length subsequence will occur, given enough time. My name, your name, the complete video of the original version of Star Wars, a photograph of the dinner I just ate and never took a photo of – it’s all there, in any normal number.
If you think about that, at first, it might seem profound: there are numbers that contain absolutely everything that ever did exist, and that ever could exist. That seems like it should be amazing. If the numbers were at all special, it might be. But they aren’t. Normality isn’t a rare property that’s only possessed by a special meaningful number like π. It’s a property that’s possessed by almost every number. If you could create a randomly selected list of a billion real numbers (you can’t in reality, but you can mathematically, using a finite version of the axiom of choice), odds are, all of them would be normal – all of them would have this property.
But here’s where it gets really strange. Those numbers, those normal numbers that contain everything? Most of them can’t even be described.
It gets stranger than that. We know that we can’t really write down an irrational number. We can write down successively more and more precise approximations, but any finite representation won’t work. So we can’t actually write down the overwhelming majority of real numbers. But in fact, not only can’t we write them down, we can’t describe the vast majority of numbers.
Numbers are something that we deal with every day, and we think we understand them. Over the ages, people have invented a variety of notations that allow us to write those numbers down: the familiar arabic notation, roman numerals, fractions, decimals, continued fractions, algebraic series, etc. I could easily spend months on this blog just writing about different notations that we use to write numbers, and the benefits and weaknesses of each notation.
But the fact is, when it comes to the vast, overwhelming majority of numbers, all of those notations are utterly useless! That statement seems bizarre at best. As strange as it it seems, though it’s true. In order to understand it, though, we have to start at the very beginning: What does it mean for a number to be describable?
The basics are really easy to explain: A describable number is a number for which there is some finite representation. An indescribable number is a number for which there is no finite notation.
- Integers are all describable.
- Finite length decimals are all describable.
- Fractions are all describable.
- Infinite repeating decimals are describable. First, they’re all fractions,
so they can be written as fractions. But even without using fractions, you can
use a notation to express the repetition.
- Many irrational numbers – like π and e are describable: we can’t
write down the sequence of digits, because they go on forever without repeating;
we can’t write them as a fraction; but we can write an algorithm
that will generate the digits.
All of those are describable. What isn’t? I can’t describe them to you – because, by definition, that’s impossible. They’re numbers whose digits aren’t predictable – not by any possible definition of predictable: there’s no algorithm, no process, no pattern – there is simple no way, in a finite amount of space, to describe how to write them down.
To show that most numbers are undescribable, I come at it from a computational perspective. When you say that a number is describable if there’s a finite process for describing how to write it down, what you’re really saying is that there’s a program that will generate it: if you can write a finite program that will generate a representation of the number, it’s describable. It doesn’t matter whether that program ever finishes or not – so if it takes it an infinite amount of time to compute the number, that’s fine – so long as the program is finite. According to this definition, π is describable: it’s notation in decimal form is infinite, but the program to generate that representation is finite.
An indescribable number is, therefore, a number for which there is no possible program which will ever generate the number in any form.
How can we show that the vast majority of numbers are not describable? It’s easy. In a branch of theoretical computer science called recursive function theory, we talk about program numberings. Basically, for any computing device, we can assign numbers to its programs. Every program is just a number.
This isn’t just an abstraction – it’s realy. Take a program – any program, written in any language. It’s a series of characters that can be written into a file. Each character is represented by a group of 8 bits. Take that set of numbers, and line their bits up, one after another. The result is a huge positive binary number. Taken this way, every sequence of characters is represented by exactly one unique natural number – and every number represents exactly one sequence of characters. Thus, every program that can be represented in a file – which is every possible computer program, valid or invalid – is represented by exactly one natural number.
Using this system, the set of all possible computer programs is a subset of the set of natural numbers. That means that the set of all possible computer programs is, at most, countably infinite.
The set (or more properly class) of all real numbers is not countably infinite. It’s much, much larger than that. That means that most real numbers cannot be described by any computable algorithm. They’re indescribable. There’s no process you can use to uniquely specify them: if there was, there’d be a computer program describing that process – but there can’t be any program for most numbers.
If we could write infinitely long computer programs, then the undescribable numbers would be describable – but their descriptions would, obviously, be infinitely long. Most numbers cannot be described in a finite amount of space. We can’t compute with them, we can’t describe them, we can’t identify them. We know that they’re there; we can prove that they’re there. All sorts of things that we count on as properties of real numbers wouldn’t work if the indescribable numbers weren’t there. But they’re totally inaccessible.
ADDENDUM: I screwed up.
I’m a computer science guy, and I love the theory of computability. I let that take over, without thinking enough about it, and managed to make a huge mistake here!
What I described above as describable numbers isn’t right. It’s the set of computable numbers. The set of computable numbers is a strict subset of the set of describable numbers. As several commenters pointed out, there are many numbers that are describable but not computable.
The describable numbers are all numbers for which there is any possible finite description that uniquely identifies the number. The countability argument still works: you can still enumerate all possible finite-length strings that could be descriptions, and define a one-to-one correspondence between strings that could be descriptions and natural numbers. The set of describable numbers is, thus, still countable, and the set of undescribables is not, which implies that the set of undescribables is far, far larger that the describables.
But my original explanation was stupidly wrong. I apologize for the error. I’m not editing the text of the original post, except to put a pointer to this addendum: in a case like this, I think editing it would be dishonest, a way of covering up my serious error.