Category Archives: Numbers

Does well-ordering contradict Cantor?

The other day, I received an email that actually excited me! It’s a question related to Cantor’s diagonalization, but there’s absolutely nothing cranky about it! It’s something interesting and subtle. So without further ado:

Cantor’s diagonalization says that you can’t put the reals into 1 to 1 correspondence with the integers. The well-ordering theorem seems to suggest that you can pick a least number from every set including the reals, so why can’t you just keep picking least elements to put them into 1 to 1 correspondence with the reals. I understand why Cantor says you can’t. I just don’t see what is wrong with the other arguments (other than it must be wrong somehow). Apologies for not being able to state the argument in formal maths, I’m around 20 years out of practice for formal maths.

As we’ve seen in too many discussions of Cantor’s diagonalization, it’s a proof that shows that it is impossible to create a one-to-one correspondence between the natural numbers and the real numbers.

The Well-ordering says something that seems innoccuous at first, but which, looked at in depth, really does appear to contradict Cantor’s diagonalization.

A set S is well-ordered if there exists a total ordering <= on the set, with the additional property that for any subset T \subseteq S, T has a smallest element.

The well-ordering theorem says that every non-empty set can be well-ordered. Since the set of real numbers is a set, that means that there exists a well-ordering relation over the real numbers.

The problem with that is that it appears that that tells you a way of producing an enumeration of the reals! It says that the set of all real numbers has a least element: Bingo, there’s the first element of the enumeration! Now you take the set of real numbers excluding that one, and it has a least element under the well-ordering relation: there’s the second element. And so on. Under the well-ordering theorem, then, every set has a least element; and every element has a unique successor! Isn’t that defining an enumeration of the reals?

The solution to this isn’t particularly satisfying on an intuitive level.

The well-ordering theorem is, mathematically, equivalent to the axiom of choice. And like the axiom of choice, it produces some very ugly results. It can be used to create “existence” proofs of things that, in a practical sense, don’t exist in a usable form. It proves that something exists, but it doesn’t prove that you can ever produce it or even identify it if it’s handed to you.

So there is an enumeration of the real numbers under the well ordering theorem. Only the less-than relation used to define the well-ordering is not the standard real-number less than operation. (It obviously can’t be, because under well-ordering, every set has a least element, and standard real-number less-than doesn’t have a least element.) In fact, for any ordering relation \le_x that you can define, describe, or compute, \le_x is not the well-ordering relation for the reals.

Under the well-ordering theorem, the real numbers have a well-ordering relation, only you can’t ever know what it is. You can’t define any element of it; even if someone handed it to you, you couldn’t tell that you had it.

It’s very much like the Banach-Tarski paradox: we can say that there’s a way of doing it, only we can’t actually do it in practice. In the B-T paradox, we can say that there is a way of cutting a sphere into these strange pieces – but we can’t describe anything about the cut, other than saying that it exists. The well-ordering of the reals is the same kind of construct.

How does this get around Cantor? It weasels its way out of Cantor by the fact that while the well-ordering exists, it doesn’t exist in a form that can be used to produce an enumeration. You can’t get any kind of handle on the well-ordering relation. You can’t produce an enumeration from something that you can’t create or identify – just like you can’t ever produce any of the pieces of the Banach-Tarski cut of a sphere. It exists, but you can’t use it to actually produce an enumeration. So the set of real numbers remains non-enumerable even though it’s well-ordered.

If that feels like a cheat, well… That’s why a lot of people don’t like the axiom of choice. It produces cheatish existence proofs. Connecting back to something I’ve been trying to write about, that’s a big part of the reason why intuitionistic type theory exists: it’s a way of constructing math without stuff like this. In an intuitionistic type theory (like the Martin-Lof theory that I’ve been writing about), it doesn’t exist if you can’t construct it.

Significant Figures and the Age of the Universe

(Note: This post originally contained a remarkably stupid error in an example. For some idiotic reason, I calculated as if a liter was a cubic meter. Which, duh, it isn’t. so I was off by a factor of 1000. Pathetic, I know. Thanks to the multiple readers who pointed it out!)

The other day, I got a question via email that involves significant figures. Sigfigs are really important in things that apply math to real-world measurements. But they’re poorly understood at best by most people. I’ve written about them before, but not in a while, and this question does have a somewhat different spin on it.

Here’s the email that I got:

Do you have strong credentials in math and/or science? I am looking for someone to give an expert opinion on what seems like a simple question that requires only a short answer.

Could the matter of significant figures be relevant to an estimate changing from 20 to less than 15? What if it were 20 billion and 13.7 billion?

If the context matters, in the 80s the age of the universe was given as probably 20 billion years, maybe more. After a number of changes it is now considered to be 13.7 billion years. I believe the change was due to distinct new discoveries, but I’ve been told it was simply a matter of increasing accuracy and I need to learn about significant figures. From what I know (or think I know?) of significant figures, they don’t really come into play in this case.

The subject of significant digits is near and dear to my heart. My father was a physicist who worked as an electrical engineer producing power circuitry for military and satellite applications. I’ve talked about him before: most of the math and science that I learned before college, I learned from him. One of his pet peeves was people screwing around with numbers in ways that made no sense. One of the most common ones of that involves significant digits. He used to get really angry at people who did things with calculators, and just read off all of the digits.

He used to get really upset when people did things like, say, measure a plate with a 6 inch diameter, and say that it had an are] of 28.27433375 square inches. That’s ridiculous! If you measured a plate’s diameter to within 1/16th of an inch, you can’t use that measurement to compute its area down to less than one billionth of a square inch!

Before we really look at how to answer the question that set this off, let’s start with a quick review of what significant figures are and why they matter.

When we’re doing science, a lot of what we’re doing involves working with measurements. Whether it’s cosmologists trying to measure the age of the universe, chemists trying to measure the energy produced by a reaction, or engineers trying to measure the strength of a metal rod, science involves measurements.

Measurements are limited by the accuracy of the way we take the measurement. In the real world, there’s no such thing as a perfect measurement: all measurements are approximations. Whatever method we chose for taking a measurement of something, the measurement is accurate only to within some margin.

If I measure a plate with a ruler, I’m limited by factors like how well I can align the ruler with the edge of the plate, by what units are marked on the ruler, and by how precisely the units are marked on the ruler.

Once I’ve taken a measurement and I want to use it for a calculation, the accuracy of anything I calculate is limited by the accuracy of the measurements: the accuracy of our measurements necessarily limits the accuracy of anything we can compute from those measurements.

For a trivial example: if I want to know the total mass of the water in a tank, I can start by saying that the mass of a liter of water is one kilogram. To figure out the mass of the total volume of water in the tank, I need to know its volume. Assuming that the tank edges are all perfect right angles, and that it’s uniform depth, I can measure the depth of the water, and the length and breadth of the tank, and use those to compute the volume.

Let’s say that the tank is 512 centimeters long, and 203 centimeters wide. I measure the depth – but that’s difficult, because the water moves. I come up with it being roughly 1 meter deep – so 100 centimeters.

The volume of the tank can be computed from those figures: 5.12 times 2.03 times 1.00, or 10,393.6 liters.

Can I really conclude that the volume of the tank is 10,393.6 liters? No. Because my measurement of the depth wasn’t accurate enough. It could easily have been anything from, say, 95 centimeters to 105 centimeters, so the actual volume could range between around 9900 liters and 11000 liters. From the accuracy of my measurements, claiming that I know the volume down to a milliliter is ridiculous, when my measurement of the depth was only accurate within a range of +/- 5 centimeters!

Ideally, I might want to know a strong estimate on the bounds of the accuracy of a computation based on measurements. I can compute that if I know the measurement error bounds on each error measurement, and I can track them through the computation and come up with a good estimate of the bounds – that’s basically what I did up above, to conclude that the volume of the tank was between 9,900 and 11,000 liters. The problem with that is that we often don’t really know the precise error bounds – so even our estimate of error is an imprecise figure! And even if we did know precise error bounds, the computation becomes much more difficult when you want to track error bounds through it. (And that’s not even considering the fact that our error bounds are only another measured estimate with its own error bounds!)

Significant figures are a simple statistical tool that we can use to determine a reasonable way of estimating how much accuracy we have in our measurements, and how much accuracy we can have at the end of a computation. It’s not perfect, but most of the time, it’s good enough, and it’s really easy.

The basic concept of significant figures is simple. You count how many digits of accuracy each measurement has. The result of the computation over the measurements is accurate to the smallest number of digits of any of the measurements used in the computation.

In the water tank example, we had three significant figures of accuracy on the length and width of the tank. But we only had one significant figure on the accuracy of the depth. So we can only have one significant figure in the accuracy of the volume. So we conclude that we can say it was around 10 liters, and we can’t really say anything more precise than that. The exact value likely falls somewhere within a bell curve centered around 10 liters.

Returning to the original question: can significant figures change an estimate of the age of the universe from 20 to 13.7?

Intuitively, it might seem like it shouldn’t: sigfigs are really an extension of the idea of rounding, and 13.7 rounded to one sigfig should round down to 10, not up to 20.

I can’t say anything about the specifics of the computations that produced the estimates of 20 and 13.7 billion years. I don’t know the specific measurements or computations that were involved in that estimate.

What I can do is just work through a simple exercise in computations with significant figures to see whether it’s possible that changing the number of significant digits in a measurement could produce a change from 20 to 13.7.

So, we’re looking at two different computations that are estimating the same quantity. The first, 20, has just one significant figure. The second, 13.7 has three significant digits. What that means is that for the original computation, one of the quantities was known only to one significant figure. We can’t say whether all of the elements of the computation were limited to one sigfig, but we know at least one of them was.

So if the change from 20 to 13.7 was caused by significant digits, it means that by increasing the precision of just one element of the computation, we could produce a large change in the computed value. Let’s make it simpler, and see if we can see what’s going on by just adding one significant digit to one measurement.

Again, to keep things simple, let’s imagine that we’re doing a really simple calculation. We’ll use just two measurements x and y, and the value that we want to compute is just their product, x \times y.

Initially, we’ll say that we measured the value of x to be 8.2 – that’s a measurement with two significant figures. We measure y to be 2 – just one significant figure. The product x\times y = 8.2 \times 2 = 16.4. Then we need to reduce that product to just one significant figure, which gives us 20.

After a few years pass, and our ability to measure y gets much better: now we can measure it to two significant figures, with a new value of 1.7. Our new measurement is completely compatible with the old one – 1.7 reduced to 1 significant figure is 2.

Now we’ve got equal precision on both of the measurements – they’re now both 2 significant figures. So we can compute a new, better estimate by multiplying them together, and reducing the solution to 2 significant figures.

We multiply 8.2 by 1.7, giving us around 13.94. Reduced to 2 significant figures, that’s 14.

Adding one significant digit to just one of our measurements changed our estimate of the figure from 20 to 14.

Returning to the intuition: It seems like 14 vs 20 is a very big difference: it’s a 30 percent change from 20 to 14! Our intuition is that it’s too big a difference to be explained just by a tiny one-digit change in the precision of our measurements!

There’s two phenomena going on here that make it look so strange.

The first is that significant figures are an absolute error measurement. If I’m measuring something in inches, the difference between 15 and 20 inches is the same size error as the difference between 90 and 95 inches. If a measurement error changed a value from 90 to 84, we wouldn’t give it a second thought; but because it reduced 20 to 14, that seems worse, even though the absolute magnitude of the difference considered in the units that we’re measuring is exactly the same.

The second (and far more important one) is that a measurement of just one significant digit is a very imprecise measurement, and so any estimate that you produce from it is a very imprecise estimate. It seems like a big difference, and it is – but that’s to be expected when you try to compute a value from a very rough measurement. Off by one digit in the least significant position is usually not a big deal. But if there’s only one significant digit, then you’ve got very little precision: it’s saying that you can barely measure it. So of course adding precision is going to have a significant impact: you’re adding a lot of extra information in your increase in precision!

Lychrel Numbers and the 196 Algorithm

The first donor to the Chesapeake Math program asked me to write about the 196 algorithm, a problem also known as the mystery of the Lychrel numbers. To be completely honest, that’s something that people have asked for in the past, but I’ve never written about it, because I didn’t see what I could add. But I made a promise, so… here goes!

Take any number with at least two-digits, N. Reverse the digits of N, giving you M, and then add N+M. That gives you a new number. If you keep repeating this process, most of the time, you’ll get a palindromic number really fast.

For example, let’s start with 16:

  1. 16 reversed is 61.
  2. 16+61=77. 77 is a palindromic number.
  3. So one reverse+add, and we have a palindromic number.

Or 317:

  1. 317 reversed is 713.
  2. 317+713=1030.
  3. 1030 reversed is 301.
  4. 1030 + 301 = 1331, so we have a palindromic number after two steps.

You can play the same game in different number bases. For example, in base 8, we can start with 013 (11 in base-10): in one reverse+add, it becomes 44 (36 in base-10).

For most numbers, you get a palindrome in just a few steps. But for some numbers, in some number bases, you don’t. If you can’t ever get to a palindrome by doing reverse+add starting from a number, then that number is called a Lychrel number.

The process of exploring Lychrel numbers has picked up a lot of devotees, who’ve developed a whole language for talking about it:

chain
A chain is the sequence of numbers produced by reverse+add starting with some number, and possibly converging on a palindromic number.
seed
The seed of a chain is the smallest number that can start that chain. For example, in the example above, we looked at the chain [217, 1030, 1331]. 217 is the seed – even though you could start a chain at 1030, 1030 would never be considered a seed, because it can be the product of a reverse+add on a smaller number.
kin
Two numbers are kin numbers if they’re part of a chain that started at the same seed. If a number is a lychrel number, then (obviously) so are all of its kin.

The question that haunts Lychrel enthusiasts is, will you always eventually get a palindrome? That is, do Lychrel numbers actually exist?

In base-2, we know the answer to that: not all numbers will produce a palindrome; there are base-2 Lychrel numbers. The smallest base-22 Lychrel number is 22 – or 10110 in binary. We can look at its reverse add sequence, and see intuitively why it will never produce a palindrome:

  1. 10110
  2. 100011
  3. 1010100
  4. 1101001
  5. 10110100 (10, 11, 01, 00)
  6. 11100001
  7. 101101000
  8. 110010101
  9. 1011101000 (10, 111, 01, 000)
  10. 110010101
  11. 1101000101
  12. 10111010000
  13. 0b11000101101
  14. 0b101111010000 (10, 1111, 01, 0000)

Starting at step 5, we start seeing a pattern in the sequence, where we have recurring values where that have a pattern of 10, followed by m-1s, followed by 01, followed by m 0s. We’ve got a sequence that’s building larger and larger numbers, in a way that will never converge into a palindrome.

We can find similar sequences in any power-of-two base – base 4, 8, 16, etc. So in power-of-two bases, there are Lychrel numbers. But: are there Lychrel numbers in our familiar base-10?

We think so, but we’re not sure. No one has been able to prove it either way. But we’ve got some numbers, which we call Lychrel candidates, that we think are probably Lychcrel numbers. The smallest one is 196 – which is why this whole discussion is sometimes called the 196 problem, or the 196 algorithm.

People have written programs that follow the Lychrel thread from 196, trying to see if it reaches a palindrome. So far, the record for exploring the 196 Lychrel thread carries it through more than a billion iterations, producing a non-palindromic number with more than 6 million digits.

That’s pretty impressive, given that the longest Lychrel thread for any number smaller than 196 is the thread of 24 steps, starting with 89 (which produces the palindromic number 8,813,200,023,188).

From my perspective, one thing that interests me about this is its nature as a computational problem. As a problem, it’s really easy to implement. For example, here’s a complete implementation in Ratchet, a Scheme-based programming language. (I used ratchet because it features infinite-precision integers, which makes it easier to write.)

(define (reverse-number n)
  (string->number (list->string (reverse (string->list (number->string n))))))

(define (palindromic? n)
  (equal? n (reverse-number n)))

(define (reverse-add n)
  (+ n (reverse-number n)))

(define (find-palindrome seed)
  (define (noisy-find-palindrome n count)
    (if (palindromic? n)
       n
      (begin
        (printf "At iteration ~v, candidate=~v~n" count n)
                (noisy-find-palindrome (reverse-add n) (+ count 1)))))
  (noisy-find-palindrome seed 0))

I literally threw that together in less than five minutes. In that sense, this is a really, really easy problem to solve. But in another sense, it’s a very hard problem: there’s no way to really speed it up.

In modern computing, when we look at a problem that takes a lot of computation to solve, the way that we generally try to approach it is to throw more CPUs at it, and do it in parallel. For most problems that we come across, we can find some reasonable way to divide it into parts that can be run at the same time; then by having a bunch of computers work on those different parts, we can get a solution pretty quickly.

For example, back in the early days of this blog, I did some writing about the Mandelbrot set, and one variant of it that I find particularly interesting, called the Buddhabrot. The Buddhabrot is interesting because it’s a fractal visualization which isn’t naively zoomable. In a typical Mandelbrot set visualization, you can pick a small region of it that you want to see in more detail, and focus your computation on just that part, to get a zoomed in view on that. In the Buddhabrot, due to the chaotic nature of the computation, you can’t. So you just compute the Buddhabrot at a massive size, and then you compress it. When you want to see a region in more detail, you un-compress. To make that work, buddhabrot’s are frequently computed at resolutions like 1 million by 1 million pixels. That translates to enough complex floating point computations to compute several trillion values. That’s a big task. But in modern environments, that’s routine enough that a friend of mine at Google wrote a program, just for kicks, to compute a big buddhabrot image, and ran it on an experimental cluster.

If that kind of computational capability can be exploited just for kicks, why is it that the best effort at exploring the Lychrel thread for 196 only covers 6 million digits?

The answer is that there’s a way of computing the Buddhabrot in parallel. You can throw 10,000 CPUs at it for a couple of days, and get an amazing Buddhabrot image. But for the Lychrel thread, there’s no good way to do it in parallel.

For each additional number in the thread, you need to rearrange and add a couple of million digits. That’s a lot of work, but it’s not crazy. On any halfway decent computer, it won’t take long. To get a sense, I just whipped up a Python program that generated 1,000,000 random pairs of digits, and added them up. It took under 8 seconds – and that’s half-assed code written using a not-particularly fast interpreted language on a not-particularly-speedy laptop. A single iteration of the Lychrel thread computation even for a million-digit candidate doesn’t take that long – it’s on the order of seconds.

The catch is that the process of searching a Lychrel thread is intrinsically serial: you can’t have different CPUs computing the next thread element for different values: you don’t know the next value until you’ve finished the previous one. So even if it took just 1 second to do the reverse+add for million digit numbers, it would takes a long time to actually explore the space. If you want to explore the next million candidates, at 2 seconds per iteration, that will take you around 3 weeks!

Even if you don’t waste time by using a relatively slow interpreter like Python – even if you use carefully hand-optimized code using an efficient algorithm, it would take months at the very least to explore a space of billions of elements of the 196 thread! And to get beyond what anyone has done, you’ll probably need to end up doing years of computation, even with a very fast modern computer – because you can’t parallelize it.

If you’re interested in this kind of thing, I recommend looking at Wade Van Landingham’s p196 site. Wade is the guy who named them Lychrel numbers (based on a rough reversal of his girlfriend (now wife)’s name, Cheryl). He’s got all sorts of data, including tracking some of the longest running efforts to continue the 196 thread.

A couple of edits were made to this, based on error pointed out by commenters.

The ABC conjecture – aka the soap opera of the math world.

Sorry for the silence of this blog for the last few months. This spring, my mother died, and I was very depressed about it. Depression is a difficult thing, and it left me without the energy or drive to do the difficult work of writing this kind of material. I’m trying to get back into the cycle of writing. I’m trying to make some progress in writing about type theory, but I’m starting with a couple of easier posts.

In the time when I was silent, I had a couple of people write to me to ask me to explain something called the ABC conjecture.

The ABC conjecture is a mathematical question about number theory that was proposed in the 1980s – so it’s relatively new as number theory problems go. It’s gotten a lot of attention recently, due to an almost soap-operatic series of events.

It’s a very hard problem, and no one had made any significant progress on it until about five years ago, when a well respected Japanese mathematician named Shinichi Mochizucki published a series of papers containing a proof of the conjecture.

Normally, when a proof of a hard problem gets published, mathematicians go nuts! Everyone starts poring over it, trying to figure it out, and see if it’s valid. That’s what happened the previous time someone thought they’d prooved it. But this time, no one has been able to make sense out of the proof!

The problem is that in order to build his proof, professor Mochizucki created a whole new mathematical theory, called inter-universal Teichmüller theory. The entire ABC conjecture proof is built in this new theory, and no one other than professor Mochizucki himself understands Teichmüller theory. Before anyone else can actually follow the proof, they need to understand the theory. Professor Mochizucki is a bit of a recluse – he has declined to travel anywhere to teach his new mathematical system. So in the five years since he first published it, no one has been able to understand it well enough to determine whether or not the proof is correct. One error in it was found, but corrected, and the whole proof remains in question.

Exactly why the proof remains unchecked after five years is a point of contention. Lots of mathematicians are angry at Professor Mochizucki for not being willing to explain or teach his theory. A common statement among critics is that if you create a new mathematical theory, you need to be willing to actually explain it to people: work with a group of mathematicians to teach it to them, so that they’ll be able to use it to verify the proof. But Professor Mochizuchki’s response has been that he has explained it: he’s published a series of papers describing the theory. He doesn’t want to travel and take time away from his work for people who haven’t been willing to take the time to read what’s he’s written. He’s angry that after five years, no one has bothered to actually figure out his proof.

I’m obviously not going to attempt to weigh in on whether or not Professor Mochizuki’s proof is correct or not. That’s so far beyond the ability of my puny little brain that I’d need to be a hundred times smarter before it would even be laughable! Nor am I going to take sides about whether or not the Professor should be travelling to teach other mathematicians his theory. But what I can do is explain a little bit about what the ABC conjecture is, and why people care so much about it.

It’s a conjecture in number theory. Number theorists tend to be obsessed with prime numbers, because the structure of the prime numbers is a huge and fundamental part of the structure and behavior of numbers as a whole. The ABC conjecture tries to describe one property of the structure of the set of prime numbers within the system of the natural numbers. Mathematicians would love to have a proof for it, because of what it would tell them about the prime numbers.

Before I can explain the problem, there’s a bit of background that we need to go through.

  1. Any non-prime number N is the product of some set of prime numbers. Those numbers are called the prime factors of N. For example, 8 is 2×2×2 – so the set of prime factors of 8 is {2}. 28 is 2×2×7, so the prime factors of 28 are {2, 7}. 360 = 8 × 45 = 2×2×2×(9×5) = 2×2×2×3×3×5, so the prime factors of 360 are {2, 3, 5}.
  2. For any number N, the radical of N is product of its set of prime factors. So the radical of 8 (written rad(8)) is 2; rad(14)=14; rad(28)=14; rad(36)=6, rad(360)=30, etc.
  3. Given two positive integers N and M, N and M are coprime if they have no common prime factors. A tiny bit more formally, if pf(N) is the set of prime factors of N, and M and N are coprime if and only if pf(N) ∩ pu(M) = ∅. (Also, if M and N are coprime, then rad(M×N) = ram(M)×rad(N).)

The simplest way of saying the ABC conjecture is that for the vast majority of integers A, B, and C, where A + B = C and A and B are coprime, C must be smaller than rad(A*B).

Of course, that’s hopelessly imprecise for mathematicians! What does “the vast majority” mean?

The usual method at times like these is to find some way of characterizing the size of the relative sizes of the set where the statement is true and where the statement is false. For most mathematicians, the sizes of sets that are interesting are basically 0, 1, finite, countably infinite, and uncountably infinite. For the statement of the ABC conjecture, they claim that the set of values for which the statement is true is infinite, but that the set of values for which it is false are finite. Specifically, they want to be able to show that the set of numbers for which rad(A*B)>C is finite.

To do that, they pull out a standard trick. Sadly, I don’t recall the proper formal term, but I’ll call it epsilon bounding. The idea is that you’ve got a statement S about a number (or region of numbers) N. You can’t prove your statement about N specifically – so you prove it about regions around N.

As usual, it’s clearest with an example. We want to say that C > rad(A*B) for most values of A and B. The way we can show that is by saying that for any value ε, the set of values (A, B, C) where A and B are coprime, and A + B = C, and rad(A*B) > C + ε is finite.

What this formulation does is give us a formal idea of how rare this is. It’s possible that there are some values for A and B where rad(A*B) is bigger that 1,000,000,000,000,000,000 + C. But the number of places where that’s true is finite. Since the full system of numbers is infinite, that means that in the overwhelming majority of cases, rad(A*B) < C. The size of the set of numbers where that's not true is so small that it might at well be 0 in comparison to the size of the set of numbers where it is true. Ultimately, it seems almost trivial once you understand what the conjecture is. It's nothing more that the hypothesis that that if A + B = C, then most of the time, pf(A)*pf(B) < C. Once you've got that down, the question is, what's the big deal? Professor Mochuzuki developed five hundred pages of theory for this? People have spent more than five years trying to work through his proof just to see if it’s correct for a statement like this? Why does anybody care so much?

One answer is: mathematicians are crazy people!

The better answer is that simple statements like this end up telling us very profound things about the deep structure of numbers. The statements reduce to something remarkably simple, but the meaning underneath it is far more complex than it appears.

Just to give you one example of what this means: If the conjecture is true, then there’s a three-line proof of Fermat’s last theorem. (The current proof of Fermat’s last theorem, by Andrew Wiles, is over 150 pages of dense mathematics.) There’s quite a number of things that number theoreticians care about that would fall out of a successful proof.

You can’t even describe most numbers!

(This is a revised version of an old post; you can see the original here.)

Please read the addendum at the bottom – I got carried away with computability, and ended up screwing up quite badly.

In the comments on my last post, a bit of discussion came up about some of the strange properties of real numbers, and that made me want to go back and get this old post about numbers that can’t even be described, and update it.

In those comments, we were talking about whether or not π contains all sorts of interesting subsequences. The fact is, when it comes to π, we just don’t know.

But… We do know that there are many numbers that do. There’s a property called normality. Informally, normality just says that taken to infinity, all sequences of digits are equally likely to occur. In an infinitely long normal number, that means that any finite length subsequence will occur, given enough time. My name, your name, the complete video of the original version of Star Wars, a photograph of the dinner I just ate and never took a photo of – it’s all there, in any normal number.

If you think about that, at first, it might seem profound: there are numbers that contain absolutely everything that ever did exist, and that ever could exist. That seems like it should be amazing. If the numbers were at all special, it might be. But they aren’t. Normality isn’t a rare property that’s only possessed by a special meaningful number like π. It’s a property that’s possessed by almost every number. If you could create a randomly selected list of a billion real numbers (you can’t in reality, but you can mathematically, using a finite version of the axiom of choice), odds are, all of them would be normal – all of them would have this property.

But here’s where it gets really strange. Those numbers, those normal numbers that contain everything? Most of them can’t even be described.

It gets stranger than that. We know that we can’t really write down an irrational number. We can write down successively more and more precise approximations, but any finite representation won’t work. So we can’t actually write down the overwhelming majority of real numbers. But in fact, not only can’t we write them down, we can’t describe the vast majority of numbers.

Numbers are something that we deal with every day, and we think we understand them. Over the ages, people have invented a variety of notations that allow us to write those numbers down: the familiar arabic notation, roman numerals, fractions, decimals, continued fractions, algebraic series, etc. I could easily spend months on this blog just writing about different notations that we use to write numbers, and the benefits and weaknesses of each notation.

But the fact is, when it comes to the vast, overwhelming majority of numbers, all of those notations are utterly useless! That statement seems bizarre at best. As strange as it it seems, though it’s true. In order to understand it, though, we have to start at the very beginning: What does it mean for a number to be describable?

Continue reading You can’t even describe most numbers!

Infinite and Non-Repeating Does Not Mean Unstructured

So, I got in to work this morning, and saw a tweet with the following image:

pi

Pi is an infinite, non-repeating decimal – meaning that every possible number combination exists somewhere in pi. Converted into ASCII text, somewhere in that string of digits is the name of every person you will ever love, the date, time, and manner of your death, and the answers to all the great questions of the universe. Converted into a bitmap, somewhere in that infinite string of digits is a pixel-perfect representation of the first thing you saw on this earth, the last thing you will see before your life leaves you, and all the moments, momentous and mundane, that will occur between those points.

All information that has ever existed or will ever exist, the DNA of every being in the universe.

EVERYTHING: all contaned in the ratio of a circumference and a diameter.

Things like this, that abuse misunderstandings of math in the service of pseudo-mystical rubbish, annoy the crap out of me.

Before I go into detail, let me start with one simple fact: No one knows whether or not π contains every possible finite-length sequence of digits. There is no proof that it does, and there is no proof that it does not. We don’t know. At the moment, no one does. If someone tells you for certain that it does, they’re bullshitting. If someone tell you for certain that it doesn’t,
they’re also bullshitting.

But that’s not really important. What bothers me about this is that it abuses a common misunderstanding of infinity. π is an irrational number. So is e. So are the square roots of most integers. In fact, so are most integral roots of most integers – cube roots, fourth roots, fifth roots, etc. All of these numbers are irrational.

What it means to be irrational is simple, and it can be stated in two different ways:

  1. An irrational number is a number that cannot be written as a ratio (fraction) of two finite integers.
  2. An irrational number is a number whose precise representation in decimal notation is an infinitely long non-repeating sequence of digits.

There are many infinitely long sequences of digits. Some will eventually include every finite sequence of digits; some will not.

For a simple example of a sequence that will, eventually, contain every possible sequence of digits: 0.010203040506070809010011012013014015016…. That is, take the sequence of natural numbers, and write them down after the decimal point with a 0 between them. This will, eventually, contain every possible natural number as a substring – and every finite string of digits is the representation of a natural number.

For a simple example of a sequence that will not contain every possible sequence of digits, consider 0.01011011101111011111… That is, the sequence of natural numbers written in unary form, separated by 0s. This will never include the number combination “2”. It will never contain the number sequence “4”. It will never even contain the digit sequence for four written in binary, because it will never contain a “1” followed by two “0”s. But it never repeats itself. It goes on and on forever, but it never starts repeating – it keeps adding new combinations that never existed before, in the form of longer and longer sequences of “1”s.

Infinite and non-repeating doesn’t mean without pattern, nor does it mean without structure. All that it means is non-repeating. Both of the infinite sequences I described above are infinitely long and non-repeating, but both are also highly structured and predictable. One of those has the property that the original quote talked about; one doesn’t.

That’s the worst thing about the original quotation: it’s taking a common misunderstanding of infinity, and turning it into an implication that’s incorrect. The fact that something is infinitely long and non-repeating isn’t special: most numbers are infinitely long and non-repeating. It doesn’t imply that the number contains all information, because that’s not true. </p.

Hell, it isn’t even close to true. Here’s a simple piece of information that isn’t contained anywhere in π: the decimal representation of e. e is, like π, represented in decimal form as an infinitely long sequence of non-repeating digits. e and π are, in fact, deeply related, via Euler’s equation: e^{i\pi} + 1 = 0. But the digits of e never occur in π, because they can’t: in decimal form, they’re both different infinitely long sequences of digits, so one cannot be contained in the other.

Numbers like π and e are important, and absolutely fascinating. If you take the time to actually study them and understand them, they’re amazing. I’ve writted about both of them: π here and e here. People have spent their entire lives studying them and their properties, and they’re both interesting and important enough to deserve that degree of attention. We don’t need to make up unproven nonsense to make them interesting. We especially don’t need to make up nonsense that teaches people incorrect “fact” about how infinity works.

Oy Veh! Power Series, Analytic Continuations, and Riemann Zeta

After the whole Plait fiasco with the sum of the infinite series of natural numbers, I decided it would interesting to dig into the real math behind that mess. That means digging in to the Riemann function, and the concept of analytic continuation.

A couple of caveats before I start:

  1. this is the area of math where I’m at my worst. I am not good at analysis. I’m struggling to understand this stuff well enough to explain it. If I screw up, please let me know in the comments, and I’ll do my best to update the main post promptly.
  2. This is way more complicated than most of the stuff I write on this blog. Please be patient, and try not to get bogged down. I’m doing my best to take something that requires a whole lot of specialized knowledge, and explain it as simply as I can.

What I’m trying to do here is to get rid of some of the mystery surrounding this kind of thing. When people think about math, they frequently get scared. They say things like “Math is hard, I can’t hope to understand it.”, or “Math produces weird results that make no sense, and there’s no point in my trying to figure out what it means, because if I do, my brain will explode. Only a super-genius geek can hope to understand it!”

That’s all rubbish.

Math is complicated, because it covers a whole lot of subjects. To understand the details of a particular branch of math takes a lot of work, because it takes a lot of special domain knowledge. But it’s not fundamentally different from many other things.

I’m a professional software engineer. I did my PhD in computer science, specializing in programming languages and compiler design. Designing and building a compiler is hard. To be able to do it well and understand everything that it does takes years of study and work. But anyone should be able to understand the basic concepts of what it does, and what the problems are.

I’ve got friends who are obsessed with baseball. They talk about ERAs, DIERAs, DRSs, EQAs, PECOTAs, Pythagorean expectations, secondary averages, UZRs… To me, it’s a huge pile of gobbledygook. It’s complicated, and to understand what any of it means takes some kind of specialized knowledge. For example, I looked up one of the terms I saw in an article by a baseball fan: “Peripheral ERA is the expected earned run average taking into account park-adjusted hits, walks, strikeouts, and home runs allowed. Unlike Voros McCracken’s DIPS, hits allowed are included.” I have no idea what that means. But it seems like everyone who loves baseball – including people who think that they can’t do their own income tax return because they don’t understand how to compute percentages – understand that stuff. They care about it, and since it means something in a field that they care about, they learn it. It’s not beyond their ability to understand – it just takes some background to be able to make sense of it. Without that background, someone like me feels lost and clueless.

That’s the way that math is. When you go to look at a result from complex analysis without knowing what complex analysis is, it looks like terrifyingly complicated nonsensical garbage, like “A meromorphic function is a function on an open subset of the complex number plain which is holomorphic on its domain except at a set of isolated points where it must have a Laurent series”.

And it’s definitely not easy. But understanding, in a very rough sense, what’s going on and what it means is not impossible, even if you’re not a mathematician.


Anyway, what the heck is the Riemann zeta function?

It’s not easy to give even the simplest answer of that in a meaningful way.

Basically, Riemann Zeta is a function which describes fundamental properties of the prime numbers, and therefore of our entire number system. You can use the Riemann Zeta to prove that there’s no largest prime number; you can use it to talk about the expected frequency of prime numbers. It occurs in various forms all over the place, because it’s fundamentally tied to the structure of the realm of numbers.

The starting point for defining it is a power series defined over the complex numbers (note that the parameter we use is s instead of a more conventional x: this is a way of highlighting the fact that this is a function over the complex numbers, not over the reals).

\zeta(s) = \sum_{n=1}^{\infty} n^{-s}

This function \zeta is not the Riemann function!

The Riemann function is something called the analytic continuation of \zeta. We’ll get to that in a moment. Before doing that; why the heck should we care? I said it talks about the structure of numbers and primes, but how?

The zeta function actually has a lot of meaning. It tells us something fundamental about properties of the system of real numbers – in particular, about the properties of prime numbers. Euler proved that Zeta is deeply connected to the prime numbers, using something called Euler’s identity. Euler’s identity says that for all integer values:

\sum_{n=1}^{\infty} n^{-s} = \prod_{p \in \textbf{Primes}} frac{1}{1-p^{-s}}

Which is a way of saying that the Riemann function can describe the probability distribution of the prime numbers.


To really understand the Riemann Zeta, you need to know how to do analytic continuation. And to understand that, you need to learn a lot of number theory and a lot of math from the specialized field called complex analysis. But we can describe the basic concept without getting that far into the specialized stuff.

What is an analytical continuation? This is where things get really sticky. Basically, there are places where there’s one way of solving a problem which produces a diverging infinite series. When that happens you say there’s no solution, that thepoint where you’re trying to solve it isn’t in the domain of the problem. But if you solve it in a different way, you can find a way of getting a solution that works. You’re using an analytic process to extend the domain of the problem, and get a solution at a point where the traditional way of solving it wouldn’t work.


A nice way to explain what I mean by that requires taking a
diversion, and looking at a metaphor. What we’re talking about here isn’t analytical continuation; it’s a different way of extending the domain of a function, this time in the realm of the real numbers. But as an example, it illustrates the concept of finding a way to get the value of a function in a place where it doesn’t seem to be defined.

In math, we like to play with limits. One example of that is in differential calculus. What we do in differential
calculus is look at continuous curves, and ask: at one specific location on the curve, what’s the slope?

If you’ve got a line, the slope is easy to determine. Take any two points on the line: (x_1, y+1), (x_2, y_2), where x_1 < x_2. Then the slope is \frac{y_2 - y_1}{x_2 - x_1}. It’s easy, because for a line, the slope never changes.

If you’re looking at a curve more complex than line, then slopes get harder, because they’re constantly changing. If you’re looking at y=x^2, and you zoom in and look at it very close to x=0, it looks like the slope is very close to 0. If you look at it close to 1, it looks like it’s around 2. If you look at it at x=10, it looks a bit more than 20. But there are no two points where it’s exactly the same!

So how can you talk about the slope at a particular point x=k? By using a limit. You pick a point really close to x=k, and call it x=k+epsilon. Then an approximate value of the slope at k is:

\frac{(x+\epsilon)^2 - x^2}{x+\epsilon - x}

The smaller epsilon gets, the closer your approximation gets. But you can’t actually get to \epsilon=0, because if you did, that slope equation would have 0 in the denominator, and it wouldn’t be defined! But it is defined for all non-zero values of \epsilon. No matter how small, no matter how close to zero, the slope is defined. But at zero, it’s no good: it’s undefined.

So we take a limit. As \epsilon gets smaller and smaller, the slope gets closer and closer to some value. So we say that the slope at the point – at the exact place where the denominator of that fraction becomes zero – is defined as:

 \lim_{\epsilon \rightarrow 0}  \frac{(k+\epsilon)^2 - k^2}{k+\epsilon - k} =

 \lim_{\epsilon \rightarrow 0}  \frac{  k^2 + 2k\epsilon + \epsilon^2 - k^2}{\epsilon} =

(Note: the original version of the previous line had a missing “-“. Thanks to commenter Thinkeye for catching it.)

 \lim_{\epsilon \rightarrow 0}  \frac{ 2k\epsilon + \epsilon^2}{\epsilon} =

Since epsilon is getting closer and closer to zero, epsilon^2 is getting smaller much faster; so we can treat it as zero:

 \lim_{\epsilon \rightarrow 0}  \frac{ 2k\epsilon}{\epsilon} = 2k

So at any point x=k, the slope of y=x^2 is 2k. Even though computing that involves dividing by zero, we’ve used an analytical method to come up with a meaningful and useful value at \epsilon=0. This doesn’t mean that you can divide by zero. You cannot conclude that \frac{2*0}{0} = 2. But for this particular analytical setting, you can come up with a meaningful solution to a problem that involves, in some sense, dividing by zero.


The limit trick in differential calculus is not analytic continuation. But it’s got a tiny bit of the flavor.

Moving on: the idea of analytic continuation comes from the field of complex analysis. Complex analysis studies a particular class of functions in the complex number plane. It’s not one of the easier branches of mathematics, but it’s extremely useful. Complex analytic functions show up all over the place in physics and engineering.

In complex analysis, people focus on a particular group of functions that are called analytic, holomorphic, and meromorphic. (Those three are closely related, but not synonymous.).

A holomorphic function is a function over complex variables, which has one
important property. The property is almost like a kind of abstract smoothness. In the simplest case, suppose that we have a complex equation in a single variable, and the domain of this function is D. Then it’s holomorphic if, and only if, for every point d \in D, the function is complex differentiable in some neighborhood of points around d.

(Differentiable means, roughly, that using a trick like the one we did above, we can take the slope (the derivative) around d. In the complex number system, “differentiable” is a much stronger condition than it would be in the reals. In the complex realm, if something is differentiable, then it is infinitely differentiable. In other words, given a complex equation, if it’s differentiable, that means that I can create a curve describing its slope. That curve, in turn, will also be differentiable, meaning that you can derive an equation for its slope. And that curve will be differentiable. Over and over, forever: the derivative of a differentiable curve in the complex number plane will always be differentiable.)

If you have a differentiable curve in the complex number plane, it’s got one really interesting property: it’s representable as a power series. (This property is what it means for a function to be called analytic; all holomorphic functions are analytic.) That is, a function f is holomorphic for a set S if, for all points s in S, you can represent the value of the function as a power series for a disk of values around s:

 f(z) = \sum_{n=0}^{\infty} a_n(z-c)^n

In the simplest case, the constant c is 0, and it’s just:

 f(z) = \sum_{n=0}^{\infty} a_nz^n

(Note: In the original version of this post, I miswrote the basic pattern of a power series, and put both z and s in the base. Thanks to John Armstrong for catching it.)

The function that we wrote, above, for the base of the zeta function is exactly this kind of power series. Zeta is an analytic function for a particular set of values. Not all values in the complex number plane; just for a specific subset.

If a function f is holomorphic, then the strong differentiability of it leads to another property. There’s a unique extension to it that expands its domain. The expansion always produces the same value for all points that are within the domain of f. It also produces exactly the same differentiability properties. But it’s also defined on a larger domain than f was. It’s essentially what f would be if its domain weren’t so limited. If D is the domain of f, then for any given domain, D, where D subset D, there’s exactly one function with domain D that’s an analytic continuation of f.

Computing analytic continuations is not easy. This is heavy enough already, without getting into the details. But the important thing to understand is that if we’ve got a function f with an interesting set of properties, we’ve got a method that might be able to give us a new function g that:

  1. Everywhere that f(s) was defined, f(s) = g(s).
  2. Everywhere that f(s) was differentiable, g(s) is also differentiable.
  3. Everywhere that f(s) could be computed as a sum of an infinite power series, g(s) can also be computed as a sum of an infinite power series.
  4. g(s) is defined in places where f(s) and the power series for f(s) is not.

So, getting back to the Riemann Zeta function: we don’t have a proper closed form equation for zeta. What we have is the power series of the function that zeta is the analytic continuation of:

\zeta(s) = \sum_{n=1}^{\infty} n^{-s}

If s=-1, then the series for that function expands to:

\sum_{n=1}^{\infty} n^1 = 1 + 2 + 3 + 4 + 5 + ...

The power series is undefined at this point; the base function that we’re using, that zeta is the analytic continuation of, is undefined at s=-1. The power series is an approximation of the zeta function, which works over some specific range of values. But it’s a flawed approximation. It’s wrong about what happens at s=-1. The approximation says that value at s=-1 should be a non-converging infinite sum. It’s wrong about that. The Riemann zeta function is defined at that point, even though the power series is not. If we use a different method for computing the value of the zeta function at s=-1 – a method that doesn’t produce an incorrect result! – the zeta function has the value -\frac{1}{12} at s=-1.

Note that this is a very different statement from saying that the sum of that power series is -\frac{1}{12} at s=-1. We’re talking about fundamentally different functions! The Riemann zeta function at s=-1 does not expand to the power series that we used to approximate it.

In physics, if you’re working with some kind of system that’s described by a power series, you can come across the power series that produces the sequence that looks like the sum of the natural numbers. If you do, and if you’re working in the complex number plane, and you’re working in a domain where that power series occurs, what you’re actually using isn’t really the power series – you’re playing with the analytic zeta function, and that power series is a flawed approximation. It works most of the time, but if you use it in the wrong place, where that approximation doesn’t work, you’ll see the sum of the natural numbers. In that case, you get rid of that sum, and replace it with the correct value of the actual analytic function, not with the incorrect value of applying the power series where it won’t work.

Ok, so that warning at the top of the post? Entirely justified. I screwed up a fair bit at the end. The series that defines the value of the zeta function for some values, the series for which the Riemann zeta is the analytical continuation? It’s not a power series. It’s a series alright, but not a power series, and not the particular kind of series that defines a holomorphic or analytical function.

The underlying point, though, is still the same. That series (not power series, but series) is a partial definition of the Riemann zeta function. It’s got a limited domain, where the Riemann zeta’s domain doesn’t have the same limits. The series definition still doesn’t work at s=-1. The series is still undefined at s=-1. At s=-1, the series expands to 1 + 2 + 3 + 4 + 5 + 6 + ..., which doesn’t converge, and which doesn’t add up to any finite value, -1/12 or otherwise. That series does not have a value at s=-1. No matter what you do, that equation – the definition of that series – does not work at s=-1. But the Riemann Zeta function is defined in places where that equation isn’t. Riemann Zeta at s=-1 is defined, and its value is -1/12.

Despite my mistake, the important point is still that last sentence. The value of the Riemann zeta function at s=-1 is not the sum of the set of natural numbers. The equation that produces the sequence doesn’t work at s=-1. The definition of the Riemann zeta function doesn’t say that it should, or that the sum of the natural numbers is -1/12. It just says that the first approximation of the Riemann zeta function for some, but not all values, is given by a particular infinite sum. In the places where that sum works, it gives the value of zeta; in places where that sum doesn’t work, it doesn’t.