Category Archives: goodmath

A Stunning Demonstration of Why Good Science Needs Good Math

Everyone is scientific circles is abuzz with the big news: there’s proof that dark matter exists! The paper from the scientists who made the discovered is [here][dark-matter-paper]; and a Sean Carroll (no relation) has [a very good explanation on his blog, Cosmic Variance][cv]. This discovery happens to work as a great example of just why good science needs good math.
As I always say, one of the ways to recognize a crackpot theory in physics is by the lack of math. For an example, you can look at the [electric universe][electric] folks. They have a theory, and they make predictions: but because there’s *no* math, the predictions are vague, and there’s no good way of *really* testing them, because there’s no quantifiable way of making a precise prediction – because there’s no math. So they can make predictions like “the stardust experiment will get bigger particles than they expect”; but they can’t tell you *how* big.
The dark matter result is a beautiful example of how to use good math in science.
Here’s the basic idea. The theory says that there are two kinds of matter: “dark” matter, and “light” matter. Dark matter only interacts with light matter via gravity; it does not interact with light matter via other forces. But dark matter and light matter generally clump in the same places – because gravity pulls them together. So it’s very difficult to really prove that dark matter exists – because you can’t see it directly, and it normally only appears with light matter, so you can’t really prove that the dark matter is there: any observed effect *might* be caused by the light matter behaving in a way different than our current theories claim it should.
But what if you could somehow sweep the light matter away?
What the scientists who did this work found is a collision of two galactic clusters. When these clusters collided, the light matter, mostly in the form of gas, interacted very strongly with one another, creating a shock wave pattern. But the *dark* matter passed through without interacting – the “collision” between the gas clouds didn’t affect the dark matter. So the gas clouds were swept back, while the dark matter continued moving. There’s a great animation illustrating this; in the animation, the blue is the dark matter; the red is the light matter. As the two clusters pass through each other, the light matter is swept away by the electromagnetic interactions between the gas clouds; the dark matter passes right through:

Here’s where the math comes in.
They used a combination of optical and X-ray telescope to produce maps of the gravitational fields of the clusters. This was done by computing the gravitational lensing effect distorting the images of other, more distant galaxies visible *behind* the collided clusters. By carefully computing the distortion caused by gravitational lensing, they were able to determine the distribution of *mass* in the collided clusters. And what they found was the bulk of the mass was *not* in the light matter. It was in the places that the center of gravities of the clusters would have been *without* the shock-wave effects of the collision. So the bulk of the mass of these two clusters do not appear on our telescope images; but it behaves exactly as the math predicts it would if it were dark matter.
The prediction and the result are both based on very careful computations based on the *mathematical* predictions of gravity and relativity. They were able to predict precisely what they would expect from the interaction using a mathematical model of the how the gas clouds would interact to be swept away; and how the dark matter would interact gravitationally to predict where the dark matter masses should be. Then they were able, via a *separate* computation to determine how much mass was in what location based on gravitational lensing. And finally, they were able to compare the two *separately computed* results to see if the reality matched the prediction.
Now *that* is both good math and good science! And the science could *not* have been done without the math.

Arithmetic with Surreal Numbers

Last thursday, I introduced the construction of John Conway’s beautiful surreal numbers. Today I’m going to show you how to do arithmetic using surreals. It’s really quite amazing how it all works! If you haven’t read the original post introducing surreals, you’ll definitely want to [go back and read that][surreals] before looking at this post!

Continue reading

A Brilliant φ link

In the comments onmy post about φ, Polymath, (whose [blog][polymath] is well worth checking out) provided a really spectacular [link involving φ][desert]. It’s an excerpt from a book called “[Mathematical Gems 2][gems]”, showing a problem that came from John Conway, called the “Sending Scouts into the Desert” problem.
The problem is:
Suppose you’re given a checkerboard with all of the squares on the bottom filled. You’re allowed to do standard checks jumps (jump over a man and remove it), but you can’t jump diagonally, only up, left, or right. How far *up* can you get a man? How many men do you need to move to get that far?
To get to the first row above your men, you need to jump one man. To get two rows, you need to jump four men; three rows, you’ll need 8 men; four rows, you’ll need 20 men. How about five rows?
You *can’t* get five rows doing up and sideways jumps. Even if you had an infinitely wide checkerboard, with as many full rows of mens as you want.
The proof, along with a nice Java applet to let you try the solvable ones, is at the link. And φ is an inextricable part of the proof!

From Lambda Calculus to Cartesian Closed Categories

This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)
Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.
First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.
So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:
1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’
So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.
The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)
We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.
So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:
* ∀ A ∈ basetypes(**L**), [A] = AC ∈ C(**L**)
* [unit] = 1C
Next, we need to define the typing rules for complex types:
* [ A × B ] = [A] × [B]
* [ A → B ] = [B][A]
Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:
* [∅] = 1C
* [Γ, x : A] = [Γ] × [A]
We also need to describe what to do with the values of the primitive types:
* For each value v : A, there is an arrow v : 1 → AC.
And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).
* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*
* [Γ :- a : AC] = a º ! : [Γ] → [AC] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*
* [Γ x : A :- x : A] = π2 : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*
* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π1 where π1 : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*
* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B][A] *(A function typing rule: the function maps to an arrow from the type context to an exponential [B][A], which is a function from A to B.)*
* [Γ :- (M M’) : B] = evalC,B º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*
There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.
The intuition behind this is:
* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.
* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)
Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)

Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.

Quaternions: upping the dimensions of complex numbers

Last week, after I wrote about complex numbers, a bunch of folks wrote and said “Do quaternions next!” My basic reaction was “Huh?” I somehow managed to get by without ever being exposed to quaternions before. They’re quite interesting things.
The basic idea behind quaterions is: we see some amazing things happen when we expand the dimensionality of numbers from 1 (the real numbers) to 2 (the complex numbers). What if we add *more* dimensions?
It doesn’t work for three dimensions But you *can* create a system of numbers in *four* dimensions. As with complex numbers, you need a distinct unit for each dimension. In complex, those were 1 and i. For quaternions, you need for units: we’ll call them “1”, “i”, “j”, and “k”. In quaternion math, the units have the following properties:
1. i2 = j2 = k2 = ijk = -1
2. ij = k, jk = i, and ki=j
3. ji = -k, kj = -i, and ik=-j
No, that last one is *not* an error. Quaternion multiplication is *not* commutative: ab &neq; ba. It *is* associative over multiplication, meaning (a*b)*c = a*(b*c). And fortunately, it’s both commutative and associative over addition.
Anyway, we write a quaternion as “a + bi + cj + dk”; as in “2 + 4i -3j + 7k”.
Aside from the commutativity thing, quaternions work well as numbers; they’re a non-abelian group over multiplication (assuming you omit 0); they’re an abelian group over addition; they form a division algebra *over* the reals (meaning that you can define an algebra with operations up to multiplication and division over the quaternions, and operations in this algebra will generate the same results as normal real algebra if the i,j, and k components are 0.)
Quaternions are a relatively recent creation, so we know the history quite precisely. Quaternions were invented by a gentleman named William Hamilton on October 16th, 1843, while walking over the Broom bridge with his wife. He had been working on trying to work out something like the complex numbers with more dimensions for a while, and the crucial identity “ijk=-1” struck him while he was walking over the bridge. He immediately *carved it into the bridge* before he forgot it. Fortunately, this act of vandalism wasn’t taken seriously: the Broom bridge now has a plaque commemorating it!
Quaternions were controversial from the start. The fundamental step that Hamilton had taken in making quaternions work was abandoning commutativity, and many people believed that this was simply *wrong*. For example, Lord Kelvin said “Quaternions came from Hamilton after his really good work had been done; and, though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.”
Why are they non-commutative?
Quaternions are decidedly strange things. As I mentioned above, they’re non-commutative. Doing math with things where ab &neq; ba was a *huge* step, and it can be strange. But it is necessary for quaternions. Here’s why.
Let’s take the fundamental identity of quaternions: ijk = -1
Now, let’s multiply both sides by k: ij(k2) = -1k
So, ij(-1)=-k, and ij=k.
Now, multiply by *i* on the left on both sides: i2j = ik
And so, -1j = ik, and ik=-j.
Keep playing with multiplication in the identities, and you find that the identities simple *do not work* unless it’s non-commutative.
What are they good for?
For the most part, quaternions aren’t used much anymore. Initially, they were used in many places where we now use complex vectors, but that’s mostly faded away. What they *are* used for is all sorts of math that in any way involves *rotation*. Quaternions capture the essential properties of rotation.
To describe a rotation well, you actually need to have *extra* dimensions. To describe an *n*-dimensional rotation, you need *n+2* dimensions. Since in our three dimensional world, we rotate things around a *line* which is two dimensional, the description of the rotation needs *four* dimensions. And further, rotation itself is non-commutative.
Take a book and put it flat on the table, with the binding to the left. Flip it so that the binding stays on the same side, but now you’ve got the back cover on top, upside-down. Then rotate it clockwise 90 degrees on the table. You now have the book with its back cover up, and the binding facing away from you.
Now, start over: book right side up, binding facing left. Rotate it clockwise 90 degrees. Now flip it the same way you did before (over the x axis if you think of left as -x, right as +x, away from you as +y, up as +z, etc.)
What’s the result? You get the book with its back cover up, and the binding factor *towards* you. It’s 180 degrees different depending on which order you do it in. And 180 degrees is a reverse equivalent to the sign difference that you get in quaternion multiplication! Quaternions have exactly the right properties for describing rotations – and in particular, the *composition* of rotations.
The use of quaternions for rotations is used in computer graphics (when something rotates on the screen of your PS2, it’s doing quaternion math!); in satellite navigation; and in relativity equations (for symmetry wrt rotation).

Ω: my favorite strange number

Ω is my own personal favorite transcendental number. Ω isn’t really a specific number, but rather a family of related numbers with bizzare properties. It’s the one real transcendental number that I know of that comes from the theory of computation, that is important, and that expresses meaningful fundamental mathematical properties. It’s also deeply non-computable; meaning that not only is it non-computable, but even computing meta-information about it is non-computable. And yet, it’s *almost* computable. It’s just all around awfully cool.
So. What is it Ω?
It’s sometimes called the *halting probability*. The idea of it is that it encodes the *probability* that a given infinitely long random bit string contains a prefix that represents a halting program.
It’s a strange notion, and I’m going to take a few paragraphs to try to elaborate on what that means, before I go into detail about how the number is generated, and what sorts of bizzare properties it has.
Remember that in the theory of computation, one of the most fundamental results is the non-computability of *the halting problem*. The halting problem is the question “Given a program P and input I, if I run P on I, will it ever stop?” You cannot write a program that reads P and I and gives you the answer to the halting problem. It’s impossible. And what’s more, the statement that the halting problem is not computable is actually *equivalent* to the fundamental statement of Gödel’s incompleteness theorem.
Ω is something related to the halting problem, but stranger. The fundamental question of Ω is: if you hand me a string of 0s and 1s, and I’m allowed to look at it one bit at a time, what’s the probability that *eventually* the part that I’ve seen will be a program that eventually stops?
When you look at this definition, your reaction *should* be “Huh? Who cares?”
The catch is that this number – this probability – is a number which is easy to define; it’s not computable; it’s completely *uncompressible*; it’s *normal*.
Let’s take a moment and look at those properties:
1. Non-computable: no program can compute Ω. In fact, beyond a certain value N (which is non-computable!), you cannot compute the value of *any bit* of Ω.
2. Uncompressible: there is no way to represent Ω in a non-infinite way; in fact, there is no way to represent *any substring* of Ω in less bits than there are in that substring.
3. Normal: the digits of Ω are completely random and unpatterned; the value of and digit in Ω is equally likely to be a zero or a one; any selected *pair* of digits is equally likely to be any of the 4 possibilities 00, 01, 10, 100; and so on.
So, now we know a little bit about why Ω is so strange, but we still haven’t really defined it precisely. What is Ω? How does it get these bizzare properties?
Well, as I said at the beginning, Ω isn’t a single number; it’s a family of numbers. The value of *an* Ω is based on two things: an effective (that is, turing equivalent) computing device; and a prefix-free encoding of programs for that computing device as strings of bits.
(The prefix-free bit is important, and it’s also probably not familiar to most people, so let’s take a moment to define it. A prefix-free encoding is an encoding where for any given string which is valid in the encoding, *no prefix* of that string is a valid string in the encoding. If you’re familiar with data compression, Huffman codes are a common example of a prefix-free encoding.)
So let’s assume we have a computing device, which we’ll call φ. We’ll write the result of running φ on a program encoding as the binary number p as &phi(p). And let’s assume that we’ve set up φ so that it only accepts programs in a prefix-free encoding, which we’ll call ε; and the set of programs coded in ε, we’ll call Ε; and we’ll write φ(p)↓ to mean φ(p) halts. Then we can define Ω as:

Ωφ,ε = Σp ∈ Ε|p↓ 2-len(p)

So: for each program in our prefix-free encoding, if it halts, we *add* 2-length(p) to Ω.
Ω is an incredibly odd number. As I said before, it’s random, uncompressible, and has a fully normal distribution of digits.
But where it gets fascinatingly bizzare is when you start considering its computability properties.
Ω is *definable*. We can (and have) provided a specific, precise definition of it. We’ve even described a *procedure* by which you can conceptually generate it. But despite that, it’s *deeply* uncomputable. There are procedures where we can compute a finite prefix of it. But there’s a limit: there’s a point beyond which we cannot compute *any* digits of it. *And* there is no way to compute that point. So, for example, there’s a very interesting paper where the authors [computed the value of Ω for a Minsky machine][omega]; they were able to compute 84 bits of it; but the last 20 are *unreliable*, because it’s uncertain whether or not they’re actually beyond the limit, and so they may be wrong. But we can’t tell!
What does Ω mean? It’s actually something quite meaningful. It’s a number that encodes some of the very deepest information about *what* it’s possible to compute. It gives a way to measure the probability of computability. In a very real sense, it represents the overall nature and structure of computability in terms of a discrete probability. It’s an amazingly dense container of information – as a n infinitely long number and so thoroughly non-compressible, it contains an unmeasurable quantity of information. And we can’t get near most of it!

Irrational and Transcendental Numbers

If you look at the history of math, there’ve been a lot of disappointments for mathematicians. They always start off with an idea of math as a clean, beautiful, elegant thing. And they seem to often wind up disappointed.
Which leads us into todays strange numbers: irrational and transcendental numbers. Both of them were huge disappointments to the mathematicians who discovered them.
So what are they? We’ll start with the irrationals. They’re numbers that aren’t integers, and that aren’t a ratio of any two integers. So you can’t write them as a normal fraction. If you write them as a continued fraction, then they go on forever. If you write them in decimal form, they go on forever without repeating. π, √2, *e* – they’re all irrational. (Incidentally, the reason that they’re called “irrational” isn’t because they don’t make sense, or because their decimal representation is crazy, but because they can’t be written *as ratios*. My high school algebra teacher who first talked about irrational numbers said that they were irrational because numbers that never repeated in decimal form weren’t sensible.)
The transcendentals are even worse. Transcendental numbers are irrational; but not only can transcendental numbers not be written as a ratio of integers; not only do their decimal forms go on forever without repeating; transcendental numbers are numbers that *can’t* be described by algebraic operations: there’s no finite sequence of multiplications, divisions, additions, subtractions, exponents, and roots that will give you the value of a transcendental number. √2 is not transcendental; *e* is.
The first disappointment involving the irrational numbers happened in Greece, around 500 bce. A rather brilliant man by the name of Hippasus, who was part of the school of Pythagoras, was studying roots. He worked out a geometric proof of the fact that √2 could not be written as a ratio of integers. He showed it to his teacher, Pythagoras. Pythagoras was convinced that numbers were clean and perfect, and could not accept the idea of irrational numbers. After analyzing Hippasus’s proof, and being unable to find any error in it, he became so enraged that he *drowned* poor Hippasus.
A few hundred years later, Eudoxus worked out the basic theory of irrationals; and it was published as a part of Euclid’s mathematical texts.
From that point, the study of irrationals pretty much disappeared for nearly 2000 years. It wasn’t until the 17th century that people started really looking at them again. And once again, it led to disappointment; but at least no one got killed this time.
Mathematicians had come up with yet another idea of what the perfection of math meant – this time using algebra. They decided that it made sense that algebra could describe all numbers; so you could write an equation to define any number using a polynomial with rational coefficients; that is, an equation using addition, subtraction, multiple, division, exponents, and roots.
Leibniz was studying algebra and numbers, and he’s the one who made the unfortunate discovery: lots of irrational numbers are algebraic; but lots of them *aren’t*. He discovered it indirectly, by way of the sin function. You see, sin(x) *can’t* be computed from x using algebra. There’s no algebraic function that can compute it. Leibniz called sin a transcendental function, since it went beyond algebra.
Building on the work of Leibniz, Liouville worked out that you could easily construct numbers that couldn’t be computed using algebra. For example, the constant named after Liouville consists of a string of 0s and 1s where 10-i is 1 if/f there is some integer n such that n!=i.
Not too long after, it was discovered that *e* was transcendental. The main reason for this being interesting is because up until that point, every transcendental number known was *constructed*; *e* is a natural, unavoidable constant. Once *e* was shown irrational, others followed; in one neat side-note, π was shown to be transcendental *using e*. One of the properties that they discovered after the transcendence of *e* was that any transcendental number raised to a non-transcendental power was transcendental. Since e is *not* transcendental – it’s 1 – then π must be transcendental.
The final disappointment in this area came soon after; Cantor, studying the irrationals, came up with the infamous “Cantor’s diagonalization” argument, which shows that there are *more* transcendental numbers than there are algebraic ones. *Most* numbers are not only irrational; they’re transcendental.
What does it mean, and why does it matter?
Irrational and transcendental numbers are everywhere. Most numbers aren’t rational. Most numbers aren’t even algebraic. That’s a very strange notion: *we can’t write most numbers down*.
Even stranger, even though we know, per Cantor, that most numbers are transcendental, it’s *incredibly* difficult to prove that any particular number is transcendental. Most of them are; but we can’t even figure out *which ones*.
What does that mean? That our math-fu isn’t nearly as strong as we like to believe. Most numbers are beyond us.
Some interesting numbers that we know are either irrational or transcendental:
1. *e*: transcendental.
2. π: transcendental.
3. √2 : irrational, but algebraic,
4. √x, for all x that are not perfect squares are irrational.
5. log23 is irrational.
6. Ω, Chaitin’s constant, is transcendental.
What’s interesting is that we really don’t know very much about how transcendentals interact; and given the difficulty of proving that something is transcendental, even for the most well-known transcendentals, we don’t know much of what happens when you put them together. π+*e*; π×*e*; ππ; *e**e*, πe are all numbers that we *don’t know* if they;’re transcendental. In fact, for π+*e* (and π-*e*) we don’t even know if it’s *irrational*.
That’s the thing about these number. We have *such* a weak grasp of them that even things that seem like they should be easy and fundamental, *we don’t know how to do*.

Something Nifty: A Taste of Simple Continued Fractions

One of the annoying things about how we write numbers is the fact that we generally write things one of two ways: as fractions, or as decimals.
You might want to ask, “Why is that annoying?” (And in fact, that’s what I want you to ask, or else there’s no point in my writing the rest of this!)
It’s annoying because both fractions and decimals can both only describe
*rational* numbers – that is, numbers that are a perfect ratio of two integers. And *most* numbers aren’t rational.
But it’s even more annoying than that: if you use decimals, then there are lots of rational numbers that you can’t represent exactly (i.e., 1/3); and if you use fractions, then it’s hard to express the idea that the fraction isn’t exact. (How do you write π as a fraction? 22/7 is a standard fractional approximation, but how do you say π, which is *almost* 22/7?)
So what do we do?
One of the answers is something called *continued fractions*. A continued fraction is a very neat thing. Here’s the idea: take a number where you don’t know it’s fractional form. Pick the nearest simple fraction 1/n that’s just a *little bit too large*. If you were looking at, say, 0.4, you’d take 1/2 – it’s a bit bigger. Now – you’ve got an approximation, but it’s too large. So that means that the demoninator is *too small*. So you add a correction to the denominator to make it a little bit bigger. And you just keep doing that – you approximate the correction to the denominator by adding a fraction to the denominator that’s just a little too big, and then you add a correction to *that* correction.
Let’s look at an example: 2.3456
1. It’s close to 2. So we start with 2 + (0.3456)
2. Now, we start approximating the fraction. The way we do that is we take the *reciprocal* of 0.3456 and take the integer part of it: 1/0.3456 rounded down is 2 . So we make it 2 + 1/2; and we know that the denominator is off by .3088.
3. We take the reciprocal again, and get 3, and it’s off by .736
4. We take the reciprocal again, and get 1, and it’s off by 0.264
5. Next we get 3, but it’s off by 208/1000
6. Then 4, off by 0.168
7. Then 5, off by .16
8. Then 6, off by .25
9. Then 4, off by 0; so now we have an exact result.
So as a continued fraction, 2.3456 looks like:
As a shorthand, continued fractions are normally written using a list notation inside of square brackets: the integer part, following by a semicolon, followed by a comma-separated list of the denominators of each of the fractions. So our continued fraction for 2.3456 would be written [2; 2, 3, 1, 3, 4, 5, 6, 4].
There’s a very cool visual way of understanding that algorithm. I’m not going to show it for 2.3456, because it’s a bit too much… But let’s look at something simpler: let’s try to write 9/16ths as a continued fraction. Basically, we make a grid consisting of 16 squares across by 9 squares up and down. We draw the *largest* square we can on that grid. The number of squares of that size that we can draw is the first digit of the continued fraction. Now there’s a rectangle left over: we draw the largest squares we can, there. And so on:


So the continued fraction for 9/16ths is [0; 1, 1, 3, 2].
Using continued fractions, we can represent *any* rational number in a finite-length continued fraction.
One incredibly nifty thing about this way of writing numbers is: what’s the reciprocal of 2.3456, aka [2; 2, 3, 1, 3, 4, 5, 6, 4]? It’s [0; 2, 2, 3, 1, 3, 4, 5, 6, 4]. We just add a zero to the front as the integer part, and push everything else one place to the right. If it was a zero in front, then we would have removed the zero and pulled everything else one place to the left.
Irrational numbers are represented as *infinite* continued fractions. So there’s an infinite series of correction fractions. You can understand it as a series of every-improving approximations of the value of the number. And you can define it using a recurrence relation (that is, a recursive formula) for how to get to the next digit.
For example, π = [3; 7, 15, 1, 292, 1, …]. If we work that out, the first six places of the continued fraction for pi work out in decimal form to 3.14159265392. That’s correct to the first *11* places in decimal. Not bad, eh?
A very cool property of the continued fractions is: square roots written as continued fractions *always repeat*. Even cooler? What’s the square root of two as a continued fraction? [1; 2, 2, 2, 2, …. ].

i : the Imaginary Number

After the amazing response to my post about zero, I thought I’d do one about something that’s fascinated me for a long time: the number *i*, the square root of -1. Where’d this strange thing come from? Is it real (not in the sense of real numbers, but in the sense of representing something *real* and meaningful)?
The number *i* has its earliest roots in some of the work of early arabic mathematicians; the same people who really first understood the number 0. But they weren’t quite as good with *i* as they were with 0: they didn’t really get it. They had some concept of roots of a cubic equation, where sometimes the tricks for finding the roots of the equation *just didn’t work*. They knew there was something going on, some way that the equation needed to have roots, but just what that really mean, they didn’t get.
Things stayed that way for quite a while. Various others, like the Greeks, encountered them in various ways when things didn’t work, but no one *really* grasped the idea that algebra required numbers that were more than just points on a one-dimensional number-line.
The next step was in Italy, over 1000 years later. During the 16th century, people were searching for solutions to the cubic equations – the same thing that the arabic scholars were looking at. But getting some of the solutions – even solutions to equations with real roots – required playing with the square root of -1 along the way. It was first really described by Rafael Bombelli in the context of the solutions to the cubic; but Bombello didn’t really think that they were *real*, *meaningful* numbers: it was viewed as a useful artifact of the process of solving the equations, but it wasn’t accepted.
It got its name as the *imaginary number* as a result of a diatribe by Rene Descartes, who believed it was a phony artifact of sloppy algebra. He did not accept that it had any meaning at all: thus it was an “imaginary” number.
They finally came into wide acceptance as a result of the work of Euler in the 18th century. Euler was probably the first to really, fully comprehend the complex number system created by the existence of *i*. And working with that, he discovered one of the most fascinating and bizzare mathematical discoveries ever, known as *Euler’s equation*. I have no idea how many years it’s been since I was first exposed to this, and I *still* have a hard time wrapping my head around *why* it’s true.

e = cos θ + i sin θ

And what *that* really means is:

e = -1

That’s just astonishing. The fact that there is *such* a close relationship between i, π, and e is just shocking to me.
What *i* does
Once the reality of *i* as a number was accepted, mathematics was changed irrevocably. Instead of the numbers described by algebraic equations being points on a line, suddenly they become points *on a plane*. Numbers are really *two dimensional*; and just like the integer “1” is the unit distance on the axis of the “real” numbers, “i” is the unit distance on the axis of the “imaginary” numbers. As a result numbers *in general* become what we call *complex*: they have two components, defining their position relative to those two axes. We generally write them as “a + bi” where “a” is the real component, and “b” is the imaginary component.


The addition of *i* and the resulting addition of complex numbers is a wonderful thing mathematically. It means that *every* polynomial equation has roots; in particular, a polynomial equation in “x” with maximum exponent “n” will always have exactly “n” complex roots.
But that’s just an effect of what’s really going on. The real numbers are *not* closed algebraically under multiplication and addition. With the addition of *i*, multiplicative algebra becomes closed: every operation, every expression in algebra becomes meaningful: nothing escapes the system of the complex numbers.
Of course, it’s not all wonderful joy and happiness once we go from real to complex. Complex numbers aren’t ordered. There is no < comparison for complex numbers. The ability to do meaningful inequalities evaporates when complex numbers enter the system in a real way.
What *i* means
But what do complex numbers *mean* in the real world? Do they really represent real phenomena? Or are they just a mathematical abstraction?
They’re very real. There’s one standard example that everyone uses: and the reason that we all use it is because it’s such a perfect example. Take the electrical outlet that’s powering your computer. It’s providing alternating current. What does that mean?
Well, the *voltage* – which (to oversimplify) can be viewed as the amount of force pushing the current – is complex. In fact, if you’ve got a voltage of 110 volts AC at 60 hz (the standard in the US), what that means is that the voltage is a number of magnitude “110”. If you were to plot the “real” voltage on a graph with time on the X axis and voltage of the Y, you’d see a sine wave:


But that’s not really accurate. If you grabbed the wire when the voltage is supposedly zero on that graph, *you’d still get a shock*! Take the moment marked “t1” on the graph above. The voltage at time t1 on the complex plane is a point at “110” on the real axis. At time t2, the voltage on the “real” axis is zero – but on the imagine axis it’s 110. In fact, the *magnitude* of the voltage is *constant*: it’s always 110 volts. But the vector representing that voltage *is rotating* through the complex plane.


You also see it in the Fourier transform: when we analyze sound using a computer, one of the tricks we use is decomposing a complex waveform (like a human voice speaking) into a collection of basic sine waves, where the sine waves added up equal the wave at a given point in time. The process by which we
do that decomposition is intimately tied with complex numbers: the fourier transform, and all of the analyses and transformations built on it are dependent on the reality of complex numbers (and in particular on the magnificent Euler’s equation up above).