Monthly Archives: November 2012

Let's Get Rid of Zero!

One of my tweeps sent me a link to a delightful pile of rubbish: a self-published “paper” by a gentleman named Robbert van Dalen that purports to solve the “problem” of zero. It’s an amusing pseudo-technical paper that defines a new kind of number which doesn’t work without the old numbers, and which gets rid of zero.

Before we start, why does Mr. van Dalen want to get rid of zero?

So what is the real problem with zeros? Zeros destroy information.

That is why they don’t have a multiplicative inverse: because it is impossible to rebuilt something you have just destroyed.

Hopefully this short paper will make the reader consider the author’s firm believe that: One should never destroy anything, if one can help it.

We practically abolished zeros. Should we also abolish simplifications? Not if we want to stay practical.

There’s nothing I can say to that.

So what does he do? He defines a new version of both integers and rational numbers. The new integers are called accounts, and the new rationals are called super-rationals. According to him, these new numbers get rid of that naughty information-destroying zero. (He doesn’t bother to define real numbers in his system; I assume that either he doesn’t know or doesn’t care about them.)

Before we can get to his definition of accounts, he starts with something more basic, which he calls “accounting naturals”.

He doesn’t bother to actually define them – he handwaves his way through, and sort-of defines addition and multiplication, with:

a + b == a concat b
a * b = a concat a concat a … (with b repetitions of a)

So… a sloppy definition of positive integer addition, and a handwave for multiplication.

What can we take from this introduction? Well, our author can’t be bothered to define basic arithmetic properly. What he really wants to say is, roughly, Peano arithmetic, with 0 removed. But my guess is that he has no idea what Peano arithmetic actually is, so he handwaves. The real question is, why did he bother to include this at all? My guess is that he wanted to pretend that he was writing a serious math paper, and he thinks that real math papers define things like this, so he threw it in, even though it’s pointless drivel.

With that rubbish out of the way, he defines an “Account” as his new magical integer, as a pair of “account naturals”. The first member of the pair is called a the credit, and the second part is the debit. If the credit is a and the debit is b, then the account is written (a%b). (He used backslash instead of percent; but that caused trouble for my wordpress config, so I switched to percent-sign.)

a%b ++ c%d = (a+c)%(b+d)
a%b ** c%d = ((a*c)+(b*d))%((a*d)+(b*c))
– a%b = b%a

So… for example, consider 5*6. We need an “account” for each: We’ll use (7%2) for 5, and (9%3) for 6, just to keep things interesting. That gives us: 5*6 = (7%2)*(9%3) = (63+6)%(21+18) = 69%39, or 30 in regular numbers.

Yippee, we’ve just redefined multiplication in a way that makes us use good old natural number multiplication, only now we need to do it four times, plus 2 additions to multiply two numbers! Wow, progress! (Of a sort. I suppose that if you’re a cloud computing provider, where you’re renting CPUs, then this would be progress.

Oh, but that’s not all. See, each of these “accounts” isn’t really a number. The numbers are equivalence classes of accounts. So once you get the result, you “simplify” it, to make it easier to work with.

So make that 4 multiplications, 2 additions, and one subtraction. Yeah, this is looking nice, huh?

So… what does it give us?

As far as I can tell, absolutely nothing. The author promises that we’re getting rid of zero, but it sure likes like this has zeros: 1%1 is zero, isn’t it? (And even if we pretend that there is no zero, Mr. van Dalen specifically doesn’t define division on accounts, we don’t even get anything nice like closure.)

But here’s where it gets really rich. See, this is great, cuz there’s no zero. But as I just said, it looks like 1%1 is 0, right? Well it isn’t. Why not? Because he says so, that’s why! Really. Here’s a verbatim quote:

An Account is balanced when Debit and Credit are equal. Such a balanced Account can be interpreted as (being in the equivalence class of) a zero but we won’t.


But, according to him, we don’t actually get to see these glorious benefits of no zero until we add rationals. But not just any rationals, dum-ta-da-dum-ta-da! super-rationals. Why super-rationals, instead of account rationals? I don’t know. (I’m imagining a fraction with blue tights and a red cape, flying over a building. That would be a lot more fun than this little “paper”.)

So let’s look as the glory that is super-rationals. Suppose we have two accounts, e = a%b, and f = c%d. Then a “super-rational” is a ratio like e/f.

So… we can now define arithmetic on the super-rationals:

e/f +++ g/h = ((e**h)++(g**f))/(f**h); or in other words, pretty much exactly what we normally do to add two fractions. Only now those multiplications are much more laborious.
e/f *** g/h = (e**g)/(f**h); again, standard rational mechanics.
Multiplication Inverse (aka Reciprocal)
`e/f = f/e; (he introduces this hideous notation for no apparent reason – backquote is reciprocal. Why? I guess for the same reason that he did ++ and +++ – aka, no particularly good reason.

So, how does this actually help anything?

It doesn’t.

See, zero is now not really properly defined anymore, and that’s what he wants to accomplish. We’ve got the simplified integer 0 (aka “balance”), defined as 1%1. We’ve got a whole universe of rational pseudo-zeros – 0/1, 0/2, 0/3, 0/4, all of which are distinct. In this system, (1%1)/(4%2) (aka 0/2) is not the same thing as (1%1)/(5%2) (aka 0/3)!

The “advantage” of this is that if you work through this stupid arithmetic, you essentially get something sort-of close to 0/0 = 0. Kind-of. (There’s no rule for converting a super-rational to an account; assuming that if the denominator is 1, you can eliminate it, you get 1/0 = 0:

I’m guessing that he intends identities to apply, so: (4%1)/(1%1) = ((4%1)/(2%1)) *** `((2%1)/(1%1)) = ((4%1)/(2%1)) *** ((1%1)/(2%1)) = (1%1)/(2%1). So 1/0 = 0/1 = 0… If you do the same process with 2/0, you end up getting the result being 0/2. And so on. So we’ve gotten closure over division and reciprocal by getting rid of zero, and replacing it with an infinite number of non-equal pseudo-zeros.

What’s his answer to that? Of course, more hand-waving!

Note that we also can decide to simplify a Super- Rational as we would a Rational by calculating the Greatest Common Divisor (GCD) between Numerator and Denominator (and then divide them by their GCD). There is a catch, but we leave that for further research.

The catch that he just waved away? Exactly what I just pointed out – an infinite number of pseudo-0s, unless, of course, you admit that there is a zero, in which case they all collapse down to be zero… in which case this is all pointless.

Essentially, this is all a stupidly overcomplicated way of saying something simple, but dumb: “I don’t like the fact that you can’t divide by zero, and so I want to define x/0=0.”

Why is that stupid? Because dividing by zero is undefined for a reason: it doesn’t mean anything! The nonsense of it becomes obvious when you really think about identities. If 4/2 = 2, then 2*2=4; if x/y=z, then x=z*y. But mix zero in to that: if 4/0 = 0, then 0*0=4. That’s nonsense.

You can also see it by rephrasing division in english. Asking “what is four divided by two” is asking “If I have 4 apples, and I want to distribute them into 2 equal piles, how many apples will be in each pile?”. If I say that with zero, “I want to distribute 4 apples into 0 piles, how many apples will there be in each pile?”: you’re not distributing the apples into piles. You can’t, because there’s no piles to distribute them to. That’s exactly the point: you can’t divide by zero.

If you do as Mr. van Dalen did, and basically define x/0 = 0, you end up with a mess. You can handwave your way around it in a variety of ways – but they all end up breaking things. In the case of this account nonsense, you end up replacing zero with an infinite number of pseudo-zeros which aren’t equal to each other. (Or, if you define the pseudo-zeros as all being equal, then you end up with a different mess, where (2/0)/(4/0) = 2/4, or other weirdness, depending on exactly how you defie things.)

The other main approach is another pile of nonsense I wrote about a while ago, called nullity. Zero is an inevitable necessity to make numbers work. You can hate the fact that division by zero is undefined all you want, but the fact is, it’s both necessary and right. Division by zero doesn’t mean anything, so mathematically, division by zero is undefined.

For every natural number N, there's a Cantor Crank C(n)

More crankery? of course! What kind? What else? Cantor crankery!

It’s amazing that so many people are so obsessed with Cantor. Cantor just gets under peoples’ skin, because it feels wrong. How can there be more than one infinity? How can it possibly make sense?

As usual in math, it all comes down to the axioms. In most math, we’re working from a form of set theory – and the result of the axioms of set theory are quite clear: the way that we define numbers, the way that we define sizes, this is the way it is.

Today’s crackpot doesn’t understand this. But interestingly, the focus of his problem with Cantor isn’t the diagonalization. He thinks Cantor went wrong way before that: Cantor showed that the set of even natural numbers and the set of all natural numbers are the same size!

Unfortunately, his original piece is written in Portuguese, and I don’t speak Portuguese, so I’m going from a translation, here.

The Brazilian philosopher Olavo de Carvalho has written a philosophical “refutation” of Cantor’s theorem in his book “O Jardim das Aflições” (“The Garden of Afflictions”). Since the book has only been published in Portuguese, I’m translating the main points here. The enunciation of his thesis is:

Georg Cantor believed to have been able to refute Euclid’s fifth common notion (that the whole is greater than its parts). To achieve this, he uses the argument that the set of even numbers can be arranged in biunivocal correspondence with the set of integers, so that both sets would have the same number of elements and, thus, the part would be equal to the whole.

And his main arguments are:

It is true that if we represent the integers each by a different sign (or figure), we will have a (infinite) set of signs; and if, in that set, we wish to highlight with special signs, the numbers that represent evens, then we will have a “second” set that will be part of the first; and, being infinite, both sets will have the same number of elements, confirming Cantor’s argument. But he is confusing numbers with their mere signs, making an unjustifiable abstraction of mathematical properties that define and differentiate the numbers from each other.

The series of even numbers is composed of evens only because it is counted in twos, i.e., skipping one unit every two numbers; if that series were not counted this way, the numbers would not be considered even. It is hopeless here to appeal to the artifice of saying that Cantor is just referring to the “set” and not to the “ordered series”; for the set of even numbers would not be comprised of evens if its elements could not be ordered in twos in an increasing series that progresses by increments of 2, never of 1; and no number would be considered even if it could be freely swapped in the series of integeres.

He makes two arguments, but they both ultimately come down to: “Cantor contradicts Euclid, and his argument just can’t possibly make sense, so it must be wrong”.

The problem here is: Euclid, in “The Elements”, wrote severaldifferent collections of axioms as a part of his axioms. One of them was the following five rules:

  1. Things which are equal to the same thing are also equal to one another.
  2. If equals be added to equals, the wholes are equal.
  3. If equals be subtracted from equals, the remainders are equal.
  4. Things which coincide with one another are equal to one another.
  5. The whole is greater that the part.

The problem that our subject has is that Euclid’s axiom isn’t an axiom of mathematics. Euclid proposed it, but it doesn’t work in number theory as we formulate it. When we do math, the axioms that we start with do not include this axiom of Euclid.

In fact, Euclid’s axioms aren’t what modern math considers axioms at all. These aren’t really primitive ground statements. Most of them are statements that are provable from the actual axioms of math. For example, the second and third axioms are provable using the axioms of Peano arithmetic. The fourth one doesn’t appear to be a statement about numbers at all; it’s a statement about geometry. And in modern terms, the fifth one is either a statement about geometry, or a statement about measure theory.

The first argument is based on some strange notion of signs distinct from numbers. I can’t help but wonder if this is an error in translation, because the argument is so ridiculously shallow. Basically, it concedes that Cantor is right if we’re considering the representations of numbers, but then goes on to draw a distinction between representations (“signs”) and the numbers themselves, and argues that for the numbers, the argument doesn’t work. That’s the beginning of an interesting argument: numbers and the representations of numbers are different things. It’s definitely possible to make profound mistakes by confusing the two. You can prove things about representations of numbers that aren’t true about the numbers themselves. Only he doesn’t actually bother to make an argument beyond simply asserting that Cantor’s proof only works for the representations.

That’s particularly silly because Cantor’s proof that the even naturals and the naturals have the same cardinality doesn’t talk about representation at all. It shows that there’s a 1 to 1 mapping between the even naturals and the naturals. Period. No “signs”, no representations.

The second argument is, if anything, even worse. It’s almost the rhetorical equivalent of sticking his fingers in his ears and shouting “la la la la la”. Basically – he says that when you’re producing the set of even naturals, you’re skipping things. And if you’re skipping things, those things can’t possible be in the set that doesn’t include the skipped things. And if there are things that got skipped and left out, well that means that it’s ridiculous to say that the set that included the left out stuff is the same size as the set that omitted the left out stuff, because, well, stuff got left out!!!.

Here’s the point. Math isn’t about intuition. The properties of infinitely large sets don’t make intuitive sense. That doesn’t mean that they’re wrong. Things in math are about formal reasoning: starting with a valid inference system and a set of axioms, and then using the inference to reason. If we look at set theory, we use the axioms of ZFC. And using the axioms of ZFC, we define the size (or, technically, the cardinality) of sets. Using that definition, two sets have the same cardinality if and only if there is a one-to-one mapping between the elements of the two sets. If there is, then they’re the same size. Period. End of discussion. That’s what the math says.

Cantor showed, quite simply, that there is such a mapping:

{ (i rightarrow itimes 2) | i in N }

There it is. It exists. It’s simple. It works, by the axioms of Peano arithmetic and the axiom of comprehension from ZFC. It doesn’t matter whether it fits your notion of “the whole is greater than the part”. The entire proof is that set comprehension. It exists. Therefore the two sets have the same size.

Debunking Two Nate Silver Myths

I followed our election pretty closely. My favorite source of information was Nate Silver. He’s a smart guy, and I love the analysis that he does. He’s using solid math in a good way to produce excellent results. But in the aftermath of the election, I’ve seen a lot of bad information going around about him, his methods, and his result.

First: I keep seeing proclamations that “Nate Silver proves that big data works”.


There is nothing big data about Nate’s methods. He’s using straightforward Bayesian methods to combine data, and the number of data points is remarkably small.

Big data is one of the popular jargon keywords that people use to appear smart. But it does actually mean something. Big data is using massive quantities of information to find patterns: using a million data points isn’t really big data. Big data means terabytes of information, and billions of datapoints.

When I was at Google, I did log analysis. We ran thousands of machines every day on billions of log records (I can’t say the exact number, but it was in excess of 10 billion records per day) to extract information. It took a data center with 10,000 CPUs running full-blast for 12 hours a day to process a single days data. Using that data, we could extract some obvious things – like how many queries per day for each of the languages that Google supports. We could also extract some very non-obvious things that weren’t explicitly in the data, but that were inferrable from the data – like probable network topologies of the global internet, based on communication latencies. That’s big data.

For another example, look at this image produced by some of my coworkers. At foursquare, we about five million points of checkin data every day, and we’ve got a total of more than 2 1/2 billion data points. By looking at average checkin densities, and then comparing that to checkin densities after the hurricane, we can map out precisely where in the city there was electricity, and where there wasn’t. We couldn’t do that by watching one person, or a hundred people. But by looking at the patterns in millions and millions of records, we can. That is big data.

This doesn’t take away from Nate’s accomplishment in any way. He used data in an impressive and elegant way. The fact is, he didn’t need big data to do this. Elections are determined by aggregate behavior, and you just don’t need big data to predict them. The data that Nate used was small enough that a person could do the analysis of it with paper and pencil. It would be a huge amount of work to do by hand, but it’s just nowhere close to the scale of what we call big data. And trying to do big data would have made it vastly more complicated without improving the result.

Second: there are a bunch of things like this.

The point that many people seem to be missing is that Silver was not simply predicting who would win in each state. He was publishing the odds that one or the other candidate would win in each statewide race. That’s an important difference. It’s precisely this data, which Silver presented so clearly and blogged about so eloquently, that makes it easy to check on how well he actually did. Unfortunately, these very numbers also suggest that his model most likely blew it by paradoxically underestimating the odds of President Obama’s reelection while at the same time correctly predicting the outcomes of 82 of 83 contests (50 state presidential tallies and 32 of 33 Senate races).

Look at it this way, if a meteorologist says there a 90% chance of rain where you live and it doesn’t rain, the forecast wasn’t necessarily wrong, because 10% of the time it shouldn’t rain – otherwise the odds would be something other than a 90% chance of rain. One way a meteorologist could be wrong, however, is by using a predictive model that consistently gives the incorrect probabilities of rain. Only by looking a the odds the meteorologist gave and comparing them to actual data could you tell in hindsight if there was something fishy with the prediction.

Bzzt. Sorry, wrong.

There are two main ways of interpreting probability data: frequentist, and Bayesian.

In a frequentist interpretation, saying that an outcome of an event has a probability X% of occuring, you’re saying that if you were to run an infinite series of repetitions of the event, then on average,
the outcome would occur in X out of every 100 events.

The Bayesian interpretation doesn’t talk about repetition or observation. What it says is: for any specific event, it will have one outcome. There is no repetition. But given the current state of information available to me, I can have a certain amount of certainty about whether or not the event will occur. Saying that I assign probability P% to an event doesn’t mean that I expect my prediction to fail (100-P)% of the time. It just means that given the current state of my knowledge, I expect a particular outcome, and the information I know gives me that degree of certainty.

Bayesian statistics and probability is all about state of knowledge. The fundamental, defining theorem of Bayesian statistics is Bayes theorem, which tells you, given your current state of knowledge and a new piece of information, how to update your knowledge based on what the new information tells you. Getting more information doesn’t change anything about whether or not the event will occur: it will occur, and it will have either one outcome or the other. But new information can allow you to improve your prediction and your certainty of that prediction’s correctness.

The author that I quoted above is being a frequentist. In another section of his articple, he’s more specific:

…The result is P= 0.199, which means there’s a 19.9% chance that it rained every day that week. In other words, there’s an 80.1% chance it didn’t rain on at least one day of the week. If it did in fact rain everyday, you could say it was the result of a little bit of luck. After all, 19.9% isn’t that small a chance of something happening.

That’s frequentist intepretation of the probability – which makes sense, since as a physicist, the author is mainly working with repeated experiments – which is a great place for frequentist interpretation. But looking at the same data, a Bayesian would say: “I have an 19.9% certainty that it will rain today”. Then they’d go look outside, see the clouds, and say “Ok, so it looks like rain – that means that I need to update my prediction. Now I’m 32% certain that it will rain”. Note that nothing about the weather has changed: it’s not true that before looking at the clouds, 80.1 percent of the time it wouldn’t rain, and after looking, that changed. The actual fact of whether or not it will rain on that specific day didn’t

Another way of looking at this is to say that a frequentist believes that a given outcome has an intrinstic probability of occurring, and that our attempts to analyze it just bring us closer to the true probability; whereas a Bayesian says that there is no such thing as an intrinsic probability, because every event is different. All that changes is our ability to make predictions with confidence.

One last metaphor, and I’ll stop. Think about playing craps, where you’re rolling two six sided dice.
For a particular die, a frequentist would say “A fair die has a 1 in 6 chance of coming up with a 1”. A
Bayesian would say “If I don’t know anything else, then my best guess is that I can be 16% certain that a 1
will result from a roll.” The result is the same – but the reasoning is different. And because of the difference in reasoning, you can produce different predictions.

Nate Silver’s predictions of the election are a beautiful example of Bayesian reasoning. He watched daily polls, and each time a new poll came out, he took the information from that poll, weighted it according to the historical reliability of that poll in that situation, and then used that to update his certainty. So based on his data, Nate was 90% certain that his prediction was correct.

Did Global Warming Cause Hurricane Sandy?

I’ve been trapped in post-storm hell (no power, no heat for 10 days. Now power is back, but still no internet at home, which is frustrating, but no big deal), and so I haven’t been able to post this until now.

I’ve been getting a bunch of questions from people in response to an earlier post of mine about global warming, where I said that we can’t blame specific weather events on global warming. The questions come down to: “Can we say that hurricane Sandy and yesterday’s NorEaster were caused by global warming?”

I try to be really careful about things like this. Increasing the amount of energy in the environment definitely has an effect on weather patterns. But for the most part, that effect is statistical. That is, we can’t generally say that a specific extreme weather event wouldn’t have happened without global warming. We can just say that we expect extreme weather events to become much more common.

But what about hurricane Sandy?

Yes, it was caused by global warming.

How can I say that so definitively?

There were a lot of observations made around this particular hurricane. What made it such a severe event is a combination of three primary factors.

  • The ocean water over which it developed is warmer that historically normal. Warm water is, simply, fuel for hurricanes. We know this from years of observation. And we know that the water was warmer, by a couple of degrees, than it would normally be in this season. This is a direct cause for the power of the storm, for the fact that as it moved north, it continued to become stronger rather than weakening. Those warm waters are, by definition global warming: they’re one of the things we measure when we’re measuring global temperature trends.
  • Hurricane Sandy took a pretty dramatic left turn as it came north, which is what swept it into the east coast of the US. That is a very unusual trajectory. Why did it do that? Because of an unusual weather pattern in the Northeast Atlantic, called a negative North Atlantic oscillation (-NAO). And where did the -NAO come from? Our best models strongly suggest that it resulted, at least in part, from icemelt from Greenland. This is less certain than the first factor, but still likely enough that we can be pretty confident.
  • Hurricane Sandy merged with another weather front as it came inland, which intensified it as it came ashore. This one doesn’t have any direct relation to global warming: the front that it merged with is typical autumn weather on the east coast.

So of the three factors that caused the severe hurricane, one of them is absolutely, undeniably global warming. The second is very probably linked to global warming. And the third isn’t.

This is important to understand. We shouldn’t make broad statements about causation when we can’t prove them. But we also shouldn’t refrain from making definitely statements about causation when we can.

The NorEaster that we’re now recovering from falls in to that first class. We simply don’t know if it would have happened without the hurricane. The best models that I’ve seen suggest that it probably wouldn’t have happened without the effects of the earlier hurricane, but it’s just not certain enough to draw a definitive conclusion.

But the Hurricane? There is absolutely no way that anyone can honestly look at the data, and conclude that it was not caused by warming. Anyone who says otherwise is, quite simply, a liar.