A White Boy's Observations of Sexism and the Adria Richards Fiasco

I’ve been watching the whole Adria Richards fiasco with a sense of horror and disgust. I’m finally going to say something, but for the most part, it’s going to be indirect.

See, I’m a white guy, born as a member of an upper middle class white family. That means that I’m awfully lucky. I’m part of the group that is, effectively, treated as the normal, default person in most settings. I’m also a guy who’s married to a chinese woman, and who’s learned a bit about how utterly clueless I am.

Here’s the fundamental issue that underlies all of this, and many similar stories: our society is deeply sexist and racist. We are all raised in an environment in which mens voices are more important than womens. It’s so deeply ingrained in us that we don’t even notice it.

What this means is that we are all to some degree, sexist, and racist. When I point this out, people get angry. We also have learned that sexism is a bad thing. So when I say to someone that you are sexist, it’s really easy to interpret that as me saying that you’re a bad person: sexism is bad, if I’m sexist, them I’m bad.

But we really can’t get away from this reality. We are sexists. For many of us, we’re not deliberately sexist, we’re not consciously sexist. But we are sexist.

Here’s a really interesting experiment to try, if you have the opportunity. Visit an elementary school classroom. First, just watch the teacher interact with the students while they’re teaching. Don’t try to count interactions. Just watch. See if you think that any group of kids is getting more attention than any other. Most of the time, you probably will get a feeling that they’re paying roughly equal attention to the boys and the girls, or to the white students and the black students. Then, come back on a different day, and count the number of times that they call on boys versus calling on girls. I’ve done this, after having the idea suggested by a friend. The result was amazing. I really, honestly believed that the teacher was treating her students (the teacher I did this with was a woman) equally. But when I counted?She was calling on boys twice as often as girls.

This isn’t an unusual outcome. Do some looking online for studies of classroom gender dynamics, and you’ll find lots of structured observations that come to the same conclusion.

My own awakening about these kinds of things came from my time working at IBM. I’ve told this first story before, but it’s really worth repeating.

One year, I managed the summer intership programs for my department. The previous summer, IBM research had wound up with an intership class consisting of 99% men. (That’s not an estimate: that’s a real number. That year, IBM research hired 198 summer interns, of whom 2 were women.) For a company like IBM, numbers like that are scary. Ignoring all of the social issues of excluding potentially great candidates, numbers like that can open the company up to gender discrimination lawsuits!

So my year, they decided to encourage the hiring of more diverse candidates. The way that they did that was by allocating each department a budget for summer interns. They could only hire up to their budgeted number of interns. Only women and minority candidates didn’t count against the budget.

When the summer program hiring opened, my department was allocated a budget of six students. All six slots were gone within the first day. Every single one of them went to a white, american, male student.

The second day, the guy across the hall from me came with a resume for a student he wanted to hire. This was a guy who I really liked, and really respected greatly. He was not, by any reasonable measure, a bad guy – he was a really good person. Anyway, he had this resume, for yet another guy. I told him the budget was gone, but if he could find a good candidate who was either a woman or minority, that we could hire them. He exploded, ranting about how we were being sexist, discriminating against men. He just wanted to hire the best candidate for the job! We were demanding that he couldn’t hire the best candidate, he had to hire someone less qualified, in order to satisfy some politically correct bureaucrat! There was nothing I could do, so eventually he stormed out.

Three days later, he came back to my office with another resume. He was practically bouncing off the walls he was so happy. “I found another student to hire. She’s even better than the guy I originally came to you with! She’s absolutely perfect for the job!”. We hired her.

I asked him why he didn’t find her before. He had no answer – he didn’t know why he didn’t find her resume of his first search.

This was a pattern that I observed multiple times that year. Looking through a stack of resumes, without deliberately excluding women, somehow, all of the candidates with female names wound up back in the slushpile. I don’t think that anyone was deliberately saying “Hmm, Jane, that’s a woman’s name, I don’t want to hire a woman”. But I do think that in the process of looking through a file containing 5000 resumes, trying to select which ones to look at, on an unconscious level, they were more likely to look carefully at a candidate with a male name, because we all learn, from a young age, that men are smarter than women, men are more serious than women, men are better workers than women, men are more likely to be technically skilled than women. Those attitudes may not be part of our conscious thought, but they are part of the cultural background that gets drummed into us by school, by books, by movies, by television, by commercials.

As I said, that was a real awakening for me.

I was talking about this with my next-door office neighbor, who happened to be one of the only two women in my department (about 60 people) at the time. She was shocked that I hadn’t noticed this before. So she pointed out to me that in meetings, she could say things, and everyone would ignore it, but if a guy said the same thing, they’d get listened to. We’d been in many meetings together, and I’d never noticed this!

So I started paying attention, and she was absolutely right.

What happened next is my second big awakening.

I started watching this in meetings, and when people brushed over something she’d said, I’d raise my voice and say “X just suggested blah, which I think is a really good idea. What about it?”. I wanted to help get her voice listened to.

She was furious at me. This just blew my mind. I was really upset at her at first. Dammit, I was trying to help, and this asshole was yelling at me for it! She’d complained about how people didn’t listen to her, and now when I was trying to help get her listened to, she was complaining again!

What I realized after I calmed down and listened to her was that I was wrong. I hadn’t spoken to her about doing it. I didn’t understand what it meant. But the problem was, people didn’t take her seriously because she was a woman. People might listen to me, because I’m also a white guy. But when I spoke for her, I wasn’t helping. When a man speaks on behalf of a woman, we’re reinforcing the idea that a woman’s voice isn’t supposed to be heard. I was substituting my man’s voice for her woman’s, and by doing that, I was not just not helping her, but I was actively hurting, because the social interpretation of my action was that “X can’t speak for herself”. And more, I learned that by taking offense at her, for pointing out that I had screwed up, I was definitely in the wrong – that I had an instinct for reacting wrong.

What I learned, gradually, from watching things like this, from becoming more sensitive and aware, and by listening to what women said, was that this kind of thing is that I was completely clueless.

The fact is, I constantly benefit from a very strong social preference. I don’t notice that. Unless I’m really trying hard to pay attention, I’m not aware of all of the benefits that I get from that. I don’t notice all of the times when I’m getting a benefit. Worse, I don’t notice all of the times when my behavior is asserting that social preference as my right.

It’s very easy for a member of an empowered majority to just take things for granted. We see the way that we are treated as a default, and assume that everyone is treated the same way. We don’t perceive that we are being treated preferentially. We don’t notice that the things that offend us are absolutely off limits to everyone, but that things that we do to offend others are accepted as part of normal behavior. Most importantly, we don’t notice when our behavior is harmful to people who aren’t part of our empowered group. And when we do offend someone who isn’t part of the empowered majority, we take offense at the fact that they’re offended. Because they’re saying that we did something bad, and we know that we aren’t bad people!

The way that this comes back to the whole Adria Richards fiasco is very simple. Many people have looked at what happened at PyCon, and said something like “She shouldn’t have tweeted their picture”, or “She shouldn’t have been offended, they didn’t do anything wrong”, or “She should have just politely spoken to them”.

I don’t know whether what she did was right or not. I wasn’t there. I didn’t hear the joke that the guys in question allegedly told. What I do know is that for a member of the minority out-group, there is frequently no action that will be accepted as “right” if it includes the assertion that the majority did something offensive.

I’ve seen this phenomena very directly myself, not in the context of sexism, but in terms of antisemitism. There’s an expression that I’ve heard multiple times in the northeast US, to talk about bartering a price for a car: “jewing the salesman down”. I absolutely find that extremely offensive. And I’ve called people out on it. There is no response that’s actually acceptable.

If I politely say “You know, that’s relying on a stereotype of me and my ancestors that’s really hurtful”, the response is: “Oh, come on, it’s just harmless. I’m not talking about you, it’s just a word. You’re being oversensitive”. If I get angry, the response is “You Jews are so strident”. If I go to an authority figure in the setting, “You Jews are so passive aggressive, why couldn’t you just talk to me?”. No matter what I do, I’m wrong. Women deal with this every day, only they’re in a situation where the power dynamic is even less in their favor.

That’s the situation that women – particularly women in tech – find themselves in every day. We are sexist. We do mistreat women in tech every day, without even knowing that we’re doing it. And we’re very likely to take offense if they mention that we did something wrong. Because we know that we’re good people, and since we aren’t deliberately doing something bad, they must be wrong.

For someone in Adria Richards’ situation at PyCon, there is no course of action that can’t be taken wrong. As a woman hearing the joke in question, she certainly knew whether or not it was offensive to her. But once she’d heard something offensive, there was nothing she could do that someone couldn’t turn into a controversy.

Was the joke offensive? We don’t know what, specifically, he said. The only fact that we’re certain of is that in her judgement, it was offensive; that the authorities at PyCon agreed, and asked the gentleman in question to apologize.

Did the guy who made the joke deserve to be fired? I don’t know. If this stupid joke were the first time he’d ever done something wrong, then he didn’t deserve to be fired. But we don’t know what his history is like. I know how hard it is to hire skilled engineers, so I’m very skeptical that any company would fire someone over one minor offense. It’s possible that his company has a crazy hair-trigger HR department. But it’s also possible that there’s background that we don’t know about. That he’s done stuff before, and been warned. If that’s the case, then his company could have decided that this was the last straw.

Did Adria Richards deserve to be fired? Almost certainly not. We know more about her case than we do about the guy who told the joke. We know that her company fired her over this specific incident, because in their announcement of her firing, they told us the reason. They didn’t cite any past behavior – they just specifically cited this incident and its aftermath as the reason for firing her. It’s possible that there’s a history here that we don’t know about, that she’d soured relations with customers of her company in incidents other than this, and that this was a last straw. But it doesn’t seem likely, based on the facts that we’re aware of.

Did either of them deserve to be threatened? Absolutely not.

Genius Continuum Crackpottery

This post was revised on June 25, 2014. Mr. Wince has been threatening to sue me for libel. I don’t think that that’s right, but one thing that he’s complained about is correct. I called him a high school dropout. In his article, Wince refers to “when he dropped out of high school”, but in the same sentence, he goes on to say that he dropped out to attend community college. Calling him a dropout is a cheap shot, which I shouldn’t have included, and for that, I apologize. I’ve removed the line from the post. I still think that his math is laughably wrong, but I shouldn’t have called him a dropout.

There’s a lot of mathematical crackpottery out there. Most of it is just pointless and dull. People making the same stupid mistakes over and over again, like the endless repetitions of the same-old supposed refutations of Cantor’s diagonalization.

After you eliminate that, you get reams of insanity – stuff which
is simply so incoherent that it doesn’t make any sense. This kind of thing is usually word salad – words strung together in ways that don’t make sense.

After you eliminate that, sometimes, if you’re really lucky, you’ll come accross something truly special. Crackpottery as utter genius. Not genius in a good way, like they’re an outsider genius who discovered something amazing, but genius in the worst possible way, where someone has created something so bizarre, so overwrought, so utterly ridiculous that it’s a masterpiece of insane, delusional foolishness.

Today, we have an example of that: Existics!. This is a body of work by a guy named Gavin Wince with truly immense delusions of grandeur. Pomposity on a truly epic scale!

I’ll walk you through just a tiny sample of Mr. Wince’s genius. You can go look at his site to get more, and develop a true appreciation for this. He doesn’t limit himself to mere mathematics: math, physics, biology, cosmology – you name it, Mr. Wince has mastered it and written about it!

The best of his mathematical crackpottery is something called C3: the Canonized Cardinal Continuum. Mr. Wince has created an algebraic solution to the continuum hypothesis, and along the way, has revolutionized number theory, algebra, calculus, real analysis, and god only knows what else!

Since Mr. Wince believes that he has solved the continuum hypothesis. Let me remind you of what that is:

  1. If you use Cantor’s set theory to explore numbers, you get to the uncomfortable result that there are different sizes of infinity.
  2. The smallest infinite cardinal number is called ℵ0,
    and it’s the size of the set of natural numbers.
  3. There are cardinal numbers larger than ℵ0. The first
    one larger than ℵ0 is ℵ1.
  4. We know that the set of real numbers is the size of the powerset
    of the natural numbers – 20 – is larger than the set of the naturals.
  5. The question that the continuum hypothesis tries to answer is: is the size
    of the set of real numbers equal to ℵ1? That is, is there
    a cardinal number between ℵ0 and |20|?

The continuum hypothesis was “solved” in 1963. In 1940, Gödel showed that you couldn’t disprove the continuum hypothesis using ZFC. In 1963,
another mathematician named Paul Cohen, showed that it couldn’t be proven using ZFC. So – a hypothesis which is about set theory can be neither proven nor disproven using set theory. It’s independent of the axioms of set theory. You can choose to take the continuum hypothesis as an axiom, or you can choose to take the negation of the continuum hypothesis as an axiom: either choice is consistent and valid!

It’s not a happy solution. But it’s solved in the sense that we’ve got a solid proof that you can’t prove it’s true, and another solid proof that you can’t prove it’s false. That means that given ZFC set theory as a basis, there is no proof either way that doesn’t set it as an axiom.

But… Mr. Wince knows better.

The set of errors that Wince makes is really astonishing. This is really seriously epic crackpottery.

He makes it through one page without saying anything egregious. But then he makes up for it on page 2, by making multiple errors.

First, he pulls an Escultura:

x1 = 1/21 = 1/2 = 0.5
x2 = 1/21 + 1/22 = 1/2 + 1/4 = 0.75
x3 = 1/21 + 1/22 + 1/23 = 1/2 + 1/4 + 1/8 = 0.875

At the end or limit of the infinite sequence, the final term of the sequence is 1.0

In this example we can see that as the number of finite sums of the sequence approaches the limit infinity, the last term of the sequence equals one.
xn = 1.0
If we are going to assume that the last term of the sequence equals one, it can be deduced that, prior to the last term in the sequence, some finite sum in the series occurs where:
xn-1 = 0.999…
xn-1 = 1/21 + 1/22 + 1/23 + 1/24 + … + 1/2n-1 = 0.999…
Therefore, at the limit, the last term of the series of the last term of the sequence would be the term, which, when added to the sum 0.999… equals 1.0.

There is no such thing as the last term of an infinite sequence. Even if there were, the number 0.999…. is exactly the same as 1. It’s a notational artifact, not a distinct number.

But this is the least of his errors. For example, the first paragraph on the next page:

The set of all countable numbers, or natural numbers, is a subset of the continuum. Since the set of all natural numbers is a subset of the continuum, it is reasonable to assume that the set of all natural numbers is less in degree of infinity than the set containing the continuum.

We didn’t need to go through the difficult of Cantor’s diagonalization! We could have just blindly asserted that it’s obvious!

or actually… The fact that there are multiple degrees of infinity is anything but obvious. I don’t know anyone who wasn’t surprised the first time they saw Cantor’s proof. It’s a really strange idea that there’s something bigger than infinity.

Moving on… the real heart of his stuff is built around some extremely strange notions about infinite and infinitessimal values.

Before we even look at what he says, there’s an important error here
which is worth mentioning. What Mr. Wince is trying to do is talk about the
continuum hypothesis. The continuum hypothesis is a question about the cardinality of the set of real numbers and the set of natural numbers.
Neither infinites nor infinitessimals are part of either set.

Infinite values come into play in Cantor’s work: the cardinality of the natural numbers and the cardinality of the reals are clearly infinite cardinal numbers. But ℵ0, the smallest infinite cardinal, is not a member of either set.

Infinitessimals are fascinating. You can reconstruct differential and integral calculus without using limits by building in terms of infinitessimals. There’s some great stuff in surreal numbers playing with infinitessimals. But infinitessimals are not real numbers. You can’t reason about them as if they were members of the set of real numbers, because they aren’t.

Many of his mistakes are based on this idea.

For example, he’s got a very strange idea that infinites and infinitessimals don’t have fixed values, but that their values cover a range. The way that he gets to that idea is by asserting the existence
of infinity as a specific, numeric value, and then using it in algebraic manipulations, like taking the “infinityth root” of a real number.

For example, on his way to “proving” that infinitessimals have this range property that he calls “perambulation”, he defines a value that he calls κ:

 sqrt[infty]{infty} = 1 + kappa

In terms of the theory of numbers, this is nonsense. There is no such thing as an infinityth root. You can define an Nth root, where N is a real number, just like you can define an Nth power – exponents and roots are mirror images of the same concept. But roots and exponents aren’t defined for infinity, because infinity isn’t a number. There is no infinityth root.

You could, if you really wanted to, come up with a definition of exponents that that allowed you to define an infinityth root. But it wouldn’t be very interesting. If you followed the usual pattern for these things, it would be a limit: sqrt[infty]{x}  lim_{nrightarrowinfty} sqrt[n]{x}. That’s clearly 1. Not 1 plus something: just exactly 1.

But Mr. Cringe doesn’t let himself be limited by silly notions of consistency. No, he defines things his own way, and runs with it. As a result, he gets a notion that he calls perambulation. How?

Take the definition of κ:

 sqrt[infty]{infty} = 1 + kappa

Now, you can, obviously, raise both sides to the power of infinity:

infty = (1 + kappa)^{infty}

Now, you can substitute ℵ0 for infty. (Why? Don’t ask why. You just can.) Then you can factor it. His factoring makes no rational sense, so I won’t even try to explain it. But he concludes that:

  • Factored and simplified one way, you end up with (κ+1) = 1 + x, where x is some infinitessimal number larger than κ. (Why? Why the heck not?)
  • Factored and simplified another way, you end up with (κ+1) = ℵ
  • If you take the mean of of all of the possible factorings and reductions, you get a third result, that (κ+1) = 2.

He goes on, and on, and on like this. From perambulation to perambulating reciprocals, to subambulation, to ambulation. Then un-ordinals, un-sets… this is really an absolute masterwork of utter insane crackpottery.

Do download it and take a look. It’s a masterpiece.

Pi-day randomness

One of my twitter friends was complaining about something that’s apparently making the rounds of Facebook for π-day. It annoyed me sufficiently to be worth ranting about a little bit.

Why isn’t π rational if π=circumference/diameter, and both measurements are plainly finite?

There’s a couple of different ways of interpreting this question.

The stupidest way of interpreting it is that the author didn’t have any clue of what an irrational number is. An irrational number is a number which cannot be written as a ratio of two integers. Another way of saying essentially the same thing is that there’s no way to create a finite representation of an irrational number. I’ve seen people get this wrong before, where they confuse not having a finite representation with not being finite.

π doesn’t have a finite representation. But it’s very clearly finite – it’s less that 3 1/4, which is obviously not infinite. Anyone who can look at π, and be confused about whether or not it’s finite is… well… there’s no nice way to say this. If you think that π isn’t finite, you’re an idiot.

The other way of interpreting this statement is less stupid: it’s a question of measurement. If you have a circular object in real life, then you can measure the circumference and the diameter, and do the division on the measurements. The measurements have finite precision. So how can the ratio of two measurements with finite precision be irrational?

The answer is, they can’t. But perfect circles don’t exist in the real world. Many mathematical concepts don’t exist in the real world. In the real world, there’s no such thing as a mathematical point, no such thing as a perfect line, no such thing as perfectly parallel lines.

π isn’t a measured quantity. It’s a theoretical quantity, which can be computed analytically from the theoretical properties derived from the abstract properties of an ideal, perfect circle.

No “circle” in the real world has a perfect ratio of π between its circumference and its diameter. But the theoretical circle does.

The facebook comments on this get much worse than the original question. One in particular really depressed me.

Just because the measurements are finite doesn’t mean they’re rational.
Pi is possibly rational, we just haven’t figured out where it ends.

Gah, no!

We know an awful lot about π. And we know, with absolute, 100% perfect certainty that π never ends.

We can define π precisely as a series, and that series makes it abundantly clear that it never ends.

pi = frac{4}{1} - frac{4}{3} + frac{4}{5} - frac{4}{7} + frac{4}{9} ...

That series goes on forever. π can’t ever end, because that series never ends.

Just for fun, here’s a little snippet of Python code that you can play with. You can see how, up to the limits of your computer’s floating point representation, that a series computation of π keeps on going, changing with each additional iteration.

def pi(numiter):
  val = 3.0
  sign = 1
  for i in range(numiter):
    term = ((i+1)*2) * ((i+1)*2 + 1) * ((i+1) *2 + 2)
    val = val + sign*4.0/term
    sign = sign * -1
  return val

Finally: Gödel’s Proof of Incompleteness!

Finally, we’re at the end of our walkthrough of Gödel great incompleteness proof. As a refresher, the basic proof sketch is:

  1. Take a simple logic. We’ve been using a variant of the Principia Mathematica’s logic, because that’s what Gödel used.
  2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
  3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
  4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What came before:

  1. Gödel numbering: The logic of the Principia, and how to encode it as numbers. This was step 1 in the sketch.
  2. Arithmetic Properties: what it means to say that a property can be expressed arithemetically. This set the groundwork for step 2 in the proof sketch.
  3. Encoding meta-math arithmetically: how to take meta-mathematical properties of logical statements, and define them as arithmetic properties of the Gödel numberings of the statements. This was step 2 proper.

So now we can move on to step three, where we actually see why mathematical logic is necessarily incomplete.

When we left off with Gödel, we’d gone through a very laborious process showing how we could express meta-mathematical properties of logical statements as primitive recursive functions and relations. We built up to being able to express one non-primitive recursive property, which describes the property that a given statement is provable:

pred provable(x) =
  some y {
    proofFor(y, x)
  }
}

The reason for going through all of that was that we really needed to show how we could capture all of the necessary properties of logical statements in terms of arithmetic properties of their Gödel numbers.

Now we can get to the target of Gödel’s effort. What Gödel was trying to do was show how to defeat the careful stratification of the Principia’s logic. In the principia, Russell and Whitehead had tried to avoid problems with self-reference by creating a very strict stratification, where each variable or predicate had a numeric level, and could only reason about objects from lower levels. So if natural numbers were the primitive objects in the domain being reasoned about, then level-1 objects would be things like specific natural numbers, and level-1 predicates could reason about specific natural numbers, but not about sets of natural numbers or predicates over the natural numbers. Level-2 objects would be sets of natural numbers, and level-2 predicates could reason about natural numbers and sets of natural numbers, but not about predicates over sets of natural numbers, or sets of sets of natural numbers. Level-3 objects would be sets of sets of natural numbers… and so on.

The point of this stratification was to make self-reference impossible. You couldn’t make a statement of the form “This predicate is true”: the predicate would be a level-N predicate, and only a level N+1 predicate could reason about a level-N predicate.

What Gödel did in the laborious process we went through in the last post is embed a model of logical statements in the natural numbers. That’s the real trick: the logic is designed to work with a set of objects that are a model of the natural numbers. By embedding a model of logical statements in the natural numbers, he made it possible for a level-1 predicate (a predicate about a specific natural number) to reason about any logical statement or object. A level-1 predicate can now reason about a level-7 object! A level-1 predicate can reason about the set defined by a level-1 predicate: a level-1 predicate can reason about itself!.

Now, we can finally start getting to the point of all of this: incompleteness! We’re going to use our newfound ability to nest logical statements into numbers to construct an unprovable true statement.

In the last post, one of the meta-mathematical properties that we defined for the Gödel-numbered logic was immConseq, which defines when some statement x is an immediate consequence of a set of statements S. As a reminder, that means that x can be inferred from statements in S in one inferrence step.

We can use that property to define what it means to be a consequence of a set of statements: it’s the closure of immediate consequence. We can define it in pseudo-code as:

def conseq(κ) = {
  K = κ + axioms
  added_to_k = false
  do {
    added_to_k = false
    for all c in immConseq(K) {
      if c not in K {
        add c to K
        added_to_k = true
      }
    }
  } while added_to_k
  return K
}

In other words, Conseq(κ) is the complete set of everything that can possibly be inferred from the statements in κ and the axioms of the system. We can say that there’s a proof for a statement x in κ if and only if x ∈ Conseq(κ).

We can the idea of Conseq use that to define a strong version of what it means for a logical system with a set of facts to be consistent. A system is ω-consistent if and only if there is not a statement a such that: a ∈ Conseq(κ) ∧ not(forall(v, a)) ∈ Conseq(κ).

In other words, the system is ω-consistent as long as it’s never true that both a universal statement and it. But for our purposes, we can treat it as being pretty much the same thing. (Yes, that’s a bit hand-wavy, but I’m not trying to write an entire book about Gödel here!)

(Gödel’s version of the definition of ω-consistency is harder to read than this, because he’s very explicit about the fact that Conseq is a property of the numbers. I’m willing to fuzz that, because we’ve shown that the statements and the numbers are interchangable.)

Using the definition of ω-consistency, we can finally get to the actual statement of the incompleteness theorem!

Gödel’s First Incompleteness Theorem: For every ω-consistent primitive recursive set κ of formulae, there is a primitive-recursive predicate r(x) such that neither forall(v, r) nor not(forall(v, r)) is provable.

To prove that, we’ll construct the predicate r.

First, we need to define a version of our earlier isProofFigure that’s specific to the set of statements κ:

pred isProofFigureWithKappa(x, kappa) = {
  all n in 1 to length(x) {
    isAxiom(item(n, x)) or
    item(n, x) in kappa or
    some p in 0 to n {
      some q in 0 to n {
        immedConseq(item(n, x), item(p, x), item(q, x))
      }
    }
  } and length(x) > 0
}

This is the same as the earlier definition – just specialized so that it ensures that every statement in the proof figure is either an axiom, or a member of κ.

We can do the same thing to specialize the predicate proofFor and provable:

pred proofForStatementWithKappa(x, y, kappa) = {
  isProofFigureWithKappa(x, kappa) and
  item(length(x), x) = y
}

pred provableWithKappa(x, kappa) = {
  some y {
    proofForStatementWithKappa(y, x, kappa)
  }
}

If κ is the set of basic truths that we can work with, then provable in κ is equivalent to provable.

Now, we can define a predicate UnprovableInKappa:

pred NotAProofWithKappa(x, y, kappa) = {
  not (proofForKappa(x, subst(y, 19, number(y))))
}

Based on everything that we’ve done so far, NotAProofWithKappa is primitive recursive.

This is tricky, but it’s really important. We’re getting very close to the goal, and it’s subtle, so let’s take the time to understand this.

  • Remember that in a Gödel numbering, each prime number is a variable. So 19 here is just the name of a free variable in y.
  • Using the Principia’s logic, the fact that variable 19 is free means that the statement is parametric in variable 19. For the moment, it’s an incomplete statement, because it’s got an unbound parameter.
  • What we’re doing in NotAProofWithKappa is substituting the numeric coding of y for the value of y‘s parameter. When that’s done, y is no longer incomplete: it’s unbound variable has been replaced by a binding.
  • With that substitution, NotAProofWithKappa(x, y, kappa) is true when x does not prove that y(y) is true.

What NotAProofWithKappa does is give us a way to check whether a specific sequence of statements x is not a proof of y.

We want to expand NotAProofWithKappa to something universal. Instead of just saying that a specific sequence of statements x isn’t a proof for y, we want to be able to say that no possible sequence of statements is a proof for y. That’s easy to do in logic: you just wrap the statement in a “∀ x ( )”. In Gödel numbering, we defined a function that does exactly that. So the universal form of provability is: ∀ a (NotAProofWithKappa(a, y, kappa)).

In terms of the Gödel numbering, if we assume that the Gödel number for the variable a is 17, and the variable y is numbered as 19, we’re talking about the statement p = forall(17, ProvableInKappa(17, 19, kappa).

p is the statement that for some logical statement (the value of variable 19, or y in our definition), there is no possible value for variable 17 (a) where a proves y in κ.

All we need to do now is show that we can make p become self-referential. No problem: we can just put number(p) in as the value of y in UnprovableInKappa. If we let q be the numeric value of the statement UnprovableInKappa(a, y), then:

r = subst(q, 19, p)

i = subst(p, 19, r)

i says that there is no possible value x that proves p(p). In other words, p(p) is unprovable: there exists no possible proof that there is no possible proof of p!

This is what we’ve been trying to get at all this time: self-reference! We’ve got a predicate y which is able to express a property of itself. Worse, it’s able to express a negative property of itself!

Now we’re faced with two possible choices. Either i is provable – in which case, κ is inconsistent! Or else i is unprovable – in which case κ is incomplete, because we’ve identified a true statement that can’t be proven!

That’s it: we’ve shown that in the principia’s logic, using nothing but arithmetic, we can create a true statement that cannot be proven. If, somehow, it were to be proven, the entire logic would be inconsistent. So the principia’s logic is incomplete: there are true statements that cannot be proven true.

We can go a bit further: the process that we used to produce this result about the Principia’s logic is actually applicable to other logics. There’s no magic here: if your logic is powerful enough to do Peano arithmetic, you can use the same trick that we demonstrated here, and show that the logic must be either incomplete or inconsistent. (Gödel proved this formally, but we’ll just handwave it.)

Looking at this with modern eyes, it doesn’t seem quite as profound as it did back in Gödel’s day.

When we look at it through the lens of today, what we see is that in the Principia’s logic, proof is a mechanical process: a computation. If every true statement was provable, then you could take any statement S, and write a program to search for a proof of either S or ¬ S, and eventually, that program would find one or the other, and stop.

In short, you’d be able to solve the halting problem. The proof of the halting problem is really an amazingly profound thing: on a very deep level, it’s the same thing as incompleteness, only it’s easier to understand.

But at the time that Gödel was working, Turing hadn’t written his paper about the halting problem. Incompletess was published in 1931; Turing’s halting paper was published in 1936. This was a totally unprecedented idea when it was published. Gödel produced one of the most profound and surprising results in the entire history of mathematics, showing that the efforts of the best mathematicians in the world to produce the perfection of mathematics were completely futile.

Passwords, Hashing, and Salt

Over on twitter, some folks were chatting about the latest big security botch. A major service, called Evernote, had a security breach where a password file was stolen. Evernote has handled the situation quite well, being open about what happened, and explaining the risks.

In their description of the breach, they said that the stolen passwords were “both hashed and salted”. Apparently this sounds funny to people outside of software. (Amazing how jargon becomes so ingrained that I didn’t even notice the fact that it could be interpreted in a funny way until it was pointed out to me!)

Anyway, since discussion of this is going around, I thought I’d explain just what password hashing and salting means.

Let’s start at the beginning. You’re some kind of system that wants to have password security. Obviously, you need to save the passwords somewhere, right?

As we’ll see, that’s only partially correct – but let’s go with it for now. You need to store something that lets you check if a user supplied the right password, so you need to store something.

The most naive approach is create a file (or database, or whatever) that contains the usernames and passwords of all of your users. Something like:

alice:abc
mark:pass
joe:123
jen:hello

Suppose you were a thief, and you wanted to crack this password file, what would you do? You’d try to steal that file! If you can get hold of that password file, then you’d have all of the passwords for all of the users of the system.

That means that this is a terrible way of storing the passwords. One step, and a thief has completely breached your system. We don’t want that. So what should we do?

First, we could encrypt the file.

This might seem like a good idea at first. If a thief were the steal the file, the wouldn’t be able to find your user’s passwords without figuring out the encryption key! It’s going to take a lot of work to figure out that key!

The first problem with this is that any password can be cracked given enough time and power. If there’s only one encryption key for the entire file, then it’s worth investing a lot of time and power into breaking it – and once it’s broken, then everything is revealed.

The second problem is: how does your system check a user’s password? It needs to decrypt the file! That means that the encryption key must be available to your system! So all that a thief needs to do is figure out where your system is getting the key from. You’ve got your entire security system for all of your users set up with a single point of protection, and somewhere in your system, everything that you need to break that protection is available!

What can we do to improve this? The answer is something called crypto graphic hashing.

Cryptographic hashing is a combination of two concepts: hashing, and one-way functions.

A really simple example of a not-very-good hash function of a string would be something like: convert all of the characters in the string to their numeric values, and exclusive-or the binary representation of those bits. With that hash function, you could take a string like “ABCD”, and convert it to the numeric values of the characters ([65, 66, 67, 68]), and then x-or them together (1000001 xor 1000010 xor 1000011 xor 1000100 = 0000100) for a result of 4. Real practical hash functions are more complicated.

For example, at least some versions of Java use the following as the default hash for a string of characters:

text{hash}(s) = left(sum_{i in text{length}(s)} 31^{length(s) - i - 1}*s[i] right) mod 2^{32}

There’s a class of mathematical functions called one-way functions. A one way function is a function f, where given x, it’s easy to compute f(x), but given f(x) it’s extremely difficult (or even impossible) to compute x.

If we combine those two, we have what’s called a crpytogrphic hash function: a function that takes an arbitrary input string, and converts it to a number, in a way where it’s very difficult to figure out what the input string that produced the number was. That’s great for storing passwords! We don’t store the password at all. We just store the hash-value produced from the password. Then when a user comes and logs in, we take their password, hash it, and compute the result to the stored hash.

Instead of the file with explicit passwords, we get something like:

alice:7a2d28fc
mark:dfd4e1c6
joe:ed849ee1
jen:bb76e739

This is much better than storing the encrypted password. There is no encryption key that a thief can use to decrypt the password. Even if a thief knows the hash values of your user’s passwords, they can’t get in to the system! And your system actually never stores the actual values of the user’s passwords – just their hashcodes!

So again, let’s look at this from the perspective of a thief. How can a thief break into a system with hashed passwords?

If they don’t know what hash function you’re using, then they’re completely stuck. Sadly, they can probably figure it out. Designing new crpytographic hash functions is hard. Implementing cryptographic hash functions correctly is hard. As a result, most people just use a hash function from a library. That means that for a thief, it’s usually pretty easy to figure out what hash function is being used by a system.

Once they know what hash function you used, their only choice to break your system is to try to guess the passwords. That is, they can guess passwords, compute their hash codes, and search through your password file to see if any of the users password hashes matches. If they find one, they’re gold!

In fact, there’s a common strategy based on this idea called a rainbow table. A rainbox table is a list of common passwords, and the numeric value that they hash to with a common crptographic hash value. Something like:

Password String Hash value
pass 1b93eb12
password a4532c47
abc 7a2d28fc

If you can somehow steal the passwords file, then with a rainbow table, you can find users with common passwords. For example, in the table above, you can see that the hashcode “7a2d28fc” occurs in the passwords file for the username “alice”, and it’s also in the rainbow table for the password “abc”. So a thief could determing that alice’s password was “abc”. Even with the best crpytographic hash, all it takes is one idiot user who uses “password” as their password, and your system’s security is breached.

Salting passwords addresses that problem. In a salting strategy, you don’t hash a user’s password by itself: you combine it with some additional data, and then hash that combination. The additional information is called the salt..

You can use lots of different things for the salt. There’s a complex set of tradeoffs in the exact salting strategy, which are beyond the scope of this post, but a few examples include:

  1. Always use a fixed salt string. This is weak, but better than nothing. It’s got a similar weakness to the encrypted password system: you only need one salt to give you a handle on breaking all of the passwords, and that one salt needs to be in the system.
  2. Add a random piece of data for each password. The catch here is that you need to store the salt data for each password. This is what unix passwords used to use. They added 12 random bits to each password. In the passwords file, they stored the salt and the hashed password. The weakness of this is that the salt is right there with the password entry. But because each user has a different salt, that means that any attempt to breach the system needs to look at each user separately.
  3. Salt on metadata: that is, take information about the user that isn’t part of their username, and use that as the salt. For example, you could use a person’s birthday as the salt for their account.

If each user has a different salt, then even if you’ve got terrible passwords, a thief needs to do a lot of work to try to break your system. Even with a rainbow-table like strategy, they can’t compute the hashcode for a given common password once, and then search the password hash list for that code – they need to recompute it for each possible salt value!

What salting does is, effectively, increase the amount of effort needed to break the passwords. If you add 12 bits of salt, then a rainbow table needs 4096 times more entries to find common passwords! If your salt is long enough, then it can make it effectively impossible to create a rainbox table at all. If they try to attack you without a rainbow table, a 12 bit salt means that your attacker needs to attack the passwords of each of your users seperately! Even if they know the value of the salt, you’ve made it much harder for them to breach your security.