Bad Comparisons with Statistics

When a friend asks me to write about something, I try do it. Yesterday, a friend of mine from my Google days, Daniel Martin, sent me a link, and asked to write about it. Daniel isn’t just a former coworker of mine, but he’s a math geek with the same sort of warped sense of humor as me. He knew my blog before we worked at Google, and on my first Halloween at Google, he came to introduce himself to me. He was wearing a purple shirt with his train ticket on a cord around his neck. For those who know any abstract algebra, get ready to groan: he was purple, and he commuted. He was dressed as an Abelian grape.

Anyway, Daniel sent me a link to this article, and asked me to write about the error in it.

The real subject of the article involves a recent twitter-storm around a professor at Boston University. This professor tweeted some about racism and history, and she did it in very blunt, not-entirely-professional terms. The details of what she did isn’t something I want to discuss here. (Briefly, I think it wasn’t a smart thing to tweet like that, but plenty of white people get away with worse every day; the only reason that she’s getting as much grief as she is is because she dared to be a black woman saying bad things about white people, and the assholes at Breitbart used that to fuel the insatiable anger and hatred of their followers.)

But I don’t want to go into the details of that here. Lots of people have written interesting things about it, from all sides. Just by posting about this, I’m probably opening myself up to yet another wave of abuse, but I’d prefer to avoid and much of that as I can. Instead, I’m just going to rip out the introduction to this article, because it makes a kind of incredibly stupid mathematical argument that requires correction. Here are the first and second paragraphs:

There aren’t too many African Americans in higher education.

In fact, black folks only make up about 4 percent of all full time tenured college faculty in America. To put that in context, only 14 out of the 321—that’s about 4 percent—of U.S. astronauts have been African American. So in America, if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor.

Statistics and probability can be a difficult field of study. But… a lot of its everyday uses are really quite easy. If you’re going to open your mouth and make public statements involving probabilities, you probably should make sure that you at least understand the first chapter of “probability for dummies”.

This author doesn’t appear to have done that.

The most basic fact of understanding how to compare pretty much anything numeric in the real world is that you can only compare quantities that have the same units. You can’t compare 4 kilograms to 5 pounds, and conclude that 5 pounds is bigger than 4 kilograms because 5 is bigger than four.

That principle applies to probabilities and statistics: you need to make sure that you’re comparing apples to apples. If you compare an apple to a grapefruit, you’re not going to get a meaningful result.

The proportion of astronauts who are black is 14/321, or a bit over 4%. That means that out of every 100 astronauts, you’d expect to find four black ones.

The proportion of college professors who are black is also a bit over 4%. That means that out of every 100 randomly selected college professors, you’d expect 4 to be black.

So far, so good.

But from there, our intrepid author takes a leap, and says “if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor”.

Nothing in the quoted statistic in any way tells us anything about anyone’s chances to become an astronaut. Nothing at all.

This is a classic statistical error which is very easy to avoid. It’s a unit error: he’s comparing two things with different units. The short version of the problem is: he’s comparing black/astronaut with astronaut/black.

You can’t derive anything about the probability of a black person becoming an astronaut from the ratio of black astronauts to astronauts.

Let’s pull out some numbers to demonstrate the problem. These are completely made up, to make the calculations easy – I’m not using real data here.

Suppose that:

  • the US population is 300,000,000;
  • black people are 40% of the population, which means that there are are 120,000,000 black people.
  • there are 1000 universities in America, and there are 50 faculty per university, so there are 50,000 university professors.
  • there are 50 astronauts in the US.
  • If 4% of astronauts and 4% of college professors are black, that means that there are 2,000 black college professors, and 2 black astronauts.

In this scenario, as in reality, the percentage of black college professors and the percentage of black astronauts are equal. What about the probability of a given black person being a professor or an astronaut?

The probability of a black person being a professor is 2,000/120,000,000 – or 1 in 60,000. The probability of a black person becoming an astronaut is just 2/120,000,000 – or 1 in 60 million. Even though the probability of a random astronaut being black is the same as a the probability of a random college professor being black, the probability of a given black person becoming a college professor is 10,000 times higher that the probability of a given black person becoming an astronaut.

This kind of thing isn’t rocket science. My 11 year old son has done enough statistics in school to understand this problem! It’s simple: you need to compare like to like. If you can’t understand that, if you can’t understand your statistics enough to understand their units, you should probably try to avoid making public statements about statistics. Otherwise, you’ll wind up doing something stupid, and make yourself look like an idiot.

(In the interests of disclosure: an earlier version of this post used the comparison of apples to watermelons. But given the racial issues discussed in the post, that had unfortunate unintended connotations. When someone pointed that out to me, I changed it. To anyone who was offended: I am sorry. I did not intend to say anything associated with the racist slurs; I simply never thought of it. I should have, and I shouldn’t have needed someone to point it out to me. I’ll try to be more careful in the future.)

9 thoughts on “Bad Comparisons with Statistics

  1. Joker_vD

    Conditional probabilities strike again!

    And I would disagree that the mistake “is very easy to avoid”. What exactly can one derive from knowledge that P(Black|Astronaut) = P(Black|CollegeProfessor) = 4/100? That P(Astronaut|Black) / P(CollegeProfessor|Black) = P(Astronaut) / P(CollegeProfessor) — or in words, “A black person is more likely to become a college professor than an astronaut in exactly the same proportion as a person is more likely to become a college professor than an astronaut”? But that was already kinda obvious, that there is no “competitive advantage” in either of those professions for a black person, since we were given two equal probabilities…

  2. Daniel Martin

    The costume thing works better if you understand that the joke “Q: What’s purple and commutes? A: An Abelian grape” is insanely popular in undergraduate math departments in the US.

    I don’t know why that’s so, but it’s so ubiquitous that it can be used as a shiboleth to see if someone was: 1) an undergrad math major, and 2) did their undergraduate study in the US. My then-fiancee-now-wife and I wore a variant of that costume to a Halloween party when I was in grad. school. Every other math grad student from the US got our costume the instant we pointed out that the bus schedules we were carrying meant that we were commuting. Most of the grad. students not from the US had to have it explained.

    For reasons I don’t understand, the equally groan-worthy “Q: What’s yellow and maximal? A: Zorn’s Lemon” isn’t nearly as widespread.

    1. markcc Post author

      Excuses, excuses. Let’s face it: it just means that we’re hopeless geeks :-).

    2. David Starner

      Commutative groups was a required subject for juniors when I went to college; Zorn’s Lemma is something I only learned about in personal study of the axiom of choice, and I’d have to look up to cite it correctly. The only thing I really know about it is the quip “The Axiom of Choice is obviously true, the Well-Ordering Principle obviously false, and who knows about Zorn’s Lemma.”

  3. Spencer

    Wait, how are apples and watermelons racist? And does not knowing that make me an insensitive bigot? I’m worried now.

    Let’s hear some more bad math puns!

    1. markcc Post author

      In the US, there are a bunch of negative stereotypes of black people. There are a bunch of variations that involve watermelon.

      One of them is about how ex-slaves in the south sat around eating watermelon once they didn’t have masters forcing them to work, because they were too primitive and stupid to understand that they needed to work.

      There are also some associations for apples: apples were brought to North America by Europeans, and spread throughout the continent by white folks, particularly a guy named John Chapman (aka Johnny Appleseed).

      Because of things like that, there are also a bunch of dog-whistles that involve watermelon. So using a comparison, particularly of watermelons to apples, reads as a dogwhistle comparing blacks to whites.


Leave a Reply