The Math of Vaccinations, Infection Rates, and Herd Immunity

Here in the US, we are, horribly, in the middle of a measles outbreak. And, as usual, anti-vaccine people are arguing that:

• Measles isn’t really that serious;
• Unvaccinated children have nothing to do with the outbreak; and
• More vaccinated people are being infected than unvaccinated, which shows that vaccines don’t help.

A few years back, I wrote a post about the math of vaccines; it seems like this is a good time to update it.

When it comes to vaccines, there’s two things that a lot of people don’t understand. One is herd immunity; the other is probability of infection.

Herd immunity is the fundamental concept behind vaccines.

In an ideal world, a person who’s been vaccinated against a disease would have no chance of catching it. But the real world isn’t ideal, and vaccines aren’t perfect. What a vaccine does is prime the recipient’s immune system in a way that reduces the probability that they’ll be infected.

But even if a vaccine for an illness were perfect, and everyone was vaccinated, that wouldn’t mean that it was impossible for anyone to catch the illness. There are many people who’s immune systems are compromised – people with diseases like AIDS, or people with cancer receiving chemotherapy. (Or people who’ve had the measles within the previous two years!) And that’s not considering the fact that there are people who, for legitimate medical reasons, cannot be vaccinated!

So individual immunity, provided by vaccines, isn’t enough to completely eliminate the spread of a contagious illness. To prevent outbreaks, we rely on an emergent property of a vaccinated population. If enough people are immune to the disease, then even if one person gets infected with it, the disease won’t be able to spread enough to produce a significant outbreak.

We can demonstrate this with some relatively simple math.

Let’s imagine a case of an infection disease. For illustration purposes, we’ll simplify things in way that makes the outbreak more likely to spread than reality. (So this makes herd immunity harder to attain than reality.)

• There’s a vaccine that’s 95% effective: out of every 100 people vaccinated against the disease, 95% are perfectly immune; the remaining 5% have no immunity at all.
• The disease is highly contagious: out of every 100 people who are exposed to the disease, 95% will be infected.

If everyone is immunized, but one person becomes ill with the disease, how many people do they need to expose to the disease for the disease to spread?

Keeping things simple: an outbreak, by definition, is a situation where the number of exposed people is steadily increasing. That can only happen if every sick person, on average, infects more than 1 other person with the illness. If that happens, then the rate of infection can grow exponentially, turning into an outbreak.

In our scheme here, only one out of 20 people is infectable – so, on average, if our infected person has enough contact with 20 people to pass an infection, then there’s a 95% chance that they’d pass the infection on to one other person. (19 of 20 are immune; the one remaining person has a 95% chance of getting infected). To get to an outbreak level – that is, a level where they’re probably going to infect more than one other person, they’d need expose something around 25 people (which would mean that each infected person, on average, could infect roughly 1.2 people). If they’re exposed to 20 other people on average, then on average, each infected person will infect roughly 0.9 other people – so the number of infected will decrease without turning into a significant outbreak.

But what will happen if just 5% of the population doesn’t get vaccinated? Then we’ve got 95% of the population getting vaccinated, with a 95% immunity rate – so roughly 90% of the population has vaccine immunity. Our pool of non-immune people has doubled. In our example scenario, if each person is exposed to 20 other people during their illness, then they will, on average, cause 1.8 people to get sick. And so we have a major outbreak on our hands!

This illustrates the basic idea behind herd immunity. If you can successfully make a large enough portion of the population non-infectable by a disease, then the disease can’t spread through the population, even though the population contains a large number of infectable people. When the population’s immunity rate (either through vaccine, or through prior infection) gets to be high enough that an infection can no longer spread, the population is said to have herd immunity: even individuals who can’t be immunized no longer need to worry about catching it, because the population doesn’t have the capacity to spread it around in a major outbreak.

(In reality, the effectiveness of the measles vaccine really is in the 95 percent range – actually slightly higher than that; various sources estimate it somewhere between 95 and 97 percent effective! And the success rate of the vaccine isn’t binary: 95% of people will be fully immune; the remaining 5% will have a varying degree of immunity And the infectivity of most diseases is lower than the example above. Measles (which is a highly, highly contagious disease, far more contagious than most!) is estimated to infect between 80 and 90 percent of exposed non-immune people. So if enough people are immunized, herd immunity will take hold even if more than 20 people are exposed by every sick person.)

Moving past herd immunity to my second point: there’s a paradox that some antivaccine people (including, recently, Sheryl Atkinson) use in their arguments. If you look at an outbreak of an illness that we vaccinate for, you’ll frequently find that more vaccinated people become ill than unvaccinated. And that, the antivaccine people say, shows that the vaccines don’t work, and the outbreak can’t be the fault of the unvaccinated folks.

Let’s look at the math to see the problem with that.

Let’s use the same numbers as above: 95% vaccine effectiveness, 95% contagion. In addition, let’s say that 2% of people choose to go unvaccinated.

That means thats that 98% of the population has been immunized, and 95% of them are immune. So now 92% of the population has immunity.

If each infected person has contact with 20 other people, then we can expect expect 8% of those 20 to be infectable – or 1.6; and of those, 95% will become ill – or 1.52. So on average, each sick person will infect 1 1/2 other people. That’s enough to cause a significant outbreak. Without the non-immunized people, the infection rate is less than 1 – not enough to cause an outbreak.

The non-immunized population reduced the herd immunity enough to cause an outbreak.

Within the population, how many immunized versus non-immunized people will get sick?

Out of every 100 people, there are 5 who got vaccinated, but aren’t immune. Out of that same 100 people, there are 2 (2% of 100) that didn’t get vaccinated. If every non-immune person is equally likely to become ill, then we’d expect that in 100 cases of the disease, about 70 of them to be vaccinated, and 30 unvaccinated.

The vaccinated population is much, much larger – 50 times larger! – than the unvaccinated.
Since that population is so much larger, we’d expect more vaccinated people to become ill, even though it’s the smaller unvaccinated group that broke the herd immunity!

The easiest way to see that is to take those numbers, and normalize them into probabilities – that is, figure out, within the pool of all vaccinated people, what their likelihood of getting ill after exposure is, and compare that to the likelihood of a non-vaccinated person becoming ill after exposure.

So, let’s start with the vaccinated people. Let’s say that we’re looking at a population of 10,000 people total. 98% were vaccinated; 2% were not.

• The total pool of vaccinated people is 9800, and the total pool of unvaccinated is 200.
• Of the 9800 who were vaccinated, 95% of them are immune, leaving 5% who are not – so
490 infectable people.
• Of the 200 people who weren’t vaccinated, all of them are infectable.
• If everyone is exposed to the illness, then we would expect about 466 of the vaccinated, and 190 of the unvaccinated to become ill.

So more than twice the number of vaccinated people became ill. But:

• The odds of a vaccinated person becoming ill are 466/9800, or about 1 out of every 21
people.
• The odds of an unvaccinated person becoming ill are 190/200 or 19 out of every 20 people! (Note: there was originally a typo in this line, which was corrected after it was pointed out in the comments.)

The numbers can, if you look at them without considering the context, appear to be deceiving. The population of vaccinated people is so much larger than the population of unvaccinated that the total number of infected can give the wrong impression. But the facts are very clear: vaccination drastically reduces an individuals chance of getting ill; and vaccinating the entire population dramatically reduces the chances of an outbreak.

The reality of vaccines is pretty simple.

• Vaccines are highly effective.
• The diseases that vaccines prevent are not benign.
• Vaccines are really, really safe. None of the horror stories told by anti-vaccine people have any basis in fact. Vaccines don’t damage your immune system, they don’t cause autism, and they don’t cause cancer.
• Not vaccinating your children (or yourself!) doesn’t just put you at risk for illness; it dramatically increases the chances of other people becoming ill. Even when more vaccinated people than unvaccinated become ill, that’s largely caused by the unvaccinated population.

In short: everyone who is healthy enough to be vaccinated should get vaccinated. If you don’t, you’re a despicable free-riding asshole who’s deliberately choosing to put not just yourself but other people at risk.

Polls and Sampling Errors in the Presidental Debate Results

My biggest pet peeve is press coverage of statistics. As someone who is mathematically literate, I’m constantly infuriated by it. Basic statistics isn’t that hard, but people can’t be bothered to actually learn a tiny bit in order to understand the meaning of the things they’re covering.

My twitter feed has been exploding with a particularly egregious example of this. After monday night’s presidential debate, there’s been a ton of polling about who “won” the debate. One conservative radio host named Bill Mitchell has been on a rampage about those polls. Here’s a sample of his tweets:

Statistical analysis has a very simple point. We’re interested in understanding the properties of a large population of things. For whatever reason, we can’t measure the properties of every object in that population.

The exact reason can vary. In political polling, we can’t ask every single person in the country who they’re going to vote for. (Even if we could, we simply don’t know who’s actually going to show up and vote!) For a very different example, my first exposure to statistics was through my father, who worked in semiconductor manufacturing. They’d produce a run of 10,000 chips for use in Satellites. They needed to know when, on average, a chip would fail from exposure to radiation. If they measured that in every chip, they’d end up with nothing to sell.)

Anyway: you can’t measure every element of the population, but you still want to take measurements. So what you do is randomly select a collection of representative elements from the population, and you measure those. Then you can say that with a certain probability, the result of analyzing that representative subset will match the result that you’d get if you measured the entire population.

How close can you get? If you’ve really selected a random sample of the population, then the answer depends on the size of the sample. We measure that using something called the “margin of error”. “Margin of error” is actually a terrible name for it, and that’s the root cause of one of the most common problems in reporting about statistics. The margin of error is a probability measurement that says “there is an $N$% probability that the value for the full population lies within the margin of error of the measured value of the sample.”.

Right away, there’s a huge problem with that. What is that variable doing in there? The margin of error measures the probability that the full population value is within a confidence interval around the measured sample value. If you don’t say what the confidence interval is, the margin of error is worthless. Most of the time – but not all of the time – we’re talking about a 95% confidence interval.

But there are several subtler issues with the margin of error, both due to the name.

1. The “true” value for the full population is not guaranteed to be within the margin of error of the sampled value. It’s just a probability. There is no hard bound on the size of the error: just a high probability of it being within the margin..
2. The margin of error only includes errors due to sample size. It does not incorporate any other factor – and there are many! – that may have affected the result.
3. The margin of error is deeply dependent on the way that the underlying sample was taken. It’s only meaningful for a random sample. That randomness is critically important: all of sampled statistics is built around the idea that you’ve got a randomly selected subset of your target population.

Let’s get back to our friend the radio host, and his first tweet, because he’s doing a great job of illustrating some of these errors.

The quality of a sampled statistic is entirely dependent on how well the sample matches the population. The sample is critical. It doesn’t matter how big the sample size is if it’s not random. A non-random sample cannot be treated as a representative sample.

So: an internet poll, where a group of people has to deliberately choose to exert the effort to participate cannot be a valid sample for statistical purposes. It’s not random.

It’s true that the set of people who show up to vote isn’t a random sample. But that’s fine: the purpose of an election isn’t to try to divine what the full population thinks. It’s to count what the people who chose to vote think. It’s deliberately measuring a full population: the population of people who chose to vote.

But if you’re trying to statistically measure something about the population of people who will go and vote, you need to take a randomly selected sample of people who will go to vote. The set of voters is the full population; you need to select a representative sample of that population.

Internet polls do not do that. At best, they measure a different population of people. (At worst, with ballot stuffing, they measure absolutely nothing, but we’ll give them this much benefit of the doubt.) So you can’t take much of anything about the sample population and use it to reason about the full population.

And you can’t say anything about the margin of error, either. Because the margin of error is only meaningful for a representative sample. You cannot compute a meaningful margin of error for a non-representative sample, because there is no way of knowing how that sampled population compares to the true full target population.

And that brings us to the second tweet. A properly sampled random population of 500 people can produce a high quality result with a roughly 5% margin of error and a 95% confidence interval. (I’m doing a back-of-the-envelope calculation here, so that’s not precise.) That means that if the population were randomly sampled, we could say there is in 19 out of 20 polls of that size, the full population value would be within +/- 4% of value measured by the poll. For a non-randomly selected sample of 10 million people, the margin of error cannot be measured, because it’s meaningless. The random sample of 500 people tells us a reasonable estimate based on data; the non-random sample of 10 million people tells us nothing.

And with that, on to the third tweet!

In a poll like this, the margin of error only tells us one thing: what’s the probability that the sampled population will respond to the poll in the same way that the full population would?

There are many, many things that can affect a poll beyond the sample size. Even with a truly random and representative sample, there are many things that can affect the outcome. For a couple of examples:

How, exactly, is the question phrased? For example, if you ask people “Should police shoot first and ask questions later?”, you’ll get a very different answer from “Should police shoot dangerous criminal suspects if they feel threatened?” – but both of those questions are trying to measure very similar things. But the phrasing of the questions dramatically affects the outcome.

What context is the question asked in? Is this the only question asked? Or is it asked after some other set of questions? The preceding questions can bias the answers. If you ask a bunch of questions about how each candidate did with respect to particular issues before you ask who won, those preceding questions will bias the answers.

When you’re looking at a collection of polls that asked different questions in different ways, you expect a significant variation between them. That doesn’t mean that there’s anything wrong with any of them. They can all be correct even though their results vary by much more than their margins of error, because the margin of error has nothing to do with how you compare their results: they used different samples, and measured different things.

The problem with the reporting is the same things I mentioned up above. The press treats the margin of error as an absolute bound on the error in the computed sample statistics (which it isn’t); and the press pretends that all of the polls are measuring exactly the same thing, when they’re actually measuring different (but similar) things. They don’t tell us what the polls are really measuring; they don’t tell us what the sampling methodology was; and they don’t tell us the confidence interval.

Which leads to exactly the kind of errors that Mr. Mitchell made.

And one bonus. Mr. Mitchell repeatedly rants about how many polls show a “bias” by “over-sampling< democratic party supporters. This is a classic mistake by people who don't understand statistics. As I keep repeating, for a sample to be meaningful, it must be random. You can report on all sorts of measurements of the sample, but you cannot change it.

If you’re randomly selecting phone numbers and polling the respondents, you cannot screen the responders based on their self-reported party affiliation. If you do, you are biasing your sample. Mr. Mitchell may not like the results, but that doesn’t make them invalid. People report what they report.

In the last presidential election, we saw exactly this notion in the idea of “unskewing” polls, where a group of conservative folks decided that the polls were all biased in favor of the democrats for exactly the reasons cited by Mr. Mitchell. They recomputed the poll results based on shifting the samples to represent what they believed to be the “correct” breakdown of party affiliation in the voting population. The results? The actual election results closely tracked the supposedly “skewed” polls, and the unskewers came off looking like idiots.

We also saw exactly this phenomenon going on in the Republican primaries this year. Randomly sampled polls consistently showed Donald Trump crushing his opponents. But the political press could not believe that Donald Trump would actually win – and so they kept finding ways to claim that the poll samples were off: things like they were off because they used land-lines which oversampled older people, and if you corrected for that sampling error, Trump wasn’t actually winning. Nope: the randomly sampled polls were correct, and Donald Trump is the republican nominee.

If you want to use statistics, you must work with random samples. If you don’t, you’re going to screw up the results, and make yourself look stupid.

When a friend asks me to write about something, I try do it. Yesterday, a friend of mine from my Google days, Daniel Martin, sent me a link, and asked to write about it. Daniel isn’t just a former coworker of mine, but he’s a math geek with the same sort of warped sense of humor as me. He knew my blog before we worked at Google, and on my first Halloween at Google, he came to introduce himself to me. He was wearing a purple shirt with his train ticket on a cord around his neck. For those who know any abstract algebra, get ready to groan: he was purple, and he commuted. He was dressed as an Abelian grape.

The real subject of the article involves a recent twitter-storm around a professor at Boston University. This professor tweeted some about racism and history, and she did it in very blunt, not-entirely-professional terms. The details of what she did isn’t something I want to discuss here. (Briefly, I think it wasn’t a smart thing to tweet like that, but plenty of white people get away with worse every day; the only reason that she’s getting as much grief as she is is because she dared to be a black woman saying bad things about white people, and the assholes at Breitbart used that to fuel the insatiable anger and hatred of their followers.)

But I don’t want to go into the details of that here. Lots of people have written interesting things about it, from all sides. Just by posting about this, I’m probably opening myself up to yet another wave of abuse, but I’d prefer to avoid and much of that as I can. Instead, I’m just going to rip out the introduction to this article, because it makes a kind of incredibly stupid mathematical argument that requires correction. Here are the first and second paragraphs:

There aren’t too many African Americans in higher education.

In fact, black folks only make up about 4 percent of all full time tenured college faculty in America. To put that in context, only 14 out of the 321—that’s about 4 percent—of U.S. astronauts have been African American. So in America, if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor.

Statistics and probability can be a difficult field of study. But… a lot of its everyday uses are really quite easy. If you’re going to open your mouth and make public statements involving probabilities, you probably should make sure that you at least understand the first chapter of “probability for dummies”.

This author doesn’t appear to have done that.

The most basic fact of understanding how to compare pretty much anything numeric in the real world is that you can only compare quantities that have the same units. You can’t compare 4 kilograms to 5 pounds, and conclude that 5 pounds is bigger than 4 kilograms because 5 is bigger than four.

That principle applies to probabilities and statistics: you need to make sure that you’re comparing apples to apples. If you compare an apple to a grapefruit, you’re not going to get a meaningful result.

The proportion of astronauts who are black is 14/321, or a bit over 4%. That means that out of every 100 astronauts, you’d expect to find four black ones.

The proportion of college professors who are black is also a bit over 4%. That means that out of every 100 randomly selected college professors, you’d expect 4 to be black.

So far, so good.

But from there, our intrepid author takes a leap, and says “if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor”.

Nothing in the quoted statistic in any way tells us anything about anyone’s chances to become an astronaut. Nothing at all.

This is a classic statistical error which is very easy to avoid. It’s a unit error: he’s comparing two things with different units. The short version of the problem is: he’s comparing black/astronaut with astronaut/black.

You can’t derive anything about the probability of a black person becoming an astronaut from the ratio of black astronauts to astronauts.

Let’s pull out some numbers to demonstrate the problem. These are completely made up, to make the calculations easy – I’m not using real data here.

Suppose that:

• the US population is 300,000,000;
• black people are 40% of the population, which means that there are are 120,000,000 black people.
• there are 1000 universities in America, and there are 50 faculty per university, so there are 50,000 university professors.
• there are 50 astronauts in the US.
• If 4% of astronauts and 4% of college professors are black, that means that there are 2,000 black college professors, and 2 black astronauts.

In this scenario, as in reality, the percentage of black college professors and the percentage of black astronauts are equal. What about the probability of a given black person being a professor or an astronaut?

The probability of a black person being a professor is 2,000/120,000,000 – or 1 in 60,000. The probability of a black person becoming an astronaut is just 2/120,000,000 – or 1 in 60 million. Even though the probability of a random astronaut being black is the same as a the probability of a random college professor being black, the probability of a given black person becoming a college professor is 10,000 times higher that the probability of a given black person becoming an astronaut.

This kind of thing isn’t rocket science. My 11 year old son has done enough statistics in school to understand this problem! It’s simple: you need to compare like to like. If you can’t understand that, if you can’t understand your statistics enough to understand their units, you should probably try to avoid making public statements about statistics. Otherwise, you’ll wind up doing something stupid, and make yourself look like an idiot.

(In the interests of disclosure: an earlier version of this post used the comparison of apples to watermelons. But given the racial issues discussed in the post, that had unfortunate unintended connotations. When someone pointed that out to me, I changed it. To anyone who was offended: I am sorry. I did not intend to say anything associated with the racist slurs; I simply never thought of it. I should have, and I shouldn’t have needed someone to point it out to me. I’ll try to be more careful in the future.)

Back to an old topic: Bad Vaccine Math

The very first Good Math/Bad Math post ever was about an idiotic bit of antivaccine rubbish. I haven’t dealt with antivaccine stuff much since then, because the bulk of the antivaccine idiocy has nothing to do with math. But the other day, a reader sent me a really interesting link from what my friend Orac calls a “wretched hive of scum and quackery”, naturalnews.com, in which they try to argue that the whooping cough vaccine is an epic failure:

(NaturalNews) The utter failure of the whooping cough (pertussis) vaccine to provide any real protection against disease is once again on display for the world to see, as yet another major outbreak of the condition has spread primarily throughout the vaccinated community. As it turns out, 90 percent of those affected by an ongoing whooping cough epidemic that was officially declared in the state of Vermont on December 13, 2012, were vaccinated against the condition — and some of these were vaccinated two or more times in accordance with official government recommendations.

As reported by the Burlington Free Press, at least 522 cases of whooping cough were confirmed by Vermont authorities last month, which was about 10 times the normal amount from previous years. Since that time, nearly 100 more cases have been confirmed, bringing the official total as of January 15, 2013, to 612 cases. The majority of those affected, according to Vermont state epidemiologist Patsy Kelso, are in the 10-14-year-old age group, and 90 percent of those confirmed have already been vaccinated one or more times for pertussis.

Even so, Kelso and others are still urging both adults and children to get a free pertussis shot at one of the free clinics set up throughout the state, insisting that both the vaccine and the Tdap booster for adults “are 80 to 90 percent effective.” Clearly this is not the case, as evidenced by the fact that those most affected in the outbreak have already been vaccinated, but officials are apparently hoping that the public is too naive or disengaged to notice this glaring disparity between what is being said and what is actually occurring.

It continues in that vein. The gist of the argument is:

1. We say everyone needs to be vaccinated, which will protect them from getting the whooping cough.
2. The whooping cough vaccine is, allagedly, 80 to 90% effective.
3. 90% of the people who caught whooping cough were properly vaccinated.
4. Therefore the vaccine can’t possibly work.

What they want you to do is look at that 80 to 90 percent effective rate, and see that only 10-20% of vaccinated people should be succeptible to the whooping cough, and compare that 10-20% to the 90% of actual infected people that were vaccinated. 20% (the upper bound of the succeptible portion of vaccinated people according to the quoted statistic) is clearly much smaller than 90% – therefore it’s obvious that the vaccine doesn’t work.

Of course, this is rubbish. It’s a classic apple to orange-grove comparison. You’re comparing percentages, when those percentages are measuring different groups – groups with wildly difference sizes.

Take a pool of 1000 people, and suppose that 95% are properly vaccinated (the current DTAP vaccination rate in the US is around 95%). That gives you 950 vaccinated people and 50 unvaccinated people who are unvaccinated.

In the vaccinated pool, let’s assume that the vaccine was fully effective on 90% of them (that’s the highest estimate of effectiveness, which will result in the lowest number of succeptible vaccinated – aka the best possible scenario for the anti-vaxers). That gives us 95 vaccinated people who are succeptible to the whooping cough.

There’s the root of the problem. Using numbers that are ridiculously friendly to the anti-vaxers, we’ve still got a population of twice as many succeptible vaccinated people as unvaccinated. so we’d expect, right out of the box, that better than 2/3rds of the cases of whooping cough would be among the vaccinated people.

In reality, the numbers are much worse for the antivax case. The percentage of people who were ever vaccinated is around 95%, because you need the vaccination to go to school. But that’s just the childhood dose. DTAP is a vaccination that needs to be periodically boosted or the immunity wanes. And the percentage of people who’ve had boosters is extremely low. Among adolescents, according to the CDC, only a bit more than half have had DTAP boosters; among adults, less that 10% have had a booster within the last 5 years.

What’s your succeptibility if you’ve gone more than 5 years without vaccination? Somewhere 40% of people who didn’t have boosters in the last five years are succeptible.

So let’s just play with those numbers a bit. Assume, for simplicity, than 50% of the people are adults, and 50% children, and assume that all of the children are fully up-to-date on the vaccine. Then you’ve got 10% of the children (10% of 475), 10% of the adults that are up-to-date (10% of 10% of 475), and 40% of the adults that aren’t up-to-date (40% of 90% of 475) is the succeptible population. That works out to 266 succeptible people among the vaccinated, which is 85%: so you’d expect 85% of the actual cases of whooping cough to be among people who’d been vaccinated. Suddenly, the antivaxers case doesn’t look so good, does it?

Consider, for a moment, what you’d expect among a non-vaccinated population. Pertussis is highly contagious. If someone in your household has pertussis, and you’re succeptible, you’ve got a better than 90% chance of catching it. It’s that contagious. Routine exposure – not sharing a household, but going to work, to the store, etc., with people who are infected still gives you about a 50% chance of infection if you’re succeptible.

In the state of Vermont, where NaturalNews is claiming that the evidence shows that the vaccine doesn’t work, how many cases of Pertussis have they seen? Around 600, out of a state population of 600,000 – an infection rate of one tenth of one percent. 0.1 percent, from a virulently contagious disease.

That’s the highest level of Pertussis that we’ve seen in the US in a long time. But at the same time, it’s really a very low number for something so contagious. To compare for a moment: there’s been a huge outbreak of Norovirus in the UK this year. Overall, more than one million people have caught it so far this winter, out of a total population of 62 million, for a rate of about 1.6% or sixteen times the rate of infection of pertussis.

Why is the rate of infection with this virulently contagious disease so different from the rate of infection with that other virulently contagious disease? Vaccines are a big part of it.

Big Number Bogosity from a Christian College Kid

I know that I just posted a link to a stupid religious argument, but I was sent a link to another one, which I can’t resist mocking.

As I’ve written about quite often, we humans really stink at understanding big numbers, and how things scale. This is an example of that. We’ve got a jerk who’s about to graduate from a dinky christian college, who believes that there must be something special about the moral atmosphere at his college, because in his four years at the school, there hasn’t been a single murder.

Yeah, seriously. He really believes that his school is special, because it’s gone four whole years without a murder:

Considering that the USA Today calculated 857 college student deaths from 2000 to 2005, how does one school manage to escape unscathed? It’s certainly not chance or luck. For Patrick Henry College, it’s in our Christian culture.

Critics mock us for our strict rules – like no dancing or drinking on campus, no members of the opposite sex permitted in your dorm room, nightly curfew hours – and the lack of a social atmosphere it creates. We have been the subject of books (God’s Harvard), television shows, op-eds, and countless blogs who rant against our brand of overbearing right-wing Christianity that poisons society’s freedom.

Yet, what is the cost of students being able to “express” themselves? Is that freedom worth the cost of drunk driving deaths, drug related violence, and love affairs turned fatal?

There were 857 college student deaths in the five-year period from 2000 to 2005! Therefore, any college where there weren’t any murders in that period must be something really special. That christian culture must be making a really big difference, right?

Well, no.

According to Google Answers, the US Census Department reports that there are 2363 four year colleges in the US. So, assuming the widest possible distribution of student deaths, there were 1506 colleges with no student deaths in a five-year period. Or, put another way, more than 60% of colleges in the US went that five-year period without any violent student deaths.

Or, let’s try looking at it another way. According to the census, there are 15.9 million people currently enrolled in college. The school that, according to the author, is so remarkable for going without any murders in the last four years? It has 325 students. Not 325 per class – 325 total.

In other words, among a group making up less than 2/1000ths of one percent of the college population, there were no murders. Assuming that the distribution of violent deaths is perfectly uniform (which it obviously isn’t; but let’s just keep things simple), given that there were 857 violent deaths in the student population as a whole, how many violent deaths would you expect among the student body at his dinky christian college?

That would be a big, fat zero.

The fact that there were no violent deaths at his school isn’t remarkable, not at all. But to a twit who’s incapable of actually understanding what numbers mean, that’s not the conclusion to be drawn. It’s also not that the violent death among college students is actually remarkably rare. Nor is it that most college students will go through college without any violent deaths on campus. No – according to a twit, with 857 violent campus deaths over five years, the only reasonable conclusion is that there must be something special about the ridiculous religious rules at his college that prevented the great rampaging plague of violence from touching the students at his school.

I actually spent five years as an undergraduate at Rutgers University in NJ. During that time, there were no violent student deaths. (There was one death by alchohol poisoning; and there was one drunk driving accident that killed four students.) But zero violent deaths. Gosh, Rutgers must have been an absolutely amazingly moral university! And gosh, we had all of those horrible sinful things, like dancing, and co-ed dorms! How did we manage to go all that time with no violence?

It must have been the prayers of the very nice Rabbi at the Chabad house on campus. Yeah, that must be it! Couldn’t just be random chance, right?

Ok, now let me stop being quite so pettily snide for a moment.

What’s going on here is really simple. We hear a whole lot about violence on campus. And when you hear about eight-hundred and some-odd violent deaths on campus, it sounds like a lot. So, intuitively, it sure seems like there must be a whole lot of violence on campus, and it must be really common. So if you can go through your whole time in college without having any violence occur on campus, it seems like it must be unusual.

That’s because, as usual, we really suck at understanding big numbers and scale. 800 sounds like a lot. The idea that there are nearly sixteen million college students is just not something that we understand on an intuitive level. The idea that nearly a thousand deaths could be a tiny drop in the bucket – that it really amounts to just one death per 100,000 students per year – it just doesn’t make sense to us. A number like 800 is, just barely, intuitively meaningful to us. One million isn’t. Fifteen million isn’t. And a ratio with a number that we can’t really grasp intuitively on the bottom? That’s not going to be meaningful either.

Bozo-boy is making an extremely common mistake. He’s just simply failing to comprehend how numbers scale; he’s not understanding what big numbers really mean.

Big Numbers and Air Travel

As you’ve surely heard by now, on christmas day, some idiot attempted to
blow up an airplane by stuffing his underwear full of explosives and then
lighting his crotch on fire. There’s been a ton of coverage of this – most of
which takes the form of people running around wetting their pants in terror.

One thing which I’ve noticed, though, is that one aspect of this whole mess
ties in to one of my personal obsessions: scale. We humans are really,
really lousy at dealing with big numbers. We just absolutely
have a piss-poor ability to really comprehend numbers, or to take what we
know, and put it together in a quantitative way.

I am the antichrist. No, really!

I normally try to ignore things like this, but this is just too funny.

In general, I find arguments like this to be extremely silly. This is, basically, like
playing with gematria – only instead of doing real gematria (which can be quite silly enough),
it’s like our friend “Gotcha” – mixing systems and screwing things up until you get the results
you want.

Lots of the particularly crazy strain of Christians really, desperately want to believe
that Barack Obama is the antichrist. They want an explanation for how this black man with
a muslim name could possible have actually been elected – they don’t believe it could possibly
have happened honestly. And their doctrine requires the antichrist to come soon. Combine
those two, and you’ve got what, for them, is a sort of perfect storm.

Which gives us things like this. For more mockery, see beneath the fold.

Moronic Probability and Stupid Physics

Via the Bad Astronomer comes one of the most pathetic abuses of
probability that I’ve ever seen. I’m simply amazed that this idiot was willing
to go on television and say this.

 The Daily Show With Jon Stewart M – Th 11p / 10c Large Hadron Collider thedailyshow.com

The crank in question is Walter Wagner, the moron who tried to use a lawsuit
to stop the LHC from being activated. (Just that much, already, is amazingly silly;
he sued in Hawaii, but the LHC is in Geneva, Switzerland. How does a Hawaiian court
have any jurisdiction?)

Anyway… Wagner claims that the LHC could destroy the earth. See, there’s a tiny theoretical chance that the right collision in the LHC could create a microscopic black hole. According to Wagner, if that happens, the black hole will swallow the entire earth.

That claim is, itself, based on some pretty bad math. The only theory that predicts
that it’s possible to create a microscopic black hole also predicts that such a black
hole would evaporate – that is, would completely disappear in a burst of energy – immediately. The
exact same math that predicts that you could create a black hole in a high-energy collision also
predicts that the hole would be destroyed before it had time to do any damage. If you tweak it so that the black hole lasts longer, the energy requirements change so that it’s no longer possible to create it in the LHC. To make the black hole last a microsecond is absolutely beyond the
energy of any collider that we could ever build on the earth.

But let’s skip that – demonstrating that is pretty complicated. To get an idea of
the level of understanding of the guy who claims that there’s a real danger, let’s just
take a look at what he says.

When asked what the probability of the LHC destroying the earth is, he says 50%. Why?
Because either it could happen, or it couldn’t – therefore, there’s a 50% chance of it happening.

You could argue that that’s naive Bayesian reasoning – but if you did, you’d be an idiot. Classic Bayesian arguments about stuff like this would say that you use 50/50 as an initial prior in the absence of any other information; then you adjust that based on whatever
other information you have available. For Mr. Wagner’s stupid argument, it’s based on
a complex physical theory – a complex physical theory which provides lots of information
which you can use to update your probability estimate.

Mr. Wagner’s 50/50 claim is based on the fact that he’s absolutely clueless about how any of
this stuff works. He clearly doesn’t understand probability, and he clearly doesn’t understand
physics.

But he’s awfully funny.

Lottery Probabilities and Clueless Reporters

A simple, silly, but entertaining example of mathematical illiteracy by way of the Associated Press:

OMAHA, Neb. (AP) — The odds are against something this odd. But a Nebraska Lottery official says there was no mistake: The same three numbers in Nebraska’s Pick 3 lottery were drawn two nights in a row this week.

Lottery spokesman Brian Rockey said one of two lottery computers that randomly generate numbers produced the numbers 1, 9 and 6 — in that order — for Monday night’s Pick 3 drawing. Rockey says the next night, the lottery’s other computer produced the same three numbers in the same sequence.

The odds of such an occurrence? One in a million.

Close… Only off by three orders of magnitude…

If you measure the wrong thing, you get the wrong answer: Down's syndrome in Britain

One of the blogs I read regularly is Ben Goldacre’s “Bad Science”. I recommend
it highly. (Which reminds me that I really need to find some time to update my blogroll!) In saturday’s entry, he discussed a BBC Radio documentary that described how Britain is becoming a much more welcoming place for Down’s syndrome babies.

Ben did a good job of shredding it. But I also wanted to take a stab, focusing on
the mathematical problem that underlies it, because it’s a great example of two very
common errors – first, the familiar confusing correlation and causation, and
second, using incorrect metrics.