Category Archives: Bad Statistics

Polls and Sampling Errors in the Presidental Debate Results

My biggest pet peeve is press coverage of statistics. As someone who is mathematically literate, I’m constantly infuriated by it. Basic statistics isn’t that hard, but people can’t be bothered to actually learn a tiny bit in order to understand the meaning of the things they’re covering.

My twitter feed has been exploding with a particularly egregious example of this. After monday night’s presidential debate, there’s been a ton of polling about who “won” the debate. One conservative radio host named Bill Mitchell has been on a rampage about those polls. Here’s a sample of his tweets:

Let’s start with a quick refresher about statistics, why we use them, and how they work.

Statistical analysis has a very simple point. We’re interested in understanding the properties of a large population of things. For whatever reason, we can’t measure the properties of every object in that population.

The exact reason can vary. In political polling, we can’t ask every single person in the country who they’re going to vote for. (Even if we could, we simply don’t know who’s actually going to show up and vote!) For a very different example, my first exposure to statistics was through my father, who worked in semiconductor manufacturing. They’d produce a run of 10,000 chips for use in Satellites. They needed to know when, on average, a chip would fail from exposure to radiation. If they measured that in every chip, they’d end up with nothing to sell.)

Anyway: you can’t measure every element of the population, but you still want to take measurements. So what you do is randomly select a collection of representative elements from the population, and you measure those. Then you can say that with a certain probability, the result of analyzing that representative subset will match the result that you’d get if you measured the entire population.

How close can you get? If you’ve really selected a random sample of the population, then the answer depends on the size of the sample. We measure that using something called the “margin of error”. “Margin of error” is actually a terrible name for it, and that’s the root cause of one of the most common problems in reporting about statistics. The margin of error is a probability measurement that says “there is an N% probability that the value for the full population lies within the margin of error of the measured value of the sample.”.

Right away, there’s a huge problem with that. What is that variable doing in there? The margin of error measures the probability that the full population value is within a confidence interval around the measured sample value. If you don’t say what the confidence interval is, the margin of error is worthless. Most of the time – but not all of the time – we’re talking about a 95% confidence interval.

But there are several subtler issues with the margin of error, both due to the name.

  1. The “true” value for the full population is not guaranteed to be within the margin of error of the sampled value. It’s just a probability. There is no hard bound on the size of the error: just a high probability of it being within the margin..
  2. The margin of error only includes errors due to sample size. It does not incorporate any other factor – and there are many! – that may have affected the result.
  3. The margin of error is deeply dependent on the way that the underlying sample was taken. It’s only meaningful for a random sample. That randomness is critically important: all of sampled statistics is built around the idea that you’ve got a randomly selected subset of your target population.

Let’s get back to our friend the radio host, and his first tweet, because he’s doing a great job of illustrating some of these errors.

The quality of a sampled statistic is entirely dependent on how well the sample matches the population. The sample is critical. It doesn’t matter how big the sample size is if it’s not random. A non-random sample cannot be treated as a representative sample.

So: an internet poll, where a group of people has to deliberately choose to exert the effort to participate cannot be a valid sample for statistical purposes. It’s not random.

It’s true that the set of people who show up to vote isn’t a random sample. But that’s fine: the purpose of an election isn’t to try to divine what the full population thinks. It’s to count what the people who chose to vote think. It’s deliberately measuring a full population: the population of people who chose to vote.

But if you’re trying to statistically measure something about the population of people who will go and vote, you need to take a randomly selected sample of people who will go to vote. The set of voters is the full population; you need to select a representative sample of that population.

Internet polls do not do that. At best, they measure a different population of people. (At worst, with ballot stuffing, they measure absolutely nothing, but we’ll give them this much benefit of the doubt.) So you can’t take much of anything about the sample population and use it to reason about the full population.

And you can’t say anything about the margin of error, either. Because the margin of error is only meaningful for a representative sample. You cannot compute a meaningful margin of error for a non-representative sample, because there is no way of knowing how that sampled population compares to the true full target population.

And that brings us to the second tweet. A properly sampled random population of 500 people can produce a high quality result with a roughly 5% margin of error and a 95% confidence interval. (I’m doing a back-of-the-envelope calculation here, so that’s not precise.) That means that if the population were randomly sampled, we could say there is in 19 out of 20 polls of that size, the full population value would be within +/- 4% of value measured by the poll. For a non-randomly selected sample of 10 million people, the margin of error cannot be measured, because it’s meaningless. The random sample of 500 people tells us a reasonable estimate based on data; the non-random sample of 10 million people tells us nothing.

And with that, on to the third tweet!

In a poll like this, the margin of error only tells us one thing: what’s the probability that the sampled population will respond to the poll in the same way that the full population would?

There are many, many things that can affect a poll beyond the sample size. Even with a truly random and representative sample, there are many things that can affect the outcome. For a couple of examples:

How, exactly, is the question phrased? For example, if you ask people “Should police shoot first and ask questions later?”, you’ll get a very different answer from “Should police shoot dangerous criminal suspects if they feel threatened?” – but both of those questions are trying to measure very similar things. But the phrasing of the questions dramatically affects the outcome.

What context is the question asked in? Is this the only question asked? Or is it asked after some other set of questions? The preceding questions can bias the answers. If you ask a bunch of questions about how each candidate did with respect to particular issues before you ask who won, those preceding questions will bias the answers.

When you’re looking at a collection of polls that asked different questions in different ways, you expect a significant variation between them. That doesn’t mean that there’s anything wrong with any of them. They can all be correct even though their results vary by much more than their margins of error, because the margin of error has nothing to do with how you compare their results: they used different samples, and measured different things.

The problem with the reporting is the same things I mentioned up above. The press treats the margin of error as an absolute bound on the error in the computed sample statistics (which it isn’t); and the press pretends that all of the polls are measuring exactly the same thing, when they’re actually measuring different (but similar) things. They don’t tell us what the polls are really measuring; they don’t tell us what the sampling methodology was; and they don’t tell us the confidence interval.

Which leads to exactly the kind of errors that Mr. Mitchell made.

And one bonus. Mr. Mitchell repeatedly rants about how many polls show a “bias” by “over-sampling< democratic party supporters. This is a classic mistake by people who don't understand statistics. As I keep repeating, for a sample to be meaningful, it must be random. You can report on all sorts of measurements of the sample, but you cannot change it.

If you’re randomly selecting phone numbers and polling the respondents, you cannot screen the responders based on their self-reported party affiliation. If you do, you are biasing your sample. Mr. Mitchell may not like the results, but that doesn’t make them invalid. People report what they report.

In the last presidential election, we saw exactly this notion in the idea of “unskewing” polls, where a group of conservative folks decided that the polls were all biased in favor of the democrats for exactly the reasons cited by Mr. Mitchell. They recomputed the poll results based on shifting the samples to represent what they believed to be the “correct” breakdown of party affiliation in the voting population. The results? The actual election results closely tracked the supposedly “skewed” polls, and the unskewers came off looking like idiots.

We also saw exactly this phenomenon going on in the Republican primaries this year. Randomly sampled polls consistently showed Donald Trump crushing his opponents. But the political press could not believe that Donald Trump would actually win – and so they kept finding ways to claim that the poll samples were off: things like they were off because they used land-lines which oversampled older people, and if you corrected for that sampling error, Trump wasn’t actually winning. Nope: the randomly sampled polls were correct, and Donald Trump is the republican nominee.

If you want to use statistics, you must work with random samples. If you don’t, you’re going to screw up the results, and make yourself look stupid.

Back to an old topic: Bad Vaccine Math

The very first Good Math/Bad Math post ever was about an idiotic bit of antivaccine rubbish. I haven’t dealt with antivaccine stuff much since then, because the bulk of the antivaccine idiocy has nothing to do with math. But the other day, a reader sent me a really interesting link from what my friend Orac calls a “wretched hive of scum and quackery”, naturalnews.com, in which they try to argue that the whooping cough vaccine is an epic failure:

(NaturalNews) The utter failure of the whooping cough (pertussis) vaccine to provide any real protection against disease is once again on display for the world to see, as yet another major outbreak of the condition has spread primarily throughout the vaccinated community. As it turns out, 90 percent of those affected by an ongoing whooping cough epidemic that was officially declared in the state of Vermont on December 13, 2012, were vaccinated against the condition — and some of these were vaccinated two or more times in accordance with official government recommendations.

As reported by the Burlington Free Press, at least 522 cases of whooping cough were confirmed by Vermont authorities last month, which was about 10 times the normal amount from previous years. Since that time, nearly 100 more cases have been confirmed, bringing the official total as of January 15, 2013, to 612 cases. The majority of those affected, according to Vermont state epidemiologist Patsy Kelso, are in the 10-14-year-old age group, and 90 percent of those confirmed have already been vaccinated one or more times for pertussis.

Even so, Kelso and others are still urging both adults and children to get a free pertussis shot at one of the free clinics set up throughout the state, insisting that both the vaccine and the Tdap booster for adults “are 80 to 90 percent effective.” Clearly this is not the case, as evidenced by the fact that those most affected in the outbreak have already been vaccinated, but officials are apparently hoping that the public is too naive or disengaged to notice this glaring disparity between what is being said and what is actually occurring.

It continues in that vein. The gist of the argument is:

  1. We say everyone needs to be vaccinated, which will protect them from getting the whooping cough.
  2. The whooping cough vaccine is, allagedly, 80 to 90% effective.
  3. 90% of the people who caught whooping cough were properly vaccinated.
  4. Therefore the vaccine can’t possibly work.

What they want you to do is look at that 80 to 90 percent effective rate, and see that only 10-20% of vaccinated people should be succeptible to the whooping cough, and compare that 10-20% to the 90% of actual infected people that were vaccinated. 20% (the upper bound of the succeptible portion of vaccinated people according to the quoted statistic) is clearly much smaller than 90% – therefore it’s obvious that the vaccine doesn’t work.

Of course, this is rubbish. It’s a classic apple to orange-grove comparison. You’re comparing percentages, when those percentages are measuring different groups – groups with wildly difference sizes.

Take a pool of 1000 people, and suppose that 95% are properly vaccinated (the current DTAP vaccination rate in the US is around 95%). That gives you 950 vaccinated people and 50 unvaccinated people who are unvaccinated.

In the vaccinated pool, let’s assume that the vaccine was fully effective on 90% of them (that’s the highest estimate of effectiveness, which will result in the lowest number of succeptible vaccinated – aka the best possible scenario for the anti-vaxers). That gives us 95 vaccinated people who are succeptible to the whooping cough.

There’s the root of the problem. Using numbers that are ridiculously friendly to the anti-vaxers, we’ve still got a population of twice as many succeptible vaccinated people as unvaccinated. so we’d expect, right out of the box, that better than 2/3rds of the cases of whooping cough would be among the vaccinated people.

In reality, the numbers are much worse for the antivax case. The percentage of people who were ever vaccinated is around 95%, because you need the vaccination to go to school. But that’s just the childhood dose. DTAP is a vaccination that needs to be periodically boosted or the immunity wanes. And the percentage of people who’ve had boosters is extremely low. Among adolescents, according to the CDC, only a bit more than half have had DTAP boosters; among adults, less that 10% have had a booster within the last 5 years.

What’s your succeptibility if you’ve gone more than 5 years without vaccination? Somewhere 40% of people who didn’t have boosters in the last five years are succeptible.

So let’s just play with those numbers a bit. Assume, for simplicity, than 50% of the people are adults, and 50% children, and assume that all of the children are fully up-to-date on the vaccine. Then you’ve got 10% of the children (10% of 475), 10% of the adults that are up-to-date (10% of 10% of 475), and 40% of the adults that aren’t up-to-date (40% of 90% of 475) is the succeptible population. That works out to 266 succeptible people among the vaccinated, which is 85%: so you’d expect 85% of the actual cases of whooping cough to be among people who’d been vaccinated. Suddenly, the antivaxers case doesn’t look so good, does it?

Consider, for a moment, what you’d expect among a non-vaccinated population. Pertussis is highly contagious. If someone in your household has pertussis, and you’re succeptible, you’ve got a better than 90% chance of catching it. It’s that contagious. Routine exposure – not sharing a household, but going to work, to the store, etc., with people who are infected still gives you about a 50% chance of infection if you’re succeptible.

In the state of Vermont, where NaturalNews is claiming that the evidence shows that the vaccine doesn’t work, how many cases of Pertussis have they seen? Around 600, out of a state population of 600,000 – an infection rate of one tenth of one percent. 0.1 percent, from a virulently contagious disease.

That’s the highest level of Pertussis that we’ve seen in the US in a long time. But at the same time, it’s really a very low number for something so contagious. To compare for a moment: there’s been a huge outbreak of Norovirus in the UK this year. Overall, more than one million people have caught it so far this winter, out of a total population of 62 million, for a rate of about 1.6% or sixteen times the rate of infection of pertussis.

Why is the rate of infection with this virulently contagious disease so different from the rate of infection with that other virulently contagious disease? Vaccines are a big part of it.

Willfull Ignorance about Statistics in Government

Quick but important one here.

I’ve repeatedly ranted here about ignorant twits. Ignorance is a plague on society, and it’s at its worst when it’s willful ignorance – that is, when you have a person who knows nothing about a subject, and who refuses to be bothered with something as trivial and useless about learning about it before they open their stupid mouths.

We’ve got an amazing, truly amazing, example of this in the US congress right now.
There’s a “debate” going on about something called the American Community Survey, or the
ACS for short. The ACS is a regular survey performed by the Census administration, which
measures a wide range of statistics related to economics.

A group of Republicans are trying to eliminate the ACS. Why? well, let’s put that question aside. And let’s also leave aside, for the moment, whether the survey is important or not. You can, honestly, put together an argument that the ACS isn’t worth doing, that it doesn’t measure the right things, that the value of the information gathered doesn’t measure up to the cost, that it’s intrusive, that it violates the privacy of the survey targets. But let’s not even bother with any of that.

Members of congress are arguing that the survey should be eliminated, and they’re claiming that the reason why is because the survey is unscientific. According to Daniel Webster, a representative from the state of Florida:

We’re spending $70 per person to fill this out. That’s just not cost effective, especially since in the end this is not a scientific survey. It’s a random survey.

Note well the emphasized point there. That’s the important bit.

The survey isn’t cost effective, the data gathered isn’t genuinely useful according to Representative Webster, because it’s not a scientific survey. Why isn’t it a scientific survey? Because it’s random.

This is what I mean by willful ignorance. Mr. Webster doesn’t understand what a survey is, or how a survey works, or what it takes to make a valid survey. He’s talking out his ass, trying to kill a statistical analysis for his own political reasons without making any attempt to actually understand what it is or how it works.

Surveys are, fundamentally, about statistical sampling. Given a large population, you can create estimates about the properties of the population by looking at a representative sample of the population. For example, if you’re looking at the entire population of America, you’re talking about hundreds of millions of people. You can’t measure, say, the employment rate of the entire population every year – there are just too many people. It’s too much information – it’s pretty much impossible to gather it.

But: if you can select a group of, say, 10,000 people, whose distribution matches the distribution of the wider population, then the data you gather about them will closely resemble the data about the wider population.

That’s the point of a survey: find a representative sample, and take measurements of that sample. Then, with a certain probability of correctness, you can infer the properties of the entire population from the properties of the sample.

Of course, there’s a catch. The key to a survey is the sample. The sample must be representative – meaning that the sample must have the same properties as the wider population of which it’s a part. But the point of survey is to discover those properties! If you choose your population to match what you believe the distribution to be, then you’ll bias your data towards matching that distribution. Your sample will only be representative if your beliefs about the data are correct. But that defeats the whole purpose of doing the survey.

So the scientific method of doing a survey is to be random. You don’t start with any preconceived idea of what the population is like. You just randomly select people in a way that makes sure that every member of the population is equally likely to be selected. If your selection is truly random, then there’s a high probability (a measurably high probability, based on the size of the sample and the size of the sampled population) that the sample will be representative.

Scientific sampling is always random.

So Mr. Webster’s statement could be rephrased more correctly as the following contradiction: “This is not a scientific survey, because this is a scientific survey”. But Mr. Webster doesn’t know that what he said is a stupid contradiction. Because he doesn’t care.

Stupid Politician Tricks; aka Averages Unfairly Biased against Moronic Conclusions

In the news lately, there’ve been a few particularly egregious examples of bad math. One that really ticked me off came from Alan Simpson. Simpson is one of the two co-chairs of a presidential comission that was asked to come up with a proposal for how to handle the federal budget deficit.

The proposal that his comission claimed that social security was one of the big problems in the budget. It really isn’t – it requires extremely creative accounting combined with several blatant lies to make it into part of the budget problem. (At the moment, social security is operating in surplus: it recieves more money in taxes each year than it pays out.)

Simpson has claimed that social security must be cut if we’re going to fix the budget deficit. As part of his attempt to defend his proposed cuts, he said the following about social security:

It was never intended as a retirement program. It was set up in ‘37 and ‘38 to take care of people who were in distress — ditch diggers, wage earners — it was to give them 43 percent of the replacement rate of their wages. The life expectancy was 63. That’s why they set retirement age at 65

When I first heard that he’d said that, my immediate reaction was “that miserable fucking liar”. Because there are only two possible interpretations of that statement. Either the guy is a malicious liar, or he’s cosmically stupid and ill-informed. I was willing to accept that he’s a moron, but given that he spent a couple of years on the deficit commission, I couldn’t believe that he didn’t understand anything about how social security works.

I was wrong.

In an interview after that astonishing quote, a reported pointed out that the overall life expectancy was 63 – but that the life expectancy for people who lived to be 65 actually had a life expectancy of 79 years. You see, the life expectancy figures are pushed down by people who die young. Especially when you realize that social security start at a time when the people collecting it grew up without antibiotics, there were a whole lot of people who died very young – which bias the age downwards. Simpson’s
response to this?

If you’re telling me that a guy who got to be 65 in 1940 — that all of them lived to be 77 — that is just not correct. Just because a guy gets to be 65, he’s gonna live to be 77? Hell, that’s my genre. That’s not true.

So yeah.. He’s really stupid. Usually, when it comes to politicians, my bias is to assume malice before ignorance. They spend so much of their time repeating lies – lying is pretty much their entire job. But Simpson is an extremely proud, arrogant man. If he had any clue of how unbelievably stupid he sounded, he wouldn’t have said that. He’d have made up some other lie that made him look less stupid. He’s got too much ego to deliberately look like a credulous drooling cretin.

So my conclusion is: He really doesn’t understand that if the overall average life expectancy for a set of people is 63, that the life expectancy of the subset people who live to be 63 going to be significantly higher than 63.

Just to hammer in how stupid it is, let’s look at a trivial example. Let’s look at a group of five people, with an average life expectancy of 62 years.

One died when he was 12. What’s the average age at death of the rest of them to make the overall average life expectancy was 62 years?

frac{4x + 12}{5} = 62, x = 74
.

So in this particular group of people with a life expectancy of 62 years, the pool of people who live to be 20 has a life expectancy of 74 years.

It doesn’t take much math at all to see how much of a moron Simpson is. It should be completely obvious: some people die young, and the fact that they die young affects the average.

Another way of saying it, which makes it pretty obvious how stupid Simpson is: if you live to be 65, you can be pretty sure that you’ll live to be at least 65, and you’ve got a darn good chance of living to be 66.

It’s incredibly depressing to realize that the report co-signed by this ignorant, moronic jackass is widely accepted by politicians and influential journalists as a credible, honest, informed analysis of the deficit problem and how to solve it. The people who wrote the report are incapable of comprehending the kind of simple arithmetic that’s needed to see how stupid Simpson’s statement was.

Electoral Rubbish

And now, for your entertainment, a bad math quickie.

I live in New York. ’round here, we’ve got a somewhat peculiar feature of how we run our elections. A single candidate can run for office on behalf of multiple parties. If they do, they appear on the ballot in multiple places – one ballot line for each party that they represent. When votes are tallied, if the candidate names for two different ballot lines match exactly, then the votes for those two lines are combined.

The theory behind this is that it allows people to say a bit more with their votes. If you want to vote for the democratic candidate, but you also want to express you preferences for policies more liberal than those of the democratic party platform, you can vote for the democrat, but do it on the liberal party line instead of the democratic party line.

In practice, what this means is that we’ve got lots of patronage parties – that is, lots of small parties which were set up by a small group of people as a way of making money by, essentially, selling their ballot line.

One thing we hear, election after election, is how terribly important these phony parties are. This year, we keep on hearing, over and over, how no Republican has won a statewide election since 1975 without the backing of the Conservative party! Therefore, winning the backing of the Conservative party is so very, very important!

This is, alas, a classic example of the old problem: correlation does not imply causation. The Republicans don’t lose elections because they don’t have the backing of the Conservative party: the Conservative party always backs the republican candidate unless it’s completely clear that they’re going to lose.

Continue reading Electoral Rubbish

Iterative Hockey Stick Analysis? Gimme a break!

This past weekend, my friend Orac sent me a link to an interesting piece
of bad math. One of Orac’s big interest is vaccination and
anti-vaccinationists. The piece is a newsletter by a group calling itself the “Sound Choice
Pharmaceutical Institute” (SCPI), which purports to show a link
between vaccinations and autism. But instead of the usual anti-vac rubbish about
thimerosol, they claim that “residual human DNA contamintants from aborted human fetal cells”
causes autism.

Among others, Orac already covered the nonsense
of that from a biological/medical
perspective. What he didn’t do, and why he forwarded this newsletter to me, is because
the basis of their argument is that they discovered key change points in the
autism rate that correlate perfectly with the introduction of various vaccines.

In fact, they claim to have discovered three different inflection points:

  1. 1979, the year that the MMR 2 vaccine was approved in the US;
  2. 1988, the year that a 2nd dose of the MMR 2 was added to the recommended vaccination
    schedule; and
  3. 1995, the year that the chickenpox vaccine was approved in the US.

They claim to have discovered these inflection points using “iterative hockey stick analysis”.

Continue reading Iterative Hockey Stick Analysis? Gimme a break!

Shameful Innumeracy in the New York Times

I’ve been writing this blog for a long time – nearly four years. You’d think that
after all of the bad math I’ve written about, I must have reached the point where
I wouldn’t be surprised at the sheer innumeracy of most people – even most supposedly
educated people. But alas for me, I’m a hopeless idealist. I just never quite
manage to absorb how clueless the average person is.

Today in the New York Times, there’s an editorial which talks about
the difficulties faced by the children of immigrants. In the course of
their argument, they describe what they claim is the difference between
the academic performance of native-born versus immigrant children:

Whereas native-born children’s language skills follow a bell
curve, immigrants’ children were crowded in the lower ranks: More than
three-quarters of the sample scored below the 85th percentile in English
proficiency.

Scoring in the 85th percentile on a test means that you did better on that
test than 85 percent of the people who took it. So for the population as a
whole
, 85% of the people who took it scored below the 85th percentile –
by definition. So, if the immigrant population were perfectly matched
with the population as a whole, then you’d expect more than 3/4s the
score below the 85th percentile.

As they reported it, the most reasonable conclusion would be that on the
whole, immigrant children do better than native-born children! The
population of test takers consists of native-born children and immigrant
children. (There’s no third option – if you’re going to school here, either
you were born here, or you weren’t.) If 3/4s of immigrant children are scoring
85th percentile or below, then that means that more than 85% of
the non-immigrant children are scoring below 85th percentile.

I have no idea where they’re getting their data. Nor do I have any idea of
what they thought they were saying. But what they actually said is a
mind-boggling stupid thing, and I can’t imagine how anyone who had the most
cursory understanding of what it actually meant would miss the fact that
the statistic doesn’t in any way, shape, or form support the statement it’s
attached to.

The people who write the editorials for the New York Times don’t even
know what percentiles mean. It’s appalling. It’s worse that appalling – it’s
an absolute disgrace.