Tag Archives: statistics

Willfull Ignorance about Statistics in Government

Quick but important one here.

I’ve repeatedly ranted here about ignorant twits. Ignorance is a plague on society, and it’s at its worst when it’s willful ignorance – that is, when you have a person who knows nothing about a subject, and who refuses to be bothered with something as trivial and useless about learning about it before they open their stupid mouths.

We’ve got an amazing, truly amazing, example of this in the US congress right now.
There’s a “debate” going on about something called the American Community Survey, or the
ACS for short. The ACS is a regular survey performed by the Census administration, which
measures a wide range of statistics related to economics.

A group of Republicans are trying to eliminate the ACS. Why? well, let’s put that question aside. And let’s also leave aside, for the moment, whether the survey is important or not. You can, honestly, put together an argument that the ACS isn’t worth doing, that it doesn’t measure the right things, that the value of the information gathered doesn’t measure up to the cost, that it’s intrusive, that it violates the privacy of the survey targets. But let’s not even bother with any of that.

Members of congress are arguing that the survey should be eliminated, and they’re claiming that the reason why is because the survey is unscientific. According to Daniel Webster, a representative from the state of Florida:

We’re spending $70 per person to fill this out. That’s just not cost effective, especially since in the end this is not a scientific survey. It’s a random survey.

Note well the emphasized point there. That’s the important bit.

The survey isn’t cost effective, the data gathered isn’t genuinely useful according to Representative Webster, because it’s not a scientific survey. Why isn’t it a scientific survey? Because it’s random.

This is what I mean by willful ignorance. Mr. Webster doesn’t understand what a survey is, or how a survey works, or what it takes to make a valid survey. He’s talking out his ass, trying to kill a statistical analysis for his own political reasons without making any attempt to actually understand what it is or how it works.

Surveys are, fundamentally, about statistical sampling. Given a large population, you can create estimates about the properties of the population by looking at a representative sample of the population. For example, if you’re looking at the entire population of America, you’re talking about hundreds of millions of people. You can’t measure, say, the employment rate of the entire population every year – there are just too many people. It’s too much information – it’s pretty much impossible to gather it.

But: if you can select a group of, say, 10,000 people, whose distribution matches the distribution of the wider population, then the data you gather about them will closely resemble the data about the wider population.

That’s the point of a survey: find a representative sample, and take measurements of that sample. Then, with a certain probability of correctness, you can infer the properties of the entire population from the properties of the sample.

Of course, there’s a catch. The key to a survey is the sample. The sample must be representative – meaning that the sample must have the same properties as the wider population of which it’s a part. But the point of survey is to discover those properties! If you choose your population to match what you believe the distribution to be, then you’ll bias your data towards matching that distribution. Your sample will only be representative if your beliefs about the data are correct. But that defeats the whole purpose of doing the survey.

So the scientific method of doing a survey is to be random. You don’t start with any preconceived idea of what the population is like. You just randomly select people in a way that makes sure that every member of the population is equally likely to be selected. If your selection is truly random, then there’s a high probability (a measurably high probability, based on the size of the sample and the size of the sampled population) that the sample will be representative.

Scientific sampling is always random.

So Mr. Webster’s statement could be rephrased more correctly as the following contradiction: “This is not a scientific survey, because this is a scientific survey”. But Mr. Webster doesn’t know that what he said is a stupid contradiction. Because he doesn’t care.