Willfull Ignorance about Statistics in Government

Quick but important one here.

I’ve repeatedly ranted here about ignorant twits. Ignorance is a plague on society, and it’s at its worst when it’s willful ignorance – that is, when you have a person who knows nothing about a subject, and who refuses to be bothered with something as trivial and useless about learning about it before they open their stupid mouths.

We’ve got an amazing, truly amazing, example of this in the US congress right now.
There’s a “debate” going on about something called the American Community Survey, or the
ACS for short. The ACS is a regular survey performed by the Census administration, which
measures a wide range of statistics related to economics.

A group of Republicans are trying to eliminate the ACS. Why? well, let’s put that question aside. And let’s also leave aside, for the moment, whether the survey is important or not. You can, honestly, put together an argument that the ACS isn’t worth doing, that it doesn’t measure the right things, that the value of the information gathered doesn’t measure up to the cost, that it’s intrusive, that it violates the privacy of the survey targets. But let’s not even bother with any of that.

Members of congress are arguing that the survey should be eliminated, and they’re claiming that the reason why is because the survey is unscientific. According to Daniel Webster, a representative from the state of Florida:

We’re spending $70 per person to fill this out. That’s just not cost effective, especially since in the end this is not a scientific survey. It’s a random survey.

Note well the emphasized point there. That’s the important bit.

The survey isn’t cost effective, the data gathered isn’t genuinely useful according to Representative Webster, because it’s not a scientific survey. Why isn’t it a scientific survey? Because it’s random.

This is what I mean by willful ignorance. Mr. Webster doesn’t understand what a survey is, or how a survey works, or what it takes to make a valid survey. He’s talking out his ass, trying to kill a statistical analysis for his own political reasons without making any attempt to actually understand what it is or how it works.

Surveys are, fundamentally, about statistical sampling. Given a large population, you can create estimates about the properties of the population by looking at a representative sample of the population. For example, if you’re looking at the entire population of America, you’re talking about hundreds of millions of people. You can’t measure, say, the employment rate of the entire population every year – there are just too many people. It’s too much information – it’s pretty much impossible to gather it.

But: if you can select a group of, say, 10,000 people, whose distribution matches the distribution of the wider population, then the data you gather about them will closely resemble the data about the wider population.

That’s the point of a survey: find a representative sample, and take measurements of that sample. Then, with a certain probability of correctness, you can infer the properties of the entire population from the properties of the sample.

Of course, there’s a catch. The key to a survey is the sample. The sample must be representative – meaning that the sample must have the same properties as the wider population of which it’s a part. But the point of survey is to discover those properties! If you choose your population to match what you believe the distribution to be, then you’ll bias your data towards matching that distribution. Your sample will only be representative if your beliefs about the data are correct. But that defeats the whole purpose of doing the survey.

So the scientific method of doing a survey is to be random. You don’t start with any preconceived idea of what the population is like. You just randomly select people in a way that makes sure that every member of the population is equally likely to be selected. If your selection is truly random, then there’s a high probability (a measurably high probability, based on the size of the sample and the size of the sampled population) that the sample will be representative.

Scientific sampling is always random.

So Mr. Webster’s statement could be rephrased more correctly as the following contradiction: “This is not a scientific survey, because this is a scientific survey”. But Mr. Webster doesn’t know that what he said is a stupid contradiction. Because he doesn’t care.

15 thoughts on “Willfull Ignorance about Statistics in Government

  1. Deen

    The stupid, it burns.

    Small quibble:

    You can’t measure, say, the employment rate of the entire population every year – there are just too many people.

    Actually, you can measure the employment rate of the entire population every year, just not with a survey. You could register applications for unemployment benefits and welfare in a nation-wide database, for example. Most of this data is probably already recorded somewhere for other purposes anyway. You might even be able to track it in near real time, if you really wanted.

    1. MarkCC Post author

      Sorry, but no. That will tell you the rate of people taking unemployment benefits. That’s a different number from the actual number of unemployed people. We’re talking about different numbers.

      IF you want to know the actual number of people working, the unemployment figures don’t tell you that. There are people who never worked, and aren’t eligible; there are people who collected until the benefits ran out. There are retired people who aren’t working. Those aren’t captured in unemployment rates.

      1. Elipson

        How about 313,582,000 quadrocopter, each one following a person? Not only would we get employment rates, as well as weekly work hours AND our production of porno would increase massively! 😉

      2. Deen

        Sorry, but no. That will tell you the rate of people taking unemployment benefits. That’s a different number from the actual number of unemployed people.

        True, but that was just meant as an example of something you could do, not a full prescription of how to get to the total employment numbers. In reality you’d also have to cross-reference that to IRS income records, social security records, birth records, death records, etc etc. I’m sure it’s highly non-trivial to mine all this data, dealing with issues like incomplete records or re-use of social security numbers etc, but I doubt it is impossible.

        I’m sure there will still be people who have fallen through the cracks of the system (or purposely went off the grid) and wouldn’t be counted, but you’d still get a good idea of the overall numbers – and besides, you likely wouldn’t reach those people with a survey either.

        As said, this is not impossible. The biggest obstacle to doing this is likely issues of privacy, not the availability or the volume of data.

  2. Robert

    I’m curious about the context here — where exactly is it recorded that Daniel Webster said this? I’m just asking because when I first read the quote, I read it as “the survey was a bunch of random questions”.

  3. Rob Britton

    Another good point to make about random sampling is that we can have a fairly good idea about how our sample is likely to vary from the population simply based on the size of the sample and some assumptions on the nature of the population.

    We don’t really have the same luxury when introducing any sort of bias to the sampling procedure, because it greatly complicates the sampling distribution! So from a mathematical perspective it just makes things easier, and that’s generally a good thing.

  4. Pingback: Misuse of Statistics in Government | Standard Deviations From the Beaten Path

  5. Hal Swyers

    I don’t want to defend the specific remark, since it is obviously flawed, however I do think that there is something to be said about low level of significance that is allowed in most social science surveys. I don’t the specifics of ACS, but if it is typical of most surveys, they are satisfied with a 95% confidence level which is an extremely low standard for claiming scientific discovery.

    IOW, the point I think the congressman could have made is that a lot of surveys in government are effectively useless for making policy decisions because there is a high likelihood that another variation of the same survey could come to different conclusions. Since the cost and level of intrusion to get to the assumed 95% confidence level is so high, the cost to get to a more significant result would be exorbitant. So the question is why are we conducting the survey?

    A better question is whether there is a cheaper way to get accurate information that is of use to law makers?

    1. JK

      That would be somewhat understandable if the ACS was primarily concerned with experimental design and testing, but it’s not. The ACS is a survey, and it’s mainly concerned with obtaining estimates of population statistics rather than using null hypothesis significance testing. The 95% confidence level corresponds to the margin of error, and is a reasonable margin of error for any social science data. Given the sample size, the margin of error is going to be low. Even with a change to a 99% confidence level, the margin of error is only going to be slightly higher, and will give minimally more information at the cost of interpretability.

      Basically what I’m saying is that you are mixing two important and intertwined, but separate branches of mathematical statistics: testing and confidence interval estimation. But, if we say that the politicians were thinking the same thing, then maybe a statistics class is in order?

  6. Uncle Al

    Dewey Beats Truman Official sampling was by phone, voting included people without phones. “Although most politicians are born liars and stupid, they aren’t elected to be stupid.” This has changed.

    Lyndon Johnson’s 1965 “Great Society” renounced the harsh rule of reason for nurturant social engineering. The crippled are enabled, the able are crippled. Constitutional restrictions are replaced by criminal adventurism clothed by the Ministry of Truth. Feckless crapweasels suffering reality deficit disorder eructate deformed decisions leading to economic cloudy days. Beltway lobotomites’ administrative despotism empowers the irresistible buoyancy of excrement. How dare you try to end this beauty!

  7. J.

    Not to rain on the parade, but the context of Webster’s statement is one of experimental design i.e. there are no controls on the variables.

    The surveys are randomly taken by random people with random degrees of competency without isolation of an experimental setting.

  8. Pingback: The ignorance of Congress simply mirrors our own ignorance

  9. Pingback: The ignorance of Congress simply mirrors our own ignorance « A Man With A Ph.D.

Leave a Reply