{"id":697,"date":"2008-10-28T21:26:58","date_gmt":"2008-10-28T21:26:58","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2008\/10\/28\/margin-of-error-and-election-polls\/"},"modified":"2008-10-28T21:26:58","modified_gmt":"2008-10-28T21:26:58","slug":"margin-of-error-and-election-polls","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2008\/10\/28\/margin-of-error-and-election-polls\/","title":{"rendered":"Margin of Error and Election Polls"},"content":{"rendered":"<p> Before I get to the meat of the post, I want to remind you that our<br \/>\nDonorsChoose drive is ending in just a couple of days! A small number of readers have made extremely generous contributions, which<br \/>\nis very gratifying. (One person has even taken me up on my offer<br \/>\nof letting donors choose topics.) But the number of contributions has been very small. Please, follow the link in my sidebar, go to DonorsChoose, and make a donation. Even a few dollars can make a<br \/>\nbig difference. And remember &#8211; if you donate one hundred dollars or more, email me a math topic that you&#8217;d like me to write about, and I&#8217;ll<br \/>\nwrite you a blog article on that topic.<\/p>\n<p> This post repeats a bunch of stuff that I mentioned in one of my <a href=\"http:\/\/scienceblogs.com\/goodmath\/goodmath\/basics\/\">basics<\/a> posts last year on the <a href=\"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/01\/basics-margin-of-error\">margin of error<\/a>. But given some of the awful rubbish I&#8217;ve heard in coverage of the coming election, I thought it was worth discussing a bit.<\/p>\n<p> As the election nears, it seems like every other minute, we<br \/>\nhear predictions of the outcome of the election, based on polling. The<br \/>\nthing is, pretty much <em>every<\/em> one of those reports is<br \/>\nutter rubbish.<\/p>\n<p><!--more--><\/p>\n<p> What happens is that they look at polls, and they talk about the results and what they mean. But they, like almost everyone, use the margin of error as if it means something very different than what it really does.<\/p>\n<p> What you <em>hear<\/em> is that, for example, Barak Obama is leading Florida by 5 points, but the margin of error is +\/- 4%, so it&#8217;s really not a significant lead. What the journalists seem to think it means<br \/>\nis that the margin of error is a total measure of the accuracy of the polls &#8211; that the poll result <em>is<\/em> within the margin of error of the &#8220;true&#8221; result that the poll measures. So, by that interpretation,<br \/>\nthe poll is predicting an outcome of 52\/48, and the margin of error means that the range of actual voter preferences ranges between<br \/>\n48\/52 and 56\/44.<\/p>\n<p> The thing is, that&#8217;s not what the margin of error means. The margin<br \/>\nof error is a statistical measure of the probabilistic size of errors caused by unintentional sampling errors.<\/p>\n<p> Polls &#8211; and much of statistics in general &#8211; are based on the idea of <em>sampling<\/em>. Given a large, relatively uniform population, you can<br \/>\nget an amazingly accurate measure of that population by looking at a small subset of it, called a <em>representative sample<\/em>. A sample<br \/>\nis a randomly selected group that is intended to be a microcosm of the<br \/>\nentire population. In an ideal representative sample, the<br \/>\nsample must have the same distribution of differences as the population as a whole. <\/p>\n<p> There&#8217;s a big problem there: how can you be sure that your sample is representative? The answer is, you can&#8217;t! The only way to <em>know<\/em> for certain that a sample is representative is to measure the entire population, and compare the results of doing that to the sample. But once you&#8217;ve measured the entire population, what&#8217;s the point of looking at a sample?<\/p>\n<p> Fortunately, we <em>can<\/em> assess how likely it is that our sample is a good representation of the population. That&#8217;s what the margin of error does &#8211; it measures the likelihood of the sample being representative of the population. It&#8217;s computed by combining a<br \/>\nbunch of factors, the primary ones most commonly being the size of the population and the size of the sample. Given those, we can assess how<br \/>\ncertain you can be of your measure being pretty close to accurate. Typically, we describe that certainty by stating how large an interval<br \/>\nyou need to define on either side of the measured statistic to be<br \/>\n95% certain that the &#8220;actual&#8221; value is within that interval. The size of that interval is the margin of error.<\/p>\n<p> So when you hear a pollster talking about a &#8220;poll of likely voters showing that Obama is ahead by 8 points with a margin of error of +\/-4%&#8221;, the big thing you should do is realize what they&#8217;re measuring. In that case, the population isn&#8217;t &#8220;the set of people who are going to vote next tuesday&#8221; (even though that&#8217;s what the journalists try to make you think); the population is &#8220;the set of people who the poll believes are likely to vote next tuesday&#8221;. So the margin of error<br \/>\nis a measure of how well their poll matches the population of people<br \/>\nwho they believe are likely to vote &#8211; which is quite a different thing<br \/>\nfrom the population of people who actually <em>do<\/em> vote. In fact, it actually does slightly less, even, than that: it measures how much<br \/>\nsampling error is contained in their poll due to unintentionally<br \/>\nselecting a non-representative sample. That&#8217;s not really saying very much in an election poll.<\/p>\n<p> The population being sampled by polls is likely to be quite different from the actual population of voters for a number of reasons,<br \/>\nand this difference produces measurement errors that almost certainly significantly outweigh the unintentional sampling errors measured by the margin of error. For example:<\/p>\n<dl>\n<dt><b>Intentional Sample Bias<\/b><\/dt>\n<dd> Intentional sample bias covers a variety of techniques that<br \/>\npollsters use when they select people for the sample. For an extreme<br \/>\nexample, some polls (like, I think, Zogby) try to get an equal number of<br \/>\npeople who self-identify as republicans and democrats. But in most<br \/>\nstates, the number of party members in the two major parties are not<br \/>\nequal. They are, in fact, often pretty dramatically uneven. A less<br \/>\ndramatic but still significant one is that many polls do their polling<br \/>\nthrough phone calls, and only call land-lines. Many younger people no<br \/>\nlonger have land-lines; the exclusion of cell-phone numbers therefore<br \/>\nexcludes some portion of the population from the sample. These kinds<br \/>\nof sample bias produce a significant mismatch between the population<br \/>\nof real voters, and the population being sampled.<\/dd>\n<dt> Unknown Population<\/dt>\n<dd> The biggest of polling errors leading up to an election is the fact<br \/>\nthat the real population is <em>unknown<\/em>. No one is sure who&#8217;s going<br \/>\nto vote &#8211; which means that no one is certain of what the correct<br \/>\npopulation to sample is. Pollsters try to identify a sample of people<br \/>\nwho are <em>likely<\/em> to vote. But since the population is unknown,<br \/>\nthey don&#8217;t know if they&#8217;re including people in the sample who aren&#8217;t in<br \/>\nthe actual population of voters, and they don&#8217;t know if they&#8217;re<br \/>\nexcluding people from their samples who <em>are<\/em> going to vote. In<br \/>\nthis election, this is likely to be a significant effect, because huge<br \/>\nnumbers of people registered to vote for the first time, but <em>no one<br \/>\nknows<\/em> how many of those newly registered voters are likely to show<br \/>\nup and vote. Once again, there&#8217;s a problem related to the fact that the<br \/>\npopulation that they&#8217;re sampling isn&#8217;t the same as the population that<br \/>\nthe poll is trying to measure &#8211; so that error factor is outside the<br \/>\nmargin of error.<\/dd>\n<dt><b> Phrasing Bias<\/b><\/dt>\n<dd> You can get significant differences in polls based on how the<br \/>\nquestion is phrased. &#8220;Who are you going to vote for?&#8221; will<br \/>\nlikely generate different results from &#8220;Are you going to vote for Obama or McCain?&#8221;, which will likely generate different results from<br \/>\n&#8220;Do you plan to vote for McCain or Obama?&#8221;, which will generate different results from &#8220;Do you plan to vote for a Democrat or a Republican in the presidential election?&#8221;. This is a well-known problem,<br \/>\nbut it still has a significant effect.<\/dd>\n<dt><b>Dishonest Answers<\/b><\/dt>\n<dd> People aren&#8217;t entirely trustworthy. They don&#8217;t necessarily answer<br \/>\nquestions honestly. A frequently discussed version of this is called<br \/>\nthe Bradley effect. The Bradley effect is a phenomenon where people<br \/>\nare reluctant to admit to being racist. So when a pollster asks them<br \/>\nif they&#8217;re going to vote for a black man, they&#8217;ll say &#8220;yes&#8221;, but when<br \/>\nit actually comes to voting, they&#8217;ll vote for the white guy. I&#8217;ve heard<br \/>\nsome people speculate on a reverse Bradley effect this year in some southern states, where people are reluctant to admit that they&#8217;re going to vote for a black man, so they lie and say they&#8217;re voting McCain. But the truth of the matter is, we don&#8217;t know if the people answering the<br \/>\npolls are answering honestly. If they&#8217;re not, that skews the poll results, and once again, it&#8217;s not covered by the margin of error.<\/dd>\n<\/dl>\n","protected":false},"excerpt":{"rendered":"<p>Before I get to the meat of the post, I want to remind you that our DonorsChoose drive is ending in just a couple of days! A small number of readers have made extremely generous contributions, which is very gratifying. (One person has even taken me up on my offer of letting donors choose topics.) [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[61],"tags":[],"class_list":["post-697","post","type-post","status-publish","format-standard","hentry","category-statistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-bf","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/697","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=697"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/697\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=697"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=697"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=697"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}