Category Archives: statistics

Basics: Standard Deviation

When we look at a the data for a population+ often the first thing we do
is look at the mean. But even if we know that the distribution
is perfectly normal, the mean isn’t enough to tell us what we know to understand what the mean is telling us about the population. We also need
to know something about how the data is spread out around the mean – that is, how wide the bell curve is around the mean.

There’s a basic measure that tells us that: it’s called the standard deviation. The standard deviation describes the spread of the data,
and is the basis for how we compute things like the degree of certainty,
the margin of error, etc.

Continue reading →

Basics: Normal Distributions

Leave a reply

In general, when we gather data, we expect to see a particular pattern to
the data, called a normal distribution. A normal distribution is one
where the data is evenly distributed around the mean in a very regular way,
which when plotted as a
histogram will result in a bell curve. There are a lot of ways of
defining “normal distribution” formally, but the simple intuitive idea of it
is that in a normal distribution, things tend towards the mean – the closer a
value is to the mean, the more you’ll see it; and the number of values on
either side of the mean at any particular distance are equal.

Continue reading →

Basics: Mean, Median, and Mode

Leave a reply

Statistics is something that surrounds us every day – we’re constantly
bombarded with statistics, in the form of polls, tests, ratings, etc. Understanding those statistics can be an important thing, but unfortunately, most people have never been taught just what statistics really mean, how they’re computed, or how to distinguish the different between
statistics used properly, and statistics misused to deceive.

The most basic concept in statistics in the idea of an average. An average is a single number which represents the idea of a typical value. There are three different numbers which can represent the idea of an average value, and it’s important to know which one is being used, and whether or not that is appropriate. The three values are the mean, the median, and the mode.

Continue reading →

Pathetic Statistics from HIV/AIDS Denialists

Leave a reply

While I was on vacation, I got some email from Chris Noble pointing me towards a discussion with some thoroughly innumerate HIV-AIDS denialists. It’s really quite shocking what passes for a reasonable argument among true believers.
The initial stupid statement is from one of Duesberg’s papers, [AIDS Acquired by Drug Consumption and Other Noncontagious Risk Factors][duesberg], and it’s quite a whopper. During a discussion of the infection rates shown by HIV tests of military recruits, he says:
>(a) “AIDS tests” from applicants to the U.S. Army and the U.S. Job
>Corps indicate that between 0.03% (Burke et al.,1990) and 0.3% (St
>Louis et al.,1991) of the 17- to 19-year-old applicants are HIV-infected
>but healthy. Since there are about 90 million Americans under the age
>of 20, there must be between 27,000 and 270,000(0.03%-0.3% of 90
>million) HIV carriers. In Central Africa there are even more, since 1-2%
>of healthy children are HIV-positive (Quinn et al.,1986).
>
>Most, if not all, of these adolescents must have acquired HIV from
>perinatal infection for the following reasons: sexual transmission of
>HIV depends on an average of 1000 sexual contacts, and only 1in 250
>Americans carries HIV (Table 1). Thus, all positive teenagers would
>have had to achieve an absurd 1000 contacts with a positive partner, or
>an even more absurd 250,000 sexual contacts with random Americans
>to acquire HIV by sexual transmission. It follows that probably all of
>the healthy adolescent HIV carriers were perinatally infected, as for
>example the 22-year-old Kimberly Bergalis (Section 3.5.16).
Now, I would think that *anyone* who reads an allegedly scientific paper like this would be capable of seeing the spectacular stupidity in this quotation. But for the sake of pedantry, I’ll explain it using small words.
If the odds of, say, winning the lottery are 1 in 1 million, that does *not* mean that if I won the lottery, that means I must have played it one million times. Nor does it mean that the average lottery winner played the lottery one million times. It means that out of every one million times *anyone* plays the lottery, *one* person will be expected to win.
To jump that back to Duesberg, what he’s saying is: if the transmission rate of HIV/AIDS is 1 in 1000, then the average infected person would need to have had sex with an infected partner 1000 times.
Nope, that’s not how math works. Not even close.
Suppose we have 1000 people who are infected with HIV, and who are having unprotected sex. *If* we follow Duesberg’s lead, and assume that the transmission rate is a constant 0.1%, then what we would expect is that if each of those 1000 people had sex with one partner one time, we would see one new infected individual – and that individual would have had unprotected sex with the infected partner only one time.
This isn’t rocket science folks. This is damned simple, high-school level statistics.
Where things get even sadder is looking at the discussion that followed when Chris posted something similar to the above explanation. Some of the ridiculous contortions that people go through in order to avoid admitting that the great Peter Duesberg said something stupid is just astounding. For example, consider [this][truthseeker] from a poster calling himself “Truthseeker”:
>If Duesberg had said that, he would indeed be foolish. The foolishness,
>however, is yours, since you misintepret his reasoning. He said, as you note
>
>>Most, if not all, of these adolescents must have acquired HIV from perinatal
>>infection for the following reasons: sexual transmission of HIV depends on an
>>average of 1000 sexual contacts, and only 1 in 250 Americans carries HIV
>>(Table 1). Thus, all positive teenagers would have had to achieve an absurd
>>1000 contacts with a positive partner, or an even more absurd 250,000 sexual
>>contacts with random Americans to acquire HIV by sexual transmission.
>
>This states the average transmission requires 1000 contacts, not every
>transmission. With such a low transmission rate and with so few Americans
>positive – you have to engage with 250 partners on average to get an average
>certainty of 100% for transmission, if the transmission rate was 1. Since it is
>1 in 1000, the number you have to get through on average is 250,000. Some might
>do it immediately, some might fail entirely even at 250,000. But the average
>indicates that all positive teenagers would have had to get through on average
>250,000 partner-bouts.
Truthseeker is making exactly the same mistake as Duesberg. The difference is that he’s just had it explained to him using a simple metaphor, and he’s trying to spin a way around the fact that *Duesberg screwed up*.
But it gets even worse. A poster named Claus responded with [this][claus] indignant response to Chris’s use of a metaphor about plane crashes:
>CN,
>
>You would fare so much better if you could just stay with the science
>points and refrain from your ad Duesbergs for more than 2 sentences at
>a time. You know there’s a proverb where I come from that says ‘thief thinks
>every man steals’. I’ve never seen anybody persisting the way you do in
>calling other people ‘liars’, ‘dishonest’ and the likes in spite of the
>fact that the only one shown to be repeatedly and wilfully dishonest
>here is you.
>
>Unlike yourself Duesberg doesn’t deal with matters on a case-by-case only basis
>in order to illustrate his statistical points. precisely as TS says, this shows
>that you’re the one who’s not doing the statistics, only the misleading.
>
>In statistics, for an illustration to have any meaning, one must assume that
>it’s representative of an in the context significant statistical average no?
>Or perphaps in CN’s estimed opinion statistics is all about that once in a
>while when somebody does win in the lottery?
Gotta interject here… Yeah, statistics *is* about that once in a while when someone wins the lottery, or when someone catches HIV, or when someone dies in a plane crash. It’s about measuring things by looking at aggregate numbers for a population. *Any* unlikely event follows the same pattern, whether it’s catching HIV, winning the lottery, or dying in a plane crash, and that’s one of the things that statistics is specifically designed to talk about: that fundamental probabilistic pattern.
>But never mind we’ll let CN have the point; the case in question was that odd
>one out, and Duesberg was guilty of the gambler’s fallacy. ok? You scored one
>on Duesberg, happy now? Good. So here’s the real statistical point abstracted,
>if you will, from the whole that’s made up by all single cases, then applied to
>the single case in question:
>
>>Thus, all positive teenagers would have had to achieve an absurd 1000 contacts
>>with a positive partner, or an even more absurd 250,000 sexual contacts with
>>random Americans to acquire HIV by sexual transmission.
>
>This is the statistical truth, which is what everybody but CN is interested in.
Nope, this is *not* statistical truth. This is an elementary statistical error which even a moron should be able to recognize.
>Reminder: Whenever somebody shows a pattern of pedantically reverting to single
>cases and/or persons, insisting on interpreting them out of all context, it’s
>because they want to divert your attention from real issues and blind you to
>the overall picture.
Reminder: whenever someone shows a pattern of pedantically reverting to a single statistic, insisting on interpreting it in an entirely invalid context, it’s because they want to divert your attention from real issues and blind you to the overall picture.
The 250,000 average sexual contacts is a classic big-numbers thing: it’s so valuable to be able to come up with an absurd number that people will immediately reject, and assign it to your opponents argument. They *can’t* let this go, no matter how stupid it is, no matter how obviously wrong. Because it’s so important to them to be able to say “According to *their own statistics*, the HIV believers are saying that the average teenage army recruit has had sex 250,000 times!”. As long as they can keep up the *pretense* of a debate around the validity of that statistic, they can keep on using it. So no matter how stupid, they’ll keep defending the line.
[duesberg]: www.duesberg.com/papers/1992%20HIVAIDS.pdf
[truthseeker]: http://www.newaidsreview.org/posts/1155530746.shtml#1487
[claus]: http://www.newaidsreview.org/posts/1155530746.shtml#1496

Messing with big numbers: using probability badly

Leave a reply

After yesterdays post about the sloppy probability from ann coulter’s chat site, I thought it would be good to bring back one of the earliest posts on Good Math/Bad Math back when it was on blogger. As usual with reposts, I’ve revised it somewhat, but the basic meat of it is still the same.
——————–
There are a lot of really bad arguments out there written by anti-evolutionists based on incompetent use of probability. A typical example is [this one][crapcrap]. This article is a great example of the mistakes that commonly get made with probability based arguments, because it makes so many of them. (In fact, it makes every single category of error that I list below!)
Tearing down probabilistic arguments takes a bit more time than tearing down the information theory arguments. 99% of the time, the IT arguments are built around the same fundamental mistake: they’ve built their argument on an invalid definition of information. But since they explicitly link it to mathematical information theory, all you really need to do is show why their definition is wrong, and then the whole thing falls apart.
The probabilistic arguments are different. There isn’t one mistake that runs through all the arguments. There’s many possibly mistakes, and each argument typically stacks up multiple errors.
For the sake of clarity, I’ve put together a taxonomy of the basic probabilistic errors that you typically see in creationist screeds.
Big Numbers
————-
This is the easiest one. This consists of using our difficulty in really comprehending how huge numbers work to say that beyond a certain probability, things become impossible. You can always identify these argument, by the phrase “the probability is effectively zero.”
You typically see people claiming things like “Anything with a probability of less than 1 in 10^60 is effectively impossible”. It’s often conflated with some other numbers, to try to push the idea of “too improbable to ever happen”. For example, they’ll often throw in something like “the number of particles in the entire universe is estimated to be 3×10^78, and the probability of blah happening is 1 in 10^100, so blah can’t happen”.
It’s easy to disprove. Take two distinguishable decks of cards. Shuffle them together. Look at the ordering of the cards – it’s a list of 104 elements. What’s the probability of *that particular ordering* of those 104 elements?
The likelihood of the resulting deck of shuffled cards having the particular ordering that you just produced is roughly 1 in 10¹⁶⁶. There are more possible unique shuffles of two decks of cards than there are particles in the entire universe.
If you look at it intuitively, it *seems* like something whose probability is
100 orders of magnitude worse than the odds of picking out a specific particle in the entire observable universe *should* be impossible. Our intuition says that any probability with a number that big in its denominator just can’t happen. Our intuition is wrong – because we’re quite bad at really grasping the meanings of big numbers.
Perspective Errors
———————
A perspective error is a relative of big numbers error. It’s part of an argument to try to say that the probability of something happening is just too small to be possible. The perspective error is taking the outcome of a random process – like the shuffling of cards that I mentioned above – and looking at the outcome *after* the fact, and calculating the likelihood of it happening.
Random processes typically have a huge number of possible outcomes. Anytime you run a random process, you have to wind up with *some* outcome. There may be a mind-boggling number of possibilities; the probability of getting any specific one of them may be infinitessimally small; but you *will* end up with one of them. The probability of getting an outcome is 100%. The probability of your being able to predict which outcome is terribly small.
The error here is taking the outcome of a random process which has already happened, and treating it as if you were predicting it in advance.
The way that this comes up in creationist screeds is that they do probabilistic analyses of evolution built on the assumption that *the observed result is the only possible result*. You can view something like evolution as a search of a huge space; at any point in that spaces, there are *many* possible paths. In the history of life on earth, there are enough paths to utterly dwarf numbers like the card-shuffling above.
By selecting the observed outcome *after the fact*, and then doing an *a priori* analysis of the probability of getting *that specific outcome*, you create a false impression that something impossible happened. Returning to the card shuffling example, shuffling a deck of cards is *not* a magical activity. Getting a result from shuffling a deck of cards is *not* improbable. But if you take the result of the shuffle *after the fact*, and try to compute the a priori probability of getting that result, you can make it look like something inexplicable happened.
Bad Combinations
——————–
Combining the probabilities of events can be very tricky, and easy to mess up. It’s often not what you would expect. You can make things seem a lot less likely than they really are by making a easy to miss mistake.
The classic example of this is one that almost every first-semester probability instructor tries in their class. In a class of 20 people, what’s the probability of two people having the same birthday? Most of the time, you’ll have someone say that the probability of any two people having the same birthday is 1/365²; so the probability of that happening in a group of 20 is the number of possible pairs over 365², or 400/365², or about 1/3 of 1 percent.
That’s the wrong way to derive it. There’s more than one error there, but I’ve seen three introductory probability classes where that was the first guess. The correct answer is very close to 50%.
Fake Numbers
————–
To figure out the probability of some complex event or sequence of events, you need to know some correct numbers for the basic events that you’re using as building blocks. If you get those numbers wrong, then no matter how meticulous the rest of the probability calculation is, the result is garbage.
For example, suppose I’m analyzing the odds in a game of craps. (Craps is a casino dice game using six sided dice.) If I say that in rolling a fair die, the odds of rolling a 6 is 1/6th the odds of rolling a one, then any probabilistic prediction that I make is going to be wrong. It doesn’t matter that from that point on, I do all of the analysis exactly right. I’ll get the wrong results, because I started with the wrong numbers.
This one is incredibly common in evolution arguments: the initial probability numbers are just pulled out of thin air, with no justification.
Misshapen Search Space
————————-
When you model a random process, one way of doing it is by modeling it as a random walk over a search space. Just like the fake numbers error, if your model of the search space has a different shape than the thing you’re modeling, then you’re not going to get correct results. This is an astoundingly common error in anti-evolution arguments; in fact, this is the basis of Dembski’s NFL arguments.
Let’s look at an example to see why it’s wrong. We’ve got a search space which is a table. We’ve got a marble that we’re going to roll across the table. We want to know the probability of it winding up in a specific position.
That’s obviously dependent on the surface of the table. If the surface of the table is concave, then the marble is going to wind up in nearly the same spot every time we try it: the lowest point of the concavity. If the surface is bumpy, it’s probably going to wind up a concavity between bumps. It’s *not* going to wind up balanced on the tip of one of the bumps.
If we want to model the probability of the marble stopping in a particular position, we need to take the shape of the surface of the table into account. If the table is actually a smooth concave surface, but we build our probabilistic model on the assumption that the table is a flat surface covered with a large number of uniformly distributed bumps, then our probabilistic model *can’t* generate valid results. The model of the search space does not reflect the properties of the actual search space.
Anti-evolution arguments that talk about search are almost always built on invalid models of the search space. Dembski’s NFL is based on a sum of the success rates of searches over *all possible* search spaces.
False Independence
———————
If you want to make something appear less likely than it really is, or you’re just not being careful, a common statistical mistake is to treat events as independent when they’re not. If two events with probability p₁ and p₂ are independent, then the probability of both p₁ and p₂ is p₁×p₂. But if they’re *not* independent, then you’re going to get the wrong answer.
For example, take all of the spades from a deck of cards. Shuffle them, and them lay them out. What are the odds that you laid them out in numeric order? It’s 1/13! = 1/6,227,020,800. That’s a pretty ugly number. But if you wanted to make it look even worse, you could “forget” the fact that the sequential draws are dependent, in which case the odds would be 1/13¹³ – or 1/3×10¹⁴ – about 50,000 times worse.
[crapcrap]: http://www.parentcompany.com/creation_essays/essay44.htm

Big Numbers: Bad Anti-Evolution Crap from anncoulter.com

Leave a reply

A reader sent me a copy of an article posted to “chat.anncoulter.com”. I can’t see the original article; anncoulter.com is a subscriber-only site, and I’ll be damned before I *register* with that site.
Fortunately, the reader sent me the entire article. It’s another one of those stupid attempts by creationists to assemble some *really big* numbers in order to “prove” that evolution is impossible.
>One More Calculation
>
>The following is a calculation, based entirely on numbers provided by
>Darwinists themselves, of the number of small selective steps evolution would
>have to make to evolve a new species from a previously existing one. The
>argument appears in physicist Lee Spetner’s book “Not By Chance.”
>
>At the end of this post — by “popular demand” — I will post a bibliography of
>suggested reading on evolution and ID.
>
>**********************************************
>
>Problem: Calculate the chances of a new species emerging from an earlier one.
>
>What We Need to Know:
>
>(1) the chance of getting a mutation;
>(2) the fraction of those mutations that provide a selective advantage (because
>many mutations are likely either to be injurious or irrelevant to the
>organism);
>(3) the number of replications in each step of the chain of cumulative >selection;
>(4) the number of those steps needed to achieve a new species.
>
>If we get the values for the above parameters, we can calculate the chance of
>evolving a new species through Darwinian means.
Fairly typical so far. Not *good* mind you, but typical. Of course, it’s already going wrong. But since the interesting stuff is a bit later, I won’t waste my time on the intro 🙂
Right after this is where this version of this argument turns particularly sad. The author doesn’t just make the usual big-numbers argument; they recognize that the argument is weak, so they need to go through some rather elaborate setup in order to stack things to produce an even more unreasonably large phony number.
It’s not just a big-numbers argument; it’s a big-numbers *strawman* argument.
>Assumptions:
>
>(1) we will reckon the odds of evolving a new horse species from an earlier
>horse species.
>
>(2) we assume only random copying errors as the source of Darwinian variation.
>Any other source of variation — transposition, e.g., — is non-random and
>therefore NON-DARWINIAN.
This is a reasonable assumption, you see, because we’re not arguing against *evolution*; we’re arguing against the *strawman* “Darwinism”, which arbitrarily excludes real live observed sources of variation because, while it might be something that really happens, and it might be part of real evolution, it’s not part of what we’re going to call “Darwinism”.
Really, there are a lot of different sources of variation/mutation. At a minimum, there are point mutations, deletions (a section getting lost while copying), insertions (something getting inserted into a sequence during copying), transpositions (something getting moved), reversals (something get flipped so it appears in the reverse order), fusions (things that were separate getting merged – e.g., chromasomes in humans vs. in chimps), and fissions (things that were a single unit getting split).
In fact, this restriction *a priori* makes horse evolution impossible; because the modern species of horses have *different numbers of chromasomes*. Since the only change he allows is point-mutation, there is no way that his strawman Darwinism can do the job. Which, of course, is the point: he *wants* to make it impossible.
>(3) the average mutation rate for animals is 1 error every 10^10 replications
>(Darnell, 1986, “Molecular Cell Biology”)
Nice number, shame he doesn’t understand what it *means*. That’s what happens when you don’t bother to actually look at the *units*.
So, let’s double-check the number, and discover the unit. Wikipedia reports the human mutation rate as 1 in 10⁸ mutations *per nucleotide* per generation.
He’s going to build his argument on 1 mutation in every 10^10 reproductions *of an animal*, when the rate is *per nucleotide*, *per cell generation*.
So what does that tell us if we’re looking at horses? Well, according to a research proposal to sequence the domestic horse genome, it consists of 3×10⁹ nucleotides. So if we go by wikipedia’s estimate of the mutation rate, we’d expect somewhere around 30 mutations per individual *in the fertilized egg cell*. Using the numbers by the author of this wretched piece, we’d still expect to see 1 out of every three horses contain at least one unique mutation.
The fact is, pretty damned nearly every living thing on earth – each and every human being, every animal, every plant – each contains some unique mutations, some unique variations in their genetic code. Even when you start with a really big number – like one error in every 10¹⁰ copies; it adds up.
>(4) To be part of a typical evolutionary step, the mutation must: (a) have a
>positive selective value; (b) add a little information to the genome ((b) is a
>new insight from information theory. A new species would be distinguished from
>the old one by reason of new abilities or new characteristics. New
>characteristics come from novel organs or novel proteins that didn’t exist in
>the older organism; novel proteins come from additions to the original genetic
>code. Additions to the genetic code represent new information in the genome).
I’ve ripped apart enough bullshit IT arguments, so I won’t spend much time on that, other to point out that *deletion* is as much of a mutation, with as much potential for advantage, as *addition*.
A mutation also does not need to have an immediate positive selective value. It just needs to *not* have negative value, and it can propagate through a subset of the population. *Eventually*, you’d usually (but not always! drift *is* an observed phenomenon) expect to see some selective value. But that doesn’t mean that *at the moment the mutation occurs*, it must represent an *immediate* advantage for the individual.
>(5) We will also assume that the minimum mutation — a point mutation — is
>sufficient to cause (a) and (b). We don’t know if this is n fact true. We don’t
>know if real mutations that presumably offer positive selective value and small
>information increases can always be of minimum size. But we shall assume so
>because it not only makes the calculation possible, but it also makes the
>calculation consistently Darwinian. Darwinians assume that change occurs over
>time through the accumulation of small mutations. That’s what we shall assume,
>as well.
Note the continued use of the strawman. We’re not talking about evolution here; We’re talking about *Darwinism* as defined by the author. Reality be damned; if it doesn’t fit his Darwinism strawman, then it’s not worth thinking about.
>Q: How many small, selective steps would we need to make a new species?
>
>A: Clearly, the smaller the steps, the more of them we would need. A very
>famous Darwinian, G. Ledyard Stebbins, estimated that to get to a new species
>from an older species would take about 500 steps (1966, “Processes of Organic
>Evolution”).
>
>So we will accept the opinion of G. Ledyard Stebbins: It will take about 500
>steps to get a new species.
Gotta love the up-to-date references, eh? Considering how much the study of genetics has advanced in the last *40 years*, it would be nice to cite a book younger than *me*.
But hey, no biggie. 500 selective steps between speciation events? Sounds reasonable. That’s 500 generations. Sure, we’ve seen speciation in less than 500 generations, but it seems like a reasonable guestimate. (But do notice the continued strawman; he reiterates the “small steps” gibberish.)
>Q: How many births would there be in a typical small step of evolution?
>
>A: About 50 million births / evolutionary step. Here’s why:
>
>George Gaylord Simpson, a well known paleontologist and an authority on horse
>evolution estimated that the whole of horse evolution took about 65 million
>years. He also estimated there were about 1.5 trillion births in the horse
>line. How many of these 1.5 trillion births could we say represented 1 step in
>evolution? Experts claim the modern horse went through 10-15 genera. If we say
>the horse line went through about 5 species / genus, then the horse line went
>through about 60 species (that’s about 1 million years per species). That would
>make about 25 billion births / species. If we take 25 billion and divided it by
>the 500 steps per species transition, we get 50 million births / evolutionary
>step.
>
>So far we have:
>
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
Here we see some really stupid mathematical gibberish. This is really pure doubletalk – it’s an attempt to generate *another* large number to add into the mix. There’s no purpose in it: we’ve *already* worked out the mutation rate and the number of mutations per speciation. This gibberish is an alternate formulation of essentially the same thing; a way of gauging how long it will take to go through a sequence of changes leading to speciation. So we’re adding an redundant (and meaningless) factor in order to inflate the numbers.
>Q: What’s the chance that a mutation in a particular nucleotide will occur and
>take over the population in one evolutionary step?
>
>A: The chance of a mutation in a specific nucleotide in one birth is 10^-10.
>Since there are 50 million births / evolutionary step, the chance of getting at
>least one mutation in the whole step is 50 million x 10^-10, or 1-in-200
>(1/200). For the sake of argument we can assume that there is an equal chance
>that the base will change to any one of the other three (not exactly true in
>the real world, but we can assume to make the calculation easier – you’ll see
>that this assumption won’t influence things so much in the final calculation);
>so the chance of getting specific change in a specific nucleotide is 1/3rd of
>1/200 or 1-in-600 (1/600).
>
>So far we have:
>
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
>1/600 chance of a point mutation taking over the population in 1 evolutionary >step (derived from numbers by Darnell in his standard reference book)
This is pure gibberish. It’s so far away from being a valid model of things that it’s laughable. But worse, again, it’s redundant. Because we’ve already introduced a factor based on the mutation rate; and then we’ve introduced a factor which was an alternative formulation of the mutation rate; and now, we’re introducing a *third* factor which is an even *worse* alternative formulation of the mutation rate.
>Q: What would the “selective value” have to be of each mutation?
>
>A: According to the population-genetics work of Sir Ronald Fisher, the chances
>of survival for a mutant is about 2 x (selective value).
>”Selective Value” is a number that is ASSIGNED by a researcher to a species in
>order to be able to quantify in some way its apparent fitness. Selective Value
>is the fraction by which its average number of surviving offspring exceeds that
>of the population norm. For example, a mutant whose average number of surviving
>offspring is 0.1% higher than the rest of the population would have a Selective
>Value = 0.1% (or 0.001). If the norm in the population were such that 1000
>offspring usually survived from the original non-mutated organism, 1001
>offspring would usually survive from the mutated one. Of course, in real life,
>we have no idea how many offspring will, IN FACT, survive any particular
>organism – which is the reason that Survival Value is not something that you go
>into the jungle and “measure.” It’s a special number that is ASSIGNED to a
>species; not MEASURED in it (like a species’ average height, weight, etc.,
>which are objective attributes that, indeed, can we can measure).
>
>Fisher’s statistical work showed that a mutant with a Selective Value of 1% has
>a 2% chance of survival in a large population. A chance of 2-in-100 is that
>same as a chance of 1-in-50. If the Selective Value were 1/10th of that, or
>0.1%, the chance would be 1/10th of 2%, or about 0.2%, or 1-in-500. If the
>Selective Value were 1/100th of 1%, the chance of survival would be 1/100th of
>2%, or 0.02%, or 1-in-5000.
>
>We need a Selection Value for our calculation because it tells us what the
>chances are that a mutated species will survive. What number should we use? In
>the opinion of George Gaylord Simpson, a frequent value is 0.1%. So we shall
>use that number for our calculation. Remember, that’s a 1-in-500 chance of
>survival.
>
>So far we have:
>
>500 evolutionary steps/new species (as per Stebbins)
>50 million births/evolutionary step (derived from numbers by G. G. Simpson)
>1/600 chance of a point mutation taking over the population in 1 evolutionary
>step (derived from numbers by Darnell in his standard reference book)
>1/500 chance that a mutant will survive (as per G. G. Simpson)
And, once again, *another* meaningless, and partially redundant factor added in.
Why meaningless? Because this isn’t how selection works. He’s using his Darwinist strawman again: everything must have *immediate* *measurable* survival advantage. He also implicitly assumes that mutation is *rare*; that is, a “mutant” has a 1-in-500 chance of seeing its mutated genes propagate and “take over” the population. That’s not at all how things work. *Every* individual is a mutant. In reality, *every* *single* *individual* possesses some number of unique mutations. If they reproduce, and the mutation doesn’t *reduce* the likelihood of its offspring’s survival, the mutation will propagate through the generations to some portion of the population. The odds of a mutation propagating to some reasonable portion of the population over a number of generations is not 1 in 500. It’s quite a lot better.
Why partially redundant? Because this. once again, factors in something which is based on the rate of mutation propagating through the population. We’ve already included that twice; this is a *third* variation on that.
>Already, however, the numbers don’t crunch all that well for evolution.
>
>Remember, probabilities multiply. So the probability, for example, that a point
>mutation will BOTH occur AND allow the mutant to survive is the product of the
>probabilities of each, or 1/600 x 1/500 = 1/300,000. Not an impossible number,
>to be sure, but it’s not encouraging either … and it’s going to get a LOT
>worse. Why? Because…
**Bzzt. Bad math alert!**
No, these numbers *do not multiply*. Probabilities multiply *when they are independent*. These are *not* independent factors.
>V.
>
>Q. What are the chances that (a) a point mutation will occur, (b) it will add
>to the survival of the mutant, and (c) the last two steps will occur at EACH of
>the 500 steps required by Stebbins’ statement that the number of evolutionary
>steps between one species and another species is 500?
See, this is where he’s been going all along.
* He created the darwinian strawman to allow him to create bizzare requirements.
* Then he added a ton of redundant factors.
* Then he combined probabilities as if they were independent when they weren’t.
* and *now* he adds a requirement for simultaneity which has no basis in reality.
>A: The chances are:
>
>The product of 1/600 x 1/500 multiplied by itself 500 times (because it has to
>happen at EACH evolutionary step). Or,
>
>Chances of Evolutionary Step 1: 1/300,000 x
>Chances of Evolutionary Step 2: 1/300,000 x
>Chances of Evolution Step 3: 1/300,000 x …
>. . . Chances of Evolution Step 500: 1/300,000
>
>Or,
>
>1/300,000^500
*Giggle*, *snort*. I seriously wonder if he actually believe this gibberish. But this is just silly. For the reasons mentioned above: this is taking the redundant factors that he already pushed into each step, inflating them by adding the simultaneity requirement, and then *exponentiating* them.
>This is approximately equal to:
>
>2.79 x 10^-2,739
>
>A number that is effectively zero.
As I’ve said before: no one who understands math *ever* uses the phrase *effectively zero* in a mathematical argument. There is no such thing as effectively zero.
On a closing note, this entire thing, in addition to being both an elaborate strawman *and* a sloppy big numbers argument is also an example of another kind of mathematical error, which I call a *retrospective error*. A retrospective error is when you take the outcome of a randomized process *after* it’s done, treat it as the *only possible outcome*, and compute the probability of it happening.
A simple example of this is: shuffle a deck of cards. What’s the odds of the particular ordering of cards that you got from the shuffle? 1/52! = 1/(8 * 10⁶⁷). If you then ask “What was the probability of a shuffling of cards resulting in *this order*?”, you get that answer: 1 in 8 * 10⁶⁷ – an incredibly unlikely event. But it *wasn’t* an unlikely event; viewed from the proper perspective, *some* ordering had to happen: any result of the shuffling process would have the same probability – but *one* of them had to happen. So the odds of getting a result whose *specific* probability is 1 in 8 * 10⁶⁷ was actually 1 in 1.
The entire argument that our idiot friend made is based on this kind of an error. It assumes a single unique path – a single chain of specific mutations happening in a specific order – and asks about the likelihood that *single chain* leading to a *specific result*.
But nothing ever said that the primitive ancestors of the modern horse *had* to evolve into the modern horse. If they weren’t to just go extinct, they would have to evolve into *something*; but demanding that the particular observed outcome of the process be the *only possibility* is simply wrong.

Yet Another Crappy Bayesian Argument

81 Replies

A reader sent me a link to yet another purported Bayesian argument for the existence of god, this time by a physicist named Stephen Unwin. It’s actually very similar to Swinburne’s argument, which I discussed back at the old home of this blog. The difference is the degree of *dishonesty* demonstrated by the author.

As usual, you can only see the entire argument if you buy his book. But from a number of reviews of the book, and a self-interview posted on his personal website, we can get the gist. Scientific American’s review has the best concise description of his argument that I could find: (the equation in it is retyped by me.)

Unwin rejects most scientific attempts to prove the divine–such as the anthropic principle and intelligent design–concluding that this “is not the sort of evidence that points in either direction, for or against.” Instead he employs Bayesian probabilities, a statistical method devised by 18th-century Presbyterian minister and mathematician Reverend Thomas Bayes. Unwin begins with a 50 percent probability that God exists (because 50-50 represents “maximum ignorance”), then applies a modified Bayesian theorem:

P_after = P_before×D/(P_before×D + 100% -P_before)

The probability of God’s existence after the evidence is considered is afunction of the probability before times D (“Divine Indicator Scale”): 10 indicates the evidence is 10 times as likely to be produced if God exists, 2 is two times as likely if God exists, 1 is neutral, 0.5 is moderately more likely if God does not exist, and 0.1 is much more likely if God does not exist. Unwin offers the following figures for six lines of evidence: recognition of goodness (D = 10), existence of moral evil (D = 0.5), existence of natural evil (D = 0.1), intranatural miracles (prayers) (D = 2), extranatural miracles (resurrection) (D = 1), and religious experiences (D = 2).

Plugging these figures into the above formula (in sequence, where the Pafter figure for the first computation is used for the Pbefore figure in the second computation, and so on for all six Ds), Unwin concludes: “The probability that God exists is 67%.” Remarkably, he then confesses: “This number has a subjective element since it reflects my assessment of the evidence. It isn’t as if we have calculated the value of pi for the first time.”

It’s pretty clear looking at this that the argument is nothing more than “I assert God exists, therefore God exists”. The “probability” result is generated by pulling numbers at random for his D-value. Even he admits that the numbers are “subjective”, but I would go much further than that: the numbers are fundamentally built on the assumption of the existence of god. How can you pretend that you haven’t already accepted the assumption that god exists, and then use stories about the occurrence of divine interventions as facts?

But this doesn’t touch on the reason that I call him dishonest. So far, it’s just sloppiness; typical of the sloppy reasoning of religious people trying to make arguments for the existence of god. But then, on his website, there’s a little self-interview:

Q: So does He exist?

SDU: God?

Q: Yes.

SDU: I don’t know. Although my book does expand on this response.

It goes on like that. He claims to not know; to not have a belief about whether or not there is a god; that his book is an honest enquiry by someone uncertain, trying to use evidence to reason about whether or not god exists.

He’s lying. Plain and simple. Everything about his argument is completely predicated on his acceptance of the existence of god. And there’s no way that he’s dumb enough to not know that. But the argument seems so much more convincing to a layman if the author isn’t sure, but is just carefully working through the probabilities. And that final figure: exactly 2/3s… It’s nicely convenient. After all, he’s not saying he’s sure; but he’s saying that an objective review of the evidence gives a number that makes it look good, while not certain – it preserves that illusion of objectivity.

This guy is using his scientific background to give him authority as someone who understands how this kind of math works; and then he’s lying about his intentions in order to increase the credibility of his argument.

Why I Hate Religious Bayesians

Leave a reply

Last night, a reader sent me a link to yet another wretched attempt to argue for the existence of God using Bayesian probability. I really hate that. Over the years, I’ve learned to dread Bayesian arguments, because so many of them are things like this, where someone cobbles together a pile of nonsense, dressing it up with a gloss of mathematics by using Bayesian methods. Of course, it’s always based on nonsense data; but even in the face of a lack of data, you can cobble together a Bayesian argument by pretending to analyze things in order to come up with estimates.

You know, if you want to believe in God, go ahead. Religion is ultimately a matter of personal faith and spirituality. Arguments about the existence of God always ultimately come down to that. Why is there this obsessive need to justify your beliefs? Why must science and mathematics be continually misused in order to prop up your belief?

Anyway… Enough of my whining. Let’s get to the article. It’s by a guy named Robin Collins, and it’s called “God, Design, and Fine-Tuning“.

Let’s start right with the beginning.

Suppose we went on a mission to Mars, and found a domed structure in which everything was set up just right for life to exist. The temperature, for example, was set around 70o F and the humidity was at 50%; moreover, there was an oxygen recycling system, an energy gathering system, and a whole system for the production of food. Put simply, the domed structure appeared to be a fully functioning biosphere. What conclusion would we draw from finding this structure? Would we draw the conclusion that it just happened to form by chance? Certainly not. Instead, we would unanimously conclude that it was designed by some intelligent being. Why would we draw this conclusion? Because an intelligent designer appears to be the only plausible explanation for the existence of the structure. That is, the only alternative explanation we can think of–that the structure was formed by some natural process–seems extremely unlikely. Of course, it is possible that, for example, through some volcanic eruption various metals and other compounds could have formed, and then separated out in just the right way to produce the “biosphere,” but such a scenario strikes us as extraordinarily unlikely, thus making this alternative explanation unbelievable.

The universe is analogous to such a “biosphere,” according to recent findings in physics. Almost everything about the basic structure of the universe–for example, the fundamental laws and parameters of physics and the initial distribution of matter and energy–is balanced on a razor’s edge for life to occur. As eminent Princeton physicist Freeman Dyson notes, “There are many . . .lucky accidents in physics. Without such accidents, water could not exist as liquid, chains of carbon atoms could not form complex organic molecules, and hydrogen atoms could not form breakable bridges between molecules” (1979, p.251)–in short, life as we know it would be impossible.

Yes, it’s the good old ID argument about “It looks designed, so it must be”. That’s the basic argument all the way through; they just dress it up later. And as usual, it’s wrapped up in one incredibly important assumption, which they cannot and do not address: that we understand what it would mean to change the fundamental structure of the universe.

What would it mean to change, say, the ratio of the strengths of the electromagnetic force and gravity? What would matter look like if we did? Would stars be able to exist? Would matter be able to form itself into the kinds of complex structures necessary for life?

We don’t know. In fact, we don’t even really have a clue. And not knowing that, we cannot meaningfully make any argument about how likely it is for the universe to support life.

They do pretend to address this:

Various calculations show that the strength of each of the forces of nature must fall into a very small life-permitting region for intelligent life to exist. As our first example, consider gravity. If we increased the strength of gravity on earth a billionfold, for instance, the force of gravity would be so great that any land-based organism anywhere near the size of human beings would be crushed. (The strength of materials depends on the electromagnetic force via the fine-structure constant, which would not be affected by a change in gravity.) As astrophysicist Martin Rees notes, “In an imaginary strong gravity world, even insects would need thick legs to support them, and no animals could get much larger.” (Rees, 2000, p. 30). Now, the above argument assumes that the size of the planet on which life formed would be an earth-sized planet. Could life forms of comparable intelligence to ourselves develop on a much smaller planet in such a strong-gravity world? The answer is no. A planet with a gravitational pull of a thousand times that of earth — which would make the existence of organisms of our size very improbable– would have a diameter of about 40 feet or 12 meters, once again not large enough to sustain the sort of large-scale ecosystem necessary for organisms like us to evolve. Of course, a billion-fold increase in the strength of gravity is a lot, but compared to the total range of strengths of the forces in nature (which span a range of 1040 as we saw above), this still amounts to a fine-tuning of one part in 1031. (Indeed,other calculations show that stars with life-times of more than a billion years, as compared to our sun’s life-time of ten billion years, could not exist if gravity were increased by more than a factor of 3000. This would have significant intelligent life-inhibiting consequences.) (3)

Does this really address the problem? No. How would matter be different if gravity were a billion times stronger, and EM didn’t change? We don’t know. For the sake of this argument, they pretend that mucking about with those ratios wouldn’t alter the nature of matter at all. That’s what they’re going to build their argument on: the universe must support life exactly like us: it’s got to be carbon-based life on a planetary surface that behaves exactly like matter does in our universe. In other words: if you assume that everything has to be exactly as it is in our universe, then only our universe is suitable.

They babble on about this for quite some time; let’s skip forwards a bit, to where they actually get to the Bayesian stuff. What they want to do is use the likelihood principle to argue for design. (Of course, they need to obfuscate, so they cite it under three different names, and finally use the term “the prime principle of confirmation” – after all, it sounds much more convincing than “the likelihood principle”!)

The likelihood principle is a variant of Bayes’ theorem, applied to experimental systems. The basic idea of it is to take the Bayesian principle of modifying an event probability based on a prior observation, and to apply it backwards to allow you to reason about the probability of two possible priors given a final observation. In other words, take the usual Bayesian approach of asking: “Given that Y has already occurred, what’s the probability of X occurring?”; turn it around, and say “X occurred. For it to have occurred, either Y or Z must have occurred as a prior. Given X, what are the relative probabilities for Y and Z as priors?”

There is some controversy over when the likelihood principle is applicable. But let’s ignore that for now.

To further develop the core version of the fine-tuning argument, we will summarize the argument by explicitly listing its two premises and its conclusion:

Premise 1. The existence of the fine-tuning is not improbable under theism.

Premise 2. The existence of the fine-tuning is very improbable under the atheistic single-universe hypothesis. (8)

Conclusion: From premises (1) and (2) and the prime principle of confirmation, it follows that the fine-tuning data provides strong evidence to favor of the design hypothesis over the atheistic single-universe hypothesis.

At this point, we should pause to note two features of this argument. First, the argument does not say that the fine-tuning evidence proves that the universe was designed, or even that it is likely that the universe was designed. Indeed, of itself it does not even show that we are epistemically warranted in believing in theism over the atheistic single-universe hypothesis. In order to justify these sorts of claims, we would have to look at the full range of evidence both for and against the design hypothesis, something we are not doing in this paper. Rather, the argument merely concludes that the fine-tuning strongly supports theism over the atheistic single-universe hypothesis.

That’s pretty much their entire argument. That’s as mathematical as it gets. Doesn’t stop them from arguing that they’ve mathematically demonstrated that theism is a better hypothesis than atheism, but that’s really their whole argument.

Here’s how they argue for their premises:

Support for Premise (1).

Premise (1) is easy to support and fairly uncontroversial. The argument in support of it can be simply stated as follows: since God is an all good being, and it is good for intelligent, conscious beings to exist, it not surprising or improbable that God would create a world that could support intelligent life. Thus, the fine-tuning is not improbable under theism, as premise (1) asserts.

Classic creationist gibberish: pretty much the same stunt that Swinburne pulled. They pretend that there are only two possibilities. Either (a) there’s exactly one God which has exactly the properties that Christianity attributes to it; or (b) there are no gods of any kind.

They’ve got to stick to that – because if they admitted more than two possibilities, they’d have to actually consider why their deity is more likely that any of the other possibilities. They can’t come up with an argument that Christianity is better than atheism if they acknowledge that there are thousands of possibilities as likely as theirs.

Support for Premise (2).

Upon looking at the data, many people find it very obvious that the fine-tuning is highly improbable under the atheistic single-universe hypothesis. And it is easy to see why when we think of the fine-tuning in terms of the analogies offered earlier. In the dart-board analogy, for example, the initial conditions of the universe and the fundamental constants of physics can be thought of as a dart- board that fills the whole galaxy, and the conditions necessary for life to exist as a small one-foot wide target. Accordingly, from this analogy it seems obvious that it would be highly improbable for the fine-tuning to occur under the atheistic single-universe hypothesis–that is, for the dart to hit the board by chance.

Yeah, that’s pretty much it. The whole argument for why fine-tuning is less probably in a universe without a deity than in a universe with one. Because “many people find it obvious”, and because they’ve got a clever dartboard analogy.

They make a sort of token effort to address the obvious problems with this, but they’re really all nothing but more empty hand-waving. I’ll just quote one of them as an example; you can follow the link to the article to see the others if you feel like giving yourself a headache.

Another objection people commonly raise against the fine-tuning argument is that as far as we know, other forms of life could exist even if the constants of physics were different. So, it is claimed, the fine-tuning argument ends up presupposing that all forms of intelligent life must be like us. One answer to this objection is that many cases of fine-tuning do not make this presupposition. Consider, for instance, the cosmological constant. If the cosmological constant were much larger than it is, matter would disperse so rapidly that no planets, and indeed no stars could exist. Without stars, however, there would exist no stable energy sources for complex material systems of any sort to evolve. So, all the fine-tuning argument presupposes in this case is that the evolution of life forms of comparable intelligence to ourselves requires some stable energy source. This is certainly a very reasonable assumption.

Of course, if the laws and constants of nature were changed enough, other forms of embodied intelligent life might be able to exist of which we cannot even conceive. But this is irrelevant to the fine-tuning argument since the judgement of improbability of fine-tuning under the atheistic single-universe hypothesis only requires that, given our current laws of nature, the life-permitting range for the values of the constants of physics (such as gravity) is small compared to the surrounding range of non-life-permitting values.

Like I said at the beginning: the argument comes down to a hand-wave that if the universe didn’t turn out exactly like ours, it must be no good. Why does a lack of hydrogen fusion stars like we have in our universe imply that there can be no other stable energy source? Why is it reasonable to constrain the life-permitting properties of the universe to be narrow based on the observed properties of the laws of nature as observed in our universe?

Their argument? Just because.

Dishonest Dembski:the Universal Probability Bound

Dishonest Dembski:the Universal Probability Bound
One of the dishonest things that Dembski frequently does that really bugs me is take bogus arguments, and dress them up using mathematical terminology and verbosity to make them look more credible.
An example of this is Dembski’s *universal probability bound*. Dembski’s definition of the UPB from the [ICSID online encyclopedia][upb-icsid] is:
>A degree of improbability below which a specified event of that probability
>cannot reasonably be attributed to chance regardless of whatever
>probabilitistic resources from the known universe are factored in. Universal
>probability bounds have been estimated anywhere between 10^-50 (Emile Borel)
>and 10^-150 (William Dembski).
He’s quantified it in several different ways. I’ve found three different versions of the calculation of the UPB: two of them from wikipedia; one is from a message thread at ICSID which the author claims is a quote from one of Dembski’s books.
Let’s look at Dembski’s own words first:
>Specifically, within the known physical universe there are estimated to be no
>more than 10⁸⁰ elementary particles. Moreover, the properties of matter are
>such that transitions from one state to another cannot occur at a rate faster
>that 10⁴⁵ times per second. Finally, the universe itself is about a billion
>times younger than 10²⁵ seconds (assuming the universe is around 10 to 20
>billion years old). ….these cosmological constraints imply that the total
>number of specified events throughout cosmic history cannot exceed
>10⁸⁰ * 10⁴⁵ x 10²⁵ = 10¹⁵⁰.
He goes on to assert that this is the “maximum number of trials” that could have occurred since the beginning of the universe, and that for anything less likely than that which is observed to occur, it is not reasonable to say it is caused by chance.
Wikipedia presents this definition, and a more recent one which lowers the UPB, but as they don’t provide all of the details of the equation, I’ll skip it for now. Wikipedia’s explanation of this original form of the UPB is:
>Dembski’s original value for the universal probability bound is 1 in 10150,
>derived as the inverse of the product of the following approximate
>quantities:[11]
>
> * 10⁸⁰, the number of elementary particles in the observable
> universe.
> * 10⁴⁵, the maximum rate per second at which transitions in
> physical states can occur (i.e., the inverse of the Planck time).
> * 10²⁵, a billion times longer than the typical estimated age of
> the universe in seconds.
>
>Thus, 10¹⁵⁰ = 10⁸⁰ × 10⁴⁵ × 10²⁵.
>Hence, this value corresponds to an upper limit on the number of physical
>events that could possibly have occurred since the big bang.
Here’s the fundamental dishonesty: None of those numbers have *anything* to do with what he’s supposedly trying to prove. He’s trying to create a formal-sounding version of the big-number problem by throwing together a bunch of fancy-sounding numbers, multiplying them together, and claiming that they somehow suddenly have meaning.
But they don’t.
It’s actually remarkably easy to show what utter nonsense this is. I’ll do a fancy one first, and a trivial one second.
Let’s create an incredibly simplified model of a region of space. Let’s say we have a cube of space, 1 kilometer on a side. Further, let’s suppose that this space contains 1000 particles, and they are all electrons. And further, let’s suppose that each 1mm cube in this cubic kilometer can only have one electron in it.
This is a model which is so much simpler than reality that it’s downright silly. But everything about the real world would make it more complex, and it’s sufficient for our purposes.
Now: consider the probability of any *configuration* of the electrons in the region of space. A configuration is a selection of the set of 1mm cubes that contain electrons. The number of different configurations of this region of space is (10⁹!)/((1000!)*(10⁹-1000)!). That works out to (10⁹*(10⁹-1)*(10⁹-2)*…*(10⁹-1000))/(1000!).
1000! is roughly 4×10²⁵⁶⁸ according to my scheme interpreter. We’ll be generous, and use 1×10²⁵⁶⁹, to make things easier. To estimate the numerator, we can treat it as (10⁹)*((10⁸)⁹⁹⁹), which will be much smaller. That’s 10⁷⁸⁰¹. So the probability of any particular configuration within that cube is 1 in 10⁵²³².
So any state of particles within that cube is an event with probability considerably smaller than 1 in 10⁵²³². So what Dembski is saying is that *every* possible configuration of matter in space in the entire universe is impossible without intelligent intervention.
And the trivial one? Grab two decks of distinguishable cards. Shuffle them together, and lay them out for a game of spider solitaire. What’s the probability of that particular lay of cards? 104! , or, very roughly, something larger than 1×10¹⁶⁶. Is god personally arranging ,my cards every time I play spider?
Anyone who’s ever taken any class on probability *knows* this stuff. One college level intro, and you know that routine daily events can have incredibly small probabilities – far smaller than his alleged UPB. But Dembski calls himself a mathematician, and he writes about probability quite frequently. As much as I’ve come to believe that he’s an idiot, things like this just don’t fit: he *must* know that this is wrong, but he continues to publish it anyway.
[upb-icsid]: http://www.iscid.org/encyclopedia/Universal_Probability_Bound

Skewing Statistics for Politics

Leave a reply

As I’ve frequently said, statistics is an area which is poorly understood by most people, and as a result, it’s an area which is commonly used to mislead people. The thing is, when you’re working with statistics, it’s easy to find a way of presenting some value computed from the data that will appear to support a predetermined conclusion – regardless of whether the data as a whole supports that conclusion. Politicians and political writers are some of the worst offenders at this.
Case in point: over at [powerline][powerline], they’re reporting:
>UPI reports that Al Gore’s movie, An Inconvenient Truth, hasn’t done so well
>after a promising start:
>
>> Former U.S. vice-President Al Gore’s documentary “An Inconvenient Truth”
>>has seen its ticket sales plummet after a promising start.
>>
>>After Gore’s global warming documentary garnered the highest average per play
>>ever for a film documentary during its limited Memorial Day weekend opening,
>>recent theater takes for the film have been less than stellar, Daily Variety
>>reports.
>>
>> The film dropped from its record $70,333 per play to $12,334 during its
>>third week and its numbers have continued to fall as the film opens in smaller
>>cities and suburbs across the country.
>
>It’s no shock, I suppose, that most people aren’t interested in seeing
>propaganda films about the weather. But the topic is an interesting and
>important one which we wrote about quite a few years ago, and will try to
>return to as time permits.
So: they’re quoting a particular figure: *dollars per screen-showing*, as a measure of how the movie is doing. The thing is, that’s a pretty darn weird statistic. Why would they use dollars/screen-showing, instead of total revenue?
Because it’s the one statistic that lets them support the conclusion that they wanted to draw. What are the real facts? Official box office statistics for gross per weekend (rounded to the nearest thousand):
* May 26: $281,000 (in 4 theaters)
* June 2: $1,356,000 (in 77 theaters)
* June 9: $1,505,000 (in 122 theaters)
* June 16: $1,912,000 (in 404 theaters)
* June 23: $2,016,000 (in 514 theaters)
Each weekend, it has made more money than the previous weekend. (Those are per weekend numbers, not cumulative. The cumulative gross for the movie is $9,630,000.)
But the per showing gross has gone down. Why? Because when it was first released, it was being shown in a small number of showings in a small number of theaters. When it was premiered in 4 theaters, they sold out standing room only – so the gross per showing was very high. Now, four weeks later, it’s showing in over 500 theaters, and the individual showings aren’t selling out anymore. But *more people* are seeing it – every weekend, the number of people seeing it has increased!
The Powerline article (and the UPI article which it cites) are playing games with numbers to skew the results. They *want* to say that Al Gore’s movie is tanking in the theaters, so they pick a bizzare statistic to support that, even though it’s highly misleading. In fact, it’s one of the best performing documentaries *ever*. It’s currently the number seven grossing documentary of all time, and it’s about $600,000 off from becoming number 5.
What was the per-theater (note, not per showing, but per theater) gross for the last Star Wars movie four weeks into its showing? $4,500/theater (at 3,322 theaters), according to [Box Office Mojo][bom]. So, if we want to use reasoning a lot like powerline, we can argue that Al Gore’s movie is doing *as well as Star Wars* an a dollars/theater-weekend basis.
But that would be stupid, wouldn’t it.
[bom]: http://www.boxofficemojo.com/movies/?page=weekend&id=starwars3.htm
[powerline]: http://powerlineblog.com/archives/014530.php

Good Math/Bad Math

The beauty of math; the humor of stupidity.

Category Archives: statistics

Basics: Standard Deviation

Like this:

Basics: Normal Distributions

Like this:

Basics: Mean, Median, and Mode

Like this:

Pathetic Statistics from HIV/AIDS Denialists

Like this:

Messing with big numbers: using probability badly

Like this:

Big Numbers: Bad Anti-Evolution Crap from anncoulter.com

Like this:

Yet Another Crappy Bayesian Argument

Like this:

Why I Hate Religious Bayesians

Like this:

Dishonest Dembski:the Universal Probability Bound

Like this:

Skewing Statistics for Politics

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: