# Yet Another Crappy Bayesian Argument

A reader sent me a link to yet another purported Bayesian argument for the existence of god, this time by a physicist named Stephen Unwin. It’s actually very similar to Swinburne’s argument, which I discussed back at the old home of this blog. The difference is the degree of *dishonesty* demonstrated by the author.

As usual, you can only see the entire argument if you buy his book. But from a number of reviews of the book, and a self-interview posted on his personal website, we can get the gist. Scientific American’s review has the best concise description of his argument that I could find: (the equation in it is retyped by me.)

Unwin rejects most scientific attempts to prove the divine–such as the anthropic principle and intelligent design–concluding that this “is not the sort of evidence that points in either direction, for or against.” Instead he employs Bayesian probabilities, a statistical method devised by 18th-century Presbyterian minister and mathematician Reverend Thomas Bayes. Unwin begins with a 50 percent probability that God exists (because 50-50 represents “maximum ignorance”), then applies a modified Bayesian theorem:

Pafter = Pbefore×D/(Pbefore×D + 100% -Pbefore)

The probability of God’s existence after the evidence is considered is afunction of the probability before times D (“Divine Indicator Scale”): 10 indicates the evidence is 10 times as likely to be produced if God exists, 2 is two times as likely if God exists, 1 is neutral, 0.5 is moderately more likely if God does not exist, and 0.1 is much more likely if God does not exist. Unwin offers the following figures for six lines of evidence: recognition of goodness (D = 10), existence of moral evil (D = 0.5), existence of natural evil (D = 0.1), intranatural miracles (prayers) (D = 2), extranatural miracles (resurrection) (D = 1), and religious experiences (D = 2).

Plugging these figures into the above formula (in sequence, where the Pafter figure for the first computation is used for the Pbefore figure in the second computation, and so on for all six Ds), Unwin concludes: “The probability that God exists is 67%.” Remarkably, he then confesses: “This number has a subjective element since it reflects my assessment of the evidence. It isn’t as if we have calculated the value of pi for the first time.”

It’s pretty clear looking at this that the argument is nothing more than “I assert God exists, therefore God exists”. The “probability” result is generated by pulling numbers at random for his D-value. Even he admits that the numbers are “subjective”, but I would go much further than that: the numbers are fundamentally built on the assumption of the existence of god. How can you pretend that you haven’t already accepted the assumption that god exists, and then use stories about the occurrence of divine interventions as facts?

But this doesn’t touch on the reason that I call him dishonest. So far, it’s just sloppiness; typical of the sloppy reasoning of religious people trying to make arguments for the existence of god. But then, on his website, there’s a little self-interview:

Q: So does He exist?

SDU: God?

Q: Yes.

SDU: I don’t know. Although my book does expand on this response.

It goes on like that. He claims to not know; to not have a belief about whether or not there is a god; that his book is an honest enquiry by someone uncertain, trying to use evidence to reason about whether or not god exists.

He’s lying. Plain and simple. Everything about his argument is completely predicated on his acceptance of the existence of god. And there’s no way that he’s dumb enough to not know that. But the argument seems so much more convincing to a layman if the author isn’t sure, but is just carefully working through the probabilities. And that final figure: exactly 2/3s… It’s nicely convenient. After all, he’s not saying he’s sure; but he’s saying that an objective review of the evidence gives a number that makes it look good, while not certain – it preserves that illusion of objectivity.

This guy is using his scientific background to give him authority as someone who understands how this kind of math works; and then he’s lying about his intentions in order to increase the credibility of his argument.

## 81 thoughts on “Yet Another Crappy Bayesian Argument”

1. Bronze Dog

That’s the second time you’ve referenced Doggerel #19 (that I know of). 🙂
Covers the situation quite well: These people aren’t here to present arguments or answer questions. They’re here to sell books and make money.

2. PaulC

Comedy is Douglas Adams saying the answer to life, the universe, and everything is 42. Tragedy is paying hard-earned cash for a book that “proves the ultimate truth” is actually 67%.
I think the best general attack on these kinds of arguments is just to substitute some other “truth” for “God exists” and see how the probabilities work out.
For instance, what’s the Bayesian probability that God exists and promises eternal damnation for anyone who tries to prove his existence mathematically? True, it might not be very high, but even a tiny non-zero value will give you infinite negative return on your effort. Theological Bayesians have got to ask themselves a question: “Do I feel lucky?” Well, do ya, punk?

3. Vasha

The humor page of the latest New Scientist has the following “Scientific Proof of Alatry” (alatry, on the model of idolatry, being defined as “the practice of not bothering to worship any deities, regardless of how many there may be”):
“The only thing we know about deities with any certainty is that the number of them is a whole number, the idea of a fractional deity being frankly absurd. So the number of deities in our universe is an integer, in the range from minus infinity to plus infinity. (We leave the theologians to interpret a negative number of deities: this is number theory, and its conclusion should save them the trouble.)
For it is commonly accepted that we should expect our universe to be typical of possible universes. So the expected number of deities is in the middle of the range of possibilities. That is, zero. Quod erat demonstrandum.”

4. Stephen

All men are either called Fred or not. Therefore, there is a 50% chance that any man is called Fred. QED.

5. Blake Stacey

For the informationally ravenous, here is the original Good Math, Bad Math post on Swinburne, entitled “Mind-Numbingly Stupid Math“. In the comment thread there, I wrote,

In my most humble and pacifist opinion, people who talk like this should be told that there is a fifty-fifty chance their toenails will be ripped off tomorrow. Either it will happen, or it won’t. And I thought I was such a nice guy. . . .

6. loren

PaulC: “For instance, what’s the Bayesian probability that God exists and promises eternal damnation for anyone who tries to prove his existence mathematically? True, it might not be very high, but even a tiny non-zero value will give you infinite negative return on your effort.”
Vaguely a propos …
http://crookedtimber.org/2003/11/01/gambling-with-the-devil
http://crookedtimber.org/2003/11/03/two-envelopes

7. Koray

Absurdly subjective choices aside, there is also no proof that Bayesian probability assessment methods are 100% correct, is there?

8. Mark C. Chu-Carroll

Koray:
That’s a suprisingly difficult question to answer… It depends on just how you define your terms. I got into a lot of trouble with commenters on the old blog last time I talked about this, but I’m a glutton for punishment, so I’ll go ahead and say what I want anyway 🙂
There are some folks who are big fans of Bayesian reasoning who say that a Bayesian assessment is *never* wrong; it is merely incomplete. Bayesian methods produce *estimates*, which should converge on the truth as information is added. So by that reasoning – a Bayesian estimate *is* always correct given the level of knowledge at which it was generated. It can, of course, be updated in light of additional knowledge, including the knowledge that some prior assumption is invalid.
On the other hand, there are people like me who say that Bayesian methods produce an estimate; for a given estimated probability, you can assign a degree of certainty based on the quality and completeness of your information. But an estimate is either correct (in that it has produced the actual real probability of something within the margin of error derived from the certainty), or it’s wrong. If you don’t have the correct information, or if you believe a piece of information that you fed into your calculation has a greater reliability/accuracy than it actually posesses, then the bayesian estimate is wrong.

9. Canuckistani

Bayesian probability can be derived as an extension of classical logic, and as in classical logic, if you start with incorrect premises (i.e., prior information), your conclusions may also be incorrect.
Mark, what do you consider an “actual, real probability”?

10. Mark Wan

Let me qualify that I am no expert in this area. I am a mere math grad with only one degree.
I am thinking that part of reason that there are so many bad math arguments around is because there are no official research in this area. Let’s face it, I don’t think writing real publishable papers in the topic of the mathematical formulations for the existence of God would be accepted nor would it generate research grants. Then again I don’t read reputable math journals…
Would it help if there is a prize like they did for Fermat’s Last Theorem? At least it attracts publicity and therefore fame for people to do proper research on. Hey if people cannot come up with a prove or a disprove then the prize money just rolls for hundreds of years.

11. Paul Gowder

Oh my… wow. I just stumbled on this blog, and having never bothered to actually read most of the arguments of the unintelligent design crowd, I’m even more horrified than I expected to be. This guy just MADE UP probabilities for the existence of god given various facts? Can I write a book where I set the D for moral evil at .00000000000000000000000000001/(10^99999999)? Will this sell copies and make me rich and famous?

12. Torbjörn Larsson

Aha! Mark and Wikipedia made me understand why I had difficulties with bayesian inference. (By the fact that frequentist probability is fully compatible with physics models and hypothesis testing, while bayesian isn’t.)
As for induction, one use the evidence to improve an hypothesis. But justifying hypotheses in science is made by a test procedure (with frequentist probabilities) and a finite set of data.
So while bayesian inference certainly works for filters and evaluating parsimony et cetera, I now find myself end up in Mark’s camp. It is an estimate that must be tested as any other estimate. I might as well start calling them bayesian estimates from now on.
I have met those who propose global bayesian ‘inference’ over all theories – never throw away bad hypotheses, merely assign them low values, they could still improve the integrated ‘theory’ for some, perhaps pathological, cases. But I could never justify my unease. Now I can. Cool!

13. Torbjörn Larsson

“As for induction, one use the evidence to improve an hypothesis.”
More correct is that one use the evidence to create and improve an hypothesis.

14. Canuckistani

Hi Torbjörn,
As a Bayesian, I’m not sure why you say that Bayesian inference is incompatible with physics models or hypothesis testing. In terms of compatibility with physics, see this paper. It’s not quite correct to say that Bayesian inference is incompatible with hypothesis testing; more accurate would be the statement that classical and Bayesian hypothesis tests do not always agree. (They do agree for Gaussian distributions with known variance, due to the symmetry of the data and the mean parameter.) As a Bayesian, I’d assert that Bayesian hypothesis testing is to be preferred on both practical and theoretical grounds. I’d be happy to discuss particular cases by email if you’re interested. I can be reached through a hotmail account under the name “cyanonym”.

15. Canuckistani

Hi Mark Wan,
The prize for Fermat’s Last Theorem attracted much more mathematical gobbledygook than it did proper research — go here and scroll down to the last section. I think a prize for mathematical research into the existence of God would create lots of grist for the GM/BM mill, but not a whole lot in the “good math” category.

16. Torbjörn Larsson

Canuckistani,
I didn’t know bayesian estimates could be used for hypothesis testing. The 5 sigma limit used to justify existence of a phenomena in phyics is based on frequentist probability, as is most of the math around classic hypothesis testing.
I note when googling fast that bayesian testing seems to conflate model development and verification as I feared. Data is used both for modelling the test and testing. That personal opinion and nonrigid experimental design is used is taken as an advantage. (There are frequentist tests that are datadriven and flexible too, but the design isn’t.) http://64.233.183.104/search?q=cache:pPQLBCfy2eMJ:www.stat.duke.edu/~berger/talks/wildlife.ppt+Bayesian+hypothesis+testing&hl=en&ct=clnk&cd=4
Perhaps you can adjust my first glance. I can’t stop using standard methods when standard, however.
The idea of incompatibility with physics is not mine. I took it from a string physicist who is an active blogger:
“It is often said that there are two basic interpretations of probability: frequency probability (the ratio of events in a repeated experiment) and Bayesian probability (the amount of belief that a statement is correct). I am, much like an overwhelming majority of physicists, statisticians, and probability theorists (see the Wikipage about the frequency probability to verify my statement) convinced that it is only the frequency probability that has a well-defined quantitative meaning that can be studied by conventional scientific methods.”
“The Bayesian probability cannot be even defined without vague words like “belief”, “plausibility”, and so forth. It’s just not a well-defined quantitative concept because it cannot be determined or measured with ever higher degree of accuracy. Such a kind of probability is not predicted by meaningful physical theories of physics either. The predictions of quantum mechanics are always about the frequentist probabilities.” ( http://motls.blogspot.com/2006/01/bayesian-probability-ii.html )
“Also, when we predict the death of the Universe or any other event that will only occur once, we are outside science as far as the experimental tests go. We won’t have a large enough dataset to make quantitative conclusions. The only requirement that the experiment puts on our theories is that the currently observed reality should not be extremely unlikely according to the theory.”
“While the text above makes it clear that I only consider the frequentist probabilities to be a subject of the scientific method including all of its sub-methods, it is equally clear that perfect enough theories may allow us to predict the probabilities whose values cannot be measured too accurately (or cannot be measured at all) by experiments. It is no contradiction. Such predictions are still “scientific predictions” but they cannot really be “scientifically verified”. Only some features of the scientific method apply in such cases.” ( http://motls.blogspot.com/2005/12/bayesian-probability.html )
It is clear that he part of his dislike is because he dislikes the use of the anthropic principle to set parameters in physics. But I have no reason to distrust him or his reference. It is also what Mark and now I say in a less wordy fashion. Estimates are estimates and must be justified by tests.

17. Torbjörn Larsson

Canuckistani:
However order my comments appear, I also want to note that I do see use of bayesian estimates. They are clearly useful in machine learning/filtering and parsimony evaluation of competing models (see some papers on WMAP models).
And contrary to my referenced physicist I think constraining sparse events can be useful. For example, in the SETI Drake equation estimates constrain the expected number of communicative civilisations and types of likely systems, which guides search.

18. Andrew

Didn’t Robert Winston cover this kind of theory in his programme on the possibility of God? (It may have been shown on BBC America). He concluded that the answer you get out depends on the numbers you choose to put in.
In a forum I’m a member of, someone made a probability argument something like this:
Assume there is a god (not necessarily the Christian God).
The probability that He can create a universe with people is 1.
Now assume there is no god. The probability of any universe beginning is tiny, multiplied by the tiny possibility that a universe would be suitable to contain Earth, multiplied by the tiny probability that Earth could give rise to humans, is infinitessimal. Therefore, a god almost certainly exists.
The flaws were rather quickly pointed out.

19. Mark C. Chu-Carroll

Canuckistani:
A “real” probability is a probability figure that accurately reflects the odds of something occurring. If I flip a fair coin, the odds of it landing heads up are 1/2. Drawing a card from a shuffled deck gives me odds of 1/13 of drawing a card with a “3” on it. The odds of finding an electron in a particular position around the nucleus in an atom is given by a probability distribution function. Those are “real” probabilities.
Bayesian probabilities are always *estimates* of the probability of an event based on the available knowledge. If they were *real* probabilities, they wouldn’t need to be updated: the fact that I’ve learned some extra piece of information about event X doesn’t change the real probability of X occuring. But the Bayesian estimate gets updated by my new knowledge, bringing my estimate *closer* to the real probability.

20. Canuckistani

Hi Torbjörn,
I kinda didn’t want to hijack the thread, although I guess we won’t be disturbing anybody. But seriously, I can write about this stuff all day long…

I note when googling fast that bayesian testing seems to conflate model development and verification as I feared. Data is used both for modelling the test and testing.

Bayesian probability doesn’t help one develop a model — the model must be provided before the Bayesian algorithm can work. So I don’t understand how it could conflate model development and verification. I’m not sure what you mean here by “modelling the test”.
Personal opinion plays just as large a role in classical methods as it does in Bayesian probability. Because classical methods lack a unified framework, reasonable statisticians may disagree as to which ostensibly objective procedure should be used on any given problem. The advantange of the Bayesian approach is that it forces its users to clearly state their assumptions before turning the crank on Bayes’ Theorem. The assumptions are always up for argument; but once they are accepted, the statistical inference that follows has all the force of any theorem in formal logic. For example, all of the objections to Bayesian arguments for God attack the assumptions, not the formal mechanism of Bayes’ Theorem.
In terms of nonrigid experimental design, there is an extensive Bayesian literature on modelling data collection mechanisms, which is essential for observational studies. In this literature, standard statistical designs are justified in that they provide so-called “ignorable” models of data collection.

I can’t stop using standard methods when standard, however.

I certainly wouldn’t want you to! The standard methods are standard because they work fairly well. There is usually a strong connection between statistical methods that have become standard and the Bayesian approach.
Thank you for the link to that interesting physics blog! I note that a Bayesian has commented in response to the post in your first link, provided citations of three papers arguing against Motl’s point of view. Those papers will speak to the intersection of physics and Bayesian probability far more effectively than I can. Two of those papers specifically argue against your assertion that frequentist probability is fully compatible with physics models.

“The Bayesian probability cannot be even defined without vague words like “belief”, “plausibility”, and so forth. It’s just not a well-defined quantitative concept because it cannot be determined or measured with ever higher degree of accuracy… I only consider the frequentist probabilities to be a subject of the scientific method including all of its sub-methods…”

Here, Motl is just wrong when he asserts that Bayesian probabilities are not a well-defined quantitative concept. He’s probably confusing the concepts of (Bayesian) probability distributions and frequency distributions. Frequency distributions are observable features of reality. Probability distributions are not. They are associated with unknown variables, be they data or parameters, and encode the plausibility of the true value of the variable being located in a specific range. As probability distributions are conditional on a specific state of information, they are a description of that state of information, not a description of the observable universe.

21. Canuckistani

Hi MarkCC,

If I flip a fair coin, the odds of it landing heads up are 1/2. Drawing a card from a shuffled deck gives me odds of 1/13 of drawing a card with a “3” on it.

These are really definitions of the words “fair” and “shuffled”, not a definition of the word probability. (Also, technically, those are “chances”, not “odds”. Odds of 1 to 2 correspond to chances of 1 in 3, which isn’t what you want to say at all.)
Suppose I tell you that I manufacture both double-headed and double-tailed coins. I want to demonstrate my product to you, but I don’t tell you if I’m holding a double-headed or a double-tailed coin. If we do a coin toss, the first toss is “fair” for you but not for me, because your information is symmetric about the interchange of heads and tails, but mine is not.

Bayesian probabilities are always *estimates* of the probability of an event based on the available knowledge…

One of the original applications of epistemic probability was Laplace’s estimate of the mass of Saturn. The mass of Saturn is not a probability of an event. Bayesian estimates may be estimates of frequencies, or they may be estimates of fixed unknown quantities. In all cases, information about the variables in question is encoded into a probability distribution.
The frequentist position of probability identifies it strictly with the long-run frequency over a series of trials, but this is subtly logically incoherent. See this paper for a short and sweet description.

22. Torbjörn Larsson

Canuckistani,
I’m sorry, I prefer to continue here. I’m currently mostly procrastinating from some chores, and blogging is more flexible since the social obligation is less on a public forum.
“So I don’t understand how it could conflate model development and verification. I’m not sure what you mean here by “modelling the test”.”
The description I looked at seemed to use the same data to decide which test and the test parameters, and to do the test.
“Because classical methods lack a unified framework, reasonable statisticians may disagree as to which ostensibly objective procedure should be used on any given problem.”
Agreed. Different tests suits different problems, and sometimes the data is too bad to look for an effect. But tests can be repeated, using other methods.
“The standard methods are standard because they work fairly well.”
I was also refering to the string phycisists claim. I don’t know enough to make the evaluation he did, so I’m basically stuck with doing frequentist probability in physics until the bayesian community turn physics around to their thinking. I think parsimony evaluation is an important application, so bayeasians has already shown that bayesian methods matter.
I note that Motl deigned to answer the bayesian comment. I’m sure one can find papers proposing bayesian estimates, but I can’t see why Motl’s assertion, that the “overwhelming majority of physicists, statisticians, and probability theorists” is using frequentist probability, is wrong.
“Motl is just wrong when he asserts that Bayesian probabilities are not a well-defined quantitative concept”
Bayesian ‘probabilities’ are estimates of probabilities, not probability distributions derived from models. So Motl and Mark seem to agree.
And I don’t agree when you claim that probability distributions describes states or information. They are part of a model, describing probabilities for variables.
I see that in your comment to Mark you problematice the choice and verification of models. That is separate from the problem of estimates. You also seem to conflate frequentist probabilities from models and classic estimates from models. I don’t see any problem with his description.

23. BenE

One thing to note is that, by setting his prior probability to 50%, he implicitly assumes a binary variable and his hypothesis has to be something like: “is there a god, any god”. Maybe you can argue on Bayesian principles that there is a certain rational probability in the existence of a god, however, it gives an almost null probability to any specific god.
If you make the hypothesis more specific like for example: “There is a god and he has one son named Jesus” to get a prior P(“There is a god”, “he has 1 sons”, “son’s name=jesus”) you add a bunch of variables which makes the prior much much smaller.
Any specific god tends to have a nearly null probability under Bayesian principles. That is of course until we find credible evidence supporting a particular god.
I argue something similar here. (scroll down to BenE’s post)

24. BenE

I wasn’t very thorough in the above. Let me elaborate further. Now I’m not yet an expert in Bayesian probabilities, but this is how I see it. To ask the question “is there a god” you must first define the word “god”. Otherwise the question makes no sense. The prior probability that “there is a blinkertubular” is nonsense.
We can for example define god as “the creator of the universe”. Now if you do that, you have to compare the god hypothesis against the other hypothesis for the creation of the universe like the big bang and since the big bang actually has evidence supporting it, it automatically wins and the god probability is 0.
You can go even further and define god as the creator of the big bang or creator of the medium in which big bangs arise. (as in here) Okay fine. In a way, if you assume there is something greater than the universe, that definition almost assures his existence because we define him to encompass everything on that level. But then if you add variables to this hypothesis. “God has a human form”,”has a son” “doesn’t like homosexuals” “thinks adultery is bad” you have to compare each variables with all their alternatives which makes for an exponentially small prior. Any specifics you add multiplies the prior probability of this god by the prior of this specific.
The point is that simply starting with 50% makes very little sense.

25. Canuckistani

Torbjörn,

The description I looked at seemed to use the same data to decide which test and the test parameters, and to do the test.

I’m still not clear on what you mean. In Bayesian hypothesis testing, there is no question of “which test”. The procedure is always to calculate the posterior probability of the hypothesis. I’m not sure what “test parameters” means in this context. Am I just dense? It seems we are speaking in different jargons.

I note that Motl deigned to answer the bayesian comment. I’m sure one can find papers proposing bayesian estimates, but I can’t see why Motl’s assertion, that the “overwhelming majority of physicists, statisticians, and probability theorists” is using frequentist probability, is wrong.

I meant the last comment, by “sreds”. The three papers he gives references to are quant-ph/0408058, quant-ph/0409144, and quant-ph/0501009. Motl’s assertion may be right for physicists, but Bayesian methods have reasonably good penetrance among statisticians.

Bayesian ‘probabilities’ are estimates of probabilities, not probability distributions derived from models. So Motl and Mark seem to agree.
And I don’t agree when you claim that probability distributions describes states or information. They are part of a model, describing probabilities for variables.

What is your operational definition of probability here? That is, if you say, “the probability of X is 0.5”, what kind of object is X permitted to be, and what statement are you making about objective reality and/or behavior you consider rational?

I see that in your comment to Mark you problematice the choice and verification of models. That is separate from the problem of estimates. You also seem to conflate frequentist probabilities from models and classic estimates from models. I don’t see any problem with his description.

Notice how I always make a sharp distinction between probability and frequency? This is because the concepts are not identical in the Bayesian view. What I’ve written will likely make more sense to you if you mentally replace the word “probability” with the word “plausibility” whenever I use it.
In the Bayesian approach, no strong distinction is made between choosing a model and estimating a parameter. Both problems are approached using the same procedure: calculate the posterior probability. (Actually, I’ve left decision theory out of this description, but the posterior probability is still always the first step.) I view this as a distinct advantage over the large number of possible approaches in classical statistics. Two-sided test or one-sided test? Confidence interval or p-value? There’s no classical theory to guide one’s choice.
I conflate frequencies and estimates because in the Bayesian approach, a probability distribution describes uncertainty about a variable. So if I am uncertain about the mass of Saturn, I assign a probability distribution to that variable to describe my uncertainty. Then, when I collect data, I update my probability distribution using Bayes’ Theorem. Likewise, if I am uncertain about the relative frequencies of outcomes in a long series of exchangeable trials, then I assign it a probability distribution describing my uncertainty, collect data, and update my distribution for the frequency.
You may wonder if it is legitimate to mathematically manipulate a degree of plausibility as if it were a probabilty. The short answer is yes, it is legitimate. The long answer is here.

26. BenE

First Mark,
“A “real” probability is a probability figure that accurately reflects the odds of something occurring. If I flip a fair coin, the odds of it landing heads up are 1/2. Drawing a card from a shuffled deck gives me odds of 1/13 of drawing a card with a “3” on it. The odds of finding an electron in a particular position around the nucleus in an atom is given by a probability distribution function. Those are “real” probabilities. »
There is no such thing as a « real » probability. Either there is a frequency or there is a statement about our knowledge.
“. If I flip a fair coin, the odds of it landing heads up are 1/2.”
Why is that? It isn’t an inherent property of the coin; there isn’t a little random generator in the coin that determines heads or tails. Assigning 0.5 probabilities to a coin toss is a statement of your knowledge or rather your lack of knowledge for all the variables that will influence the result of the toss. You lack knowledge about the way you will make your toss and the physical encounters that will influence the coin’s flight and landing. If you knew everything, every little air perturbations, all the details about the surfaces the coin is going to encounter and the exact direction, orientation and force of the coin toss, you would know the result in advance. It’s the lack of information that is stated in the 0.5 probability.
Read Jaynes Chapter Physics of “Random Experiments” for more about the coin toss.
Torbjörnm,
“Bayesian ‘probabilities’ are estimates of probabilities, not probability distributions derived from models. So Motl and Mark seem to agree.”
Not true, using Bayesian probabilities almost always implies using models and probability distributions. They just add the estimates of probabilities on top of the models. Its just a more consistent way to do parameter estimation instead of using maximum likelihood.
“And I don’t agree when you claim that probability distributions describes states or information. They are part of a model, describing probabilities for variables.”
Well see my response to Mark, but I will reiterate.What is a “probabilities of variables” if it is not a degree of certainty? What is a probability distribution if it is not a way to carry information about what we know and what we don’t know for a variable? Do you really think that defining these in term of some nonexistent infinite ideal makes sense?
One case that exemplifies the flaws of the frequencist methods is the use of the t-tests with a null hypothesis. This is usually used to determine if some effect is significant or not. However, this method has one big gapping hole in it. If you have enough data the null hypothesis will almost always be rejected! (Unless the effect is exactly null) You could see that the hypothesis has erronously been rejected because of too much data by plotting the data and looking at the magnitude of the effect which would be very small, however, this means we can never trust p values in t-tests without seeing a plot and judging subjectively by looking at the points! Otherwise how do we know that there isn’t too much data??? How much data is too much data? P values prove nothing!

27. Torbjörn Larsson

BenE:
I like that kind of reasoning. You can also start to throw in all possible gods, to make the prior for a specific one very small, and to increase the number of distinguishing properties that must be fulfilled. There once was a number of religions with gods very similar to the christian one…
In fact, since theologists insists the supernatural is unconstrained, anything is possible and conversely nothing can be known. (Last thursdayism is possible, but we can never know.) So the best prior is a big, fat 0. Not even Unwin’s formula can improve that.

28. Torbjörn Larsson

“Motl’s assertion may be right for physicists, but Bayesian methods have reasonably good penetrance among statisticians.”
I can accept that. Now they have to prove themselves in textbooks and in physics.
“What is your operational definition of probability here?”
I’m not a statistician, so the answer to that is “he usual one”. 🙂 Kolmogorov’s axiom, for example. That will make probabilities measures on a sigma-algebra of events in a sample space, which seems reasonable and general enough. Events are measurable as sets of outcomes.
Your discussion on bayesian concepts are clarifying, but I don’t see that it adds anything to Mark’s description. I’m quite happy with my first comment. Not conflating probabilities and estimates makes a lot of problems disappear for me.
BenE:
“Assigning 0.5 probabilities to a coin toss is a statement of your knowledge or rather your lack of knowledge for all the variables that will influence the result of the toss.”
It is the result of a model of a fair coin. The model can be an infinitely thin disc, that will fall either way. Simplified models are a trick of the trade.
“They just add the estimates of probabilities on top of the models.”
Agreed, they are estimates.
For the discussion on probabilities, see my comment to Canuckistani above. On tests, plotting data is an a good method to help verify the test. For example, in t-test you must first test normality and equality of variances. If you plot only, for normality say, there is a subjectivity element. The again, all hypothesis testing involves subjectivity, different tests will make different comparisons or they wouldn’t be different methods. One must make sure the result isn’t affected by that.

29. Torbjörn Larsson

Sorry, the first part of my previous comment was cut. It should be:
Canuckistani:
“I’m not sure what “test parameters” means in this context.”
The test statistic may vary with the sample size, for example.

30. BenE

“It is the result of a model of a fair coin. The model can be an infinitely thin disc, that will fall either way. Simplified models are a trick of the trade.”
What do you mean? A coin is a real thing. No need to create simplyfied models to see the fairness in a game of heads or tails.
Think about your thin disk model for a second. It isn’t a very robust model. Consider, that I drop your disk face side up from a one inch distance over the floor. Now that you have this information, would you play tail against me? Alternatively, what if the coin is tossed from high above the floor but is tossed by a robot that I designed to toss coins with exact precision in orientation and force so that the robot can control the outcome. I programmed the robot. I get to choose heads or tail. Would you play against me?
I could have done this with any model you had chosen. The point is that the probability isn’t inherent to the coin but to our knowledge about the situation.
Coins are used in this game because their shape and size makes it hard for human senses to get enough information to predict their landing even for the thrower who can actually touch the coin when he throws.
Of course, cognitively, everything we reason about is just a model in our head. If that is what you mean by “model” then you are accepting the bayesian perspective that probabilties are about knowledge. They are about the probabilities we assign to our models and not about real inherent probabilities assigned to things. This is exactly the distinction bayesians make.
“Agreed, they are estimates.”
Probabilities are always estimates. When we know everything, we are omniscient about a situation, we don’t need probabilities at all. We just know. As with the coin toss, if we can assign a probability to an event, it’s because we are missing some information about it. Often its because we lack precision in our measuring instrument or senses.
“On tests, plotting data is an a good method to help verify the test.”
If you are a frequencist you pretty much need to do this. But then you are simply making a subjective assesment. If you are a Bayesian you can just calculate a confidence interval based on an uninformative prior and you’re good to go.
Of course the reason why frequencists need to plot the data is because when you have it on paper you can instinctively assign an even prior to the paper area you are viewing and interpret the clusters of points along with their density relative to this uniform area. This subjective assesment is often better than the significance tests because humans instinctively use bayesian reasoning. Although, there is a catch. The choice of scale on the axis of your graph risks influensing your interpretation. Of course baysians have a solution to this. We have priors that are so uninformative that they exhibit scale invarience so we need not worry about that.
“The again, all hypothesis testing involves subjectivity,”
Not if you’re a Bayesian and you start with the most uniformative prior.

31. Torbjörn Larsson

BenE:
“What do you mean? A coin is a real thing. No need to create simplyfied models to see the fairness in a game of heads or tails.”
I did that since you discussed events such as air currents, height of the drop vs rate of revolution, and possibly a real coin landing on the edge. The model eliminates these spurious phenomena and events. This seems to be part of what you are discussing, the difference between a simple model, a more complex model and reality.
I’m not sure where you want to go with this since it doesn’t affect the definition of probability. Nor does the fact that coarsegraining classical systems means introducing randomness.
“Probabilities are always estimates.”
Not according to the Kolmogorov definition, see previous comment.
“If you are a frequencist you pretty much need to do this.”
Your discussion here contradicts the description I gave. It is your idea of what is done, not mine.
“Not if you’re a Bayesian and you start with the most uniformative prior.”
I can dispute your prior, but the main point is that a bayesian estimate needs justification, as Mark and I noted above. I don’t see how application to hypothesis testing changes that.

32. BenE

“Nor does the fact that coarsegraining classical systems means introducing randomness.”
Well okay. You can hide the fact that probabilities are about information be using the word “coarsegraining”. However, remember that it is this “coarsegraining” that brings the need to use probabilities. If you finegrain enough, all the probabilities disapear from your model.
“I can dispute your prior, but the main point is that a bayesian estimate needs justification, as Mark and I noted above.”
What do you mean by “needs justification”. You just have to use the most uninformative, high entropy prior. That’s it.
You haven’t addressed the problem that null hypothesis testing can almost always be made to reject the null hypothesis by adding more data. I’ve seen people throw away part of their data in order to “make their data set more suitable to t-tests”. I’ve seen others pile up their data until they could reject their null hypothesis and report something significant. They just hide the fact that if you plot the data you see that the magnitude of the significant difference from 0 is so minuscule that a common sense conclusion actually confirms the pretty-much-null hypothesis and the minuscule (but statisticaly significant) difference is due to some systematic error. IMO, that alone should be enough to dismiss the frequencist methods.

33. Canuckistani

Ah, another Jaynes enthusiast joins the fray! Groovy.
Torbjörn,

“What is your operational definition of probability here?”
I’m not a statistician, so the answer to that is “he usual one”. 🙂 Kolmogorov’s axiom, for example. That will make probabilities measures on a sigma-algebra of events in a sample space, which seems reasonable and general enough. Events are measurable as sets of outcomes.

Kolmogorov’s axioms give a mathematical definition for probability, but they say nothing about what probability means. It’s just a number between 0 and 1. The axioms also make no reference to either frequencies or degrees of plausibility. So this doesn’t really answer the question of what the statement “the probability of X is 0.5” means to you, functionally, in the real world. I’m assuming that when you say, “events are measurable as sets of outcomes”, you’re not using “measurable” as in “mathematical measure theory” but rather as “observable”. So at least you’re saying the probabilities can only apply to things that are, in principle, observable. (There’s a whole sub-school of Bayesian thought which agrees with this premise.)
BenE,
Check out the references to physics articles that I gave to Torbjörn in my previous comment. I believe you will find them interesting reading.

34. BenE

Hey Canuck, I just realised you’re from canada aren’t you? I’m studying in Ottawa right now. Anyhoo, I’m not a physicist and quantum stuff is way over my head so that paper is no good for me. Although, I must say from the little I know, the idea of representing quantum uncertainty as limits of knowledge makes sense to me.
Getting on a sidetrack here, I have a long standing question that you guys could maybe able to answer. That whole “observation changes the outcome of the two slit experiment” business has always seemed like it was an odd interpretation to me. Wouldn’t it make more sense to say that when you detect the electrons you actually disturb the experiments so that it isn’t the observation per say that is the problem but the fact that you have to disturb in order to detect? If that is actually the case, why are the bogus interpretation so widespread and why are physicist not doing anything to remedy the situation?

35. Canuckistani

BenE,
As it happens, I’m not a physicist either, but I’ve done some light reading on the subject. At my level, I can more-or-less follow the paper on quantum exchangeability, but not the one discussing the incompatibility of the frequency interpretation of probability with quantum physics. So I can’t answer your question on the two-slit experiment.
I’m in Ottawa too. What are you studying?

36. BenE

Computational Linguistics

37. Mark C. Chu-Carroll

BenE:
You’re in CL? Do you know my wife, Jennifer?

38. BenE

I don’t know your wife 🙁 but holy small world! What does she do? I’m a master student at ottawaU. I work under Nathalie Japkowics in teh cs department but officially i’m enrolled as an electrical engineering student.

39. BenE

hum, I think this coincidence is greater evidence for the existence of god than Unwin’s arguments!

40. Torbjörn Larsson

BenE:
I am garteful for our discussion here, it has helped me understand the similarity between the frequentist probability and other model properties, and the different discussion bayesians and perhaps other statisticians makes about what the problems are. I am also starting to appreciate the connection between frequentist probabilities and models as such. Modelfree methods have their uses, but not if we have or want to develop formal theories.
“You can hide the fact that probabilities are about information”
I don’t think I know enough to follow you into this discussion, nor do I see the immediate interest from my viewpoint on probability. Mark has just posted about such use, and misuse from creationists, at http://scientopia.org/blogs/goodmath/2006/07/dembski-on-nonevolutionary-math . It seems to be more connected to Shannons IT than Kolmogorov’s work.
“be using the word “coarsegraining”. However, remember that it is this “coarsegraining” that brings the need to use probabilities. If you finegrain enough, all the probabilities disapear from your model.”
Agreed. Probabilities and spurious phenomena depend on the model, or are measured. As usual in modelling a better model gives better correspondence. In QM systems genuine randomness persists with finegraining.
“What do you mean by “needs justification”. You just have to use the most uninformative, high entropy prior.”
I mean that the aposteriori estimate needs justification, as far as I understand. The problem of priors is perhaps solved, but I’ve seem to remember quite a lot of old web discussions here.
But I’m not particularly interested in discussing hypothesis testing. I don’t understand your discussion on t-tests, which I only have used in one application. The testvalue is dependent on the degree of freedom. BTW, to refresh myself I googled, and found a page that claims that ANOVA and regression test gives the same answer, so the t-test seems more foolproof than tests I have done in other instances. ( http://www.socialresearchmethods.net/kb/stat_t.htm )
In general modelling and testing are a search for facts, not truth. I’m not sure that you want to allow for the imperfections and uncertainty that separates models and reality?
Canuckistani:
I’m using an operational view since I’m looking at it from an observation and modelling point of view. Yes, I meant observable, as events giving frequencies.
I think you are asking for an epistemological view, since the axioms and the observation define what probabilities is. Isn’t that much like asking what a charge means on an electron? Science doesn’t answer such questions, except by embedding it in the properties of QED.
“So at least you’re saying the probabilities can only apply to things that are, in principle, observable.”
If a model allow for frequencies that we for some reason can’t observe yet, they should still exist as part of that formal theory. Otherwise there would be a problem with formal theories. But here we are approaching philosophy, similar to asking if popperian falsifiability means feasible or theoretical falsifiability.

41. Torbjörn Larsson

BenE:
All QM interpretations are odd. 🙂 I think your interpretation is similar or equal to a decoherence picture that some seems to think may replace the wavefunction collapse of the Copenhagen interpretations. That picture may be verifiable. In any case, it is proposed in the consistent histories and the manyworlds interpretations. ( http://en.wikipedia.org/wiki/Interpretations_of_Quantum_Mechanics )
Currently interpretations are interchangeable – they are consistent with QM axioms, but there isn’t experiments that can distinguish between. (Note: The bohm, transactional and relational interpretations aren’t consistent on the referenced link as they conflict with relativity, and the consciousness one is a nonscientific dualistic idea. It’s a crappy reference.)
What you call a bogus interpretation is the main Copenhagen interpretation. In this the observation makes the wavefunction collapse instantaneously. (It’s the naturalistic version of creationists ‘poof’ – except that it gives observable consequences. See also decoherence above.) I don’t think they define observation more than necessary, since it gives all sorts of philosophical problems. They go for the operational definition (as I did above. ;-), so no talk of disturbance.

42. BenE

Well I’m glad you gained from this discussion. I’m convinced that your new awareness to the importance of models and of separating them from reality is an indication that you are inching away from the frequencist perspective even if you don’t realise it. I will admit that there are still some debates to the definitions of some priors but people are slowly converging to the right ones, and usually the debated alternatives make pretty much no difference in practice.
“In QM systems genuine randomness persists with finegraining.”
E.T. Jaynes who is the major proponent of Bayesianism was actually a quantum physicist (among other things). He responded to the quantum argument (see page 11 of this book chapter) basically agreeing to Einstein’s “god does not play dice”.

43. Torbjörn Larsson

BenE:
Actually, theories and models vs reality is based in my education as a scientist. The same education makes me a realist, in the meaning that I believe objects such as atoms exists. It is the philosophical part that is confusing me. So no, I have been interested and confused earlier by bayesian reasoning, but I believe this thread cleared out a lot of my unresolved problems.
Einstein’s comment was made because he hoped that quantum theory would lead to a fully deterministic model. It didn’t. Hidden variables conflicts with locality, which is essential for lorentz invariance and thus causality.
It was interesting to look at Jaynes. His maximum entropy school is another idea, using also trajectories besides states. I trust statistical mechanics which defines entropy measuring the degree to which the probability of the system is spread out over different possible states.
But his QM discussion doesn’t answer the hidden variable question any more than Einstein. In fact he also introduces the idea that the usual interpretation means QM is totally random. This is a like ID painting evolution as totally random. It is wrong since systems develop deterministically between wavefunction collapse. Deterministic development is part of what the multifaceted concept of causality connotes.
Looking back at what bayesians do, I note an amount of philosophy. I can’t say that this is wrong as such, since other areas also implies and sometimes discuss philosophical ideas. What I do think is wrong is if the philosophy is in conflict with the major ideas of science, such as separating theory and models from reality and looking for facts instead of thruths. Some of the ideas I have looked at now seems to do so.

44. Torbjörn Larsson

To be totally clear, which I wasn’t above:
I think usual science does a good job of separating model and reality.
While bayesian thinking something isn’t an inherent property of a model and so the reality if properly modelled is to confuse the meaning of the model, and to state a model can be perfect is to confuse model and reality. We may come as close as we which in some cases. The model isn’t reality however, and we don’t model truths (“everything”) but facts (observables).

45. Torbjörn Larsson

“can almost always be made to reject the null hypothesis by adding more data”
Now i believe I understand this part of your previous comment on t-tests. While the test value is dependent on the degree of freedom, it is a failure of the test to add data during test, dependent on outcome. One must reject the whole data set and start over if one sees a need for more data. Otherwise one is indeed doing something the test wasn’t constructed for.
Rejecting data should be made in the same way. Ie if one must reject some data (for example temperature in a reactor during filling) it should be with a clear protocol (here after a filled reactor stabilised its temperature).
Geez, and I didn’t want to discuss testing…

46. Mark C. Chu-Carroll

BenE:
Jennifer has done work in a few areas of computational linguistics. In grad-school, she did planning-based natural language dialog systems. After we graduated, she worked at Bell Labs, still in dialog systems, but from a more statistical/machine learning approach. When Bell Labs fell apart, she moved to IBM, where she does question-answering systems. She did her masters degree at Waterloo with Robin Cohen, and her PhD at University of Delaware with Sandee Carberry. She was the program co-chair for this year’s NAACL Human Language Technologies conference.

47. Canuckistani

Torbjörn,

I’m using an operational view since I’m looking at it from an observation and modelling point of view. Yes, I meant observable, as events giving frequencies.

The phrase “events giving frequencies” goes much further than “observable events” — there are many observable events which cannot be sensibly embedded into ensembles or sequences where the calculation of relative frequencies makes sense. (If I seem slow here, it’s because I don’t want to assume you hold a position you do not, so I need you to state your position explicitly before I address it. That was the purpose of my question.)

Isn’t that much like asking what a charge means on an electron? Science doesn’t answer such questions, except by embedding it in the properties of QED.

Yes, it is very much like that. The charge is just a number, and its functional content only comes out when it’s embedded in QED, which makes predictions about observable reality. Likewise, probability is only a number — the question I asked you was to embed it in an interpretation to give it some content. Since you said “events giving frequencies”, I’m going assume that the meaning you ascribe is “probability == relative frequency of outcomes in some long/infinite series of trials”. There’s another meaning available: “probability == degree of plausibility, expressed as a real number, derived by extending classical, Aristotelian logic to situations where information is incomplete”. This is the so-called “logical Bayesian” interpretation.
Let’s look at the difference between these interpretations, in the context of Laplace’s estimate of the mass of Saturn. He stated “…it is a bet of 11,000 to 1 that the error in this result is not 1/100th of its value”. (The modern estimate differs from Laplace’s by 0.63%.) He explicitly means that the *probability* that his estimate is in error by more that 1% is 1/11001. This statement makes no sense in the frequency interpretation of probability, as Laplace’s estimate of Saturn’s mass is a one-time event, and cannot be embedded in a sequence or ensemble. It is a perfectly sensible statement in Bayesian interpretation of probability.
My position is that there are subtle but serious logical inconsistencies in the assertion that “probability == relative frequency of outcomes in some long/infinite series of trials”; by construction, there are no inconsistencies in the logical Bayesian view. This post is already getting long, so going into further details will require another post, which I will happily write provided you are still interested.

48. BenE

Torbjörn ,
We are probably going too deep in testing that is usefull for this discussion however, I just wanna point out that this:
“While the test value is dependent on the degree of freedom, it is a failure of the test to add data during test, dependent on outcome. One must reject the whole data set and start over if one sees a need for more data.”
is not what i meant. Some people chose to analyse a certain amount of data _before_ seeing their results which is considered acceptable. This happens a lot in social sciences, where there is almost always at least some minuscule significant effect between variables. So psychologist say. Okay, I’m not gonna use a lot of data for my experiments because I _know_ in advance that if I use a lot, I will most likely find something significant. Alternatively, the less honest psychologist will just use as much data as he can so that he finds something significant he can publish. No one knows what is just-enough-data. They just go at it intuitively. The problem with null hypothesis testing is that it doesn’y take into account the magnitude of the effect we are looking for. It just asks the question of wheter a value is different from the null or not. It doesn’t matter how much different it is. Lets take a hypothetical question:Is there a link between say drinking alcohol and cancer? Well there is probably some minuscule indirect link we are not interested in, that could say reduce your chances of cancer by something like 0.001%. This could be explained by anything. Maybe someone with latent cancer doesn’t feel as healty and is less social and thus drinks less. Now if your N is great enough, this will turn out significant in a t-test. Someone could proclaim: “drinking alcohol reduces your risk of cancer” when in reality the magnitude of this effect is too small to consider.

49. Torbjörn Larsson

Canuckistani:
“The phrase “events giving frequencies” goes much further than “observable events” — there are many observable events which cannot be sensibly embedded into ensembles or sequences where the calculation of relative frequencies makes sense. (If I seem slow here, it’s because I don’t want to assume you hold a position you do not, so I need you to state your position explicitly before I address it. That was the purpose of my question.)”
I don’t see why and which definition that attempts to go further. Don’t you first say frequencies are wider, then observable events? But the point is that the events we are discussing here by definition must be embeddable in measurable sample spaces. Measurability here is a weak requirement, isn’t it, since events already are subsets, and a lattice of subsets should make a measure possible?
The stronger requirement is probably that the measure is finite ie normalisable to 1. I guess there are problems in infinite hilbert spaces. Then again, I guess there are problems with all observables in infinite hilbert spaces.
I have no position on this but are trying to catch up on the kolmogorov definition. (My oldest math reference book mentioned Cox’s axioms IIRC. I think Kolmogorov’s axiom was mentioned at the university. Things change.) Apparently the main statistician view is to assign meaning to “objective properties of a population, real or hypothetical” ( http://en.wikipedia.org/wiki/Frequentist ). That should work well in science too. Any such property may well be observable.
“He stated “…it is a bet of 11,000 to 1 that the error in this result is not 1/100th of its value”. (The modern estimate differs from Laplace’s by 0.63%.)”
Seems he made an estimate on the error of another estimate. But not much of an effort of modelling the situation and deriving precision. Without any details on his attempt of estimate I can’t see if it was modellable. The usual models works with repeatable measurements of errors. For one-off estimates, see my comments above.
“My position is that there are subtle but serious logical inconsistencies in the assertion that “probability == relative frequency of outcomes in some long/infinite series of trials”;”
I don’t want to misstate your position, but what I see in bayesian reasoning such as this is a complaint on the incontrovertible difference between model and reality, and a wish to observe truths (assumed properties of reality) instead of facts (observable properties of phenomena). I have no more problem with frequencies being imperfectly modelled than any other property.
BenE:
You are correct, I misunderstood.
It seems you are saying that though the test value is correcting for the degree of freedom, it is a resolution problem here even though the test is designed with this in mind. If I understand correctly, you are saying that assumed cause-effect relationship may eventually become confused and unsubstantiated because they aren’t resolved and substantiated properly by mere correlation studies. Correlation aren’t causation. I believe that is an important observation.

50. Torbjörn Larsson

Canuckistani:
I should have looked further on Kolmogorov’s axiom and refreshed on measure theory. One doesn’t need a lattice but countable additivity. I don’t think that is a strong requirement since there exist a lot of measure spaces.

51. Canuckistani

Torbjörn,

I don’t see why and which definition that attempts to go further. Don’t you first say frequencies are wider, then observable events?

I’m having trouble deciphering what you’re trying to say here. I’m saying there are observable events which cannot be sensibly be embedded into a frequentist sequence or ensemble. A frequentist cannot assign or calculate probabilities for such events. A Bayesian can.

But the point is that the events we are discussing here by definition must be embeddable in measurable sample spaces. Measurability here is a weak requirement, isn’t it, since events already are subsets, and a lattice of subsets should make a measure possible?

You seem to have switched back to measurability in the sense of mathematical measure, which is rather straying from the issue I want to discuss: the meaning of probability, not the mathematics of it. I grant all the mathematical apparatus: sigma algebras, measurable sets, etc.

Apparently the main statistician view is to assign meaning to “objective properties of a population, real or hypothetical”

That was the main view in 1949, when Kendall said it. Nowadays, less so. The problem is that there is no consistent way to “define probability in terms of the objective properties of a population, real or hypothetical…” It is impossible to define probability without reference to single cases.

I don’t want to misstate your position, but what I see in bayesian reasoning such as this is a complaint on the incontrovertible difference between model and reality, and a wish to observe truths (assumed properties of reality) instead of facts (observable properties of phenomena). I have no more problem with frequencies being imperfectly modelled than any other property.

I have no idea how anything you’ve written here relates to anything to which it is supposed to be a response. I haven’t even begun to explain Bayesian reasoning. I’ve simply made a claim that frequentist probability doesn’t make sense, and Bayesian probability does. This claim doesn’t seem to interest you, since nothing you’ve written is addressed to it, but it is my central thesis. I will write a sequel expounding this thesis.

52. Canuckistani

The frequentist definition of probability is that it is the relative frequency of the occurence of some outcome in a long/infinite sequence of trials. We distinguish two possibilities here: (i) infinite sequences of trials (ii) long but finite sequences of trials.
In infinite sequences of trials, the relative frequency is a “tail property”, meaning that it is unaffected by the outcomes in any finite initial subsequence. For example, imagine a sequence in which outcome A has a relative frequency of 0.25 in the first 101000000 trials and a relative frequency of 0.5 thereafter. The frequentist is forced to state that the probability of A is 0.5, even though the relative frequency of A in any realistically accessible part of the sequence is 0.25. This disposes of the case of probability defined as relative frequency in infinite sequences of trials.
For the case of finite sequences of trials, I cannot argue more eloquently than Appleby [quant-ph/0408058]:

The standard way of relating a probability to the frequency observed in a sequence of repeated trials is thus critically dependent on the assumptions that (a) the trials are independent and (b) the probability is constant. We are so accustomed to making these assumptions in theoretical calculations that they may appear trivial.
But if one looks at them from the point of view of a warking [sic] statistician it will be seen that they are very far from trivial. The probability of a coin coming up heads depends as much on the tossing procedure as it does on properties of the coin. Suppose that, in an experiment to
determine the probability, one used a number of visibly different tossing procedures, without keeping any record of which procedure was employed on which particular toss. We would mostly consider the results of this experiment to be meaningless, on the grounds that the probability of heads might be varying in an uncontrolled manner. It is clearly essential, in any serious experiment, to standardize the tossing procedure in such a way as to ensure that the probability of heads is constant. This raises the question: how can we be sure that we have standardized properly? And,
more fundamentally: what does it mean to say that the probability is constant? Anyone who thinks these questions are easily answered should read chapter 10 of
[Probability Theory: The Logic of Science by E. T. Jaynes]…
Frequentists are impressed by the fact that we infer probabilities from frequencies observed in finite ensembles. What they overlook is the fact that we do not infer probabilities from just any ensemble, but only from certain very carefully selected ensembles in which the probabilities are, we suppose, constant (or, at any rate,
varying in a specified manner). This means that statistical reasoning makes an essential appeal to the concept of a single-case probability: for you cannot say that the probability is the same on every trial if you do not accept that the probability is defined on every trial.

53. Torbjörn Larsson

“I’m having trouble deciphering what you’re trying to say here. I’m saying there are observable events which cannot be sensibly be embedded into a frequentist sequence or ensemble.”
Okay, that was the part I got earlier.
“A frequentist cannot assign or calculate probabilities for such events. A Bayesian can.”
I’m not sure what these unembeddable events are, and how to get there from the definition.
But we have the examples Motl raises:
“when we predict the death of the Universe or any other event that will only occur once, we are outside science as far as the experimental tests go. We won’t have a large enough dataset to make quantitative conclusions.”
“it is equally clear that perfect enough theories may allow us to predict the probabilities whose values cannot be measured too accurately (or cannot be measured at all) by experiments. It is no contradiction. Such predictions are still “scientific predictions” but they cannot really be “scientifically verified”. Only some features of the scientific method apply in such cases.”
OTOH he also points out that:
“I am, much like an overwhelming majority of physicists, statisticians, and probability theorists (see the Wikipage about the frequency probability to verify my statement) convinced that it is only the frequency probability that has a well-defined quantitative meaning that can be studied by conventional scientific methods.”
“The predictions of quantum mechanics are always about the frequentist probabilities.”
It seems frequentists doesn’t allow bayesian estimates as probabilities. Not merely apriori by one useful definition but aposteriori by considering both their use and problems. And then we are back to the fact that bayesians need to convince textbooks authors and physicists before I’m convinced too.
“It is impossible to define probability without reference to single cases.”
I’m not sure what you say here. Are you again refering to “probability == relative frequency of outcomes in some long/infinite series of trials”?
My answer in that case is the same as before, model vs reality, and imperfect models.
“I’ve simply made a claim that frequentist probability doesn’t make sense, and Bayesian probability does. This claim doesn’t seem to interest you, since nothing you’ve written is addressed to it, but it is my central thesis.”
Frankly, that claims makes no sense to me considering how widespread frequentist methods are. They seem to make perfectly sense there and are useful, especially in science there they seem to be essential parts of formal theories of all sorts, including modelling experimentation.
If you note some disinterest it is probably because I have stated my own main position a number of times already. Besides usability in the prior paragraphs I have essentially stated:
“As for induction, one use the evidence to improve an hypothesis. But justifying hypotheses in science is made by a test procedure (with frequentist probabilities) and a finite set of data.
So while bayesian inference certainly works for filters and evaluating parsimony et cetera, I now find myself end up in Mark’s camp. It is an estimate that must be tested as any other estimate. ”
“I think usual science does a good job of separating model and reality. While bayesian thinking something isn’t an inherent property of a model and so the reality if properly modelled is to confuse the meaning of the model, and to state a model can be perfect is to confuse model and reality. We may come as close as we which in some cases. The model isn’t reality however, and we don’t model truths (“everything”) but facts (observables).”
“Looking back at what bayesians do, I note an amount of philosophy. I can’t say that this is wrong as such, since other areas also implies and sometimes discuss philosophical ideas. What I do think is wrong is if the philosophy is in conflict with the major ideas of science, such as separating theory and models from reality and looking for facts instead of thruths. Some of the ideas I have looked at now seems to do so.”
You may have an interest to try to answer these questions and claims with a careful explanation of what bayesian reasoning is. I started out with claiming a revelation about bayesian estimates.
Essentially our discussion now goes nowhere, no doubt due to me as evidenced in this lengthy comment, but also it seems to me because you don’t tackle the distinctions between model and reality, or probabilities and estimates, or science and philosophy, head on. So far what you say on behalf of bayesians becomes sorted in the later compartments by me, for good reasons it seems to me.
For whatever reason, you think your main claim “that frequentist probability doesn’t make sense, and Bayesian probability does” isn’t adressed. While I think I have been doing nothing else for a long while now: “The model isn’t reality however, and we don’t model truths (“everything”) but facts (observables).”.
Since we discuss besides each other we should probably stop.

54. Torbjörn Larsson

“This disposes of the case of probability defined as relative frequency in infinite sequences of trials.”
While I again sees that “The model isn’t reality however, and we don’t model truths (“everything”) but facts (observables).”. It seems like an attempt to confuse the failure of a model with the definition of frequencies. Clearly the system changed, so the model should change too?!

55. Torbjörn Larsson

Ah, now I see! You probably claim that this is one particular measurement. Sure it could happen, but by the law of large numbers it is *very* unlikely. If you repeat the measurement it will yield a result that is close to the expected. (Or the system is wrongly modelled as I noted in my previous comment.)
So this is, by me, still seen as an attempt to confuse model and reality.

56. Canuckistani

Torbjörn,
Yes, we do seem to be arguing past each other. Frustrating, isn’t it? I’m doing my best…

It seems like an attempt to confuse the failure of a model with the definition of frequencies. Clearly the system changed, so the model should change too?!

The frequentist position is that probabilities can only be defined, calculated and/or estimated once the sequence of trials is in hand. So, we don’t start with any model, any expectation, or any idea of how the outcomes are generated. We start with the sequence. If that sequence is infinite, then the relative frequency is a tail property, i.e., it has nothing to do with anything we might care about. If the sequence is finite, it only makes sense to use it to estimate the probability if we already have the prior notion that the probability is fixed from trial to trial. Hence probability cannot be defined solely in reference to a sequence of trials.
I haven’t addressed your point about model versus reality because as far as I know, nothing I have written suggests a model of any kind. What kind of model do you think I am talking about?

And then we are back to the fact that bayesians need to convince textbooks authors and physicists before I’m convinced too.

Bayesians are textbook authors and physicists. Seriously, google E.T. Jaynes and David McKay for Bayesian textbooks and articles, written by physicists, freely available on the internet.

57. BenE

Torbjörn ,
“It seems you are saying that though the test value is correcting for the degree of freedom, it is a resolution problem here even though the test is designed with this in mind. If I understand correctly, you are saying that assumed cause-effect relationship may eventually become confused and unsubstantiated because they aren’t resolved and substantiated properly by mere correlation studies. Correlation aren’t causation. I believe that is an important observation.”
That’s not quite it. I think the problem of causality is inherent in both type of probabilities. I have my ideas on that, but that’s another discussion. I was rather talking about magnitude. Because frequencist null hypothesis testing ignores magnitude, it fails in the case I have described above. (re-read it while ignoring anything about causality) You can almost always reject a null hypothesis in a real world situation if you have enough data because there are almost always at least tiny effects between variables, that regardless of what causal interpretation you give to your data.

58. BenE

To elaborate further, what I am saying is that asking the question “is some variable significantly different from a fix “null” value?” which can almost always be proven true because variables have almost always at least a tiny difference from the null , (especially in social science problems) is wrong. Bayesians say we should be asking instead “what is the probable value of the variable?”. Given the data you can assign a confidence interval and say that there is 95% it is between such and such value. However, to calculate this, you need to assign a prior distribution to the variable.

59. Torbjörn Larsson

Canuckistani:
I believe I will adress most of your latest comment best by expanding on my analysis on your earlier example on infinite and finite trials. It is also IIRC often used examples by bayesian, and perhaps I can adress them with my newfound picture and meager knowledge of probability basics.
I’m also inspired by a blog elsewhere where the poster accurately concluded that science must mean one can explain one’s own ideas instead of criticising the other side – the later is akin to creationism. But my analysis will lead to some criticism on bayesian reasoning.
As I understand it we can have three frequentist/science cases. Either we have an apriori formal model, which the measurements are compared to. Or we don’t, but we want to describe the measurements anyway, ie we make an aposteriori description (adhoc model) of them. Or we make an adhoc model of the system. (The two later usages are essentially the same IMO, so I will not distinguish them later.) Models are never built by an infinite number of measurements, nor are models when tested justified by an infinite number of experiments. This is also true here.
On the infinite trial there are the modelling problem – a finite number of measurements are made. But there is also another fundamental problem that Motl raises. If we use measurements we should see to it that we don’t predict events that only occurs once. This should be true for predictions on the one-off measurement series as well. And even if the universe is timelike infinite, expansion and entropy will eventualy prohibit further measurements. I don’t think the inherent repetability of probabilities helps – if we are making serial experiments on one system we will still make only one prediction.
We will still have the analogous problem if we confine ourselves to a reasonable number of finite trials. The tail of a set of measurements may have a different relative frequency.
Is this unexpected? No. Models, whether formal or not, will give frequencies with spread, that never has 0 value somewhere around the expected value, so we know that a measurement of frequencies may (will) always deviate from the expected. And even if the expectance value could go to 0 due to the use of measures there will always be allowed an infinite number of deviating measurement sets with measure 0.
I restate that following through the one-off measurement series and observing a deviance from an existing model will indicate that either the measurements are wrong (strong formal model) or the model is wrong due to changed system (weak formal model or adhoc model).
If we are using the masurements to model the observations adhoc, we will take care that the measurement set isn’t varying as you suggest, or if the variance is persistent introduce it into the model.
After this analysis it should come as no surprise that I criticise bayesians with discussing infinite and/or one-off measurements. Those aren’t used in frequentist models. And they are an attempt to discuss imagined ‘thruths’ instead of observable facts. It is a two-pronged criticism instead of a constructive attempt.
The first paragraph of the citation from Appleby seems to concern the same or similar attempts to confuse model and reality. He raises the same concerns about spurious phenomena and events that a simple formal model may not contain, nor a corresponding adhoc model, that I have dealt with earlier. Deviations from a simple model compared to a more complex model, or deviations of a model compared to reality, is expected and nothing special.
The second paragraph is probably concerning estimates on systems where a formal measure on infinite ensembles can’t be made. I must confess I don’t understand what this is about, since the kolmogorov definition of probabilities I look on allows finite sample spaces.
Nor do I understand why the measurements used for aposteriori models are “supposed” to be constant or welldefined varying. Except if it is another attempt to confuse the model with its welldefined properties with reality.
The appeal to “one-off probability”, ie single measurement series, is besides the point. Repetitions are possible, even a must to justify the model, so it isn’t one-off measurements in Motl’s sense. The model will both have a certain probability spread and are allowed (expected) to deviate from reality, since a model isn’t even a perfect replica of reality, and even less reality itself.
In conclusion I think frequentist probability passes the test of doing that is asked for and delivering the models. And I found the same criticism in the discussed comment as in your latest I’m starting to see whenever bayesian estimates is discussed for replacing probabilities. It is destructive, not constructive.
Regarding textbooks (a constructive point, finally 😉 it was quite some time I attended university for these courses. But I don’t think bayesian estimates are taught in regular probability courses, especially if they are basics for physics science. Sorry to seem to be changing goalposts, I should have been more specific earlier.
BenE:
Okay, I can see that too. While the t-test adjusts for the number of measurements and the relative magnitude vs spread, it doesn’t concern absolute magnitude in the difference between population means.
It only tests that difference exist.
This isn’t (shouldn’t be) a problem in science, but perhaps in technical applications. When justifying theories areadependent but firm confidence limits are (should be) used. In physics it seems the latest standards are that 5 sigma is needed for justifying existence of a new, untheorised phenomena, while 3 sigma is used for justifying theories. (This is googleable, since that is how I found out.)
Incidentally, one of Motl’s (oh, no, him again – I’m starting to get mottled commentaries) pet peeves seems to be the low or nonexistent standards that are used in medicine. He doesn’t care much for 0.6 sigma or so differing mean of biological or sociologial tested populations. So I think you, him, and me agree on the problems with low differences, with or without confused causation.
Both:
This has been educational, even though I with your guidance have mostly explored my new understanding from the epiphany Mrk gave me. What I’m less interested in now, since it seems to me I have a feasible and consistent view, is to explore bayesian reasoning further. At this point, maybe later when new experiences have problematiced probabilities or estimates.
Feel free to criticise my analysis, but if you could somehow restrict further argumentation for now, it would be preferable. Also, we have misused this thread for some time now on my insistence, for a good purpose but nevertheless.
Thanks!

60. BenE

I few points.
-I don’t understand what “one-off” means. I googled it and nothing sensical came back.
-I think you misunderstood Canuck’s point that it is the _frequencist_ view of probability which rests its definition on limits to infinity and infinite sampling. Baysians get rid of those erronous assumptions that are based on situations that never happen. Canuck mentioned the infinite stuff as a flaw to frequencism.
-Jaynes’ book was only released in 2003, before that, Bayesianism was pretty muched snuffed out. You have to give it a little time for it to take root in academia.
-I don’t think anyone competent uses a 0.6 sigma confidence level.
-But it doesn’t matter, even with 3 or 5 sigma, if you have enough data, as I demonstrated, you are likely to be able to reject the null hypothesis anyways based on consistent minuscule errors. To be objective, you _have to_ look at the magnitude of the effect. To do this mathematically, you _have to_ assign a prior to the range of different possible magnitudes.
-This apriori formal model you speak of does not exist for real things. _All_ models of reality we have in science are based on empirical evidence. Thus, all models are of the kind you describe as “ad hoc”. Sometimes as a mathematical excersise you can define a formal model, however, since it is not based on empirical evidence it is much farther to reality than the ad hoc models.
-ad hoc is a very bad term here as Bayesian theory provides a very consistent framework to make these models.
-It is rather those 3-sigma, 5-sigma values that should be called ad hoc since there are no good reason to pick these values and they can be defeated by enough data anyways.
regards,
Ben

61. BenE

Wow I just read the Motl posts. That guy is seriously misguided.

62. Torbjörn Larsson

BenE:
“I don’t understand what “one-off” means. I googled it and nothing sensical came back.”
Sorry, it was an expression I imagined I heard once or twice. I mean unique cases. Probabilities aren’t well defined for them, while some information such as estimates remain available.
“I think you misunderstood Canuck’s point that it is the _frequencist_ view of probability which rests its definition on limits to infinity and infinite sampling.”
I question that, since the definition of probabilities that I referenced allow finite sample spaces. So it seems to me to be the _bayesian_ view of probabilities.
But even if this would be the case, the use in a model with finite sampling means that it isn’t a real problem. The frequency will be a fair estimate whether it is derived from a formal model or modelled from measurements. Models are simplifications of reality, so this is no different than estimating any other parameter. And models are justified by a finite set of measurements. (A new set if the probabilities are modelled adhoc from measurements.)
One can now ask if an estimate of probability is better than bayesian estimates. The ability to derive them apriori from a formal model instead of aposteriori from measurements make them less adhoc and similar to all other derived parameters. They are stable by definition. And I expect them to be precise.
As I read these bayesian arguments, they are negative arguments about probabilities based on a contrived idea of theoretical epistemology instead of practical use. Methods are judged by the later. Neither does negative argument argue that another theory is correct. These arguments aren’t impressive.
“Baysians get rid of those erronous assumptions that are based on situations that never happen.”
I’m not sure what that means. In my view we are modeeling probabilities. There will always be differences between the model and reality, so there will be situations that never happens in the model but happens in reality. In a good model, the difference will be small.
“Jaynes’ book was only released in 2003, before that, Bayesianism was pretty muched snuffed out. You have to give it a little time for it to take root in academia.”
Sure. The day bayesian methods replace frequentist probabilities in sciences, I will start to use them too. Meanwhile I can use them where they are accepted tools, such as in parsimony comparisons. (Though they aren’t the only such method, and not good in every such situation.)
“But it doesn’t matter, even with 3 or 5 sigma, if you have enough data, as I demonstrated, you are likely to be able to reject the null hypothesis anyways based on consistent minuscule errors.”
Based on firm confidence limits a physicist should not reject the null hypotheses if there were less than 5 sigma between the means of the two populations when studying if a new phenomena disconnected from earlier theory exists. (If the mean were the only measure looked at.)
As I understand it the t-test is usable for small populations, when it isn’t safe to look at certified means, but the variance becomes important too. It is exactly in those situations firm confidence limits are useful to preclude finding phenomena that aren’t there. The t-test wouldn’t be exactly compatible with the confidence limit which is based on the null population. But if the deviant population is similar, it would be a compatible procedure.
Hoe does consistent minuscule errors add up to a 5 sigma difference in (nearly) normalised means?
“To be objective, you _have to_ look at the magnitude of the effect. To do this mathematically, you _have to_ assign a prior to the range of different possible magnitudes.”
Actually, I believe we are saying the same thing here. Every sample plan for measuring probabilities must be well defined, and every area must have firm limits. In technical areas there are a whole lot of arbitrariness going on. I hope it isn’t so in medicin or biology. I would be surprised to see 3 and 5 sigma population differences, but when firm limits become even more important.
“This apriori formal model you speak of does not exist for real things. _All_ models of reality we have in science are based on empirical evidence. Thus, all models are of the kind you describe as “ad hoc”.
Sometimes as a mathematical excersise you can define a formal model, however, since it is not based on empirical evidence it is much farther to reality than the ad hoc models.”
Not so. Formal models are of great importance in science, and they aren’t mathematical. Mathematicians or perhaps even statisticians may find this unfamiliar.
As a simple example, take modelling the kinetic energy from a moving body. If you know classical mechanics, you derive a model from basic principles: E(kin) = mv^2/2. This isn’t a mathematical exercise, this is a use of physical model. For example, you immediately know what unit the energy is measure in. A mere mathematical model would not contain such information. You also know that m, v, E are welldefined and a lot of their basic and derived properties and use in other theories.
If you don’t know classical mechanics, you must guess and verify which properties affects the kinetic energy. Perhaps you guess that mass and velocity is essential. So you make experiments which measure m, v, E in some manner and form an adhoc model with no connection to classical mechanics.
If you do it correctly you will end up with something close to the formally derived expression. But you will make some errors in coefficients if you don’t use parsimony to round off, and you will have to repeat this experiment a number of times in different situations because you don’t know if there is something not yet accounted for that affects the situation. The formal model is immediately trustworthy, after justification with experiments to see that there are no errors in the model, across the whole class of such situations.
The use of formal models in theory cannot be understated. They parsimonously express knowledge, and may enable forceful and stable modelling. Even adhoc empirical models are simplified since as I noted above they aren’t certified for all situations but merely the modelled one during the time it was measured on. And, again, the difference between simplified models of various kinds and reality is wellknown.
“ad hoc is a very bad term here as Bayesian theory provides a very consistent framework to make these models.”
When I say adhoc it is because they don’t have the connection to the formal theory for the observed system as probabilities have. Probabilities are immediately derivable from theory. The meaning of adhoc for nonconnected models or parameters invented to explain measurements that can’t be derived formally is exactly this – something formed for a particular purpose, with no connect to the rest of the theory.
“It is rather those 3-sigma, 5-sigma values that should be called ad hoc since there are no good reason to pick these values”
I believe that is a correct and interesting observation. They are part of methods. The method and the values are defended by the observed use, as all other methods in science. They are all ad hoc, as all other methods in other areas. It is their nature.
“and they can be defeated by enough data anyways.”
If they can, people would probably not use them. That goes back to my question above, why do you believe small errors add up in these methods? Ordinarily errors average out and variances tighten by laws of large numbers. Why is the case of a null hypotheses different?

63. Torbjörn Larsson

BenE:
Motl isn’t a crank, but he do have polarised opinions, and aren’t afraid to show them. As far as I know he do peerreviewed work in theoretical physics.
As I have read his blogs at times I remembered his discussion on probability and bayesian methods in science, which suited the situation. I’m not used to think about the use and meaning of probabilities and estimates so I took the opportunity to cite him, especially since he raised so many interesting points bearing on this discussion as I see it.
One must take such a reference for that it is, a particular scientists view on his area, in this case negative to bayesians. (And a lot of other stuff … 😉

64. Torbjörn Larsson

BenE:
I said “They are all ad hoc, as all other methods in other areas. It is their nature.”
Thinking about it I now believe I overstated. But the point is still very valid, there aren’t much theory behind methods and especially what is called “the method of science”, AFAIK. There are a lot of conflicting philosophical ideas, though.
But the point is that we can live with it as methods are justified by success. For theoretical models we have a higher ambition, formal theories are more useful.
BTW, another useful thing I forgot above is that theories connect by formal models. For example, to connect back to the example model above, velocity as used in classical mechanics and as used in quantum mechanics.

65. BenE

“As I read these bayesian arguments, they are negative arguments about probabilities based on a contrived idea of theoretical epistemology instead of practical use. Methods are judged by the later. Neither does negative argument argue that another theory is correct. These arguments aren’t impressive. »
That is why I provided the the concrete null hypothesis example.
“Hoe does consistent minuscule errors add up to a 5 sigma difference in (nearly) normalised means?”
That is easy, it happens all the time in the social sciences, because there are a lot of indirect effects between variables, but it can happen in physics too because of things like imperfect measuring tools or environmental disturbances.
Your null hypothesis is a prediction of a fixed value, often zero. When you take a sample of data the greater the amount of data the smaller its null hypothesis standard deviation should be (the sigma). Let’s say you are measuring a magnetic field, but there is a computer speaker in the room next to yours with a magnet which skews your result to one side. Now the effect of this speaker is very very small, so that it changes your results by 0.01% (relative value of 0.0001) only. However with a null hypothesis, the more data you have the smaller your sigma is. With an infinite number of samples the sigma would be 0. But you don’t need an infinite number of samples, you can get just enough samples so that the sigma is something like 0.00001 and there you have it, the 0.0001 error cause by the speaker is a 0.0001/0.00001= 10 sigma phenomenon that makes you reject the null hypothesis. What you are doing here is rejecting the null hypothesis of 0 for an almost identical hypothesis of 0.0001 caused by a speaker in the room next to you. Now the frequencist approach doesn’t let you say that there is an alternate hypothesis of 0.0001. You can only say that the 0 hypothesis was rejected and in general with enough data you can always reject the null hypothesis.
The Bayesian approach would tell you that the most likely hypothesis is parameter=0.0001. However, since this is so small and close to zero we can attribute this effect to disturbances and for practical purposes accept the 0 value. Thus practically we accept the null hypothesis. We arrived a two widely different conclusion. The frequencist found that he has found a significant effect of 10 sigma in his experiment. The Bayesian also found that, but by looking at the magnitude, he noticed that it is so small that in practice the effect should be considered null.
“As a simple example, take modelling the kinetic energy from a moving body. If you know classical mechanics, you derive a model from basic principles: E(kin) = mv^2/2. This isn’t a mathematical exercise, this is a use of physical model. For example, you immediately know what unit the energy is measure in. A mere mathematical model would not contain such information. You also know that m, v, E are well defined and a lot of their basic and derived properties and use in other theories »
Kinetic energy is not actually a real thing you can measure in objects. It is a mathematical way of representing speed. And since movement is relative, it depends on your point of reference (I think they say inertial frame?). This is a completely mathematic transformation.
Now the units and constants, the ones relating speed and force etc, those are empirical, they were found by measuring them and thus are part of the non formal models. “you derive a model from basic principles” the basic principles are also from non formal models. They were measured approximately, and in the case of classical mechanics even proved later to have little precision at high velocities by Einstein. All the real data we have stems from informal predictions. The formal models are simply mathematical tools that allow us to manipulate the reality based informal models.
“When I say adhoc it is because they don’t have the connection to the formal theory for the observed system as probabilities have. Probabilities are immediately derivable from theory. The meaning of adhoc for nonconnected models or parameters invented to explain measurements that can’t be derived formally is exactly this – something formed for a particular purpose, with no connect to the rest of the theory. ”
As I said the formal theory was made from informal measurements at some point. You can’t get very far away from the informal models. They are the one which science deals with. The formal models do not relate to reality when they don’t use information from and informal model.
“If they can, people would probably not use them.”
I don’t understand why they do. I guess it’s the best they had before bayesianism.

66. Canuckistani

Torbjörn,
You are right to say that I have only been attempting to tear down the frequentist view. I wanted to convince you that there were problems with its consistency before I moved onto the justification for Bayesian probability. In connection with that, I will note these two passages:

Nor do I understand why the measurements used for aposteriori models are “supposed” to be constant or welldefined varying. Except if it is another attempt to confuse the model with its welldefined properties with reality.

It rather surprised me that you had written this, because earlier you had written:

If we are using the masurements to model the observations adhoc, we will take care that the measurement set isn’t varying as you suggest, or if the variance is persistent introduce it into the model.

As far as I can tell, these are exactly the same ideas; what you advocate in the second quote is exactly what is question in the first quote.
I just want to examine this idea behind “we will take care that the measurement set isn’t varying” a little more closely, in the context of coin flips. If we perfectly replicate a coin flip, (or any “random” experiment in principle explainable by deterministic physics,) then the outcomes are all the same. The phrase “we will take care that the measurement set isn’t varying” isn’t a statement about a lack of actual physical variation in the experimental setup, but rather a statement about our information about the experimental setup. The fact that there is no discernable difference in the setup of the experiment from trial to trial is what permits us to model it as having a fixed, unknown probability. The technical name for this state of information is “exchangeable trials”.
On the question of models versus reality, I’m not really clear on exactly what it is you are trying to assert about the Bayesian viewpoint — but I am sure it’s an incorrect assertion. E. T. Jaynes, one of the main proponents of the logical Bayesian viewpoint, had this to say about the difference between models and reality:

It is very difficult to get this point [the difference between models and reality] across to those who think that in doing probability calculations their equations are describing the real world. But that is claiming something that one could never know to be true; we call it the Mind Projection Fallacy…

Here is another pertinent quote:

All models are wrong, some models are useful.

G.E.P. Box Robustness in the Strategy of Scientific Model Building (1979)
I think you are perceiving a problem with the Bayesian viewpoint which is not actually there.
I’d also like to emphasize that all frequentist probability calculations are valid Bayesian probability calculations. (Not all Bayesian probability calculations are valid frequentist calculations.) The disagreement is about what bearing certain probability calculations (hypothesis tests, unbiased estimators, p-values) have on the question of scientific inference. These calculations are the basis of frequentist inference; Bayesians regard them as valid statements of probability theory, but irrelevant to the inference scientists would actually like to perform. Bayesians claim that the relevant quantity is always the posterior distribution.
Construction of Bayesian probability theory from a logical Bayesian point of view to follow shortly.

67. Canuckistani

BenE,
The publication of the Jaynes textbook didn’t spark the Bayesian revival. The history goes something like this:
In 1948, Abraham Wald proved his Complete Class theorem, demonstrating that “admissible” decisions are Bayes decisions, i.e., decisions that maximize the Bayesian posterior expected utility. Wald was a frequentist of the von Mises school, so this came as something of a shock to him. The Neo-Bayesian revolution resulted. (Wald died in a plane crash in 1950.)
In 1955, Stein demonstrated his famous shrinkage paradox. This led to the empirical Bayes method, which was eventually superceded by true hierarchical Bayesian estimation.
From about 1970 to 1990, Bayesian methods plateaued, due to the “curse of dimensionality”, which restricted practical Bayesian analysis to smallish models.
In 1990, Gelfand and Smith re-introduced the Metropolis-Hastings algorithm, a method for constructing a Markov chain with an arbitrary stationary distribution. This made previously infeasible Bayesian calculations possible. Another explosion of Bayesian analyses resulted.
If Jaynes is all you’ve read, for your next Bayesian textbook I recommend Bayesian Data Analysis, 2nd ed. by Gelman, Carlin, Stern and Rubin.

68. Canuckistani

Logical Bayesian construction of probability theory (short short version):
Suppose we wish to extend classical, Aristotelian logic (including the law of the excluded middle) to deal with uncertain propositions. We shall need to extend the range of possible truth values, and we more-or-less arbitrarily extend them to the real numbers. (Exactly one real number per proposition, so we’re not getting into Dempster-Shafer belief functions here.) We want to capture the concept of plausibility, and we choose to have a larger real number represent greater plausibility.
We want our new system to reduce to classical logical when all the truth values are known. So it must obey syllogisms like:
(i) if A then B
(ii) A
————–
therefore B
and
(i) if A then B
(ii) not-B
————–
therefore not-A
But we also want it to obey “fuzzy” syllogisms that express common-sense reasoning, like
(i) if A then B
(ii) B
—————
therefore A is more plausible
and even
(i) if A then B is more plausible
(ii) B
———————————
therefore A is more plausible
Consistency with these “fuzzy” syllogisms (and the true syllogisms of classical logic) requires that the there exists two specific functional relationships. On prior information X, let (A|X) represent the plausibility of A given X, let (A and B|X) represent the plausibility of the conjuction of A and B, and let (B|A and X) represent the plausibility of B given both A and X. (Caveat: if A is known to be false, then we refuse to define plausibilities given A.) Then the following functional relationships must hold:
There exists functions F (with two arguments) and S (with one argument) such that:
(A and B|X) = F{(A|X),(B|A and X)}
(not-A|X) = S{(A|X)}
From the transitivity of the logical conjunction, we derive the famous Associativity Equation for the function F:
F(x,F(y,z)) = F(F(x,y),z)
Solving these two functional equations (under certain technical assumptions about their properties such as continuity, differentiability, monotonicity, etc.), we find that there must exist a monotonic function (with one argument), call it p(), with the properties that
p(A and B|X) = p(A|X)p(B|A and X)
p(not-A|X) = 1 – p(A|X)
The function p() has the following properties:
p(True|X) = 1
p(False|X) = either zero or positive infinity
We arbitrarily choose zero to represent falsehood; this entails no loss of generality, as the alternate choice simply leads to values that are the reciprocal of our choice. Our choice makes p() a monotonic increasing function; the alternate choice makes p() a monotonic decreasing function.
So, to summarize:
(i) if we want to extend classical logic to uncertain propositions;
(ii) if we want to use one real number to represent the plausibility of the proposition, with larger real numbers corresponding to greater plausibility;
(iii) if we want plausibility to follow certain “fuzzy” syllogisms that encapsulate common-sense reasoning,
then there must exist a monotonic function which maps our plausibilities onto mathematical probabilities. Hence probability theory is an extension of classical logic to uncertain propositions.
Let’s briefly review one of the fuzzy syllogisms:
(i) if A then B is more plausible
(ii) B
———————————
therefore A is more plausible
Let us identify prior information X with premise (i) of this “syllogism”. Premise (i) says p(B|A and X) > p(B|X)
We have, from Bayes Theorem,
p(A|B and X) = p(A|X) [p(B|A and X) / p(B|X)]
Hence, p(A|B and X) > p(A|X) in agreement with our fuzzy syllogism.

69. BenE

Canuckistani ,
I’ll read your book next. I’m just finishing reading Jeffreys book which is older than everything you mentioned but still quite compelling.

70. Canuckistani

BenE,
You know, I bought the Jeffreys book, but it just didn’t do it for me. A fair portion of it was descriptions of obsolete computational techniques, which I just found dull. The primitive typesetting for mathematical notation was also an obstacle.

71. Torbjörn Larsson

BenE:
You discuss a relative error. That is an experimental error, that has to be checked and controlled. And strictly speaking, it doesn’t add up, its the variance of the populations that decreases.
I see nothing special about relative errors and other model or experimental defects. In this case with a very tight variance relative errors become extremely important and must be controlled. Otherwise the firm limit looses its meaning, as you suggest.
“Kinetic energy is not actually a real thing you can measure in objects. It is a mathematical way of representing speed.”
Perhaps I shouldn’t have proposed a physics example. Respectfully, this is all seriously wrong. I don’t really know where to start explaining how energy is the fundamental basis for the physical action principle in the modern formulations of classical and quantum mecahnics and that it can be measured in a number of ways.
I will say this though:
“the basic principles are also from non formal models.”
Sure, basic principles are derived from experiments, but also connections with other theories. But the point is that if they are used in the formal theory they are part of it. You can’t conflate everything down to ‘it is all experiments, and bayesian estimates doesn’t need formal theories’. Science is more than that.
Nor can you conflate formal theories with mathematical descriptions. They contain much more than that, and they have interrelations against experiments, with each other, and with the methods of science that isolated mathematical descriptions lack.
At this point we have started a I say-you say vicious circle, and different views of science separates us. I don’t see the point of continuing in this manner.

72. Torbjörn Larsson

Canuckistani:
“As far as I can tell, these are exactly the same ideas; what you advocate in the second quote is exactly what is question in the first quote.”
I don’t see your point of being a question. I advocate that we don’t suppose anything about the measurements but that we have characterised the experiments and adjusted the model accordingly. I’m glad I show some consistency. 🙂
“On the question of models versus reality, I’m not really clear on exactly what it is you are trying to assert about the Bayesian viewpoint”
I’m claim that frequentist probabilities are directly derivable from an expressive enough formal theory. I also claim that bayesian estimates are modelless or are used for adhoc models not directly derivable from formal theories. I also think that bayesian reasoning doesn’t use the model-relaity distinction, but seem to argue epistemological issues that are either wrong or obscures the use of the different methods.
“All models are wrong, some models are useful.”
Yes, I saw that citation first time this week on another blog. It is a really good one, funny and to the point. I used Box-Behnken type designs to confuse me and others years ago in adhoc process modelling in semiconductor processing equipment. 😉
“all frequentist probability calculations are valid Bayesian probability calculations”
That is what Mark, Motl and I contest. I’m sure bayesian estimates can be used to form a model like the one derived directly from theory. But it doesn’t mean anything without measurement AFAIK, it is the meaning of the apriori-aposteriori conditional usage in Bayes theorem.
“Construction of Bayesian probability theory from a logical Bayesian point of view to follow shortly.”
IIRC one may condense down defining ‘probability’ to one axiom with bayesian estimates. I don’t know why that definition doesn’t go through except in the naive sense I discussed on conditionals in Bayes theorem. So I have to trust the experts that say that the ‘probability’ isn’t probability but remains an estimate.

73. Torbjörn Larsson

Canuckistani:
I see you went through with your definition.
The obvious problem I can see is that probabilities models facts from observations as kolmogorovs axiom makes possible in a rigorous manner, not “thruths” or “plausibility” (fuzzy syllogisms) which are philosophical ideas about reality.
The problem is more precisely “there must exist a monotonic function which maps our plausibilities onto mathematical probabilities”. Plausibilities isn’t an observationally derivable concept such as events is, so we don’t know if such a map exists or what its use is. Bayes theorem remains unconnected to anything else and its output remains being conditional probabilities (estimates).
So perhaps I don’t have to trust the experts if it is that simple to see the problem. 🙂

74. BenE

“I also think that bayesian reasoning doesn’t use the model-relaity distinction, but seem to argue epistemological issues that are either wrong or obscures the use of the different methods.”
Most certainly not. The whole baysian theory rest on the important distinction between model and reality. This is the exact particular reason why we interpret probabilities as “degree of belief”. We assign probabilities to models and NOT to objects directly. That’s what makes em “a belief”.
Belief=model of reality.
Whereas the frequencist method confuses the model with reality by saying the probabilities are inherent of things. Jaynes calls this, as Canuck said, The “Mind Projection Fallacy”. The Mind Projection Fallacy, occurs when you attribute probabilities to objects instead of to their models. Let me reiterate. Frequencists will say that a coin has .5 chance of falling on heach side. Bayesians will say that our belief (our internal model) predicts it will fall 0.5 heads.
Using the word _belief_ means that our model predicts the 0.5 value. This is opposed to frequencists who would say that the coin actually carries this probability. Talking about belief is just a way to recognise that we are _not_ talking about reality but about _models_ of reality. Using the word “belief” is a terminological method to emphasize the distinciton between model and reality.
I’m guessing the word _belief_ was chosen because it makes reference to how our brain is assumed to be a modelisation tool. Our brain is a machine that makes models of the world. Learning new things is akin to updating our internal model of the world.

75. Torbjörn Larsson

Sure. Do we agree? No.

76. Canuckistani

Torbjörn,

I don’t see your point of being a question. I advocate that we don’t suppose anything about the measurements but that we have characterised the experiments and adjusted the model accordingly.

Sorry, that should read, “…what you advocate in the second quote is exactly what you question in the first quote.” By which I mean, you question why Appleby requires that the experiment not change, or at least change in a modellable way, but then advocate the exact same requirement.

“all frequentist probability calculations are valid Bayesian probability calculations”
That is what Mark, Motl and I contest.

First, you’re just wrong. Bayesian probability calculations are a superset of frequentist probability calculations. Frequentists claim only those calculations limited to their subset are valid and have real meaning; they furthermore derive certain probability statements which they claim can be the basis of statistical inference. Bayesians claim that although those probability statements are mathematically correct, they are devoid of inferential content, and that statistical inference should be based on the posterior distribution, a probability calculation not found in the frequentist subset. If you want to contest this statement, you’ll have to produce a counter-example or some kind of argument.

I’m claim that frequentist probabilities are directly derivable from an expressive enough formal theory.

If you mean the Kolmogorov axioms, these say nothing about either the frequentist or Bayesian positions, and is consistent with both (or at least, with the logical Bayesian position). Both frequentism and Bayesianism are philosophical positions about how mathematical probability as defined by the axioms corresponds to the real world. Or rather, Bayesians claim that probabilities are not properties of the real world, but only of states of information about that world. So what exactly is the formal theory from which the frequentist position is derivable?

Plausibilities isn’t an observationally derivable concept such as events is, so we don’t know if such a map exists or what its use is.

Real numbers are not “observationally derivable”. Are you saying you do not do reasoning about plausibility at all? Or that you do it, but not in any consistent way? If you, personally, agree that the fuzzy syllogisms are a sensible way to do plausible reasoning, then you’ve given the concept of plausibility enough structure to build a mathematical theory about it. (And if you are the sort of person who carries an umbrella when the skies look cloudy, then you are the sort of person who reasons using the fuzzy syllogisms.)

Bayes theorem remains unconnected to anything else and its output remains being conditional probabilities (estimates).

All probabilities, frequentist or Bayesian, are conditional probabilities. Even frequentists agree with that statement. They maker this statement because even they start theor calculations with some sort of probabilistic model of whatever it is they’re studying. The conclusions they draw are conditional on the model. If this is is an argument against Bayesian probability theory, then it is equally an argument against frequentist probability.

77. Canuckistani

Torbjörn,
You know what would be really convincing? Take a Bayesian analysis (a real one, textbook or peer-reviewed, not woo like arguments for God) and show how it is wrong. Just knock the shit out of it. Then, show how frequentist statistical inference gives a better answer.
In case you’re wondering, I have a couple of examples of the converse. You just have to ask.

78. Torbjörn Larsson

Canuckistani:
Okay, now I see your confusion. Yes, I’m questioning his idea that we suppose constancy or some varying. It must be measured and modelled.
I will bypass the discussion on my claims, since I’ve already explained to long length why I make them and where we differ at this time, and go to the new stuff:
“Real numbers are not “observationally derivable”.”
This is a bit besides the point, since we aren’t discussing math but physical parameters such as mass, charge, frequencies of events. My view on math is that it is based on observations and idealisations on the real world. I think real numbers are an excellent model of a dimensional measure. I’m not a “modellist” in extremis, I just don’t see that Platonic cryptodualism is useful, nor compatible with naturalism.
“Are you saying you do not do reasoning about plausibility at all?”
I’m saying that it is based in philosophy and logic. But we can do estimates of probabilites, if that is what you want to call plausibility instead of fuzzy syllogisms. One such estimate, adhoc by nature, is the bayesian estimate.
“the concept of plausibility enough structure to build a mathematical theory about it”
Yes, but a formal theory must be about something observed to be meaningful.
“who carries an umbrella”
I hate umbrellas, people use them to try to poke my eyes out “by accident”. 🙂 🙂 🙂
“All probabilities, frequentist or Bayesian, are conditional probabilities. Even frequentists agree with that statement.”
Sorry, I meant that Bayes theorem gives conditional probabilities if applied on probabilities, and estimates if applied adhoc in the bayesian fashion.
“You know what would be really convincing?”
Now we come back to the purpose of this thread. With your and BenE help I’ve been able to expand on and challenge my newfound view on probabilities as much as I’m able at this time. I’m not trying to convince either of you, and looking on other stuff I’m interested in at this time this topic doesn’t cut the threshold to continue with. Part of the reason is also that I lack the tools to continue at this time, since I undoubtedly should read on the foundations of probabilities, bayesian reasoning and especially decision theory where bayesian estimates seem to be useful too.
So even if this was an excellent opportunity to go further, especially with your kind offer to assist, I will decline.

79. Canuckistani

Torbjörn,

So even if this was an excellent opportunity to go further, especially with your kind offer to assist, I will decline.

Fair enough. I’ve enjoyed our discussion very much!

80. Jonathan Vos Post

Bacteria use Bayes’ Law better than that. And, so far as I know, bacteria do not believe in God.
physorg.com/news96301683.html
How cells deal with uncertainty
Researchers at McGill University have found that cells respond to their ever-changing environment in a way that mimics the optimal mathematical approach to doing so, also known as Bayes’ rule; an application of probability theory. Their findings are published in the April 17 issue of PNAS, the Proceedings of the National Academy of Sciences.
“Biology is seeing a re-birth,” said Dr. Peter Swain, an assistant professor in the Department of Physiology and a Canada Research Chair in Systems Biology, as more researchers are “thinking about the cell using schemes that we know work from engineering and computer science.”
The study was carried out at McGill’s Centre for Nonlinear Dynamics in Physiology and Medicine (CND). Eric Libby, PhD candidate at the CND and lead author on the paper, Dr. Ted Perkins, assistant professor in the School of Computer Science, and Dr. Swain simulated data on a biochemical response mechanism in a strain of E. coli bacteria.
“The ideal mathematical model and the simulation meshed perfectly with Bayes’ rule,” remarked Swain. The bacteria’s collection of genes and proteins that responded to changing environmental conditions acted as a successful Bayesian ‘inference module’, which takes noisy, uncertain information and interprets what it means for the cell.
There are many known schemes for inference that exist in mathematics. This study suggests that cells may have evolved to incorporate the most efficient decision-making abilities into their biochemical pathways.
Quick, accurate cell responses to signals are necessary for survival. When we sense danger, our bodies can tell if the signal is real and trigger the production of adrenaline immediately. However, modeling the effects of a signal on one part of a cell, even in isolation from body tissues and organs, is complicated. “With many drugs, we don’t know how they work or exactly what they are targeting in a cell,” noted Swain. He explained that further study of inference modules could allow us to model more sophisticated cellular behavior, which could one day lead to computerized drug experiments and trials.
Source: McGill University