Sorry for the slowness of the blog lately. I finally got myself back onto a semi-regular schedule when I posted about the Adria Richards affair, and that really blew up. The amount of vicious, hateful bile that showed up, both in comments (which I moderated) and in my email was truly astonishing. I’ve written things which pissed people off before, and I’ve gotten at least my fair share of hatemail. But nothing I’ve written before came close to preparing me for the kind of unbounded hatred that came in response to that post.

I really needed some time away from the blog after that.

Anyway, I’m back, and it’s time to get on with some discrete probability theory!

I’ve already written a bit about *interpretations* of probability. But I haven’t said anything about what probability means formally. When I say that the probability of rolling a 3 with a pair of fair six-sided dice is 1/18, how do I know that? Where did that 1/6th figure come from?

The answer lies in something called a *probability space*. I’m going to explain the probability space in frequentist terms, because I think that that’s easiest, but there is (of course) an equivalent Bayesian description.) Suppose I’m looking at a particular experiment. In classic mathematical form, a probability space consists of three components (Ω, E, P), where:

- Ω, called the
*sample space*, is a set containing all possible outcomes of the experiment. For a pair of dice, Ω would be the set of all possible rolls: {(1,1), (1,2), (1,3), (1,4), (1,5), (1, 6), (2,1), …, (6, 5), (6,6)}. -
*E*is an equivalence relation over Ω, which partitions Ω into a set of*events*. Each event is a set of outcomes that are equivalent. For rolling a pair of dice, an event is a total – each event is the set of outcomes that have the same total. For the event “3” (meaning a roll that totalled three), the set would be {(1, 2), (2, 1)}. -
*P*is a*probability assignment*. For each event*e*in*E*,*P(e)*is a value between 0 and 1, where:(That is, the sum of the probabilities of all of the possible events in the space is exactly 1.)

The probability of an event *e* being the outcome of a trial is *P(e)*.

So the probability of any particular event as the result of a trial is a number between 0 and 1. What’s it mean? If the probability of event *e* is *p*, then if we repeat the trial *N* times, we expect *N*p* of those trials to have *e* as their result. If the probability of *e* is 1/4, and we repeat the trial 100 times, we’d expect *e* to be the result 25 times.

But in an important sense, that’s a cop-out. We’ve defined probability in terms of this abstract model, where the third component is the probability. Isn’t that circular?

Not really. For a given trial, we create the probability assignment by observation and/or analysis. The important point is that this is really just a bare minimum starting point. What we really care about in probability isn’t the change associated with a single, simple, atomic event. What we want to do is take the probability associated with a group of single events, and use our understanding of that to allow us to explore a complex event.

If I give you a well-shuffled deck of cards, it’s easy to show that the odds of drawing the 3 of diamonds is 1/52. What we want to do with probability is things like ask: What are the odds of being dealt a flush in a poker hand?

The construction of a probability space gives us a well-defined platform to use for building probabilistic models of more interesting things. Give a probability space of two single dice, we can combine them together to create the probability space of the two dice rolled together. Given the probability space of a pair of dice, we can construct the probability space of a game of craps. And so on.

Paul C. AnagnostopoulosWhere did the figure 1/6th come from in the last sentence of paragraph 4?

~~ Paul

KrisJGood to see you back. I think you’ve done a great job ignoring the hate and getting on with some good old fashioned maths.

Kyle SzklenskiAgreed with KrisJ. And I’m really excited about this series of posts…for whatever reason, in school, no one really explained the details of statistics, just the very light, “Here’s how you do it” part. I’ve never been one to just pick up something without knowing why it works the way it does.

OsirisGreat to see you are back. I really look forward to your posts. This is one of the best blogs on the internet!

OsirisAlso, I forwarded your blog post about PyCon to everyone in management at my company. I received multiple thanks you responses. I think your view is honest and refreshing.

Catherine AsaroI’m glad to see you back, too. After I read this post, I went back and read the other post you referred to, the one that led to the explosion of negative mail. And I can say this much: your post on the Adria Richards affair is brilliant.

That post is one of the best that I’ve seen on the subject. On the one hand, it is discouraging to hear that you have received such a virulent negative response; on the other hand, the presence of the post itself and how well you talk about the subject matter is a sign that times, as they say, are a’changing.

Please hang in there. Don’t let the bad comments get you down. You have support.

Will DouglasWelcome back, MarkCC! I was worried you had gotten sick, or something. I read your A.R. post, agreed with it mostly, disagreed some, and kinda thought you might be bearding the lion. That is kind of what you do, in your debunking posts, though most of them are cubs. I am only one of many who applaud you for it. Too bad you had to take all that heat. ut tranquillitas et gererent, dog!

Will douglasSorry, wow, my ancient East LA street latin sucks. That sounds like some kind of command or put down. Not meant that way at all. What I meant to say was “be calm, please keep doing what you are doing, you are very good at it, and I and many other people appreciate it very much.” Mea culpa, deprecarenti.

Will Douglas

jjsocratesHow many sides does a 6-sided die have?

Paul C. AnagnostopoulosOh, you think the 1/6th refers to the probability of rolling any given side of a die? Because Mark wrote “that 1/6th figure,” it sounded as if he was referring to the previous sentence, in which the figure is 1/18. I assumed he’d either made a simple typo or I was missing something important. My confusion was strengthened by the fact that he doesn’t answer the question of where 1/6 comes from, but does explain where 1/18 comes from. All clear now.

~~Paul

gauthma«Isn’t that circular?

Not really. For a given trial, we create the probability assignment by observation and/or analysis.»

This is still a cop-out, you’re just using the frequencist interpretation of probability. For example, say you throw a fair coin, one hundred times, and you *always* get heads. Creating the probability assignment based on this observation would mean that the probability of heads is one (and thus of tails is zero). The only way out of this is to say that such an outcome as above described is, wait for it… improbable.

Right? Or am I misinterpreting what you meant with “observation and/or analysis”?

MarkCCPost authorThat’s not frequentist. In fact, I’d argue that it’s bayesian.

You start with the assumption that there are two equally likely outcomes. Then based on observation, you update your priors. If 100 flips produce 100 heads, then wouldn’t a bayesian interpretation agree that the priors should be updated to reflect it?

gauthmaYou’re right in that I did assume the coin was fair. But suppose you repeat the experiment without that assumption, i.e. you are given a coin, about which you know nothing, and are tasked with determining the probabilities of each of the two events, and have no prior expectations. If you do it by flipping it a huge number of times, and recording your results, then, whether you interpret it (i.e. define “probability”) the frequentist or the bayesian way, you’re always relying on some pre-existing notion of what “probability” means.

As far as I know (and this was my original point), both definitions (or “interpretations” if you will) of probability suffer from the problem of making use of that which they purport to define — in other words, it’s *always* a cop-out. Which is why we define it axiomatically. To quote from Wikipedia: «The mathematics of probability can be developed on an entirely axiomatic basis that is independent of any interpretation» — https://en.wikipedia.org/wiki/Probability_interpretations#Axiomatic_probability

That’s why the particular way in which you phrased it gave me an itch. The way I understand it is that probability theory (along with most math) is developed from a set of axioms, and the reason it’s useful it’s because it allows you to build a useful model of real situations.