# Combining Non-Disjoint Probabilities

In my previous post on probability, I talked about how you need to be careful about covering cases. To understand what I mean by that, it’s good to see some examples.

And we can do that while also introducing an important concept which I haven’t discussed yet. I’ve frequently talked about independence, but equally important is the idea of disjointness.

Two events are independent when they have no ability to influence one another. So two coin flips are independent. Two events are disjoint when they can’t possibly occur together. Flipping a coin, the event “rolled a head” and the event “rolled a tail” are disjoint: if you rolled a head, you can’t roll a tail, and vice versa.

So let’s think about something abstract for a moment. Let’s suppose that we’ve got two events, A and B. We know that the probability of A is 1/3 and the probability of B is also 1/3. What’s the probability of A or B?

Naively, we could say that it’s P(A) + P(B). But that’s not necessarily true. It depends on whether or not the two events are disjoint.

Suppose that it turns out that the probability space we’re working in is rolling a six sided die. There are three basic scenarios that we could have:

1. Scenario 1: A is the event “rolled 1 or 2”, and B is “rolled 3 or 4”. That is, A and B are disjoint.
2. Scenario 2: A is the event “rolled 1 or 2”, and B is “rolled 2 or 3”. A and B are different, but they overlap.
3. Scenario 3: A is the event “rolled 1 or 2”, and B is the event “rolled 1 or 2”. A and B are really just different names for the same event.

In scenario one, we’ve got disjoint events. So P(A or B) is P(A) + P(B). One way of checking that that makes sense is to look at how the probability of events work out. P(A) is 1/3. P(B) is 1/3. The probability of neither A nor B – that is, the probability of rolling either 5 or 6 – is 1/3. The sum is 1, as it should be.

But suppose that we looked at scenario 2. If we made a mistake and added them as if they were disjoint, how would things add up? P(A) is 1/3. P(B) is 1/3. P(neither A nor B) = P(4 or 5 or 6) = 1/2. The total of these three probabilities is 1/3 + 1/3 + 1/2 = 7/6. So just from that addition, we can see that there’s a problem, and we did something wrong.

If we know that A and B overlap, then we need to do something a bit more complicated to combine probabilities. The general equation is:

$P(A cup B) = P(A) + P(B) - P(A cap B)$

Using that equation, we’d get the right result. P(A) = 1/3; P(B) =
1/3; P(A and B) = 1/6. So the probability of A or B is 1/3 + 1/3 – 1/6 = 1/2. And P(neither A nor B) = P(4 or 5 or 6) = 1/2. The total is 1, as it should be.

From here, we’ll finally start moving in to some more interesting stuff. Next post, I’ll look at how to use our probability axioms to analyze the probability of winning a game of craps. That will take us through a bunch of applications of the basic rules, as well as an interesting example of working through a limit case.

And then it’s on to combinatorics, which is the main tool that we’ll use for figuring out how many cases there are, and what they are, which as we’ve seen is an essential skill for probability.

# Weekend Recipes: Chicken Wings with Thai Chile Sauce

In my house, chicken wings are kind of a big deal. My wife doen’t know how to cook. Her cooking is really limited to two dishes: barbecued chicken wings, and grilled cheese. But her chicken wings are phenomenal. We’ve been married for 20 years, and I haven’t found a wing recipe that had the potential to rival hers.

Until now.

I decided to try making a homemade thai sweet chili sauce, and use that on the wings. And the results were fantastic. Still not quite up there with her wings, but I think this recipe has the potential to match it. This batch of wings was the first experiment with this recipe, and there were a couple of things that I think should be changed. I wet-brined the wings, and they ended up not crisping up as well as I would have liked. So next time, I’ll dry-brine. I also crowded them a bit too much on the pan.

When you read the recipe, it might seem like the wings are being cooked for a long time. They are, but that’s a good thing. Wings have a lot of fat and a lot of gelatin – they stand up to the heat really well, and after a long cooking time they just get tender and their flavor concentrates. They don’t get tough or stringy or anything nasty like a chicken breast would cooked for this long.

The Sauce

The sauce is a very traditional thai sweet chili. It’s a simple sauce, but it’s very versatile. It’s loaded with wonderful flavors that go incredibly well with poultry or seafood. Seriously delicious stuff.

• 1 cup sugar.
• 1/2 cup rice vinegar.
• 1 1/2 cup water.
• 1 teaspoon salt.
• 2 tablespoons fish sauce.
• Finely diced fresh red chili pepper (quantity to taste)
• 5 large cloves garlic, finely minced.
• 1/2 teaspoon minced ginger.
• 1 tablespoon of cornstarch, mixed with water.
1. Put the sugar, salt, vinegar, water, and fish sauce into a pot, and bring to a boil.
2. Add the garlic, ginger, and chili pepper. Lower the heat, and let it simmer for a few minutes.
3. Leave the sauce sitting for about an hour, to let the flavors of the spices infuse into the sauce.
4. Taste it. If it’s not spicy enough, add more chili pepper, and simmer for another minute or two.
5. Bring back to a boil. Remove from heat, and mix in the cornstarch slurry. Then return to the heat, and simmer until the starch is cooked and the sauce thickens.

The sauce is done.

The wings

• About an hour before you want to start cooking, you need to dry-brine the wings. Spread the wings on a baking sheet. Make a 50-50 mixture of salt and sugar, and sprinkle over the wings. Coat both sides. Let the wings sit on the sheet for an hour. After they’ve sat in the salt for an hour, rinse them under cold water, and pat them dry.
• Lightly oil a baking sheet. Put the wings on the sheet. You don’t want them to be too close together – they’ll brown much better if they have a bit of space on the sides.
• Put the baking sheet full of wings into a 350 degree oven. After 30 minutes, turn them over, and back for another 30 minutes.
• Now it’s time to start with the sauce! With a basting brush, cover the top side with the sweet chile sauce. Then turn the wings over, and coat the other side. Once they’re basted with the sauce, it’s back into the oven for another 30 minutes.
• Again, baste both sides, and then back into the oven for another 30 minutes with the second side up.
• Take the wings out, turn the oven up to 450. Baste the wings, and then put them back in until they turn nice and brown on top. Then turn them, baste them again, and brown the other side.
• Time to eat!

# Correction, Sigma Algebras, and Mass functions

So, I messed up a bit in the previous post. Let me get that out of the way before we move forward!

In measure theory, you aren’t just working with sets. You’re working with something called σ-algebras. It’s a very important distinction.

The problem is, our intuition of sets doesn’t always work. Sets, as defined formally, are really pretty subtle. We expect certain things to be true, because they make sense. But in fact, they are not implied by the definition of sets. A σ-algebra is, essentially, a well-behaved set – a set whose behavior matches our usual expectations.

To be formal, a sigma algebra over a set S is a collection Σ of subsets of S such that:

1. Σ is closed over set complement.
2. Σ is closed over countable union.

The reason why you need to make this restriction is, ultimately, because of the axiom of choice. Using the axiom of choice, you can create sets which are unmeasurable. They’re clearly subsets of a measurable set, and supersets of other measurable sets – and yet, they are, themselves, not measurable. This leads to things like the Banach-Tarski paradox: you can take a measurable set, divide it into non-measurable subsets, and then combine those non-measurable subsets back into measurable sets whose size seem to make no sense. You can take a sphere the size of a baseball, slice it into pieces, and then re-assemble those pieces into a sphere the size of the earth, without stretching them!

These non-measurable sets blow away our expectations about how things should behave. The restriction to σ algebras is just a way of saying that we need to be working in a space where all sets are measurable. When we’re looking at measure theory (or probability theory, where we’re building on measures), we need to exclude non-measurable sets. If we don’t, we’re seriously up a creek without a paddle. If we allowed non-measurable sets, then the probability theory we’re building would be inconsistent, and that’s the kiss of death in mathematics.

Ok. So, with that out of the way, how do we actually use Kolmogorov’s axioms? It all comes down to the idea of a sample space. You need to start with an experiment that you’re going to observe. For that experiment, there are a set of possible outcomes. The set of all possible outcomes is the sample space.

Here’s where, sadly, even axiomatized probability theory gets a bit handwavy. Given the sample space, you can define the structure of the sample space with a function, called the probability mass function, f, which maps each possible event in the sample space to a probability. To be a valid mass function for a sample space S, it’s got to have the following properties:

1. For each event e in S, f(e) ≥ 0 and f(e) <= 1..
2. The sum of the probabilities in the sample space must be 1: $Sigma_{e in S} f(e) = 1$

So we wind up with a sort of circularity: in order to describe the probability of events, we need to start by knowing the probability of events. In fact, this isn’t really a problem: we’re talking about taking something than we observe in the real world, and mapping it into the abstract space of math. Whenever we do that, we need to take our observations of the real world and create an approximation as a mathematical model.

The point of probability theory isn’t to do that primitive mapping. In general, we already understand how rolling a single die works. We know how it should behave, and we know how and why its actual behavior can vary from our expectation. What we want to know is really how many events combine.

We don’t need any special theory to figure out what the probability of rolling a 3 on a six-sided die is: that’s easy, and it’s obvious: it’s 1 in 6. But what’s the probability of winning a game of craps?

If all days of the year 2001 are equally likely, then we don’t need anything fancy to ask what the probability of someone born in 2001’s birthday being July 21st. It’s easy: 1 in 365. But if I’ve got a group of 35 people, what’s the probability of two of them sharing the same birthday?

Both of those questions start with the assignment of a probability mass function, which is trivial. But they involve combining the probabilities given by those mass functions, and use them with Kolmogorov’s axioms to figure out the probabilities of the complicated events.