## Tuesday, September 17, 2013

### my favorite surprise

I’m excited about tomorrow. Tomorrow in my probability class, we’re going to start discussing Bayes’ Formula. This is the main thing I remember about my college probability class. While we have already seen some surprising results in particular cases where the rules of probability are applied, this is, to me, the first truly surprising general result. It changes everything I think about probability.

Here’s my motivating example: suppose we have before us an urn that contains five blue balls and five red balls. We draw two balls, in order, and record their colors. (To be clear, this is sampling without replacement.) What is the probability that the first ball is red? “Well,” you say, “evidently the likelihood of that is 1/2, because half of the balls are red.” “Very well,” I say, “then what is the probability that the first ball is red, assuming that the second ball is blue?” “What does that have to do with it?” you ask. “When the second ball is drawn, the first one has already been chosen, so how could the second ball turning up blue have anything to do with the probability that the first ball is red?”

Let’s throw some formulas in here. Suppose $E$ and $F$ are two events in the sample space $S$ of an experiment. (I discussed the definitions of these terms in my previous post.) The conditional probability of $E$ given $F$, written $P(E\mid F)$, is the quotient $\dfrac{P(E \cap F)}{P(F)}$, meaning, loosing speaking, that we consider all the ways both $E$ and $F$ can occur (weighted by their individual probabilities), and think of this as just a subset of the outcomes where $F$ occurs (rather than all of $S$). “Sensible enough,” (I hope) I hear you say. Now, you will hopefully also agree that we can split $F$ into two parts: the one that intersects $E$ and the one that does not, i.e., $F = (F \cap E) \cup (F \cap E^c)$. “Aren’t you overcomplicating things?” you demur. “Just wait,” I plead. Because the events $F \cap E$ and $F \cap E^c$ are mutually exclusive (i.e., disjoint), and so we have $P(F) = P(F \cap E) + P(F \cap E^c)$. Interesting, no? So we can write $P(E \mid F) = \frac{P(E \cap F)}{P(E \cap F) + P(E^c \cap F)}$ (using the fact that $E \cap F = F \cap E$). And now perhaps it seems like this manipulation isn’t so weird, because in our “motivating case”, each of the terms in that expression isn’t so hard to compute, and in fact one of them appears twice!

So what happens? Let’s return to our urn and say $E$ is the event “the first ball is red”, while $F$ is the event “the second ball is blue”. Then $P(E \cap F) = \big(\frac{5}{10}\big)\big(\frac{5}{9}\big)$ and $P(E^c \cap F) = \big(\frac{5}{10}\big)\big(\frac{4}{9}\big)$, so $P(E \mid F) = \frac{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)}{\big(\frac{5}{10}\big)\big(\frac{5}{9}\big)+\big(\frac{5}{10}\big)\big(\frac{4}{9}\big)} = \frac{25}{25+20} = \frac{5}{9}.$ Since 5/9 > 1/2, it is more likely that the first ball was red if we know that the second ball is blue! (Surprised? Think about what happens if there are only two balls to begin with, one blue and one red. Once that’s sunk in, try the above again starting with $m$ blue and $n$ red balls in the urn.)

So far I’m cool with everything that’s happened. The realization that later events provide information about earlier ones is a bit of a jolt, but not so far-fetched after a little reflection. Bayes, however, endeavors to turn our minds further inside-out. We just need one new idea, just as simple as everything we’ve done to this point: the equation for conditional probability can be rewritten as $P(E \cap F) = P(F) \cdot P(E \mid F)$. And of course, because $E \cap F = F \cap E$, we could just as well write $P(E \cap F) = P(E) \cdot P(F \mid E)$. Now, as before, let’s split $F$ into $E \cap F$ and $E^c \cap F$. Using our most recent observation, we have $P(E \mid F) = \frac{P(E) \cdot P(F \mid E)}{P(E) \cdot P(F \mid E) + P(E^c) \cdot P(F \mid E^c)}.$ “Now why on Earth…?” you splutter, to which I reply, “Because sometimes the knowledge you have is more suited to computing the conditional probabilities on the right than finding the one on the left directly from the definition.”

Here’s a classic example. Suppose there is an uncommon illness that occurs in the general population with probability 0.005 (half of percent). Suppose further that there is a medical test for this affliction that is 99% accurate. That is, 99% percent of the time the test is used on a sick patient, the test returns positive, and 99% of the time it is used on a healthy patient, it returns negative. You are concerned that you might have this illness, and so you have the test. It comes back positive. What is the probability that you have the illness?

Do you see where this is going? You’re interested (well, we both are, really, because I care about your health) in the event $E$ “I have the illness.” The information we have, though, is that the event $F$ “the test came back positive” occurred. And what we know about the test is how its results depend on the patient being sick or well. That is, we know $P(F \mid E)$ and $P(F \mid E^c)$, and fortunately we also know $P(E)$ (ergo we also know $P(E^c) = 1 - P(E)$). We can compute the likelihood of your being ill as $P(E \mid F) = \frac{(0.005)(.99)}{(0.005)(0.99) + (0.995)(0.01)} \approx 0.3322.$ Far from it being a certainty that you have this particular illness, your chances are better than 2 in 3 that you don’t! Even if the illness were twice as common and occurred in 1% of the population, your chances of being sick are only 1 in 2 after the test comes back positive. (Notice that this probability—this conditional probability—depends not only on the efficacy of the test, but also on the prevalence of the illness.)

And that’s my favorite surprise in probability.

(If you haven’t read it yet, you should go look at the article Steven Strogatz wrote for the New York Times about Bayes’ Theorem, in which he makes it seem—somewhat—less surprising.)