Reassuring Statistics
One of the cooler, and more counter-intuitive, bits of statistics I
know of concerns the question: “If your doctor performs a 95% reliable
test on you, and it says you have a disease, how worried should you
be?” (Spoiler alert: not as much as you think.)
To start out with, you either have the disease or you don’t; and the
test either returns the correct result or it doesn’t. So that makes
four possibilities, as outlined in this illustration:

Now, one of the first things you’ll notice is that that illustration
isn’t to scale: I assumed at the beginning that the test is 95%
reliable. So the right-hand column, the one marked “Test fails”,
should be a lot narrower. Also, if you’re talking about a
life-threatening disease like colon cancer or tuberculosis, far less
than half the population has that particular disease. So the top row,
the one marked “Sick” should be a lot skinnier as well. Let’s say that
one person in a hundred has this disease.
(Note that it’s also possible to have an asymmetric test, one that’s
95% reliable if you’re healthy, but only 50% reliable if you’re sick.
I’ll ignore this for simplicity.)
So that gives us this:

In other words, if you picked a random person off the street and
administered the test, that would be like throwing a dart at the
picture above: most would fall in the large green area (they’re not
sick, and the test says so), and only a few would fall in the yellow,
orange, or red areas.
But of course you don’t care about the health of the average person on
the street, you care about whether you have the disease.
Especially since the test has shown that you have the disease! So
let’s look at that.
If the test comes back positive, that means one of two things: either
you’re sick and the test worked (your dart fell in the orange area),
or else you’re healthy, and the test gave the wrong result (your dart
fell in the yellow area).
What are the probabilities for these cases? Well, let’s assume that
the entire diagram represents one million people. 99% (1,000,000
× 0.99 = 990,000 people) of them are healthy, and of those, 5%
will get a positive result. 1,000,000 × 0.99 × 0.05 =
49,500 people in the yellow area.
1% of the original one million people are sick, and of those, 95% will
get a positive result on the test. 1,000,000 × 0.01 × 0.95
= 9,500 people.
That gives us a total of 49,500 + 9,500 = 59,000 people out of the
original million who got a positive result. 49,500 of them are
healthy, and 9,500 are sick. What’s the likelihood that you’re one of
the sick people? 9,5000/59,000 = 0.16, or 16%, or roughly one chance
in six.
In the next illustration, I’ve cut up the orange and yellow areas and
put them side by side (but not resized anything) to show the contrast.

This illustrates the fact that the probability of A given B
(p(A|B)) is not the same as the probability of B given A
(p(B|A)): the probability of getting a positive result if
you’re sick is 0.95, but the probability of being sick given that
you’ve gotten a positive result is only 0.16.
This brings up the next question: you’ve gotten a positive result on
the first test, so your doctor recommends doing a second, different
test to make sure. If the second test comes back positive as well, how
much should you worry?
First of all, notice that the conditions have changed: in the general
population, only 1% of people have the disease, but we’re not looking
at the general population anymore; we’re looking only at those people
who tested positive on the first test, and 16% of those have the
disease.
So let’s say you tested positive on both the first and second tests.
How can this happen? Recall that we narrowed down the original
population of one million down to 59,000 people, 9,500 of whom are
sick, and 49,500 of whom are healthy.
Let’s say that the second test is also 95% accurate (and completely
independent of the first test). 95% of the sick people will get a
second positive result: 9,500 × 0.95 = 9025 people. 5% of
the healthy people will also get a positive result: 49,500 ×
0.05 = 2475 people. So given that you’ve gotten two positive results,
the odds of actually being sick are 9025 / (9025 + 2475) = 0.78. At
this point, it’s definitely time to worry.
Of course, I just made up the numbers above for purposes of
illustration. If the test is less than 95% accurate, or if fewer than
1% of people have the disease, you’re going to get more false
positives.
This applies in other areas as well, such as airport security
screening.
According to the FAA,
over 762 million people flew out of US airports in 2007, or roughly 2
million people per day.
Let’s say that Al Qaeda had tried to pull another 9/11, but with ten
planes instead of the original four, and five people per plane. That
means the TSA is looking for 20 people out of 2,000,000, or 0.001% of
passengers on that day. So even if the screening procedure is 99.9%
effective, that still means 2000 false alarms (and 20 captured
terrorists), so 99% of all people flagged as suspect are innocent.
And that’s on that one day when there’s a massive terrorist attack.
Most days there are no hijackings, which skews the numbers even more.
So the vast majority of times when someone is flagged, it’s a false
alarm.
Another example of where this is extremely important is polygraph examinations. They may claim 95% accuracy (which, BTW, is a meaningless number given that the accuracy of such a test is described by an ROC curve, and even one operating point on that curve has two measures of accuracy), but there are a lot of situations where that number is simply meaningless. A counterintelligence screening polygraph for a security clearance is a classic example: What percentage of applicants for a security clearance actually are spies? It works out to a system where wild accusations are commonplace. The telling part of it is when they deny your application because you failed the counterintelligence poly, they don’t bother to follow up on any of your alleged spy activities. Why? While you may be infinitesimally more likely to be a security threat than somebody who didn’t fail the screening, you’re still pretty darned unlikely to be worthy of investigation.
“I’ve conclusively shown that you’re a foreign agent, a drug dealer, a habitual drug user, and you’ve been involved in at least one murder. While it may seem like it would be a good idea for us to investigate these things, we won’t. Good day.” Counterintuitive indeed.