Bayes' Theorem
Master Bayes' Theorem, the mathematical foundation of rational belief updating, priors vs posteriors, base rate fallacy, and Bayesian vs frequentist thinking.
The Most Powerful Formula in Statistics
Bayes' Theorem is simple to state but profoundly powerful. It's the mathematical foundation of rational belief updating, machine learning, spam filters, medical diagnosis, and even search engines.
At its heart, Bayes' Theorem answers: "I just learned something new. How should I update what I believe?"
This is how science works, how doctors diagnose, how detectives solve crimes, and how you should reason about uncertainty in everyday life.
The Formula
From the definition of conditional probability, we know:
Both equal P(A ∩ B), so:
Rearranging gives Bayes' Theorem:
In the context of updating beliefs, we write it with more intuitive names:
What you believe before seeing the evidence. Your baseline probability based on existing knowledge.
How likely the evidence would be if the hypothesis were true. The data-generating process under your theory.
What you should believe after seeing the evidence. Your updated probability incorporating the new information.
Medical Diagnosis Revisited
Let's return to the medical testing example from the previous lesson, but now we'll use Bayes' Theorem explicitly.
- Disease prevalence: P(D) = 0.01 (1% of population)
- Test sensitivity: P(+|D) = 0.95 (95% true positive rate)
- Test specificity: 90%, so false positive rate: P(+|Healthy) = 0.10
What is P(D|+)? — probability of disease given a positive test?
Using Bayes' Theorem:
P(D|+) = P(+|D) · P(D) / P(+)
We need P(+), the total probability of testing positive: P(+) = P(+|D)·P(D) + P(+|Healthy)·P(Healthy) P(+) = 0.95(0.01) + 0.10(0.99) = 0.0095 + 0.099 = 0.1085
Now: P(D|+) = (0.95)(0.01) / 0.1085 = 0.0095 / 0.1085 ≈ 0.088 = 8.8%
Same answer as before, but now we have the systematic framework.
The denominator P(Evidence) acts as a normalizing constant — it ensures probabilities sum to 1. We often compute it using the law of total probability.
The Base Rate Fallacy
The most common error in probabilistic reasoning is ignoring the prior (the base rate). People focus on P(Evidence|Hypothesis) and forget that P(Hypothesis) matters enormously.
A hit-and-run accident occurs at night. A witness says the taxi was blue. The city has:
- 85% green taxis, 15% blue taxis
- The witness correctly identifies taxi colors 80% of the time (tested under similar conditions)
What's the probability the taxi was actually blue?
Intuitive (wrong) answer: 80% — the witness is 80% reliable.
Bayes' answer:
P(Blue|"Blue") = P("Blue"|Blue) · P(Blue) / P("Blue")
P("Blue") = P("Blue"|Blue)·P(Blue) + P("Blue"|Green)·P(Green) = 0.80(0.15) + 0.20(0.85) = 0.12 + 0.17 = 0.29
P(Blue|"Blue") = 0.12 / 0.29 ≈ 0.41 = 41%
Despite the witness being 80% reliable, the actual probability is only 41% because green taxis are so much more common. The base rate dominates.
In court cases, this fallacy has led to wrongful convictions. "The DNA match is 99.9% accurate" sounds damning, but if you test a million people, you'll get 1,000 false matches. Context matters.
Bayesian vs Frequentist Thinking
Bayes' Theorem represents a fundamentally different philosophy about probability:
Aspect | Frequentist | Bayesian |
|---|---|---|
| What is probability? | Long-run frequency | Degree of belief |
| Parameters | Fixed but unknown | Random variables with distributions |
| Inference | Based on hypothetical repetitions | Updates beliefs with data |
| Priors | Not used | Explicitly incorporated |
| Interpretation | "If we repeated this infinitely..." | "Given what we know, we believe..." |
Frequentist example: "There's a 95% confidence interval for the mean." This means: if we repeated the study infinitely, 95% of such intervals would contain the true mean.
Bayesian example: "There's a 95% probability the mean is in this range." This is a direct statement about belief given the data.
The Bayesian approach is often more intuitive and aligns with how humans naturally reason, but it requires choosing priors — which can be controversial when evidence is weak.
Sequential Updates
One beautiful property of Bayes' Theorem: you can update beliefs repeatedly as new evidence arrives.
Today's posterior becomes tomorrow's prior:
- Start with Prior₀
- Observe Evidence₁ → Calculate Posterior₁
- Posterior₁ becomes Prior₁
- Observe Evidence₂ → Calculate Posterior₂
- Repeat...
This is how scientific knowledge accumulates and how machine learning algorithms learn from data.
Is a coin fair? You start with Prior: 50% chance it's fair (heads 50% of time).
Flip 1: Heads Your posterior shifts slightly toward "biased toward heads."
Flip 2: Heads Your belief in bias strengthens.
Flips 3-10: All heads You're now highly confident the coin is biased.
Each flip is weak evidence alone, but the accumulation becomes overwhelming. Bayes' Theorem lets you quantify exactly how much to update after each observation.
Real-World Applications
Bayes' Theorem isn't just theoretical. It powers:
1. Spam Filters P(Spam|"Congratulations, you won!") depends on:
- How often spam contains these words (likelihood)
- Base rate of spam in your inbox (prior)
2. Medical Diagnosis Doctors (should) combine:
- Test accuracy (likelihood)
- Disease prevalence (prior)
- Patient risk factors (updated prior)
3. Machine Learning Naive Bayes classifiers, Bayesian neural networks, and recommendation systems all update beliefs based on evidence.
4. Search Engines "Given that this document contains these words, what's the probability it's relevant to your query?"
5. Criminal Justice "Given this evidence, what's the probability the defendant is guilty?" (Though this should be done carefully to avoid the prosecutor's fallacy.)
The key insight: Rational belief updating requires combining what you know (prior) with what you observe (likelihood). Ignoring either leads to bad conclusions.
Common Mistakes
1. Confusing P(A|B) with P(B|A) "10% of terrorists have this profile" ≠ "10% of people with this profile are terrorists." The base rate of terrorism is minuscule, so the reverse probability is much lower.
2. Ignoring the prior entirely Focusing only on evidence strength while ignoring how rare the hypothesis is leads to false positives.
3. Using biased priors inappropriately Starting with "I'm 99% sure I'm right" and barely updating with evidence isn't rational updating — it's confirmation bias.
4. Forgetting the denominator P(Evidence) in the denominator accounts for all ways the evidence could occur. Forgetting it breaks the calculation.