Kekkei - Advancing Financial Science For Everyone

The Most Powerful Formula in Statistics

Bayes' Theorem is simple to state but profoundly powerful. It's the mathematical foundation of rational belief updating, machine learning, spam filters, medical diagnosis, and even search engines.

At its heart, Bayes' Theorem answers: "I just learned something new. How should I update what I believe?"

This is how science works, how doctors diagnose, how detectives solve crimes, and how you should reason about uncertainty in everyday life.

The Formula

From the definition of conditional probability, we know:

P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{and} \quad P(B|A) = \frac{P(A \cap B)}{P(A)}

Both equal P(A ∩ B), so:

P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

Rearranging gives Bayes' Theorem:

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

In the context of updating beliefs, we write it with more intuitive names:

P(\text{Hypothesis}|\text{Evidence}) = \frac{P(\text{Evidence}|\text{Hypothesis}) \cdot P(\text{Hypothesis})}{P(\text{Evidence})}

What you believe before seeing the evidence. Your baseline probability based on existing knowledge.

How likely the evidence would be if the hypothesis were true. The data-generating process under your theory.

What you should believe after seeing the evidence. Your updated probability incorporating the new information.

Medical Diagnosis Revisited

Let's return to the medical testing example from the previous lesson, but now we'll use Bayes' Theorem explicitly.

Calculating with Bayes

Disease prevalence: P(D) = 0.01 (1% of population)
Test sensitivity: P(+|D) = 0.95 (95% true positive rate)
Test specificity: 90%, so false positive rate: P(+|Healthy) = 0.10

What is P(D|+)? — probability of disease given a positive test?

Using Bayes' Theorem:

P(D|+) = P(+|D) · P(D) / P(+)

We need P(+), the total probability of testing positive: P(+) = P(+|D)·P(D) + P(+|Healthy)·P(Healthy) P(+) = 0.95(0.01) + 0.10(0.99) = 0.0095 + 0.099 = 0.1085

Now: P(D|+) = (0.95)(0.01) / 0.1085 = 0.0095 / 0.1085 ≈ 0.088 = 8.8%

Same answer as before, but now we have the systematic framework.

The denominator P(Evidence) acts as a normalizing constant — it ensures probabilities sum to 1. We often compute it using the law of total probability.

The Base Rate Fallacy

The most common error in probabilistic reasoning is ignoring the prior (the base rate). People focus on P(Evidence|Hypothesis) and forget that P(Hypothesis) matters enormously.

The Taxi Problem

A hit-and-run accident occurs at night. A witness says the taxi was blue. The city has:

85% green taxis, 15% blue taxis
The witness correctly identifies taxi colors 80% of the time (tested under similar conditions)

What's the probability the taxi was actually blue?

Intuitive (wrong) answer: 80% — the witness is 80% reliable.

Bayes' answer:

P(Blue|"Blue") = P("Blue"|Blue) · P(Blue) / P("Blue")

P("Blue") = P("Blue"|Blue)·P(Blue) + P("Blue"|Green)·P(Green) = 0.80(0.15) + 0.20(0.85) = 0.12 + 0.17 = 0.29

P(Blue|"Blue") = 0.12 / 0.29 ≈ 0.41 = 41%

Despite the witness being 80% reliable, the actual probability is only 41% because green taxis are so much more common. The base rate dominates.

In court cases, this fallacy has led to wrongful convictions. "The DNA match is 99.9% accurate" sounds damning, but if you test a million people, you'll get 1,000 false matches. Context matters.

Bayesian vs Frequentist Thinking

Bayes' Theorem represents a fundamentally different philosophy about probability:

Two Philosophies of Probability

Aspect	Frequentist	Bayesian
What is probability?	Long-run frequency	Degree of belief
Parameters	Fixed but unknown	Random variables with distributions
Inference	Based on hypothetical repetitions	Updates beliefs with data
Priors	Not used	Explicitly incorporated
Interpretation	"If we repeated this infinitely..."	"Given what we know, we believe..."

Frequentist example: "There's a 95% confidence interval for the mean." This means: if we repeated the study infinitely, 95% of such intervals would contain the true mean.

Bayesian example: "There's a 95% probability the mean is in this range." This is a direct statement about belief given the data.

The Bayesian approach is often more intuitive and aligns with how humans naturally reason, but it requires choosing priors — which can be controversial when evidence is weak.

Sequential Updates

One beautiful property of Bayes' Theorem: you can update beliefs repeatedly as new evidence arrives.

Today's posterior becomes tomorrow's prior:

Start with Prior₀
Observe Evidence₁ → Calculate Posterior₁
Posterior₁ becomes Prior₁
Observe Evidence₂ → Calculate Posterior₂
Repeat...

This is how scientific knowledge accumulates and how machine learning algorithms learn from data.

Coin Fairness

Is a coin fair? You start with Prior: 50% chance it's fair (heads 50% of time).

Flip 1: Heads Your posterior shifts slightly toward "biased toward heads."

Flip 2: Heads Your belief in bias strengthens.

Flips 3-10: All heads You're now highly confident the coin is biased.

Each flip is weak evidence alone, but the accumulation becomes overwhelming. Bayes' Theorem lets you quantify exactly how much to update after each observation.

Real-World Applications

Bayes' Theorem isn't just theoretical. It powers:

1. Spam Filters P(Spam|"Congratulations, you won!") depends on:

How often spam contains these words (likelihood)
Base rate of spam in your inbox (prior)

2. Medical Diagnosis Doctors (should) combine:

Test accuracy (likelihood)
Disease prevalence (prior)
Patient risk factors (updated prior)

3. Machine Learning Naive Bayes classifiers, Bayesian neural networks, and recommendation systems all update beliefs based on evidence.

4. Search Engines "Given that this document contains these words, what's the probability it's relevant to your query?"

5. Criminal Justice "Given this evidence, what's the probability the defendant is guilty?" (Though this should be done carefully to avoid the prosecutor's fallacy.)

The key insight: Rational belief updating requires combining what you know (prior) with what you observe (likelihood). Ignoring either leads to bad conclusions.

Common Mistakes

1. Confusing P(A|B) with P(B|A) "10% of terrorists have this profile" ≠ "10% of people with this profile are terrorists." The base rate of terrorism is minuscule, so the reverse probability is much lower.

2. Ignoring the prior entirely Focusing only on evidence strength while ignoring how rare the hypothesis is leads to false positives.

3. Using biased priors inappropriately Starting with "I'm 99% sure I'm right" and barely updating with evidence isn't rational updating — it's confirmation bias.

4. Forgetting the denominator P(Evidence) in the denominator accounts for all ways the evidence could occur. Forgetting it breaks the calculation.

Test your knowledge

🧠 Knowledge Check

1 / 5

The Most Powerful Formula in StatisticsFocusStart Focus Mode

The FormulaFocusStart Focus Mode

Medical Diagnosis RevisitedFocusStart Focus Mode

The Base Rate FallacyFocusStart Focus Mode

Bayesian vs Frequentist ThinkingFocusStart Focus Mode

Sequential UpdatesFocusStart Focus Mode

Real-World ApplicationsFocusStart Focus Mode

Common MistakesFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

In Bayes' Theorem, what does the "prior" represent?