Central Limit Theorem

Understand the CLT — the reason sample means are normal, why statistics works, standard error, and the magic of large samples.

22 min read
Intermediate

The Most Important Theorem in Statistics

The Central Limit Theorem (CLT) is the reason statistics works. It's why we can make inferences about populations from samples. It's why the normal distribution shows up everywhere. And it's shockingly simple:

No matter what distribution your data comes from, the average of enough observations will be approximately normally distributed.

This is magic. Your original data could be anything — uniform, exponential, binomial, completely bizarre — but take sample means and you get a bell curve.

The Central Limit Theorem is the reason everyone thinks normal distributions are "normal."
Multiple statisticians

The Theorem

Let X₁, X₂, ..., Xₙ be independent random variables from ANY distribution with mean μ and finite variance σ².

Define the sample mean:

Xˉ=1ni=1nXi\bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i

Then as n → ∞:

Xˉμσ/nN(0,1)\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \rightarrow N(0,1)

Or equivalently:

XˉN(μ,σ2n) approximately\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \text{ approximately}

In plain English:

  • Sample means are approximately normal
  • The mean of the sample means equals the population mean μ
  • The variance of the sample means is σ²/n (decreases with sample size!)
  • This works regardless of the shape of the original distribution

Visualizing the Magic

Dice Rolls

Roll a die. The distribution is uniform (flat, each value 1-6 equally likely):

  • Mean: μ = 3.5
  • Not normal at all!

Now roll 2 dice and take the average. The distribution of averages starts looking more bell-shaped.

Roll 5 dice and average: even more normal.

Roll 30 dice and average: almost perfectly normal!

The original distribution was uniform. The distribution of sample means is normal. That's the CLT.

This is why casinos win: Individual bets are random, but average over thousands of bets and the casino's take becomes highly predictable (near-certain profit). The CLT smooths out randomness.

How Large is Large Enough?

"As n → ∞" sounds abstract. In practice, how big does n need to be?

Rule of thumb:

  • n ≥ 30: Usually sufficient for CLT to kick in (common magic number)
  • n ≥ 15: Often adequate if the original distribution isn't too skewed
  • n ≥ 5: Can work if the original distribution is already close to normal
  • n ≥ 100+: Needed for heavily skewed or heavy-tailed distributions

The key factor: How far from normal is your original distribution?

  • Symmetric distributions: CLT works quickly (small n)
  • Skewed distributions: Need larger n
  • Extreme outliers: Need even larger n
Exponential Distribution

Exponential is heavily right-skewed (waiting times). Original distribution looks nothing like normal.

  • n = 2: Sample mean distribution still skewed
  • n = 10: Starting to look normal
  • n = 30: Quite normal
  • n = 50: Very close to perfect normal

Even from this weird starting point, n=30 gets you close to normality.

Standard Error

The standard deviation of the sample mean distribution:

SE(Xˉ)=σnSE(\bar{X}) = \frac{\sigma}{\sqrt{n}}

This is crucial: As sample size increases, the standard error decreases by √n.

  • n = 4: SE = σ/2 (half the original σ)
  • n = 100: SE = σ/10 (one-tenth!)
  • n = 10,000: SE = σ/100 (one-hundredth!)

This is why large samples are better: Sample means concentrate tightly around μ. Precision increases with √n.

To cut error in half, you need 4× the sample size. Doubling precision is expensive! This is why very precise estimates require huge samples.

Why the CLT Matters

1. Inference about means We can construct confidence intervals and hypothesis tests for means using normal distribution theory, even when the original data isn't normal.

2. Polling and surveys Sample proportion is a sample mean (of 0s and 1s). CLT applies → normal distribution → margin of error calculations.

3. Quality control Manufacturing: average defect rates, average dimensions — all approximately normal due to CLT.

4. Financial models Returns aren't normal, but portfolio returns (averages) are approximately normal.

5. A/B testing Compare average outcomes between groups using normal-based tests, even if individual outcomes aren't normal.

Election Polling

Poll 1,000 voters. Each response is binary (candidate A or B).

Individual response: Bernoulli(p) — definitely not normal!

But sample proportion p̂ = "fraction choosing A" is a sample mean.

By CLT: p̂ ~ N(p, p(1-p)/1000) approximately

This is why pollsters can say "52% ± 3%" — the CLT makes the normal approximation valid.

Sum vs Mean

The CLT applies to sums too (since sum = n × mean):

Sn=i=1nXiN(nμ,nσ2) approximatelyS_n = \sum_{i=1}^{n} X_i \sim N(n\mu, n\sigma^2) \text{ approximately}

For sums: Variance grows with n (more terms = more variability) For means: Variance shrinks with n (averaging reduces variability)

Both are approximately normal for large n.

What the CLT Doesn't Say

Misconception 1: "My data is normal because I have a large sample." Reality: CLT says sample means are normal, not individual observations. If you measure one person's height, n=1000 measurements doesn't make that height normally distributed.

Misconception 2: "CLT works for any statistic." Reality: CLT specifically applies to means (and sums). Medians, variances, and other statistics have their own limiting distributions (often not normal).

Misconception 3: "n=30 always works." Reality: n=30 is a rough rule of thumb. For heavily skewed data, you might need n=50, n=100, or more.

Misconception 4: "CLT applies without independence." Reality: The standard CLT requires independence. Dependent observations (time series, clustered data) need modified versions of the theorem.

The Deep Reason It Works

Why does averaging produce normality?

Averaging smooths out extremes. Individual observations might be all over the place, but their average is stable. High and low values cancel out.

The normal distribution emerges because:

  1. Sums of independent random variables add variances
  2. Many small independent contributions create a distribution determined only by mean and variance
  3. Maximum entropy principle: Given only mean and variance, normal is the "least informative" (most spread out) distribution

The CLT is deeply connected to the law of large numbers: averages converge to the true mean AND become normally distributed.

The normal distribution is a consequence of the fact that errors and deviations are typically the result of many small, independent factors.
William Feller

Test your knowledge

🧠 Knowledge Check
1 / 5

What does the Central Limit Theorem say?