The Normal Distribution

Master the bell curve: normal distribution properties, Z-scores, the empirical rule, standard normal, and why normality appears everywhere in statistics.

24 min read
Intermediate

The Most Important Distribution

The normal distribution — the bell curve — is everywhere. Heights, test scores, measurement errors, stock returns, and countless natural phenomena all follow (approximately) normal distributions.

But here's what makes it truly special: even when individual observations aren't normally distributed, averages of those observations are. This is the Central Limit Theorem, and it's why the normal distribution dominates statistics.

The Normal Distribution

Parameters:

  • μ (mu): mean — the center of the distribution
  • σ² (sigma squared): variance — spread around the mean
  • σ: standard deviation

PDF:

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Don't memorize this formula. Instead, understand what it tells you:

  • The shape is determined entirely by μ and σ
  • It's symmetric around μ
  • It extends from -∞ to +∞ (though values far from μ are extremely rare)
  • About 68% of values fall within [μ-σ, μ+σ]
Human Heights

Adult male heights (in cm) approximately follow N(175, 7²):

  • Mean height: μ = 175 cm
  • Standard deviation: σ = 7 cm

What this means:

  • About 68% of men are between 168-182 cm (μ ± σ)
  • About 95% are between 161-189 cm (μ ± 2σ)
  • About 99.7% are between 154-196 cm (μ ± 3σ)

Someone who is 196 cm (6'5") is about 3 standard deviations above average — very tall but not impossible.

The Empirical Rule (68-95-99.7)

For ANY normal distribution:

The Empirical Rule
Range
Probability
Interpretation
μ ± σ68%About 2/3 of data
μ ± 2σ95%Most data
μ ± 3σ99.7%Nearly all data

This is why σ matters: It tells you the typical spread. One standard deviation captures the "normal range."

Outliers: Values more than 3σ away from μ occur less than 0.3% of the time. If you see them frequently, your data probably isn't normal.

Memorize this rule. It's one of the most practical facts in statistics. Knowing just μ and σ immediately tells you where ~95% of data falls.

The Standard Normal Distribution Z

The standard normal distribution has μ = 0 and σ = 1. We denote it Z ~ N(0,1).

This is the reference distribution for all normal distributions. Every normal distribution can be transformed into it.

If X ~ N(μ, σ²), then:

Z=XμσZ = \frac{X - \mu}{\sigma}

Z ~ N(0,1) — a standard normal variable.

What Z tells you: How many standard deviations X is away from the mean.

  • Z = 0: exactly at the mean
  • Z = 1: one standard deviation above
  • Z = -2: two standard deviations below
SAT Scores

SAT scores: μ = 1000, σ = 200.

You score 1300. What's your Z-score?

Z = (1300 - 1000) / 200 = 1.5

You scored 1.5 standard deviations above average.

From standard normal tables (or calculators): P(Z > 1.5) ≈ 0.067

You scored better than about 93% of test-takers.

Z-scores are universal. A Z-score of 2 means "95th percentile" whether you're measuring heights, test scores, or stock returns. Standardization makes different distributions comparable.

Using Normal Tables

Before computers, statisticians used Z-tables — lookup tables for P(Z < z).

How to use them:

  1. Standardize your variable to get Z
  2. Look up Z in the table to get P(Z < z)
  3. Use symmetry and complements for other probabilities

Key facts for standard normal:

  • P(Z < 0) = 0.5 (half below the mean)
  • P(Z < 1.96) ≈ 0.975 → P(-1.96 < Z < 1.96) ≈ 0.95
  • P(Z < 2.58) ≈ 0.995 → P(-2.58 < Z < 2.58) ≈ 0.99

These numbers show up constantly: 1.96 for 95% confidence, 2.58 for 99% confidence.

Why Everything Seems Normal

The normal distribution shows up so often because:

1. Central Limit Theorem (next lesson) Averages of random variables tend toward normal, even if the original variables aren't normal.

2. Sum of independent normals is normal If X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²) are independent, then: X + Y ~ N(μ₁+μ₂, σ₁²+σ₂²)

3. Many natural processes are additive Height depends on many genes, each contributing a small amount. Test scores depend on many small skills. The sum of many small independent effects → normal.

4. Maximum entropy Given only mean and variance, the normal distribution has maximum entropy (least additional assumptions).

But not everything is normal! Income (power law), stock returns (fat tails), count data (Poisson), proportions (Beta). Assuming normality when it doesn't hold leads to bad inferences.

Checking Normality

How do you know if your data is approximately normal?

1. Histogram / Density Plot Should look roughly bell-shaped and symmetric.

2. Q-Q Plot (Quantile-Quantile) Plot sample quantiles against theoretical normal quantiles. Should be roughly a straight line.

3. Numerical Tests Shapiro-Wilk test, Kolmogorov-Smirnov test (but be careful — large samples reject even minor deviations).

4. Check skewness and kurtosis Normal distribution has skewness ≈ 0 (symmetric) and kurtosis ≈ 3.

5. The 68-95-99.7 rule Check if your data follows these percentages.

When Normality Fails

Income: Heavily right-skewed. Most people earn moderate amounts, but a few earn millions. Mean > Median (sign of skew).

Lifespan: Bimodal in historical data (infant mortality + old age).

Binary outcomes: Can't be normal — only two values!

Small counts: Discrete, not continuous. Use Poisson or Binomial instead.

When normality fails, use transformations (log, square root) or non-parametric methods (covered in Module 7).

Linear Combinations

Beautiful property: Linear combinations of independent normals are normal.

If X ~ N(μₓ, σₓ²) and Y ~ N(μᵧ, σᵧ²) are independent:

aX+bYN(aμx+bμy,a2σx2+b2σy2)aX + bY \sim N(a\mu_x + b\mu_y, a^2\sigma_x^2 + b^2\sigma_y^2)
Portfolio Returns

Stock A returns: μₐ = 8%, σₐ = 15% Stock B returns: μᵦ = 6%, σᵦ = 10%

Portfolio: 60% in A, 40% in B (assuming independent)

Expected return: 0.6(8%) + 0.4(6%) = 7.2%

Variance: (0.6)²(15)² + (0.4)²(10)² = 81 + 16 = 97 Std dev: √97 ≈ 9.8%

The portfolio has lower risk than Stock A alone — diversification works!

Test your knowledge

🧠 Knowledge Check
1 / 5

For a normal distribution, approximately what percentage of data falls within μ ± 2σ?