Kekkei - Advancing Financial Science For Everyone

From Point Estimates to Intervals

Point estimates are useful but incomplete. "The average height is 175 cm" tells you the center, but not how confident you should be.

Confidence intervals quantify uncertainty. They say: "I'm 95% confident the true value is between 173 and 177 cm."

This moves us from "what's our best guess?" to "what range of values is plausible given the data?"

What is a Confidence Interval?

A confidence interval is a range of values constructed from sample data that is likely to contain the true population parameter.

General form: Estimate ± (Critical value) × (Standard error)

The 95% confidence interval for a mean (σ known):

\bar{x} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}}

The 99% confidence interval:

\bar{x} \pm 2.58 \cdot \frac{\sigma}{\sqrt{n}}

Where:

x̄ is the sample mean
σ is the population standard deviation
n is the sample size
1.96 comes from the normal distribution (97.5th percentile)

The Correct Interpretation

This is the most misunderstood concept in statistics. Here's what a 95% CI means and doesn't mean:

CORRECT: "If we repeated this study many times and constructed a 95% CI each time, about 95% of those intervals would contain the true population mean."

WRONG: "There's a 95% probability the true mean is in this interval."

Why wrong? The true mean μ is fixed (not random). It either is or isn't in your interval. The interval is what's random.

Understanding the Frequentist Interpretation

Imagine 100 researchers each sample 30 people and construct 95% CIs for average height.

Researcher 1: [173, 177]
Researcher 2: [174, 178]
Researcher 3: [171, 175]
...
Researcher 100: [175, 179]

About 95 of these 100 intervals will contain the true μ. About 5 will miss it (bad luck in sampling).

Your specific interval is just one realization. You don't know if you're in the lucky 95 or unlucky 5.

Practical translation: "I used a procedure that captures the truth 95% of the time. I'm reasonably confident my interval caught it."

Confidence Level vs Width

Tradeoff Between Confidence and Precision

Confidence Level	Critical Value (Z)	Width	Interpretation
90%	1.645	Narrower	Less confident, more precise
95%	1.96	Medium	Standard choice
99%	2.58	Wider	More confident, less precise
99.9%	3.29	Very wide	Very confident, imprecise

The tradeoff: Higher confidence → wider interval. You can't have both maximum confidence and maximum precision.

Example:

90% CI: [174, 176] — narrow but only 90% confident
99% CI: [172, 178] — wide but 99% confident

It's like casting a fishing net: wider net (higher confidence) catches more fish (captures μ more often), but tells you less about exactly where the fish are.

The t-Distribution (σ Unknown)

In reality, we rarely know the population standard deviation σ. We estimate it with the sample standard deviation s.

This introduces additional uncertainty, so we use the t-distribution instead of the normal distribution.

Similar to the normal distribution but with heavier tails (more probability in the extremes).

Parameter: degrees of freedom (df) = n - 1

As n increases, t approaches the normal distribution.

\text{95% CI with unknown } \sigma: \quad \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}

Where t_{α/2, n-1} is the critical value from the t-distribution with n-1 degrees of freedom.

Key differences from Z:

Small samples (n < 30): t critical values are larger than Z → wider CIs (accounting for uncertainty in s)
Large samples (n ≥ 100): t ≈ Z (1.96 for 95%)
n = 10, 95% CI: t = 2.26 (vs Z = 1.96)
n = 5, 95% CI: t = 2.78 (even wider!)

Small Sample CI

Measure 10 people's heights:

x̄ = 175 cm
s = 7 cm
n = 10, df = 9

95% CI using t-distribution (t₀.₀₂₅,₉ = 2.26):

175 ± 2.26 · (7/√10) 175 ± 2.26 · 2.21 175 ± 5.0 [170, 180]

If we incorrectly used Z = 1.96: 175 ± 4.3 → [170.7, 179.3] — too narrow! Underestimates uncertainty.

Confidence Intervals for Proportions

For proportions (polls, success rates), we use a different formula:

\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Where p̂ = sample proportion = x/n (number of successes / sample size)

Election Poll

Poll 400 voters: 220 support Candidate A.

p̂ = 220/400 = 0.55 (55%)

95% CI: 0.55 ± 1.96√(0.55·0.45/400) 0.55 ± 1.96√(0.0006188) 0.55 ± 1.96(0.0249) 0.55 ± 0.049 [0.501, 0.599] or [50.1%, 59.9%]

Margin of error: ±4.9 percentage points

Headline: "Candidate A leads with 55% support, ±5% margin of error"

Why polls say "±3%": With n ≈ 1000, the margin of error for 95% CI is roughly ±3%. This is where that ubiquitous number comes from!

Factors Affecting CI Width

A confidence interval width depends on three factors:

1. Confidence level (chosen by researcher) Higher confidence → wider interval

2. Sample size n Larger n → narrower interval (by factor of √n)

3. Population variability σ More variable data → wider interval

You control n and confidence level. σ is a property of the population.

Shrinking the Interval

Current CI: 175 ± 5 (width = 10)

Want to cut width in half to ±2.5: Need 4× the sample size (because of √n)

Want 99% instead of 95% confidence: Critical value changes from 1.96 to 2.58 (31% wider)

Can't change σ — it's inherent to what you're measuring.

Common Mistakes

1. "95% chance μ is in the interval" Wrong. The interval is random, μ is fixed. Correct: "95% of such intervals contain μ."

2. Confusing confidence level with probability A 95% CI for [170, 180] doesn't mean P(170 < μ < 180) = 0.95 in the Bayesian sense.

3. Thinking wider intervals are worse Wider intervals reflect reality (more uncertainty). Artificially narrow intervals are overconfident and misleading.

4. Ignoring assumptions CIs assume random sampling, independence, and (for small n) normality. Violations make intervals unreliable.

5. Using CI to test hypotheses incorrectly "The CI doesn't include 0, so the effect is significant" is correct. But CIs and hypothesis tests address slightly different questions.

Test your knowledge

🧠 Knowledge Check

1 / 5

From Point Estimates to IntervalsFocusStart Focus Mode

What is a Confidence Interval?FocusStart Focus Mode

The Correct InterpretationFocusStart Focus Mode

Confidence Level vs WidthFocusStart Focus Mode

The t-Distribution (σ Unknown)FocusStart Focus Mode

Confidence Intervals for ProportionsFocusStart Focus Mode

Factors Affecting CI WidthFocusStart Focus Mode

Common MistakesFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

What does a 95% confidence interval actually mean?