Point Estimation

Learn about estimators, bias vs variance, consistency, the bias-variance tradeoff, and the Law of Large Numbers.

18 min read
Intermediate

From Sample to Population

We've established that we study samples to learn about populations. But how exactly do we go from a sample statistic (something we calculated) to a population parameter (something we want to know)?

This is the problem of estimation — using data to make our best guess about the truth.

Parameters vs Statistics

A fixed (but usually unknown) value that describes a population. We use Greek letters: μ (population mean), σ (population standard deviation), p (population proportion).

A value calculated from sample data, used to estimate a parameter. We use Latin letters: x̄ (sample mean), s (sample standard deviation), p̂ (sample proportion).

Think of it this way: the parameter is the target. The statistic is your arrow. Point estimation is the act of aiming and shooting a single arrow at the bullseye.

What Makes a Good Estimator?

Not all estimators are created equal. Here are the properties we want:

An estimator is unbiased if its expected value equals the parameter it's estimating:

E(θ^)=θE(\hat{\theta}) = \theta

The sample mean x̄ is an unbiased estimator of μ. On average, across many possible samples, x̄ hits the bullseye. Individual estimates may miss, but there's no systematic drift in either direction.

The sample variance with n-1 (s²) is unbiased for σ². That's why we use n-1 instead of n — Bessel's correction removes the bias!

Among all unbiased estimators, we prefer the one with the smallest variance — the one whose estimates cluster most tightly around the true value.

As sample size n → ∞, the estimator converges to the true parameter. More data → more accuracy. All good estimators should be consistent.

The Bias-Variance Tradeoff

Here's a profound insight: low bias and low variance sometimes compete.

Imagine throwing darts at a bullseye:

  • Low bias, low variance: Darts cluster tightly around the center. The best case.
  • Low bias, high variance: Darts are centered on the bullseye but scattered widely.
  • High bias, low variance: Darts cluster tightly, but off-center. Precise but wrong.
  • High bias, high variance: Scattered and off-center. The worst case.
Mean Squared Error=Bias2+Variance\text{Mean Squared Error} = \text{Bias}^2 + \text{Variance}

MSE captures the total error. Sometimes a slightly biased estimator with much lower variance has lower MSE than the unbiased one. This tradeoff becomes central in machine learning (regularization is intentionally adding bias to reduce variance).

In practice: For large samples, bias matters less (estimators tend toward unbiased). For small samples, the bias-variance tradeoff can be important, and a biased estimator might actually perform better.

Common Point Estimators

| Parameter | Estimator | Unbiased? | |---|---|---| | Population mean μ | Sample mean x̄ = Σxᵢ/n | ✅ Yes | | Population variance σ² | Sample variance s² = Σ(xᵢ-x̄)²/(n-1) | ✅ Yes | | Population proportion p | Sample proportion p̂ = successes/n | ✅ Yes | | Population std dev σ | Sample std dev s = √s² | ❌ Slightly biased (but consistent) | | Population median | Sample median | ✅ Yes (for symmetric distributions) |

Estimating a Proportion

A poll of 1,000 voters finds 540 support Candidate A.

Point estimate: p̂ = 540/1000 = 0.54

Our best single guess is that 54% of all voters support Candidate A.

But how confident should we be? How close is 0.54 to the true p? That's the question confidence intervals (next lesson) will answer.

The Law of Large Numbers

As the sample size increases, the sample mean converges to the population mean:

XˉnPμas n\bar{X}_n \xrightarrow{P} \mu \quad \text{as } n \to \infty

This is the mathematical guarantee that more data = better estimates. It's why:

  • Casinos always win in the long run (even if gamblers sometimes win short-term)
  • Insurance companies can predict claims accurately across millions of policies
  • Batting averages stabilize over a season

Don't confuse LLN with the Gambler's Fallacy. LLN says averages converge over the long run. It does NOT say short-run deviations get "corrected." After 5 heads in a row, the coin doesn't "owe you" tails.

LLN says x̄ → μ, not that individual observations become more predictable. Each observation is still random. It's the average that stabilizes.

Test your knowledge

🧠 Knowledge Check
1 / 3

An unbiased estimator means: