Sampling Methods

Master random, stratified, cluster, and systematic sampling. Understand selection bias, non-response bias, and survivorship bias.

20 min read
Intermediate

Why Sampling Matters

You almost never get to observe an entire population. Instead, you take a sample and try to generalize. But the way you sample determines whether your conclusions are valid or garbage.

A perfectly analyzed biased sample is worse than a rough analysis of a well-collected sample. Sampling is where statistics begins.

Simple Random Sampling

Every member of the population has an equal chance of being selected, and every possible sample of size n is equally likely.

This is the gold standard — the method all statistical formulas assume.

How to do it: Assign a number to each population member, then use a random number generator to select n numbers.

Pros: Unbiased, simple, all theory applies directly. Cons: Requires a complete list of the population (sampling frame), which may not exist. May miss small subgroups by chance.

Stratified Sampling

Divide the population into strata (non-overlapping subgroups), then take a random sample from each stratum.

Stratified Sampling

Surveying a university with 60% undergrads and 40% grad students:

  • Stratum 1: Undergraduates → randomly sample 600
  • Stratum 2: Graduate students → randomly sample 400

This guarantees both groups are represented proportionally, unlike SRS where you might accidentally get 90% undergrads by bad luck.

Pros: Guarantees representation of important subgroups, often more precise than SRS. Cons: Must know the strata in advance, more complex to implement.

Cluster Sampling

Divide the population into clusters (usually geographic or organizational), randomly select some clusters, then sample everyone (or a random subset) within chosen clusters.

Cluster Sampling

Studying school performance across a country:

  • Clusters = individual schools
  • Randomly select 50 schools
  • Test all students in those 50 schools

Much cheaper than traveling to every school in the country!

Pros: Practical and cost-effective when populations are spread geographically. Cons: Less precise than SRS (members within a cluster tend to be similar), requires more total observations for the same precision.

Stratified vs Cluster: In stratified sampling, you sample FROM every group. In cluster sampling, you sample entire groups and skip others. Stratified → maximize precision. Cluster → minimize cost.

Systematic Sampling

Select every kth member from a list after a random starting point. For example, from a list of 10,000, start at position 7 and then pick every 100th person: 7, 107, 207, 307...

Pros: Easy to implement, ensures spread across the list. Cons: Can be biased if the list has a periodic pattern that aligns with k.

Sampling Bias: When Things Go Wrong

Bias means your sample systematically misrepresents the population. No amount of statistical analysis can fix a biased sample.

The sampling method systematically excludes certain population members. Example: Online surveys exclude people without internet access — often poorer, older, or rural populations.

People who respond differ systematically from those who don't. Example: Customer satisfaction surveys — angry customers and very happy ones respond; the indifferent majority doesn't.

Only observing the "survivors" while ignoring those who dropped out or failed. This might be the most insidious bias of all.

WWII Survivorship Bias

During WWII, the military examined bullet holes on returning bombers to decide where to add armor. They found holes concentrated on the wings and fuselage.

Naive conclusion: Armor the wings and fuselage (where the holes are).

Statistician Abraham Wald's insight: You're only seeing planes that survived! The holes show where planes CAN take damage and still return. The missing holes are on engines and cockpits — planes hit there didn't come back.

Correct conclusion: Armor where there are NO holes on surviving planes.

This is survivorship bias: drawing conclusions from an incomplete picture because you can't see the failures.

More Real-World Bias Examples

  • Mutual fund advertising: "Our top 5 funds returned 30%!" (What about the 20 funds that were quietly shut down?)
  • Successful entrepreneurs: "I dropped out and made billions!" (What about the millions who dropped out and struggled?)
  • Medical studies: Patients who complete a drug trial may be healthier than those who dropped out due to side effects.
  • Yelp reviews: Extreme experiences (great or terrible) get reviewed; average experiences don't.

Whenever you see data, ask: "Who is MISSING from this dataset?" The absent data often tells a more important story than the present data.

Choosing a Sampling Method

Sampling Methods Compared
Method
Best When
Precision
Cost
Simple RandomComplete population list availableHighModerate
StratifiedImportant subgroups existHighestHigher
ClusterPopulation is geographically spreadLowerLowest
SystematicOrdered list availableModerateLow
ConvenienceQuick informal insight (NOT for inference!)LowestLowest

Test your knowledge

🧠 Knowledge Check
1 / 3

In the WWII bomber example, what type of bias was present?