Common Distributions

Learn the essential probability distributions: Bernoulli, Binomial, Poisson, Uniform, Exponential, and an introduction to the Normal distribution.

25 min read
Intermediate

Why Named Distributions Matter

In theory, every random variable has its own unique distribution. But in practice, a handful of distributions show up everywhere. These aren't just mathematical curiosities — they're the building blocks of statistical models.

Learning these distributions is like learning the periodic table in chemistry. Once you recognize them, you can solve problems instantly that would otherwise take hours.

Discrete Distributions

These distributions model countable outcomes: number of successes, number of events, binary choices.

Models a single trial with two outcomes: success (1) or failure (0).

Parameter: p = probability of success

PMF: P(X = 1) = p, P(X = 0) = 1-p

Mean: E[X] = p
Variance: Var(X) = p(1-p)

Examples: One coin flip, one free throw, one product passes quality check

Bernoulli in Action

A basketball player has a 70% free throw success rate.

X ~ Bernoulli(p = 0.7)

P(makes it) = 0.7
P(misses) = 0.3
E[X] = 0.7 (expected "successes" per shot)

Models the number of successes in n independent Bernoulli trials.

Parameters: n (number of trials), p (success probability)

PMF: P(X = k) = C(n,k) · p^k · (1-p)^(n-k)

Mean: E[X] = np
Variance: Var(X) = np(1-p)

Notation: X ~ Binomial(n, p)

Binomial Example

Flip a fair coin 10 times. X = number of heads.

X ~ Binomial(n=10, p=0.5)

E[X] = 10·0.5 = 5 heads expected

P(X = 7) = C(10,7)·(0.5)^7·(0.5)^3 = 120·(0.5)^10 ≈ 0.117

About 11.7% chance of getting exactly 7 heads.

Models the number of events occurring in a fixed interval when events happen at a constant average rate, independently.

Parameter: λ (lambda) = average rate

PMF: P(X = k) = (λ^k · e^(-λ)) / k!

Mean: E[X] = λ
Variance: Var(X) = λ (mean equals variance!)

Notation: X ~ Poisson(λ)

Poisson Example

A website gets an average of 3 visitors per minute.

X ~ Poisson(λ = 3)

P(X = 5 visitors in next minute) = (3^5 · e^(-3)) / 5! ≈ 0.10

P(X = 0) = e^(-3) ≈ 0.05 (5% chance of no visitors)

Common uses: Radioactive decay, phone calls to a call center, typos per page, rare diseases

Rule of thumb: Use Binomial when you have a fixed number of trials. Use Poisson when counting events over time/space with no clear "number of trials."

Continuous Distributions

These distributions model continuous outcomes measured on a scale: time, distance, temperature, weight.

Every value in the interval [a, b] is equally likely.

Parameters: a (min), b (max)

PDF: f(x) = 1/(b-a) for a ≤ x ≤ b, 0 otherwise

Mean: E[X] = (a+b)/2
Variance: Var(X) = (b-a)²/12

Notation: X ~ Uniform(a, b)

Uniform Example

A bus arrives uniformly between 9:00 and 9:10 AM.

X ~ Uniform(0, 10) (minutes after 9:00)

Expected arrival time: (0+10)/2 = 5 minutes = 9:05 AM

P(arrives before 9:03) = P(X < 3) = 3/10 = 0.3

Models waiting time until the next event in a Poisson process.

Parameter: λ (rate parameter — same as Poisson!)

PDF: f(x) = λe^(-λx) for x ≥ 0

Mean: E[X] = 1/λ
Variance: Var(X) = 1/λ²

Key property: Memoryless — P(X > s+t | X > s) = P(X > t)

Notation: X ~ Exponential(λ)

Exponential Example

Website visitors arrive at rate λ = 3 per minute (Poisson).
Time until next visitor ~ Exponential(λ = 3).

E[time until next visitor] = 1/3 minute = 20 seconds

P(wait more than 1 minute) = e^(-3·1) ≈ 0.05

Memoryless property: If you've already waited 30 seconds, the probability of waiting another 30 seconds is the same as it was initially. The past doesn't matter.

Beautiful connection: If events follow a Poisson(λ) distribution, then waiting times between events follow an Exponential(λ) distribution. They're two sides of the same coin.

The bell curve. The most important distribution in all of statistics.

Parameters: μ (mean), σ² (variance)

PDF: f(x) = (1/(σ√(2π))) · exp(-(x-μ)²/(2σ²))

Mean: E[X] = μ
Variance: Var(X) = σ²

Notation: X ~ Normal(μ, σ²) or N(μ, σ²)

Standard Normal: μ=0, σ=1, denoted N(0,1) or Z

The normal distribution is so important it gets its own dedicated lesson next. For now, know:

  • It's symmetric around the mean
  • 68% of data within 1 standard deviation
  • 95% within 2 standard deviations
  • 99.7% within 3 standard deviations

Examples everywhere: Heights, test scores, measurement errors, and (by the Central Limit Theorem) sample means from any distribution.

How to Choose the Right Distribution

Distribution Selection Guide
Data Type
Description
Use This Distribution
Binary outcomeSingle yes/no trialBernoulli
Count successesFixed number of yes/no trialsBinomial
Count eventsEvents over time/spacePoisson
Waiting timeTime until next eventExponential
Anything boundedValue in a known rangeUniform (if no other info)
Natural measurementsHeights, errors, aggregatesNormal

The key questions:

  1. Discrete or continuous?
  2. Bounded or unbounded?
  3. Is there a natural structure (trials, events, time)?
  4. What do you know about mean/variance/shape?

Why These Distributions?

These aren't arbitrary. Each emerges naturally from real-world processes:

Bernoulli/Binomial arise from independent trials
Poisson emerges when events are rare and independent
Exponential comes from memoryless waiting
Normal appears via the Central Limit Theorem (averaging anything gives you normal!)

Understanding why a distribution applies is more important than memorizing formulas. The formulas follow from the assumptions.

In practice: 90% of statistical modeling uses variations of Normal, Binomial, and Poisson. Master these three and you'll understand most of applied statistics.

Test your knowledge

🧠 Knowledge Check
1 / 5

You flip a coin 20 times and count heads. Which distribution models this?