Common Distributions
Learn the essential probability distributions: Bernoulli, Binomial, Poisson, Uniform, Exponential, and an introduction to the Normal distribution.
Why Named Distributions Matter
In theory, every random variable has its own unique distribution. But in practice, a handful of distributions show up everywhere. These aren't just mathematical curiosities — they're the building blocks of statistical models.
Learning these distributions is like learning the periodic table in chemistry. Once you recognize them, you can solve problems instantly that would otherwise take hours.
Discrete Distributions
These distributions model countable outcomes: number of successes, number of events, binary choices.
Models a single trial with two outcomes: success (1) or failure (0).
Parameter: p = probability of success
PMF: P(X = 1) = p, P(X = 0) = 1-p
Mean: E[X] = p
Variance: Var(X) = p(1-p)
Examples: One coin flip, one free throw, one product passes quality check
A basketball player has a 70% free throw success rate.
X ~ Bernoulli(p = 0.7)
P(makes it) = 0.7
P(misses) = 0.3
E[X] = 0.7 (expected "successes" per shot)
Models the number of successes in n independent Bernoulli trials.
Parameters: n (number of trials), p (success probability)
PMF: P(X = k) = C(n,k) · p^k · (1-p)^(n-k)
Mean: E[X] = np
Variance: Var(X) = np(1-p)
Notation: X ~ Binomial(n, p)
Flip a fair coin 10 times. X = number of heads.
X ~ Binomial(n=10, p=0.5)
E[X] = 10·0.5 = 5 heads expected
P(X = 7) = C(10,7)·(0.5)^7·(0.5)^3 = 120·(0.5)^10 ≈ 0.117
About 11.7% chance of getting exactly 7 heads.
Models the number of events occurring in a fixed interval when events happen at a constant average rate, independently.
Parameter: λ (lambda) = average rate
PMF: P(X = k) = (λ^k · e^(-λ)) / k!
Mean: E[X] = λ
Variance: Var(X) = λ (mean equals variance!)
Notation: X ~ Poisson(λ)
A website gets an average of 3 visitors per minute.
X ~ Poisson(λ = 3)
P(X = 5 visitors in next minute) = (3^5 · e^(-3)) / 5! ≈ 0.10
P(X = 0) = e^(-3) ≈ 0.05 (5% chance of no visitors)
Common uses: Radioactive decay, phone calls to a call center, typos per page, rare diseases
Rule of thumb: Use Binomial when you have a fixed number of trials. Use Poisson when counting events over time/space with no clear "number of trials."
Continuous Distributions
These distributions model continuous outcomes measured on a scale: time, distance, temperature, weight.
Every value in the interval [a, b] is equally likely.
Parameters: a (min), b (max)
PDF: f(x) = 1/(b-a) for a ≤ x ≤ b, 0 otherwise
Mean: E[X] = (a+b)/2
Variance: Var(X) = (b-a)²/12
Notation: X ~ Uniform(a, b)
A bus arrives uniformly between 9:00 and 9:10 AM.
X ~ Uniform(0, 10) (minutes after 9:00)
Expected arrival time: (0+10)/2 = 5 minutes = 9:05 AM
P(arrives before 9:03) = P(X < 3) = 3/10 = 0.3
Models waiting time until the next event in a Poisson process.
Parameter: λ (rate parameter — same as Poisson!)
PDF: f(x) = λe^(-λx) for x ≥ 0
Mean: E[X] = 1/λ
Variance: Var(X) = 1/λ²
Key property: Memoryless — P(X > s+t | X > s) = P(X > t)
Notation: X ~ Exponential(λ)
Website visitors arrive at rate λ = 3 per minute (Poisson).
Time until next visitor ~ Exponential(λ = 3).
E[time until next visitor] = 1/3 minute = 20 seconds
P(wait more than 1 minute) = e^(-3·1) ≈ 0.05
Memoryless property: If you've already waited 30 seconds, the probability of waiting another 30 seconds is the same as it was initially. The past doesn't matter.
Beautiful connection: If events follow a Poisson(λ) distribution, then waiting times between events follow an Exponential(λ) distribution. They're two sides of the same coin.
The bell curve. The most important distribution in all of statistics.
Parameters: μ (mean), σ² (variance)
PDF: f(x) = (1/(σ√(2π))) · exp(-(x-μ)²/(2σ²))
Mean: E[X] = μ
Variance: Var(X) = σ²
Notation: X ~ Normal(μ, σ²) or N(μ, σ²)
Standard Normal: μ=0, σ=1, denoted N(0,1) or Z
The normal distribution is so important it gets its own dedicated lesson next. For now, know:
- It's symmetric around the mean
- 68% of data within 1 standard deviation
- 95% within 2 standard deviations
- 99.7% within 3 standard deviations
Examples everywhere: Heights, test scores, measurement errors, and (by the Central Limit Theorem) sample means from any distribution.
How to Choose the Right Distribution
Data Type | Description | Use This Distribution |
|---|---|---|
| Binary outcome | Single yes/no trial | Bernoulli |
| Count successes | Fixed number of yes/no trials | Binomial |
| Count events | Events over time/space | Poisson |
| Waiting time | Time until next event | Exponential |
| Anything bounded | Value in a known range | Uniform (if no other info) |
| Natural measurements | Heights, errors, aggregates | Normal |
The key questions:
- Discrete or continuous?
- Bounded or unbounded?
- Is there a natural structure (trials, events, time)?
- What do you know about mean/variance/shape?
Why These Distributions?
These aren't arbitrary. Each emerges naturally from real-world processes:
Bernoulli/Binomial arise from independent trials
Poisson emerges when events are rare and independent
Exponential comes from memoryless waiting
Normal appears via the Central Limit Theorem (averaging anything gives you normal!)
Understanding why a distribution applies is more important than memorizing formulas. The formulas follow from the assumptions.
In practice: 90% of statistical modeling uses variations of Normal, Binomial, and Poisson. Master these three and you'll understand most of applied statistics.