Random Variables
Learn what random variables are, the difference between discrete and continuous, PMF vs PDF vs CDF, expected value, and variance.
From Outcomes to Numbers
Probability theory started with events: "rain," "heads," "winning." But to do mathematics with these outcomes, we need to assign them numbers. That's what random variables do.
A random variable is a function that maps outcomes of a random process to numerical values. It's the bridge between the abstract world of probability and the concrete world of data analysis.
The Formal Definition
A random variable is a function X: S β β that assigns a real number to each outcome in the sample space S.
It's called "random" because the outcome is uncertain, and "variable" because it takes different values depending on the outcome.
Flip two coins. Sample space: S = {HH, HT, TH, TT}
Define X = "number of heads"
- X(HH) = 2
- X(HT) = 1
- X(TH) = 1
- X(TT) = 0
X is a random variable mapping outcomes to {0, 1, 2}.
Why this matters: Once we have random variables, we can ask quantitative questions:
- What's the average number of heads?
- What's the probability X β₯ 1?
- How spread out are the values?
These questions form the foundation of data analysis.
Discrete vs Continuous Random Variables
Type | Values | Example | Probability |
|---|---|---|---|
| Discrete | Countable (finite or infinite) | Die roll, number of customers, coin flips | P(X = x) |
| Continuous | Uncountable (intervals on real line) | Height, temperature, time | P(a < X < b) |
Key difference: For discrete RVs, individual values have positive probability. For continuous RVs, P(X = exactly x) = 0 for any specific x β we only talk about probabilities over intervals.
Why? There are infinitely many values in any interval, so each specific value has probability zero. (This is a deep measure-theoretic fact, but the intuition is: probability spreads over infinite possibilities.)
Probability Mass Function (PMF)
The probability mass function p(x) gives the probability that X equals x:
p(x) = P(X = x)
Properties:
- p(x) β₯ 0 for all x
- Ξ£ p(x) = 1 (sum over all possible values)
X = result of rolling a fair six-sided die
PMF: p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6
p(7) = 0 (impossible)
This fully describes the probability distribution of X.
Probability Density Function (PDF)
The probability density function f(x) describes the relative likelihood of X taking values near x.
Important: f(x) is NOT P(X = x). Instead:
P(a < X < b) = β«βα΅ f(x)dx
The probability is the area under the curve between a and b.
Properties:
- f(x) β₯ 0 for all x
- β«ββ^β f(x)dx = 1 (total area = 1)
- f(x) can be > 1 (it's a density, not a probability!)
X is equally likely to be anywhere between 0 and 1.
PDF: f(x) = 1 for 0 β€ x β€ 1, and f(x) = 0 elsewhere
P(0.2 < X < 0.5) = β«β.β^β.β 1 dx = 0.5 - 0.2 = 0.3
The probability is the width of the interval because the height is 1.
Common confusion: "The PDF is 2 at x = 3, so P(X = 3) = 2?" No! f(3) = 2 means the density is high near x = 3, making P(2.99 < X < 3.01) relatively large. But P(X = exactly 3) = 0.
Cumulative Distribution Function (CDF)
The cumulative distribution function F(x) gives the probability that X is at most x:
F(x) = P(X β€ x)
For discrete: F(x) = Ξ£ββ€β p(t) For continuous: F(x) = β«ββΛ£ f(t)dt
The CDF works for both discrete and continuous random variables.
Properties of the CDF:
- F(x) is non-decreasing (as x increases, F(x) doesn't decrease)
- lim_{xβ-β} F(x) = 0
- lim_{xββ} F(x) = 1
- F(x) is right-continuous
Why it's useful:
- P(a < X β€ b) = F(b) - F(a)
- For continuous X: f(x) = F'(x) (the PDF is the derivative of the CDF)
- The CDF uniquely determines the distribution
X = die roll (1 to 6)
F(0.5) = P(X β€ 0.5) = 0 (can't roll less than 1) F(1) = P(X β€ 1) = 1/6 F(2.7) = P(X β€ 2.7) = P(X β€ 2) = 2/6 F(6) = 1 F(10) = 1
The CDF is a step function for discrete random variables.
Expected Value (Mean)
The expected value E[X] (also called the mean ΞΌ) is the long-run average value of X:
Discrete: E[X] = Ξ£β x Β· p(x) Continuous: E[X] = β«ββ^β x Β· f(x)dx
It's the probability-weighted average of all possible values.
E[X] = 1Β·(1/6) + 2Β·(1/6) + 3Β·(1/6) + 4Β·(1/6) + 5Β·(1/6) + 6Β·(1/6) = (1+2+3+4+5+6)/6 = 21/6 = 3.5
You can never roll a 3.5, but it's the average outcome over many rolls.
Linearity of expectation: E[aX + b] = aE[X] + b, and E[X + Y] = E[X] + E[Y] even if X and Y are dependent! This makes expected value calculations very tractable.
Variance and Standard Deviation
The variance Var(X) = ΟΒ² measures the spread of X around its mean:
Var(X) = E[(X - ΞΌ)Β²]
Equivalently (easier to compute): Var(X) = E[XΒ²] - (E[X])Β²
Standard deviation: Ο = βVar(X) (same units as X)
We know E[X] = 3.5
E[XΒ²] = 1Β²Β·(1/6) + 2Β²Β·(1/6) + ... + 6Β²Β·(1/6) = (1+4+9+16+25+36)/6 = 91/6 β 15.17
Var(X) = 15.17 - (3.5)Β² = 15.17 - 12.25 = 2.92
Ο = β2.92 β 1.71
Interpretation: A die roll is typically about 1.71 away from the mean (3.5). This quantifies the spread we intuitively see in die rolls.
Why Random Variables Matter
Random variables are the foundation of all statistical modeling:
1. Data as realizations of random variables When you measure heights, incomes, test scores β you're observing values of random variables.
2. Models specify distributions "Heights are normally distributed" means heights follow a specific random variable distribution.
3. Statistical inference We estimate E[X] and Var(X) from samples to understand the population.
4. Predictions are random variables "What will tomorrow's temperature be?" defines a random variable with a distribution.
Without random variables, we'd be stuck describing individual outcomes. With them, we can reason about patterns, make predictions, and quantify uncertainty.
Test your knowledge
What is a random variable?
Interactive Playground
Experiment with these interactive tools to deepen your understanding.
π Interactive: Distribution Visualizer
π‘ Law of Large Numbers: Sample more to see the histogram converge to the theoretical distribution. The sample mean xΜ converges to the true mean ΞΌ!
π― Interactive: Expected Value Calculator
A trade has 30% chance of making $100 and 70% chance of losing $50.
E[X] = 0.30Γ100 + 0.70Γ(-50) = 30 - 35 = $-5.00
β Negative expected value - this trade destroys value over time!
π Interactive: Central Limit Theorem Demo
The CLT states that sample means approach a normal distribution, regardless of the original distribution!
π‘ The Magic: Even though uniform distribution is flat, the distribution of sample means becomes bell-shaped! Try increasing n to see the effect strengthen.