Random Variables

Learn what random variables are, the difference between discrete and continuous, PMF vs PDF vs CDF, expected value, and variance.

24 min read
Intermediate

From Outcomes to Numbers

Probability theory started with events: "rain," "heads," "winning." But to do mathematics with these outcomes, we need to assign them numbers. That's what random variables do.

A random variable is a function that maps outcomes of a random process to numerical values. It's the bridge between the abstract world of probability and the concrete world of data analysis.

The Formal Definition

A random variable is a function X: S β†’ ℝ that assigns a real number to each outcome in the sample space S.

It's called "random" because the outcome is uncertain, and "variable" because it takes different values depending on the outcome.

Coin Flips

Flip two coins. Sample space: S = {HH, HT, TH, TT}

Define X = "number of heads"

  • X(HH) = 2
  • X(HT) = 1
  • X(TH) = 1
  • X(TT) = 0

X is a random variable mapping outcomes to {0, 1, 2}.

Why this matters: Once we have random variables, we can ask quantitative questions:

  • What's the average number of heads?
  • What's the probability X β‰₯ 1?
  • How spread out are the values?

These questions form the foundation of data analysis.

Discrete vs Continuous Random Variables

Two Types of Random Variables
Type
Values
Example
Probability
DiscreteCountable (finite or infinite)Die roll, number of customers, coin flipsP(X = x)
ContinuousUncountable (intervals on real line)Height, temperature, timeP(a < X < b)

Key difference: For discrete RVs, individual values have positive probability. For continuous RVs, P(X = exactly x) = 0 for any specific x β€” we only talk about probabilities over intervals.

Why? There are infinitely many values in any interval, so each specific value has probability zero. (This is a deep measure-theoretic fact, but the intuition is: probability spreads over infinite possibilities.)

Probability Mass Function (PMF)

The probability mass function p(x) gives the probability that X equals x:

p(x) = P(X = x)

Properties:

  • p(x) β‰₯ 0 for all x
  • Ξ£ p(x) = 1 (sum over all possible values)
Die Roll

X = result of rolling a fair six-sided die

PMF: p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6

p(7) = 0 (impossible)

This fully describes the probability distribution of X.

Probability Density Function (PDF)

The probability density function f(x) describes the relative likelihood of X taking values near x.

Important: f(x) is NOT P(X = x). Instead:

P(a < X < b) = βˆ«β‚α΅‡ f(x)dx

The probability is the area under the curve between a and b.

Properties:

  • f(x) β‰₯ 0 for all x
  • βˆ«β‚‹βˆž^∞ f(x)dx = 1 (total area = 1)
  • f(x) can be > 1 (it's a density, not a probability!)
Uniform Distribution on [0,1]

X is equally likely to be anywhere between 0 and 1.

PDF: f(x) = 1 for 0 ≀ x ≀ 1, and f(x) = 0 elsewhere

P(0.2 < X < 0.5) = βˆ«β‚€.β‚‚^β‚€.β‚… 1 dx = 0.5 - 0.2 = 0.3

The probability is the width of the interval because the height is 1.

Common confusion: "The PDF is 2 at x = 3, so P(X = 3) = 2?" No! f(3) = 2 means the density is high near x = 3, making P(2.99 < X < 3.01) relatively large. But P(X = exactly 3) = 0.

Cumulative Distribution Function (CDF)

The cumulative distribution function F(x) gives the probability that X is at most x:

F(x) = P(X ≀ x)

For discrete: F(x) = Ξ£β‚œβ‰€β‚“ p(t) For continuous: F(x) = βˆ«β‚‹βˆžΛ£ f(t)dt

The CDF works for both discrete and continuous random variables.

Properties of the CDF:

  • F(x) is non-decreasing (as x increases, F(x) doesn't decrease)
  • lim_{xβ†’-∞} F(x) = 0
  • lim_{xβ†’βˆž} F(x) = 1
  • F(x) is right-continuous

Why it's useful:

  • P(a < X ≀ b) = F(b) - F(a)
  • For continuous X: f(x) = F'(x) (the PDF is the derivative of the CDF)
  • The CDF uniquely determines the distribution
CDF of a Die Roll

X = die roll (1 to 6)

F(0.5) = P(X ≀ 0.5) = 0 (can't roll less than 1) F(1) = P(X ≀ 1) = 1/6 F(2.7) = P(X ≀ 2.7) = P(X ≀ 2) = 2/6 F(6) = 1 F(10) = 1

The CDF is a step function for discrete random variables.

Expected Value (Mean)

The expected value E[X] (also called the mean ΞΌ) is the long-run average value of X:

Discrete: E[X] = Ξ£β‚“ x Β· p(x) Continuous: E[X] = βˆ«β‚‹βˆž^∞ x Β· f(x)dx

It's the probability-weighted average of all possible values.

Expected Value of a Die Roll

E[X] = 1Β·(1/6) + 2Β·(1/6) + 3Β·(1/6) + 4Β·(1/6) + 5Β·(1/6) + 6Β·(1/6) = (1+2+3+4+5+6)/6 = 21/6 = 3.5

You can never roll a 3.5, but it's the average outcome over many rolls.

Linearity of expectation: E[aX + b] = aE[X] + b, and E[X + Y] = E[X] + E[Y] even if X and Y are dependent! This makes expected value calculations very tractable.

Variance and Standard Deviation

The variance Var(X) = σ² measures the spread of X around its mean:

Var(X) = E[(X - ΞΌ)Β²]

Equivalently (easier to compute): Var(X) = E[XΒ²] - (E[X])Β²

Standard deviation: Οƒ = √Var(X) (same units as X)

Variance of a Die Roll

We know E[X] = 3.5

E[XΒ²] = 1Β²Β·(1/6) + 2Β²Β·(1/6) + ... + 6Β²Β·(1/6) = (1+4+9+16+25+36)/6 = 91/6 β‰ˆ 15.17

Var(X) = 15.17 - (3.5)Β² = 15.17 - 12.25 = 2.92

Οƒ = √2.92 β‰ˆ 1.71

Interpretation: A die roll is typically about 1.71 away from the mean (3.5). This quantifies the spread we intuitively see in die rolls.

Why Random Variables Matter

Random variables are the foundation of all statistical modeling:

1. Data as realizations of random variables When you measure heights, incomes, test scores β€” you're observing values of random variables.

2. Models specify distributions "Heights are normally distributed" means heights follow a specific random variable distribution.

3. Statistical inference We estimate E[X] and Var(X) from samples to understand the population.

4. Predictions are random variables "What will tomorrow's temperature be?" defines a random variable with a distribution.

Without random variables, we'd be stuck describing individual outcomes. With them, we can reason about patterns, make predictions, and quantify uncertainty.

Test your knowledge

🧠 Knowledge Check
1 / 5

What is a random variable?

Interactive Playground

Experiment with these interactive tools to deepen your understanding.

πŸ“ˆ Interactive: Distribution Visualizer

xdensity

πŸ’‘ Law of Large Numbers: Sample more to see the histogram converge to the theoretical distribution. The sample mean xΜ„ converges to the true mean ΞΌ!

🎯 Interactive: Expected Value Calculator

Define Outcomes
Expected Value Formula
E[X] = Ξ£ xΒ·P(X=x)
E[X] = 100Γ—0.30 -50Γ—0.70
Expected Value E[X]
-5.00
Variance
4725.00
Std Dev
68.74
πŸ’Ή Trading Example

A trade has 30% chance of making $100 and 70% chance of losing $50.
E[X] = 0.30Γ—100 + 0.70Γ—(-50) = 30 - 35 = $-5.00
❌ Negative expected value - this trade destroys value over time!

πŸ”” Interactive: Central Limit Theorem Demo

The CLT states that sample means approach a normal distribution, regardless of the original distribution!

Samples: 0
Mean of xΜ„: 0.000
Std of xΜ„: 0.0000(Οƒ/√n)
Take samples to see the distribution of means

πŸ’‘ The Magic: Even though uniform distribution is flat, the distribution of sample means becomes bell-shaped! Try increasing n to see the effect strengthen.