Kekkei - Advancing Financial Science For Everyone

From Outcomes to Numbers

Probability theory started with events: "rain," "heads," "winning." But to do mathematics with these outcomes, we need to assign them numbers. That's what random variables do.

A random variable is a function that maps outcomes of a random process to numerical values. It's the bridge between the abstract world of probability and the concrete world of data analysis.

The Formal Definition

A random variable is a function X: S → ℝ that assigns a real number to each outcome in the sample space S.

It's called "random" because the outcome is uncertain, and "variable" because it takes different values depending on the outcome.

Coin Flips

Flip two coins. Sample space: S = {HH, HT, TH, TT}

Define X = "number of heads"

X(HH) = 2
X(HT) = 1
X(TH) = 1
X(TT) = 0

X is a random variable mapping outcomes to {0, 1, 2}.

Why this matters: Once we have random variables, we can ask quantitative questions:

What's the average number of heads?
What's the probability X ≥ 1?
How spread out are the values?

These questions form the foundation of data analysis.

Discrete vs Continuous Random Variables

Two Types of Random Variables

Type	Values	Example	Probability
Discrete	Countable (finite or infinite)	Die roll, number of customers, coin flips	P(X = x)
Continuous	Uncountable (intervals on real line)	Height, temperature, time	P(a < X < b)

Key difference: For discrete RVs, individual values have positive probability. For continuous RVs, P(X = exactly x) = 0 for any specific x — we only talk about probabilities over intervals.

Why? There are infinitely many values in any interval, so each specific value has probability zero. (This is a deep measure-theoretic fact, but the intuition is: probability spreads over infinite possibilities.)

Probability Mass Function (PMF)

The probability mass function p(x) gives the probability that X equals x:

p(x) = P(X = x)

Properties:

p(x) ≥ 0 for all x
Σ p(x) = 1 (sum over all possible values)

Die Roll

X = result of rolling a fair six-sided die

PMF: p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6

p(7) = 0 (impossible)

This fully describes the probability distribution of X.

Probability Density Function (PDF)

The probability density function f(x) describes the relative likelihood of X taking values near x.

Important: f(x) is NOT P(X = x). Instead:

P(a < X < b) = ∫ₐᵇ f(x)dx

The probability is the area under the curve between a and b.

Properties:

f(x) ≥ 0 for all x
∫₋∞^∞ f(x)dx = 1 (total area = 1)
f(x) can be > 1 (it's a density, not a probability!)

Uniform Distribution on [0,1]

X is equally likely to be anywhere between 0 and 1.

PDF: f(x) = 1 for 0 ≤ x ≤ 1, and f(x) = 0 elsewhere

P(0.2 < X < 0.5) = ∫₀.₂^₀.₅ 1 dx = 0.5 - 0.2 = 0.3

The probability is the width of the interval because the height is 1.

Common confusion: "The PDF is 2 at x = 3, so P(X = 3) = 2?" No! f(3) = 2 means the density is high near x = 3, making P(2.99 < X < 3.01) relatively large. But P(X = exactly 3) = 0.

Cumulative Distribution Function (CDF)

The cumulative distribution function F(x) gives the probability that X is at most x:

F(x) = P(X ≤ x)

For discrete: F(x) = Σₜ≤ₓ p(t) For continuous: F(x) = ∫₋∞ˣ f(t)dt

The CDF works for both discrete and continuous random variables.

Properties of the CDF:

F(x) is non-decreasing (as x increases, F(x) doesn't decrease)
lim_{x→-∞} F(x) = 0
lim_{x→∞} F(x) = 1
F(x) is right-continuous

Why it's useful:

P(a < X ≤ b) = F(b) - F(a)
For continuous X: f(x) = F'(x) (the PDF is the derivative of the CDF)
The CDF uniquely determines the distribution

CDF of a Die Roll

X = die roll (1 to 6)

F(0.5) = P(X ≤ 0.5) = 0 (can't roll less than 1) F(1) = P(X ≤ 1) = 1/6 F(2.7) = P(X ≤ 2.7) = P(X ≤ 2) = 2/6 F(6) = 1 F(10) = 1

The CDF is a step function for discrete random variables.

Expected Value (Mean)

The expected value E[X] (also called the mean μ) is the long-run average value of X:

Discrete: E[X] = Σₓ x · p(x) Continuous: E[X] = ∫₋∞^∞ x · f(x)dx

It's the probability-weighted average of all possible values.

Expected Value of a Die Roll

E[X] = 1·(1/6) + 2·(1/6) + 3·(1/6) + 4·(1/6) + 5·(1/6) + 6·(1/6) = (1+2+3+4+5+6)/6 = 21/6 = 3.5

You can never roll a 3.5, but it's the average outcome over many rolls.

Linearity of expectation: E[aX + b] = aE[X] + b, and E[X + Y] = E[X] + E[Y] even if X and Y are dependent! This makes expected value calculations very tractable.

Variance and Standard Deviation

The variance Var(X) = σ² measures the spread of X around its mean:

Var(X) = E[(X - μ)²]

Equivalently (easier to compute): Var(X) = E[X²] - (E[X])²

Standard deviation: σ = √Var(X) (same units as X)

Variance of a Die Roll

We know E[X] = 3.5

E[X²] = 1²·(1/6) + 2²·(1/6) + ... + 6²·(1/6) = (1+4+9+16+25+36)/6 = 91/6 ≈ 15.17

Var(X) = 15.17 - (3.5)² = 15.17 - 12.25 = 2.92

σ = √2.92 ≈ 1.71

Interpretation: A die roll is typically about 1.71 away from the mean (3.5). This quantifies the spread we intuitively see in die rolls.

Why Random Variables Matter

Random variables are the foundation of all statistical modeling:

1. Data as realizations of random variables When you measure heights, incomes, test scores — you're observing values of random variables.

2. Models specify distributions "Heights are normally distributed" means heights follow a specific random variable distribution.

3. Statistical inference We estimate E[X] and Var(X) from samples to understand the population.

4. Predictions are random variables "What will tomorrow's temperature be?" defines a random variable with a distribution.

Without random variables, we'd be stuck describing individual outcomes. With them, we can reason about patterns, make predictions, and quantify uncertainty.

Test your knowledge

🧠 Knowledge Check

1 / 5

Random Variables

From Outcomes to Numbers

The Formal Definition

Discrete vs Continuous Random Variables

Probability Mass Function (PMF)

Probability Density Function (PDF)

Cumulative Distribution Function (CDF)

Expected Value (Mean)

Variance and Standard Deviation

Why Random Variables Matter

Test your knowledge

What is a random variable?

Interactive Playground

📈 Interactive: Distribution Visualizer

🎯 Interactive: Expected Value Calculator

🔔 Interactive: Central Limit Theorem Demo

From Outcomes to NumbersFocusStart Focus Mode

The Formal DefinitionFocusStart Focus Mode

Discrete vs Continuous Random VariablesFocusStart Focus Mode

Probability Mass Function (PMF)FocusStart Focus Mode

Probability Density Function (PDF)FocusStart Focus Mode

Cumulative Distribution Function (CDF)FocusStart Focus Mode

Expected Value (Mean)FocusStart Focus Mode

Variance and Standard DeviationFocusStart Focus Mode

Why Random Variables MatterFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

What is a random variable?

Interactive Playground

📈 Interactive: Distribution Visualizer

🎯 Interactive: Expected Value Calculator

🔔 Interactive: Central Limit Theorem Demo

From Outcomes to Numbers

The Formal Definition

Discrete vs Continuous Random Variables

Probability Mass Function (PMF)

Probability Density Function (PDF)

Cumulative Distribution Function (CDF)

Expected Value (Mean)

Variance and Standard Deviation

Why Random Variables Matter

Test your knowledge