Discrete Random Variables
Master discrete random variables, PMFs, CDFs, and their properties with Python implementations.
23 min read
Intermediate
Introduction
Random variables transform outcomes into numbers, enabling us to apply mathematical analysis to probability. They're the bridge between abstract sample spaces and practical calculations.
Learning Objectives:
- Understand random variables as functions
- Work with probability mass functions (PMFs)
- Calculate probabilities using cumulative distribution functions (CDFs)
- Implement random variables in Python
Random Variables
A random variable is a function that maps each outcome in the sample space to a real number.
Notation: Uppercase letters (, ) for random variables, lowercase (, ) for specific values.
Types:
- Discrete: Takes countable values (integers, finite set)
- Continuous: Takes values in an interval
Example: Die Roll
Sample space:
Define = "number showing":
- ...
Now we can ask: , , , etc.
Probability Mass Function
For discrete random variable , the PMF is:
Properties:
- for all
- (sum over all possible values)
python
import numpy as np
import matplotlib.pyplot as plt
# Die roll: PMF
values = [1, 2, 3, 4, 5, 6]
pmf = [1/6] * 6
print("PMF for fair die:")
for x, p in zip(values, pmf):
print(f" P(X = {x}) = {p:.4f}")
print(f"\nSum of PMF: {sum(pmf):.4f} (must equal 1)")
# Verify with simulation
rolls = np.random.randint(1, 7, size=10000)
for x in values:
empirical_prob = np.mean(rolls == x)
theoretical_prob = 1/6
print(f"X={x}: empirical={empirical_prob:.4f}, theoretical={theoretical_prob:.4f}")Cumulative Distribution Function
The CDF gives the probability that is at most :
Properties:
- is non-decreasing
python
import numpy as np
# Die roll CDF
def die_cdf(x):
"""CDF for fair 6-sided die"""
if x < 1:
return 0
elif x >= 6:
return 1
else:
return int(np.floor(x)) / 6
# Compute CDF at various points
test_points = [0, 1.5, 3, 3.7, 6, 10]
print("CDF values:")
for x in test_points:
cdf_val = die_cdf(x)
print(f" F_X({x}) = P(X โค {x}) = {cdf_val:.4f}")
# Calculate interval probability
a, b = 2, 5
prob_interval = die_cdf(b) - die_cdf(a-1) # P(2 โค X โค 5)
print(f"\nP({a} โค X โค {b}) = F_X({b}) - F_X({a-1}) = {prob_interval:.4f}")Key Takeaways
- Random variable: Function from outcomes to numbers
- PMF: for discrete RVs
- CDF: , works for any RV
- PMF sums to 1:
- CDF properties: Non-decreasing, limits to 0 and 1
Next Lesson: Common discrete distributions (Bernoulli, Binomial, Poisson)!
Concept | Discrete Formula | Example |
|---|---|---|
| PMF | $p_X(x) = P(X=x)$ | Die: $p_X(3) = 1/6$ |
| CDF | $F_X(x) = \sum_{t \leq x} p_X(t)$ | Die: $F_X(3) = 1/2$ |
| Interval | $P(a < X \leq b) = F_X(b) - F_X(a)$ | $P(2 < X \leq 5) = F_X(5) - F_X(2)$ |