Kekkei - Advancing Financial Science For Everyone

Why Central Tendency Isn't Enough

Consider two classes that both scored an average of 75 on an exam:

Class A: 74, 75, 75, 76, 75 — everyone scored about the same Class B: 30, 60, 90, 95, 100 — wildly different scores

Same mean. Completely different stories. The mean alone hides a crucial dimension: how spread out the data is. This is what measures of dispersion capture.

Range: The Simplest Measure

The difference between the largest and smallest values.

\text{Range} = x_{\max} - x_{\min}

Class A range: 76 - 74 = 2 Class B range: 100 - 30 = 70

Simple and intuitive, but terrible as a sole measure because:

It only uses two data points (the extremes)
A single outlier can massively inflate it
It tells you nothing about how data is distributed between the extremes

Variance: The Foundation

Variance answers: "On average, how far are the data points from the mean?"

But there's a subtlety. If we just average the raw differences from the mean, positives and negatives would cancel out (points above and below the mean would sum to zero). So we square the differences first.

Used when you have data for the entire population:

\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2

Used when you have a sample from a larger population. Note the n-1 in the denominator:

s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

Computing Variance Step by Step

Data: 4, 8, 6, 5, 7 (sample)

Step 1: Find the mean: (4+8+6+5+7)/5 = 30/5 = 6

Step 2: Find each deviation from the mean:

4 - 6 = -2
8 - 6 = +2
6 - 6 = 0
5 - 6 = -1
7 - 6 = +1

Step 3: Square each deviation: 4, 4, 0, 1, 1

Step 4: Sum the squared deviations: 4+4+0+1+1 = 10

Step 5: Divide by n-1 (sample): 10/4 = 2.5

Sample variance s² = 2.5

Why n-1 instead of n? This is called Bessel's correction. A sample tends to underestimate the population's spread (sample points cluster closer to their own mean than to the true population mean). Dividing by n-1 corrects this bias. It's one of those beautiful mathematical details where a small tweak makes a big difference.

Why Variance Squares the Differences

Students often ask: "Why not just use absolute differences instead of squaring?" Great question. There are several reasons:

Mathematical convenience — Squared functions are differentiable everywhere; absolute values create kinks that are harder to work with in calculus.
Penalizes large deviations more — A point that's 10 units away contributes 100 to variance, not just 10. This makes variance more sensitive to outliers, which is sometimes desirable.
Connects to geometry — Variance relates to the Pythagorean theorem and Euclidean distance in higher dimensions.
Decomposes nicely — Total variance can be broken into components (this becomes crucial in ANOVA and regression).

That said, the alternative (mean absolute deviation) does exist and is sometimes used. It's just less common because variance plays nicer with the rest of statistics.

Standard Deviation: Variance in Human-Readable Units

Variance has a problem: its units are squared. If your data is in meters, variance is in meters². That's hard to interpret.

Standard deviation fixes this by taking the square root of variance:

\sigma = \sqrt{\sigma^2} \qquad \text{(population)} \qquad s = \sqrt{s^2} \qquad \text{(sample)}

From our example: s = √2.5 ≈ 1.58

This means, roughly speaking, data points are about 1.58 units away from the mean on average. Same units as the original data — much more interpretable!

Standard deviation is the most commonly reported measure of spread. When someone says "the average is 100 with a standard deviation of 15," you immediately know that most values fall between about 85 and 115.

Interquartile Range (IQR)

Just like the median is a robust alternative to the mean, the IQR is a robust alternative to standard deviation.

Q1 (25th percentile): 25% of data falls below this value
Q2 (50th percentile): The median
Q3 (75th percentile): 75% of data falls below this value
IQR = Q3 - Q1: The range of the middle 50% of the data

\text{IQR} = Q_3 - Q_1

The IQR ignores the extremes entirely, making it resistant to outliers. It's the foundation of box plots and the standard method for detecting outliers:

Outlier rule: Any value below Q1 - 1.5×IQR or above Q3 + 1.5×IQR is considered a potential outlier.

Coefficient of Variation: Comparing Apples to Oranges

How do you compare the spread of two datasets measured in different units? Enter the Coefficient of Variation (CV):

\text{CV} = \frac{s}{\bar{x}} \times 100\%

Comparing Variability Across Scales

Heights of adults: Mean = 170 cm, SD = 10 cm → CV = 5.9% Weights of adults: Mean = 70 kg, SD = 12 kg → CV = 17.1%

Even though you can't directly compare centimeters to kilograms, the CV tells you that weight is relatively more variable than height in this population.

CV only works for ratio-scale data (where zero means "none"). It's meaningless for temperature in Celsius (0°C doesn't mean "no temperature").

Choosing the Right Measure

Dispersion Measures Compared

Measure	Pros	Cons	Best For
Range	Dead simple	Ignores all but 2 values	Quick overview
Variance	Mathematical foundation	Squared units, hard to interpret	Calculations, formulas
Std Deviation	Same units as data	Sensitive to outliers	General-purpose reporting
IQR	Robust to outliers	Ignores 50% of data	Skewed data, box plots
CV	Unitless comparison	Only for ratio scales	Comparing across scales

Test your knowledge

🧠 Knowledge Check

1 / 3

Measures of Dispersion

Why Central Tendency Isn't Enough

Range: The Simplest Measure

Variance: The Foundation

Why Variance Squares the Differences

Standard Deviation: Variance in Human-Readable Units

Interquartile Range (IQR)

Coefficient of Variation: Comparing Apples to Oranges

Choosing the Right Measure

Test your knowledge

Why do we divide by n-1 instead of n when computing sample variance?

Why Central Tendency Isn't EnoughFocusStart Focus Mode

Range: The Simplest MeasureFocusStart Focus Mode

Variance: The FoundationFocusStart Focus Mode

Why Variance Squares the DifferencesFocusStart Focus Mode

Standard Deviation: Variance in Human-Readable UnitsFocusStart Focus Mode

Interquartile Range (IQR)FocusStart Focus Mode

Coefficient of Variation: Comparing Apples to OrangesFocusStart Focus Mode

Choosing the Right MeasureFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

Why do we divide by n-1 instead of n when computing sample variance?

Why Central Tendency Isn't Enough

Range: The Simplest Measure

Variance: The Foundation

Why Variance Squares the Differences

Standard Deviation: Variance in Human-Readable Units

Interquartile Range (IQR)

Coefficient of Variation: Comparing Apples to Oranges

Choosing the Right Measure

Test your knowledge