Kekkei - Advancing Financial Science For Everyone

Do Two Variables Move Together?

So far we've analyzed one variable at a time. But the most interesting questions involve relationships: Does more education lead to higher income? Does exercise lower blood pressure? Does advertising increase sales?

Covariance and correlation quantify how two variables move together.

Covariance

A measure of how two variables change together. Positive covariance means they tend to increase together; negative means one increases as the other decreases.

\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})

Intuition: For each data point, multiply how far X is from its mean by how far Y is from its mean. If both tend to be above (or both below) their means simultaneously, the product is positive → positive covariance. If one is above while the other is below, the product is negative → negative covariance.

Problem with covariance: Its magnitude depends on the units of X and Y. Cov(height in cm, weight in kg) gives a completely different number than Cov(height in inches, weight in pounds). You can't tell if the relationship is "strong" or "weak" from covariance alone.

Pearson Correlation Coefficient

The solution: standardize the covariance by dividing by the product of the standard deviations.

r = \frac{\text{Cov}(X, Y)}{s_X \cdot s_Y} = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}

-1 ≤ r ≤ 1 always
r = 1: Perfect positive linear relationship
r = -1: Perfect negative linear relationship
r = 0: No linear relationship (but there could be a non-linear one!)
r is unitless — changing units of X or Y doesn't change r

Interpreting Correlation Strength

r	Strength
0.00 – 0.19	Very weak
0.20 – 0.39	Weak
0.40 – 0.59	Moderate
0.60 – 0.79	Strong
0.80 – 1.00	Very strong

These labels are rough guidelines, not rules. In some fields (physics), r = 0.7 might be disappointing. In others (psychology), r = 0.3 might be impressive.

Spearman Rank Correlation

Measures the strength of a monotonic relationship (not just linear). It works by ranking the data first, then computing Pearson's r on the ranks.

When to use Spearman instead of Pearson:

Data contains outliers (ranks are robust to outliers)
Relationship is monotonic but not linear (e.g., exponential)
Data is ordinal (rankings, ratings)
Distribution is heavily skewed

Example: Income and happiness might have a positive but diminishing relationship — doubling income from $20k to$ 40k matters more than $200k to$ 400k. Spearman captures this better than Pearson.

Correlation Pitfalls

1. Correlation ≠ Causation (again!) r = 0.95 between ice cream sales and drownings doesn't mean ice cream causes drowning. Confounders lurk everywhere.

2. r measures LINEAR relationships only A perfect U-shaped relationship gives r ≈ 0. Always plot your data! If the scatter plot shows a curve, Pearson's r will miss it.

3. Outliers can dominate A single extreme point can create an apparent correlation where none exists, or destroy a real one.

4. Restriction of range If you only measure the relationship among a narrow group (e.g., SAT scores vs college GPA among Harvard students), the correlation will be artificially low because the range of SAT scores is restricted.

5. Ecological fallacy Correlation at the group level doesn't imply correlation at the individual level. Countries with more chocolate consumption win more Nobel Prizes, but individual chocolate eaters aren't smarter.

Always, always, always make a scatter plot before computing correlation. The number alone can be deeply misleading. Anscombe's Quartet proves this — four datasets with identical r ≈ 0.82 that look completely different when plotted.

Test your knowledge

🧠 Knowledge Check

1 / 3

Covariance & Correlation

Do Two Variables Move Together?

Covariance

Pearson Correlation Coefficient

Spearman Rank Correlation

Correlation Pitfalls

Test your knowledge

Pearson's r = 0 means:

Do Two Variables Move Together?FocusStart Focus Mode

CovarianceFocusStart Focus Mode

Pearson Correlation CoefficientFocusStart Focus Mode

Spearman Rank CorrelationFocusStart Focus Mode

Correlation PitfallsFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

Pearson's r = 0 means:

Do Two Variables Move Together?

Covariance

Pearson Correlation Coefficient

Spearman Rank Correlation

Correlation Pitfalls

Test your knowledge