Non-Parametric Tests
Learn rank-based tests that don't assume normality: Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, and Spearman correlation.
When Traditional Tests Break Down
Most classical statistical tests (t-tests, ANOVA, regression) assume your data is normally distributed. But what if it isn't? What if your data is:
- Heavily skewed (income, reaction times)
- Has outliers (one billionaire in your sample)
- Ordinal but not truly numerical ("rate your pain 1-10")
- Small sample where you can't rely on CLT
Non-parametric tests don't assume a specific distribution. They work with ranks instead of raw values, making them robust to outliers and skewness.
The Core Idea: Ranks vs Values
Instead of using the actual data values, non-parametric tests use their ranks (positions when sorted).
Data: [15, 23, 18, 50, 19]
Ranks: [1, 5, 2, 5, 3]
- 15 is smallest → rank 1
- 18 is 2nd smallest → rank 2
- 19 → rank 3
- 23 → rank 4
- 50 is largest → rank 5
Why this helps: The outlier 50 becomes just "rank 5" — its extreme value doesn't dominate the analysis.
Key advantage: Ranks are insensitive to the magnitude of outliers. Whether the top value is 50 or 5,000, it's still just "rank 5."
Mann-Whitney U Test
Non-parametric alternative to the two-sample t-test. Tests whether two independent samples come from the same distribution.
Null hypothesis: The two populations have the same median (or distribution).
How it works:
- Pool both samples together and rank all values
- Sum the ranks for each group
- Test if one group's ranks are systematically higher
Does a new painkiller reduce pain more than placebo?
Placebo group pain scores: [7, 5, 8, 6, 9] Treatment group: [3, 4, 2, 5, 4]
All values ranked together: 2, 3, 4, 4, 5, 5, 6, 7, 8, 9 Ranks: 1, 2, 3.5, 3.5, 5.5, 5.5, 7, 8, 9, 10
Treatment ranks: 1, 2, 3.5, 3.5, 5.5 → Sum = 15.5 Placebo ranks: 5.5, 7, 8, 9, 10 → Sum = 39.5
Treatment ranks are much lower (lower pain) — likely a real effect!
Use this when: Data is skewed, has outliers, or sample size is too small for t-test assumptions.
Wilcoxon Signed-Rank Test
Non-parametric alternative to the paired t-test. Tests whether paired observations differ systematically.
Null hypothesis: The median difference is zero.
How it works:
- Calculate differences for each pair
- Rank the absolute differences (ignoring sign)
- Assign signs back to ranks
- Test if positive and negative ranks balance out
Weight before and after diet (kg):
| Person | Before | After | Difference | |Difference| | Rank | |--------|--------|-------|------------|--------------|------| | A | 85 | 82 | -3 | 3 | 2 | | B | 90 | 88 | -2 | 2 | 1 | | C | 78 | 74 | -4 | 4 | 3 | | D | 92 | 91 | -1 | 1 | (small, maybe ignore) |
Signed ranks: -2, -1, -3
All negative → consistent weight loss → likely effective!
Use this when: Paired data, but differences aren't normally distributed.
Kruskal-Wallis Test
Non-parametric alternative to one-way ANOVA. Compares 3+ independent groups.
Null hypothesis: All groups have the same median.
How it works:
- Rank all observations across all groups
- Compare average ranks between groups
- Test if differences are beyond chance
Recovery time (days) for three treatments:
Treatment A: [12, 14, 13, 15] Treatment B: [10, 9, 11, 10] Treatment C: [14, 16, 15, 18]
Rank all 12 values, then compare average ranks for each treatment.
If Treatment B consistently has lower ranks (faster recovery), the test will detect it.
Use this when: Comparing 3+ groups with non-normal or skewed data.
Spearman Rank Correlation
Non-parametric alternative to Pearson correlation. Measures monotonic relationship (one variable consistently increases as the other increases).
How it works:
- Convert both variables to ranks
- Calculate Pearson correlation on the ranks
Income vs education years:
Education might increase linearly, but income increases exponentially (non-linear).
Pearson correlation: Measures linear relationship → might miss non-linear pattern Spearman correlation: Measures monotonic relationship → captures consistent increase even if not linear
Spearman is robust to outliers and works for ordinal data ("rank your preference 1-5").
Advantages and Limitations
Aspect | Non-Parametric | Parametric |
|---|---|---|
| Assumptions | Minimal (no distribution) | Normal distribution |
| Robustness | Handles outliers, skew | Sensitive to violations |
| Power | Lower (if assumptions met) | Higher (if assumptions met) |
| Interpretability | Median, ranks | Mean, specific parameters |
| Data types | Ordinal, non-normal | Interval/ratio, normal |
Use non-parametric tests when:
- Data is severely non-normal (skewed, outliers)
- Sample size is small (can't rely on CLT)
- Data is ordinal (ranks, ratings)
- Robustness is more important than power
Use parametric tests when:
- Data is approximately normal (or large sample)
- Maximum statistical power is needed
- You want to estimate specific parameters (means, regression coefficients)
Common misconception: "Non-parametric tests are always safer." Reality: They're less powerful when parametric assumptions hold. If your data is normal, use t-tests — you'll detect real effects more reliably.
Practical Guidelines
1. Check assumptions first Use histograms, Q-Q plots, normality tests to see if parametric tests are appropriate.
2. Small samples → consider non-parametric With n < 15, violations of normality are hard to detect and can invalidate parametric tests.
3. Outliers → non-parametric One extreme value can dominate a t-test. Ranks neutralize this problem.
4. Ordinal data → must use non-parametric Likert scales ("1=strongly disagree to 5=strongly agree") aren't true numbers. Ranks preserve the ordering.
5. If in doubt, report both Run both tests. If they agree, you're on solid ground. If they disagree, the non-parametric result is more trustworthy.