Non-Parametric Tests

Learn rank-based tests that don't assume normality: Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, and Spearman correlation.

22 min read
Intermediate

When Traditional Tests Break Down

Most classical statistical tests (t-tests, ANOVA, regression) assume your data is normally distributed. But what if it isn't? What if your data is:

  • Heavily skewed (income, reaction times)
  • Has outliers (one billionaire in your sample)
  • Ordinal but not truly numerical ("rate your pain 1-10")
  • Small sample where you can't rely on CLT

Non-parametric tests don't assume a specific distribution. They work with ranks instead of raw values, making them robust to outliers and skewness.

The Core Idea: Ranks vs Values

Instead of using the actual data values, non-parametric tests use their ranks (positions when sorted).

Converting to Ranks

Data: [15, 23, 18, 50, 19]

Ranks: [1, 5, 2, 5, 3]

  • 15 is smallest → rank 1
  • 18 is 2nd smallest → rank 2
  • 19 → rank 3
  • 23 → rank 4
  • 50 is largest → rank 5

Why this helps: The outlier 50 becomes just "rank 5" — its extreme value doesn't dominate the analysis.

Key advantage: Ranks are insensitive to the magnitude of outliers. Whether the top value is 50 or 5,000, it's still just "rank 5."

Mann-Whitney U Test

Non-parametric alternative to the two-sample t-test. Tests whether two independent samples come from the same distribution.

Null hypothesis: The two populations have the same median (or distribution).

How it works:

  1. Pool both samples together and rank all values
  2. Sum the ranks for each group
  3. Test if one group's ranks are systematically higher
Comparing Treatments

Does a new painkiller reduce pain more than placebo?

Placebo group pain scores: [7, 5, 8, 6, 9] Treatment group: [3, 4, 2, 5, 4]

All values ranked together: 2, 3, 4, 4, 5, 5, 6, 7, 8, 9 Ranks: 1, 2, 3.5, 3.5, 5.5, 5.5, 7, 8, 9, 10

Treatment ranks: 1, 2, 3.5, 3.5, 5.5 → Sum = 15.5 Placebo ranks: 5.5, 7, 8, 9, 10 → Sum = 39.5

Treatment ranks are much lower (lower pain) — likely a real effect!

Use this when: Data is skewed, has outliers, or sample size is too small for t-test assumptions.

Wilcoxon Signed-Rank Test

Non-parametric alternative to the paired t-test. Tests whether paired observations differ systematically.

Null hypothesis: The median difference is zero.

How it works:

  1. Calculate differences for each pair
  2. Rank the absolute differences (ignoring sign)
  3. Assign signs back to ranks
  4. Test if positive and negative ranks balance out
Before-After Study

Weight before and after diet (kg):

| Person | Before | After | Difference | |Difference| | Rank | |--------|--------|-------|------------|--------------|------| | A | 85 | 82 | -3 | 3 | 2 | | B | 90 | 88 | -2 | 2 | 1 | | C | 78 | 74 | -4 | 4 | 3 | | D | 92 | 91 | -1 | 1 | (small, maybe ignore) |

Signed ranks: -2, -1, -3

All negative → consistent weight loss → likely effective!

Use this when: Paired data, but differences aren't normally distributed.

Kruskal-Wallis Test

Non-parametric alternative to one-way ANOVA. Compares 3+ independent groups.

Null hypothesis: All groups have the same median.

How it works:

  1. Rank all observations across all groups
  2. Compare average ranks between groups
  3. Test if differences are beyond chance
Comparing Multiple Treatments

Recovery time (days) for three treatments:

Treatment A: [12, 14, 13, 15] Treatment B: [10, 9, 11, 10] Treatment C: [14, 16, 15, 18]

Rank all 12 values, then compare average ranks for each treatment.

If Treatment B consistently has lower ranks (faster recovery), the test will detect it.

Use this when: Comparing 3+ groups with non-normal or skewed data.

Spearman Rank Correlation

Non-parametric alternative to Pearson correlation. Measures monotonic relationship (one variable consistently increases as the other increases).

How it works:

  1. Convert both variables to ranks
  2. Calculate Pearson correlation on the ranks
When Spearman > Pearson

Income vs education years:

Education might increase linearly, but income increases exponentially (non-linear).

Pearson correlation: Measures linear relationship → might miss non-linear pattern Spearman correlation: Measures monotonic relationship → captures consistent increase even if not linear

Spearman is robust to outliers and works for ordinal data ("rank your preference 1-5").

Advantages and Limitations

Non-Parametric vs Parametric
Aspect
Non-Parametric
Parametric
AssumptionsMinimal (no distribution)Normal distribution
RobustnessHandles outliers, skewSensitive to violations
PowerLower (if assumptions met)Higher (if assumptions met)
InterpretabilityMedian, ranksMean, specific parameters
Data typesOrdinal, non-normalInterval/ratio, normal

Use non-parametric tests when:

  • Data is severely non-normal (skewed, outliers)
  • Sample size is small (can't rely on CLT)
  • Data is ordinal (ranks, ratings)
  • Robustness is more important than power

Use parametric tests when:

  • Data is approximately normal (or large sample)
  • Maximum statistical power is needed
  • You want to estimate specific parameters (means, regression coefficients)

Common misconception: "Non-parametric tests are always safer." Reality: They're less powerful when parametric assumptions hold. If your data is normal, use t-tests — you'll detect real effects more reliably.

Practical Guidelines

1. Check assumptions first Use histograms, Q-Q plots, normality tests to see if parametric tests are appropriate.

2. Small samples → consider non-parametric With n < 15, violations of normality are hard to detect and can invalidate parametric tests.

3. Outliers → non-parametric One extreme value can dominate a t-test. Ranks neutralize this problem.

4. Ordinal data → must use non-parametric Likert scales ("1=strongly disagree to 5=strongly agree") aren't true numbers. Ranks preserve the ordering.

5. If in doubt, report both Run both tests. If they agree, you're on solid ground. If they disagree, the non-parametric result is more trustworthy.

Test your knowledge

🧠 Knowledge Check
1 / 5

What is the key advantage of using ranks instead of raw values?