Hypothesis Testing Framework
Learn the logic of hypothesis testing: null and alternative hypotheses, one-tailed vs two-tailed tests, significance levels, and the testing procedure.
The Logic of Hypothesis Testing
Hypothesis testing is statistics' framework for making decisions from data. It follows the same logic as a criminal trial:
- Start by assuming innocence (nothing is happening)
- Look at the evidence (data)
- Decide if the evidence is strong enough to reject innocence (conclude something is happening)
We never "prove" anything. We either find sufficient evidence to reject the default assumption, or we don't.
Null and Alternative Hypotheses
The default assumption — typically "nothing is happening," "no difference," or "no effect." The claim we try to disprove.
Examples:
- The drug has no effect (μ_drug = μ_placebo)
- The coin is fair (p = 0.5)
- There's no difference between groups (μ₁ = μ₂)
What we're trying to find evidence for — typically "something is happening." It's the research hypothesis.
Examples:
- The drug works (μ_drug ≠ μ_placebo)
- The coin is biased (p ≠ 0.5)
- The groups differ (μ₁ ≠ μ₂)
The burden of proof is on H₁. We don't try to prove H₀ — we assume it's true and see if the data contradicts it strongly enough. This is like "innocent until proven guilty."
One-Tailed vs Two-Tailed Tests
The alternative hypothesis determines whether the test is one-tailed or two-tailed:
Type | H₁ | When to Use |
|---|---|---|
| Two-tailed | μ ≠ μ₀ | You care about any difference (higher or lower) |
| Right-tailed | μ > μ₀ | You only care if the value is higher |
| Left-tailed | μ < μ₀ | You only care if the value is lower |
"Does the new drug affect blood pressure?" → Two-tailed (could raise or lower it)
"Does the new drug lower blood pressure?" → Left-tailed (we only care about a decrease)
"Do students score higher with the new teaching method?" → Right-tailed
Important: You must choose one-tailed vs two-tailed BEFORE looking at the data. Choosing after seeing which direction the data went is a form of cheating (p-hacking).
The Testing Procedure
Step 1: State H₀ and H₁
Step 2: Choose a significance level α (typically 0.05)
Step 3: Collect data and compute a test statistic
Step 4: Find the p-value or compare to the critical value
Step 5: Make a decision:
- If p-value ≤ α → Reject H₀ (evidence supports H₁)
- If p-value > α → Fail to reject H₀ (insufficient evidence)
We never "accept H₀." We only "fail to reject" it. Absence of evidence is not evidence of absence. Not finding evidence that a drug works doesn't mean it doesn't work — maybe our sample was too small or our test wasn't sensitive enough.
Significance Level (α)
The probability threshold for rejecting H₀. It's the maximum P(reject H₀ | H₀ is true) you're willing to tolerate.
Common choices:
- α = 0.05 (5%) — Standard in most fields
- α = 0.01 (1%) — More conservative (medicine, physics)
- α = 0.10 (10%) — More liberal (exploratory research)
Setting α = 0.05 means you're willing to accept a 5% chance of falsely concluding something is happening when it isn't. In other words, 1 in 20 times you'll cry wolf.
Why not set α = 0.001 to be super safe? Because the safer you make it against false positives, the more likely you are to miss real effects (false negatives). It's a tradeoff.
A Complete Example
You flip a coin 100 times and get 60 heads. Is the coin fair?
Step 1: H₀: p = 0.5 (fair coin), H₁: p ≠ 0.5 (biased coin) — two-tailed
Step 2: α = 0.05
Step 3: Under H₀, the number of heads ~ Binomial(100, 0.5) Expected: 50 heads, Standard error: √(100 × 0.5 × 0.5) = 5 Z = (60 - 50) / 5 = 2.0
Step 4: P(|Z| > 2.0) = 2 × P(Z > 2.0) = 2 × 0.0228 = 0.0456
Step 5: p-value (0.046) < α (0.05) → Reject H₀
At the 5% significance level, there's sufficient evidence to conclude the coin is biased.
Note: At α = 0.01, we would NOT reject (0.046 > 0.01). The conclusion depends on our chosen threshold!
The Courtroom Analogy
| Court System | Hypothesis Testing | |---|---| | Innocent until proven guilty | H₀ assumed true until rejected | | Evidence presented | Data collected | | Beyond reasonable doubt | p-value < α | | Guilty verdict | Reject H₀ | | Not guilty verdict | Fail to reject H₀ | | Not guilty ≠ innocent | Fail to reject ≠ H₀ is true | | Wrongful conviction | Type I error | | Guilty person goes free | Type II error |
This analogy breaks down in one way: courts use subjective judgment, while hypothesis tests use a precise numerical threshold. But the logical structure is identical.