Advanced Conditional Probability Problems

Master Monty Hall, Simpson's paradox, and other counterintuitive conditional probability problems.

24 min read
Intermediate

Introduction

Real-world probability problems often involve intricate conditional relationships. From the Monty Hall problem to medical diagnostics, mastering advanced conditioning is essential.

Learning Objectives:

  • Solve the Monty Hall problem
  • Apply Bayes to multi-stage scenarios
  • Handle paradoxes in conditional probability
  • Recognize and avoid common fallacies

The Monty Hall Problem

Setup

You're on a game show with three doors. Behind one is a car, behind the others are goats. You pick door 1. The host (who knows what's behind each door) opens door 3, revealing a goat. Should you switch to door 2?

Intuition says: No difference, both have probability 1/2.

Math says: Switch! Switching gives you 2/3 probability of winning.

Analysis

Initial probabilities:

  • P(car at 1)=1/3P(\text{car at 1}) = 1/3
  • P(car at 2)=1/3P(\text{car at 2}) = 1/3
  • P(car at 3)=1/3P(\text{car at 3}) = 1/3

Key insight: The host's action is not random - he must open a goat door different from your choice.

If car is at door 1: Host can open door 2 or 3 (both goats) If car is at door 2: Host must open door 3 (only goat) If car is at door 3: Host must open door 2 (only goat)

Since host opened door 3:

  • If car was at 1 (prob 1/3): You stay and win
  • If car was at 2 (prob 1/3): Host opens 3, you switch and win
  • If car was at 3: Impossible (host wouldn't/couldn't open it)

Adjusting: P(car at 2host opens 3)=2/3P(\text{car at 2} | \text{host opens 3}) = 2/3

Strategy: Always switch!

python
import random

def monty_hall_simulation(n_trials=10000, strategy='switch'):
    """Simulate Monty Hall problem"""
    wins = 0
    
    for _ in range(n_trials):
        # Setup
        car_door = random.randint(1, 3)
        player_choice = 1  # Always pick door 1 initially
        
        # Host opens a door
        available_doors = [d for d in [1, 2, 3] 
                          if d != player_choice and d != car_door]
        host_opens = random.choice(available_doors) if available_doors else None
        
        if host_opens is None:
            continue
        
        # Player's final choice
        if strategy == 'switch':
            remaining_doors = [d for d in [1, 2, 3] 
                             if d != player_choice and d != host_opens]
            final_choice = remaining_doors[0] if remaining_doors else player_choice
        else:  # stay
            final_choice = player_choice
        
        if final_choice == car_door:
            wins += 1
    
    return wins / n_trials

# Test both strategies
stay_win_rate = monty_hall_simulation(10000, 'stay')
switch_win_rate = monty_hall_simulation(10000, 'switch')

print(f"Win rate (stay): {stay_win_rate:.4f} (theoretical: 0.3333)")
print(f"Win rate (switch): {switch_win_rate:.4f} (theoretical: 0.6667)")
print(f"\nSwitching is {switch_win_rate / stay_win_rate:.2f}x better!")

Why our intuition fails: We intuitively think the host's action is "revealing information equally" about doors 1 and 2. But the host's choice is constrained by your initial pick, creating asymmetry.

Simpson's Paradox

The Paradox

A trend can reverse when data is aggregated!

Example: Two treatments for kidney stones

Small stones:

  • Treatment A: 81/87 = 93% success
  • Treatment B: 234/270 = 87% success
  • Winner: A

Large stones:

  • Treatment A: 192/263 = 73% success
  • Treatment B: 55/80 = 69% success
  • Winner: A

Combined (all stones):

  • Treatment A: 273/350 = 78% success
  • Treatment B: 289/350 = 83% success
  • Winner: B (!!)

Explanation

Treatment A was used more often on difficult cases (large stones), dragging down its overall success rate despite being better in each category.

Lesson: Always check for confounding variables before aggregating data!

Key Takeaways

  1. Monty Hall: Always switch (2/3 vs 1/3)
  2. Conditioning matters: Host's constrained choice creates asymmetry
  3. Simpson's Paradox: Trends can reverse when aggregated
  4. Confounders: Hidden variables affect conditional probabilities
  5. Simulation: When intuition fails, simulate to verify

Next Module: Random Variables and their distributions!