The Pigeonhole Principle

Master the pigeonhole principle for existence proofs, worst-case guarantees, and surprising mathematical results.

20 min read
Intermediate

Introduction

The Pigeonhole Principle is deceptively simple: if you have more pigeons than pigeonholes, at least one hole must contain multiple pigeons. Yet this principle leads to profound existence proofs in mathematics, computer science, and beyond.

Learning Objectives:

  • Master the basic and generalized Pigeonhole Principle
  • Apply it to existence proofs and impossibility results
  • Understand the Birthday Paradox through pigeonhole reasoning
  • Solve real-world problems using pigeonhole arguments

The Basic Pigeonhole Principle

If nn objects are placed into kk containers (where n>kn > k), then at least one container must contain more than one object.

Mathematical statement: If n>kn > k, then max⁑(objectsΒ inΒ aΒ container)β‰₯⌈n/kβŒ‰\max(\text{objects in a container}) \geq \lceil n/k \rceil.

Proof: By contradiction. If every container had at most 1 object, we'd have at most kk objects total, contradicting n>kn > k.

Example 1: Socks in a Drawer

You have 10 pairs of socks (20 socks total) in 5 different colors (4 socks of each color). In a dark room, how many socks must you grab to guarantee a matching pair?

Solution:

  • Pigeons: socks grabbed
  • Holes: colors (5)
  • To guarantee a match, we need at least one color to have 2 socks

If we grab 6 socks, by the Pigeonhole Principle, at least one color must have ⌈6/5βŒ‰=2\lceil 6/5 \rceil = 2 socks.

Answer: 6 socks guarantees a matching pair.

python
import random
from collections import Counter

def simulate_sock_draw(n_colors=5, socks_per_color=4, draws=1000):
    """Simulate drawing socks until a pair is found"""
    results = []

    for _ in range(draws):
        # Create drawer with socks
        drawer = []
        for color in range(n_colors):
            drawer.extend([color] * socks_per_color)

        # Draw socks one by one until a pair
        random.shuffle(drawer)
        drawn = []
        for i, sock in enumerate(drawer, 1):
            drawn.append(sock)
            if Counter(drawn).most_common(1)[0][1] >= 2:
                results.append(i)
                break

    return results

# Run simulation
results = simulate_sock_draw()

print(f"Simulating sock drawing ({len(results)} trials)")
print(f"\nSocks drawn until first pair:")
print(f"  Minimum: {min(results)}")
print(f"  Maximum: {max(results)}")
print(f"  Average: {sum(results)/len(results):.2f}")

# Count distribution
counter = Counter(results)
print(f"\nDistribution:")
for socks, count in sorted(counter.items()):
    pct = count / len(results) * 100
    print(f"  {socks} socks: {count:3d} times ({pct:5.1f}%)")

print(f"\nPigeonhole guarantee: 6 socks (5 colors + 1)")
print(f"Trials reaching 6: {sum(1 for r in results if r == 6):3d} ({sum(1 for r in results if r == 6)/len(results)*100:.1f}%)")

Key Insight: The Pigeonhole Principle gives a worst-case guarantee. You might get lucky (matching pair in 2 draws), but you're guaranteed a match in 6. This is crucial for algorithm analysis and worst-case bounds.

The Generalized Pigeonhole Principle

If nn objects are placed into kk containers, then at least one container contains at least ⌈n/kβŒ‰\lceil n/k \rceil objects.

Proof: Suppose each container has at most ⌈n/kβŒ‰βˆ’1\lceil n/k \rceil - 1 objects. Then total objects would be at most: k(⌈n/kβŒ‰βˆ’1)<kβ‹…nk=nk(\lceil n/k \rceil - 1) < k \cdot \frac{n}{k} = n

This contradicts having nn objects.

Example 2: Class Birthdays

In a class of 400 students, prove that at least two students share the same birthday (ignoring leap years).

Solution:

  • Pigeons: 400 students
  • Holes: 365 possible birthdays
  • By generalized pigeonhole: ⌈400/365βŒ‰=⌈1.096βŒ‰=2\lceil 400/365 \rceil = \lceil 1.096 \rceil = 2

At least one day has β‰₯2\geq 2 students born on it.

python
import math

def min_objects_for_guarantee(n_containers, min_per_container):
    """Calculate minimum objects to guarantee min_per_container in at least one container"""
    # We need n > (min_per_container - 1) * n_containers
    return (min_per_container - 1) * n_containers + 1

def max_objects_per_container(n_objects, n_containers):
    """Calculate guaranteed maximum objects in fullest container"""
    return math.ceil(n_objects / n_containers)

# Birthday problem
students = 400
days = 365

guarantee = max_objects_per_container(students, days)
print(f"With {students} students and {days} possible birthdays:")
print(f"  At least one day has β‰₯ {guarantee} students")
print(f"  Calculation: ⌈{students}/{days}βŒ‰ = ⌈{students/days:.3f}βŒ‰ = {guarantee}")

# Other examples
print(f"\nMore examples:")
examples = [
    (100, 12, "100 people, 12 months"),
    (53, 52, "53 cards, 52 types"),
    (27, 26, "27 students, 26 letters"),
    (1000, 365, "1000 people, 365 days")
]

for objects, containers, desc in examples:
    result = max_objects_per_container(objects, containers)
    print(f"  {desc}: at least {result} in one container")

Birthday Paradox Preview

The Birthday Paradox asks a different question: In a room of nn people, what's the probability that at least two share a birthday?

Surprising result: With just 23 people, the probability exceeds 50%!

This is not the same as the pigeonhole principle (which guarantees a match at 366 people), but pigeonhole reasoning helps us understand why collisions happen so quickly.

python
import math

def birthday_probability(n_people, n_days=365):
    """Calculate probability that at least 2 people share a birthday"""
    if n_people > n_days:
        return 1.0  # Pigeonhole guarantee

    # Probability all different = (365/365) * (364/365) * ... * ((365-n+1)/365)
    prob_all_different = 1.0
    for i in range(n_people):
        prob_all_different *= (n_days - i) / n_days

    # Probability at least one match = 1 - P(all different)
    return 1 - prob_all_different

# Find when probability exceeds 50%
print("Birthday Paradox: Probability of shared birthday\n")
print(f"{'People':>7} {'Probability':>12} {'Visual':>20}")
print("-" * 45)

for n in [5, 10, 20, 23, 30, 40, 50, 60, 100, 366]:
    prob = birthday_probability(n)
    bar = "β–ˆ" * int(prob * 20)
    print(f"{n:>7} {prob:>11.1%} {bar:<20}")

print(f"\nKey insight: Only 23 people needed for >50% chance!")
print(f"But pigeonhole guarantees match at 366 people (worst case)")

Common Confusion: The pigeonhole principle gives guarantees (worst case), while probability gives expected behavior (average case). With 23 people, you're likely to see a match, but it's not guaranteed. At 366, it's guaranteed.

Applications and Existence Proofs

Example 3: Substring Existence

In any string of 11 binary digits (0s and 1s), prove that some substring of length 4 must repeat.

Solution:

  • Total 4-digit substrings: positions 1-4, 2-5, 3-6, ..., 8-11 = 8 substrings
  • Possible 4-digit patterns: 24=162^4 = 16 patterns

Wait, we have fewer substrings (8) than patterns (16), so pigeonhole doesn't immediately apply!

Better approach: Consider a string of length 11. There are 211=20482^{11} = 2048 possible strings. But if all 8 substrings were distinct, we'd only use 8 of 16 possible patterns. However, the argument needs refinement.

Correct approach: Actually, in a string of length 2k+k2^k + k, some kk-substring must repeat. For k=4k=4: 24+4=202^4 + 4 = 20. So in a string of length 20, some 4-substring repeats.

Let me recalculate: In length 11, we have 8 substrings and 16 patterns, so no guarantee. Let's use a cleaner example.

Example 4: Sequence Monotonicity

In any sequence of n2+1n^2 + 1 distinct numbers, there exists either:

  • An increasing subsequence of length n+1n+1, OR
  • A decreasing subsequence of length n+1n+1

This is the ErdΕ‘s–Szekeres theorem, provable by pigeonhole reasoning!

python
def longest_increasing_subsequence(seq):
    """Find longest increasing subsequence length (dynamic programming)"""
    if not seq:
        return 0

    n = len(seq)
    dp = [1] * n

    for i in range(1, n):
        for j in range(i):
            if seq[j] < seq[i]:
                dp[i] = max(dp[i], dp[j] + 1)

    return max(dp)

def longest_decreasing_subsequence(seq):
    """Find longest decreasing subsequence length"""
    return longest_increasing_subsequence([-x for x in seq])

# Test ErdΕ‘s–Szekeres theorem
import random

n = 5  # We expect increasing or decreasing subsequence of length 6
seq_length = n**2 + 1  # 26 numbers

print(f"ErdΕ‘s–Szekeres theorem: n = {n}, sequence length = {seq_length}")
print(f"Guarantee: increasing OR decreasing subsequence of length β‰₯ {n+1}\n")

# Generate random sequence
random.seed(42)
seq = random.sample(range(1, 200), seq_length)

lis = longest_increasing_subsequence(seq)
lds = longest_decreasing_subsequence(seq)

print(f"Random sequence (first 15): {seq[:15]}...")
print(f"\nLongest increasing subsequence: {lis}")
print(f"Longest decreasing subsequence: {lds}")
print(f"\nMax of both: {max(lis, lds)} (guaranteed β‰₯ {n+1})")
print(f"Theorem satisfied: {max(lis, lds) >= n+1}")

Key Takeaways

  1. Basic Pigeonhole: nn objects in kk holes (n>kn > k) β†’ at least one hole has β‰₯2\geq 2 objects
  2. Generalized: At least one hole has β‰₯⌈n/kβŒ‰\geq \lceil n/k \rceil objects
  3. Existence proofs: Powerful for showing something must exist without finding it
  4. Worst-case reasoning: Provides guarantees, not probabilities
  5. Birthday paradox: Pigeonhole guarantees at 366, but probability high at 23

Next Lesson: Advanced counting techniques, symmetry arguments, and problem-solving strategies!

Problem
Pigeons
Holes
Guarantee
Socks6 socks5 colors2 matching
Birthdays400 people365 days2 same day
Cards53 cards52 ranks2 same rank
Hair count8 billion people200k hairs40k+ same count