Birthday Attack
Birthday Paradox
Problem Statement
What is the probability that at least two people in a group share the same birthday?
Assumptions
- 365 days in a year (ignore leap years)
- Birthdays are uniformly distributed
- Each person’s birthday is independent
Key Insight
It is easier to compute the complement:
Probability(at least one shared birthday) = 1 - Probability(no shared birthdays)
Probability Derivation
For n people:
- Person 1: 365/365
- Person 2: 364/365
- Person 3: 363/365
- …
- Person n: (365 - n + 1) / 365
So:
\[ P(no\ match) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{365-n+1}{365} \]
\[ P(match) = 1 - P(no\ match) \]
Key Results
| People | Probability of Match |
|---|---|
| 10 | 11.7% |
| 20 | 41.1% |
| 23 | 50.7% |
| 30 | 70.6% |
| 50 | 97.0% |
| 70 | 99.9% |
Why It’s Counterintuitive
People assume: “There are 365 days, so we need ~180 people.”
But actually:
- We compare pairs, not individuals vs a fixed date
- Number of pairs:
\[ \frac{n(n-1)}{2} \]
Example:
- n = 23 → 253 comparisons
Intuition
- Probability grows nonlinearly
- Around 23 people → ~50%
- Around 50 people → almost certain
Real-World Notes
- Birthdays are not perfectly uniform
- Seasonal effects exist
- Twins increase clustering slightly
Despite this, the approximation is very accurate.
Hash Collision Analysis (Birthday Paradox Applied)
Core Idea
A hash function maps a large input space into a finite output space.
This is mathematically identical to:
- People → inputs
- Birthdays → hash outputs
So:
- A “collision” = two inputs producing the same hash
Birthday Bound
For a hash with N possible outputs:
\[ N = 2^k \]
The number of random inputs needed for ~50% collision probability:
\[ n \approx 1.2 \cdot \sqrt{N} = 1.2 \cdot 2^{k/2} \]
Key Insight
- You do NOT need 2^k attempts to find a collision
- You only need ~2^(k/2)
This is the birthday attack
Examples
| Hash Size | Possible Outputs | ~50% Collision at |
|---|---|---|
| 32-bit | 2^32 | ~2^16 ≈ 65K |
| 64-bit | 2^64 | ~2^32 ≈ 4B |
| 128-bit | 2^128 | ~2^64 |
| 256-bit | 2^256 | ~2^128 |
Security Implications
Weak Hashes
- 32-bit or 64-bit hashes are trivially collision-prone
- Unsafe for identifiers or security use
Cryptographic Hashes
Designed to resist:
- Collision attacks
- Preimage attacks
Examples:
- MD5 → broken (collisions practical)
- SHA-1 → broken (collision attacks demonstrated)
- SHA-256 → currently secure
Attack Types
Birthday Attack
Goal:
- Find ANY two inputs with same hash
Complexity: \[ O(2^{k/2}) \]
Preimage Attack
Goal:
- Given hash H, find input that produces it
Complexity: \[ O(2^k) \]
Second Preimage Attack
Goal:
- Given input A, find different input B with same hash
Complexity: \[ O(2^k) \]
Practical Security Takeaways
- Doubling hash size squares attack difficulty
- 128-bit security requires 256-bit hashes (collision resistance)
- Never rely on:
- Short hashes
- Truncated hashes without analysis
Real-World Applications
Where Collisions Matter
- File integrity verification
- Digital signatures
- Certificate systems (PKI)
- Blockchain
Where Collisions Can Break Systems
- Deduplication systems
- Cache keys
- Session IDs
- Object storage identifiers
Example Intuition
If you generate random 64-bit IDs:
- After ~4 billion IDs
- You have a ~50% chance of at least one collision
This surprises many engineers and leads to:
- Silent data corruption risks
- ID conflicts at scale
Optional: Approximation Formula
\[ P(\text{collision}) \approx 1 - e^{-\frac{n(n-1)}{2N}} \]
Where:
- n = number of inputs
- N = total hash space