Drop a rubber ball from shoulder height. It bounces back, but not as high. Each bounce is lower than the last—vigorous at first, then gradually settling, until it barely leaves the ground before finally resting.
That’s exponential backoff: when something fails, retry quickly at first, then with increasing delays, eventually accepting defeat. Nature’s way of finding equilibrium.
The Problem
Infinite Energy Ball
Magic ball that bounces to the same height forever:
- Try connecting: Failed
- Retry immediately: Failed
- Retry immediately: Failed
- Server melts from requests
Perpetual motion destroys servers.
No Bounce Ball
Opposite extreme—ball that doesn’t bounce at all:
- Try once: Failed
- Give up forever
- Potentially temporary issue becomes permanent failure
Neither extreme serves us well.
Natural Bounce Pattern
Dropped from 6 feet:
- First bounce: 4 feet
- Second bounce: 2.5 feet
- Third bounce: 1.5 feet
- Fourth bounce: 0.9 feet
- Gradually settling
Each bounce loses energy. Intervals increase. Eventually rest.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Failed API call:
- First retry: After 1 second
- Second retry: After 2 seconds
- Third retry: After 4 seconds
- Fourth retry: After 8 seconds
- Give up: After 32 seconds total
Natural decay prevents overwhelming failed systems.
Strategies
Exponential Backoff
Retry after: 1s, 2s, 4s, 8s, 16s
Most common approach. Steep enough to avoid hammering, gentle enough to catch recovery.
Linear Backoff
Retry after: 1s, 2s, 3s, 4s, 5s
Ball loses fixed height each bounce. Predictable but can still overwhelm.
Fibonacci Backoff
Retry after: 1s, 1s, 2s, 3s, 5s, 8s, 13s
Gentler than pure exponential.
Jitter
Add randomness to any strategy:
- Base: 1s, 2s, 4s, 8s
- Actual: 0.5-1.5s, 1-3s, 2-6s, 4-12s
Prevents synchronized retries (thundering herds).
Common Mistakes
Synchronization
All clients retry at exactly the same times:
- All bounce together
- Hit ground together
- System can’t recover
Always add jitter.
No Maximum
Delays keep doubling forever: 1hr, 2hr, 4hr, 8hr waits. Effectively permanent failure.
Cap maximum delay.
Too Aggressive
Ball barely loses height: 6 → 5.9 → 5.8 → 5.7 feet. Barely any backoff. Still overwhelming.
Need steeper decay.
Too Gentle
Ball loses too much height: 6 → 1 → 0.1 → 0.01 feet. Gives up too quickly. Misses recovery window.
Need gentler decay.
Implementation
Basic
delay = initial_delay
for attempt in range(max_attempts):
try:
result = request()
return result
except TransientError:
sleep(delay)
delay *= multiplier # typically 2
raise PermanentError
With Jitter
delay = initial_delay * (0.5 + random.random()) # ±50%
Or full jitter: random(0, delay).
Production Features
- Exponential increase
- Jitter
- Maximum delay cap
- Maximum attempts limit
- Success resets
- Failure callbacks
- Metrics collection
Decision Rules
Start with standard exponential:
- Initial: 1 second
- Multiplier: 2
- Max delay: 30 seconds
- Max attempts: 5
Always add jitter. Match backoff to failure type: network issues need quick retries, overload needs longer delays, maintenance needs very long delays.
Exponential backoff is patience encoded in mathematics. Each retry waits a bit longer, giving systems time to breathe and recover.