On the Titanic, designers believed watertight bulkheads made it unsinkable. When the iceberg tore through multiple compartments, water spilled from one to another, creating a cascade that sank the “unsinkable.”
Bulkheads must be truly isolated, tall enough, numerous enough.
In distributed systems, the bulkhead pattern isolates failures so one breach doesn’t flood the entire ship.
Without Bulkheads
Single Hull
One large cargo hold:
- Hull breach anywhere
- Water floods entire ship
- No way to contain
- Ship sinks rapidly
Like a paper bag—one hole destroys everything.
Cascade Effect
Even with basic dividers:
- Water fills first section
- Overflows to second
- Weight shifts, ship lists
- More sections flood
- Progressive failure
One problem becomes ship-wide catastrophe.
Software Bulkheads
Thread Pool Isolation
Without bulkheads:
- All requests share thread pool
- Bad endpoint exhausts threads
- Entire application blocked
With bulkheads:
- Payment API: 10 threads
- User API: 10 threads
- Analytics API: 5 threads
- Search API: 15 threads
One service flooding doesn’t drown others.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Connection Pool Isolation
Shared pool:
- 100 connections total
- Reporting query uses all 100
- Customer transactions blocked
Bulkhead pools:
- Transactions: 50 connections
- Reporting: 20 connections
- Admin: 10 connections
- Background: 20 connections
Reports can’t sink transactions.
Common Problems
Titanic’s Fatal Flaw
Bulkheads not tall enough:
- Water spilled over tops
- Progressive flooding
- Ship lost
Software equivalent:
- Thread limit too high
- Memory limit too generous
- Isolation fails
Shared Resources
Hidden connections:
- Compartments share ventilation
- Flood spreads through ducts
- Isolation compromised
Software parallel:
- Shared cache
- Common connection pool
- Central logging
Failure spreads.
Decision Rules
Right-size compartments: analyze load patterns, consider failure impact, plan for growth.
Test your bulkheads: flood one compartment, verify others dry, test cascade scenarios.
Bulkheads delay failures, they don’t prevent them. Have lifeboats (fallbacks), practice procedures, monitor constantly.
The iceberg is out there. Your system will hit something. With proper bulkheads, you’ll stay afloat long enough to reach port.