Central Library started small: one room, one librarian, manageable. Now it holds millions of books. Patrons wait hours. The librarian hasn’t slept in weeks.
The solution: split the library. Fiction (A-M) to North building. Fiction (N-Z) to South. Non-fiction to East. Reference to West. Each building has its own staff and catalog.
That’s sharding. Break an overwhelming whole into manageable pieces, each handling its portion independently.
Growth Timeline
Day 1
- 1,000 books
- 1 librarian
- 50 visitors daily
Year 5
- 500,000 books
- 10 librarians
- 5,000 visitors daily
- 2-hour wait times
Year 10
- 5 million books
- 20 librarians (still not enough)
- 20,000 visitors daily
- Building at capacity
- System grinding to halt
Adding more librarians to the same building stopped working.
The Shard Solution
Split into specialized buildings:
North Library: Fiction (A-M) - 1.2 million books, 5 librarians South Library: Fiction (N-Z) - 1.1 million books, 5 librarians East Library: Non-fiction - 1.8 million books, 6 librarians West Library: Reference & Periodicals - 900,000 books, 4 librarians
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Each building now handles a quarter of the load.
Sharding Strategies
Range-Based
Split by alphabetical range:
- Authors A-F: Building 1
- Authors G-M: Building 2
- Authors N-S: Building 3
- Authors T-Z: Building 4
Simple, clear routing. But popular ranges get overloaded (lots of S names).
Hash-Based
Take author name, apply hash function, modulo number of buildings:
“Stephen King” → Hash → Mod 4 → Building 3
Even distribution, no hotspots. But range queries become difficult.
Geographic
Split by reader location:
- Downtown readers: Central Library
- Suburban readers: Branch Libraries
- University area: Academic Library
Serves local communities, reduces travel. But coordination is complex.
Cross-Shard Operations
Finding a Book
Patron wants “War and Peace” by Tolstoy:
- Check directory: Fiction, T, means South Library
- Travel to South Library
- Find in local catalog
Mostly transparent to patrons.
Finding All Books by One Author
Problem: Stephen King’s books might span multiple libraries if miscategorized.
Solution: Central directory tracks all locations, inter-library loan system.
The Join Problem
Research project needs books from multiple categories:
Philosophy: East Library Literature: South Library History: East Library Poetry: North Library
Researcher travels between buildings or waits for transfers.
Challenges
Rebalancing
North Library approaches capacity while South has space:
Option 1: Move boundary (A-P vs Q-Z) - massive book moving Option 2: Add sub-shard (North splits into North-1, North-2) - less disruption, more complexity Option 3: Virtual sharding - route new acquisitions differently, gradual rebalancing
Hot Authors
New Harry Potter release: everyone wants it, all copies in North Library (H), North overwhelmed.
Solutions: replicate popular items, temporary redistribution, digital copies.
Query Routing
Need central system to direct patrons:
- “Which building has this book?”
- “Where should this new book go?”
- “Which buildings have capacity?”
The router becomes critical infrastructure.
Decision Rules
Shard when:
- Single database can’t handle your load
- You can identify natural data divisions
- Cross-shard operations are rare
Don’t shard when:
- Your data doesn’t have natural boundaries
- You need lots of cross-shard joins
- You can’t afford the operational complexity
Sharding is powerful medicine. Use when needed, not prophylactically.