Database Scaling Patterns

Stateless web servers are easy: clone them behind a load balancer and you’re done. The database is where scaling gets hard, and that’s no accident — data has identity. You cannot just spin up a tenth copy of your authoritative data and call it capacity, the way you can with app servers, because writes must converge on a single source of truth or the truth forks. This is why, in nearly every growing system, the database is the first wall you hit. This page lays out the standard ladder of moves, in the order you’ll actually climb it.

Why the database resists scaling

A web server is stateless; a database is the opposite — it is the state. That creates three frictions app servers don’t have:

Writes need a single authority. Two machines independently accepting writes to the same row will disagree, and reconciling that is a hard distributed-systems problem.
Consistency is expensive. Keeping copies in agreement costs coordination, latency, or both (the CAP tension).
Data is heavy. Moving or splitting terabytes is slow, risky, and usually can’t be done without careful migration.

So the strategy is always: squeeze the most out of the cheap, simple moves before touching the expensive, irreversible ones. Climb the ladder; don’t skip to the top.

The ladder

   1. Vertical (bigger box)            ── simple, capped
   2. Read replicas                    ── scales reads only
   3. Caching                          ── offloads repeated reads
   4. Functional partitioning          ── split BY FEATURE across DBs
   5. Sharding (horizontal partition)  ── split ONE dataset across DBs
                          (each rung is more complex than the last)

Rung 1 — Vertical: a bigger box

Give the database more CPU, RAM, and faster disks. RAM especially is magic for databases, because more of the working set fits in memory and disk I/O — usually the bottleneck — drops. It’s the simplest move and buys a lot of headroom. The cost is the familiar one: a hard ceiling, a SPOF, and a super-linear price curve (see Vertical vs Horizontal). Stay here as long as you sanely can; every rung above is harder.

Rung 2 — Read replicas

If reads dominate (they usually do), copy the data to replicas and spread reads across them while the primary takes writes. Cheap, effective, and it adds read availability. The catches are replication lag and the fact that it does nothing for write load — covered in depth in Read Replicas & CQRS.

Rung 3 — Caching

Put a cache (often Redis) in front of the database so repeated reads never reach it at all. This is frequently the highest-leverage rung: a good cache can absorb the bulk of read traffic for a fraction of the cost of more replicas. The price is the invalidation problem and stampede risk — see Caching Strategies. Notice rungs 2 and 3 both attack reads; that’s deliberate, because reads are the easy win and usually the first thing to saturate.

Rung 4 — Functional partitioning (vertical partitioning)

Split the database by feature: put users in one database, orders in another, the product catalog in a third. Each service owns its own store. This is often the natural consequence of moving toward services, and it relieves both reads and writes because each database now carries only one slice of the workload.

   one DB:         [ users + orders + products + messages ]   (everything contends)
   functional:     [ users ] [ orders ] [ products ] [ messages ]   (independent)

What does this buy us, and what does it cost? It buys independent scaling per feature and smaller, faster individual databases. It costs you cross-feature joins and transactions — you can no longer JOIN users against orders in one query, and a transaction spanning two databases needs application-level coordination. You’ve traded SQL’s convenience for scalability.

Rung 5 — Sharding (horizontal partitioning)

When a single dataset is too big or too write-heavy for one machine — even after all the above — you split that one table across many databases by a shard key (e.g. user ID): users A–M on shard 1, N–Z on shard 2. Each shard is a full, independent database holding a slice of the rows. This is the move that scales writes, and it is the only rung with no ceiling. (The mechanics — choosing a shard key, rebalancing, hot shards — are covered in Partitioning & Sharding.)

What does this buy us, and what does it cost? It buys effectively unlimited capacity for both reads and writes — the holy grail. It costs you the most of any rung:

Cross-shard queries are painful — anything not keyed by the shard key may have to fan out to every shard and merge results.
No cross-shard transactions (without heavy machinery).
Rebalancing and hot shards become operational realities — a bad shard key concentrates load.
It is hard to undo. Sharding is a one-way door; treat it as the last resort it is.

The whole ladder, at a glance

Rung	Move	Scales	Main cost
1	Vertical	reads+writes	Hard ceiling, SPOF, price
2	Read replicas	reads	Replication lag
3	Caching	reads	Invalidation, stampedes
4	Functional partitioning	reads+writes	No cross-feature joins/txns
5	Sharding	reads+writes	Cross-shard pain, irreversible

The thread through all five: what does this buy us, and what does it cost? Each rung buys more capacity by surrendering more of the convenient single-database guarantees — strong consistency, easy joins, simple transactions. Scaling a database is the slow, deliberate trade of convenience for capacity, one rung at a time.

Check your understanding

Why can’t you scale a database the way you scale stateless web servers — by just adding identical copies?
Rungs 2 and 3 both target reads. Why is that the natural first place to attack?
Distinguish functional partitioning from sharding. What does each split, and which one scales writes of a single dataset?
Name two convenient database guarantees you give up when you shard, and why they break.
Why is “don’t shard early” such common advice, and what should you exhaust first?