Skip to content

Part 9 · Case Studies — Designing Real Systems

The previous eight parts built your vocabulary: caching, replication, sharding, queues, consensus, rate limiting. This part puts the vocabulary to work. Each page here is a case study — a full “design X” walkthrough — and they all follow the same skeleton. The skeleton matters more than any single answer, because a system design interview (or a real design doc) is not a trivia quiz. It is a test of whether you can take a vague prompt and drive it toward a defensible architecture under constraints.

The single biggest mistake is jumping straight to boxes-and-arrows. Resist it. Walk these six steps in order, out loud, every time.

1. CLARIFY What are we actually building? (functional + non-functional)
2. ESTIMATE How big is it? (QPS, storage, bandwidth — back-of-envelope)
3. API What are the endpoints? (the contract clients depend on)
4. DATA MODEL What do we store, and how is it shaped/keyed?
5. ARCHITECTURE High-level boxes and arrows (the happy path)
6. DEEP DIVE Find the bottleneck, scale it, name the trade-offs
└─ thread throughout: what does each choice BUY us, and what does it COST?

Split requirements into two buckets. Functional requirements are what the system does — “shorten a URL,” “deliver a message,” “return autocomplete suggestions.” Non-functional requirements are the qualities it must have while doing it — latency, availability, consistency, durability, scale. Non-functional requirements are where the design actually lives: “a chat app” tells you almost nothing, but “100M users, sub-200ms delivery, messages must never silently disappear” tells you almost everything.

Turn the scale into numbers you can design against. You need three:

  • QPS — daily active users × actions per user ÷ 86,400 seconds, then multiply by a peak factor (typically 2–3×). Distinguish read QPS from write QPS.
  • Storage — bytes per record × records per day × retention. Project to years.
  • Bandwidth — QPS × payload size.

You are not chasing precision; you are chasing the order of magnitude that tells you whether one box suffices or you need a sharded fleet. See Back-of-Envelope Estimation and the latency numbers every engineer should know.

A handful of endpoints — method, path, key params, return shape. This forces you to name the core operations and exposes hidden requirements (pagination, auth, idempotency keys). Keep it small; three to five endpoints is plenty.

What entities exist, what fields they carry, and — crucially — what you key and index on. The access pattern dictates the model, not the other way around. Decide SQL vs NoSQL here, and anticipate your partition key before you need it.

Now draw the boxes: clients → load balancer → stateless app tier → caches → databases → async workers. Show the happy path of one request first. Lean on the building blocks you already know: load balancers, caching, CDNs, replication.

Pick the part that breaks first under your estimated load and fix it: hot keys, the celebrity fan-out, connection limits, write amplification. Then state the trade-offs explicitly. This is the move that separates a senior answer from a junior one — every choice you made bought something (latency, simplicity, scale) and cost something (consistency, money, operational burden). Saying so out loud proves you understand the design rather than reciting it.

This part contains eight designs. The four detailed in depth here, plus four companions in this same directory. Each reuses the framework above; together they cover the major archetypes you’ll meet.

Case studyArchetype it teaches
Design a URL ShortenerRead-heavy KV store, unique ID generation, caching
Design a News FeedFan-out, the celebrity problem, ranking
Design a Chat SystemStateful connections, ordering, real-time delivery
Design a Rate LimiterDistributed counters, algorithm trade-offs
Design a Notification SystemMulti-channel fan-out, queues, retries
Design a Typeahead AutocompleteTries, prefix search, latency budgets
Design a Web CrawlerFrontier queues, politeness, dedup at scale
Design a Payment SystemIdempotency, exactly-once, consistency & audit

Read the four deep dives first. They establish patterns — caching the hot path, fanning out work, holding stateful connections, counting under contention — that the companion four recombine.

What does this buy us, and what does it cost? Carry that question through every page. A framework is only useful if it makes the trade-offs visible: estimation buys you the right to choose, the API buys you a contract, the data model buys you predictable access, and the deep dive is where you pay — in consistency, in money, in complexity — for the scale you asked for. Master the six moves and any “design X” prompt becomes the same problem wearing a different hat.

  1. Name the six moves of the framework in order. Why is jumping straight to the architecture diagram a mistake?
  2. What is the difference between functional and non-functional requirements, and why do the non-functional ones do most of the design work?
  3. Why is the read:write ratio the first number to establish?
  4. How do you turn “100M daily active users” into a peak write-QPS figure?
  5. Give an example of a design choice and state explicitly what it buys and what it costs.