Skip to content

Design a Rate Limiter

A rate limiter caps how many requests a client may make in a window (“100 requests/minute”). It’s a deceptively rich design problem: the single-machine version is a few lines, but doing it correctly across a fleet of servers forces you to confront shared state, latency, and accuracy trade-offs. (For the concept, see Rate Limiting; here we design one.)

Functional

  • Limit requests per client (by user ID, API key, or IP) to N per time window.
  • Return 429 Too Many Requests (with a Retry-After header) when exceeded.
  • Limits must apply across the whole fleet, not per server.

Non-functional

  • Low latency — the limiter is on every request’s hot path, so it must add single-digit milliseconds at most.
  • Highly available — if the limiter is down, fail open (allow) or closed (block)? A deliberate choice (see below).
  • Memory-efficient at scale (millions of clients).

Say 1M active clients and 10k requests/second globally. The limiter does at least one read+write of counter state per request → ~10k ops/s against the counter store, and must hold ~1M small counters. That comfortably fits an in-memory store like Redis — which points us at the architecture.

The heart of the design. Each trades accuracy for simplicity/cost:

AlgorithmIdeaProCon
Fixed windowCount per clock-aligned windowTrivial, 1 counterBurst at edges: 2× limit across a boundary
Sliding window logStore timestamp of every requestExactMemory-heavy (one entry per request)
Sliding window counterWeighted blend of current + previous windowSmooth, cheapSlight approximation
Token bucketTokens refill at a steady rate; each request spends oneAllows controlled burstsTwo values to track (tokens, last-refill)
Leaky bucketRequests queue and drain at a fixed rateSmooths outputAdds queuing latency

Token bucket is the common default: it permits short bursts (good UX) while enforcing a long-run rate, and needs just (tokens, last_refill_time) per client.

tokens refill at R/sec, capacity = B
request arrives → refill tokens based on elapsed time → if tokens ≥ 1: allow, tokens-- ; else: 429

A per-server counter is wrong: with 10 servers each allowing 100/min, a client gets 1,000/min. The limit must live in shared state.

request ──► any app server ──► [ Redis: client:123 → {tokens, ts} ] ──► allow / 429
(stateless) single source of truth, fleet-wide

The update must be atomic — read-modify-write across many servers races. Use Redis with an atomic Lua script (or INCR + EXPIRE for window counters) so the check-and-decrement happens in one indivisible operation.

  • Where to enforce: an API gateway is the natural home — limit once at the edge before requests fan into the system.
  • Latency vs accuracy: a strictly-correct global counter means a network hop to Redis per request. For extreme throughput, allow a small local token cache per server (approximate, occasionally over-permits) and reconcile with the central store — trading perfect accuracy for speed.
  • Fail open or closed? If Redis is unreachable, failing open (allow) preserves availability but removes protection; failing closed (block) protects the backend but causes an outage. Most user-facing APIs fail open for availability; abuse-sensitive endpoints fail closed.

What does this buy us, and what does it cost? A rate limiter buys protection (against overload, abuse, runaway clients) and fairness — at the cost of latency on every request and a shared-state dependency. The central design tension is accuracy vs speed: perfectly global counting is a hop away on every call; cheap local approximations are fast but leak a little capacity. Pick the point that matches whether you’re protecting a fragile backend or just shaping traffic.

  1. Why is a per-server counter incorrect, and where must the limit’s state live?
  2. Why is token bucket often preferred over fixed window?
  3. Describe the race condition in a naïve GET-then-SET limiter and how atomicity fixes it.
  4. Explain the accuracy-vs-latency trade-off of a per-server local token cache.
  5. When would you fail open vs fail closed if the counter store is unreachable?