Message Queues

Some work doesn’t need to happen now, in the request, while the user waits. Sending a welcome email, transcoding a video, updating a search index, charging a card — these can happen moments later, off to the side, without the user staring at a spinner. A message queue is the block that makes “later, somewhere else” possible. A producer drops a message into the queue and immediately moves on; one or more consumers pick messages up and process them on their own schedule. The queue sits between them as a durable buffer.

The deep idea is decoupling. Without a queue, the component that creates work and the component that does work are bound together: same speed, same uptime, same failure. With a queue between them, they’re independent — they can run at different speeds, scale separately, and one can be down while the other keeps going. That independence is the whole point.

What does a queue buy us, and what does it cost? It buys decoupling, resilience to spikes, and the ability to do slow work without blocking users. It costs you asynchrony (the work is no longer done-or-failed by the time you respond), a new piece of infrastructure to run, and a set of subtle delivery and ordering problems you now have to reason about.

What decoupling actually buys

   Synchronous (no queue):
     Web request ──► do slow work inline ──► respond     (user waits; a spike overwhelms you)

   Asynchronous (with a queue):
     Web request ──► enqueue job ──► respond instantly
                          │
                          ▼
                      [ queue ]  ──► worker(s) process at their own pace

Spike absorption. A flood of work piles up in the queue instead of crashing your workers. The queue is a shock absorber: producers can briefly outrun consumers and the backlog just grows, then drains.
Independent scaling. Too much backlog? Add workers. The producing side doesn’t change.
Failure isolation. If the email service is down, emails wait in the queue and send when it recovers — instead of failing the user’s whole request.
Responsiveness. The user gets an instant “got it,” and the slow part happens behind the scenes.

Queue vs log: two shapes that look alike

People say “message queue” for two genuinely different designs. The distinction is what happens to a message after it’s read.

   BROKER-STYLE QUEUE (RabbitMQ, SQS)        LOG-STYLE STREAM (Kafka, Pulsar)
   ───────────────────────────────          ───────────────────────────────
   message consumed → removed                message appended → STAYS (retained)
   each message goes to ONE consumer         many independent consumers re-read the same log
   think: a to-do list, items checked off    think: an append-only ledger, replayable
   great for: task/work distribution         great for: event streaming, replay, multiple readers

A broker-style queue treats a message as a task: deliver it to one worker, and once it’s acknowledged, it’s gone. This is the model for distributing work — each job done exactly once by one of N workers.
A log-style stream treats a message as an event written to an append-only log, kept for a retention period (hours, days). Many different consumers can read the same stream independently, each tracking its own position (offset), and can rewind to replay history. This is the backbone of event-driven architecture, where one event (“order placed”) is consumed by many unrelated systems (billing, shipping, analytics) at once.

Neither is “better.” A queue is for do this work; a log is for this happened — anyone who cares, react. Choosing depends on whether messages are tasks to be consumed once or events to be observed by many.

Delivery guarantees: the at-least-once reality

A distributed queue cannot perfectly promise “exactly once,” because networks lose acknowledgments. There are three guarantees, and the middle one is what you almost always get:

Guarantee	Meaning	Reality
At-most-once	every message delivered 0 or 1 times	may lose messages; rarely acceptable
At-least-once	every message delivered 1 or more times	the practical default; may duplicate
Exactly-once	delivered precisely once	very hard end-to-end; often faked atop at-least-once

The crux: if a consumer processes a message but crashes before sending its acknowledgment, the queue doesn’t know it succeeded, so it redelivers — and now the work happens twice. Because at-least-once is the realistic default, your consumers must be idempotent: processing the same message twice must produce the same result as processing it once (e.g. “set status = shipped,” not “increment count”). Idempotency is how you turn the messy at-least-once guarantee into effectively-once behavior.

Consumer groups: scaling the read side

To process faster, you run many consumers — but you don’t want all of them handling the same message. A consumer group is a set of cooperating consumers that share a stream’s partitions so each message is handled by exactly one member of the group. Add a consumer and the partitions rebalance, spreading the load; meanwhile a different group can read the same stream independently for a different purpose. This is how a log serves both “scale one job across many workers” and “many distinct jobs each read everything.”

   Partitioned log:  [P0][P1][P2][P3]
        Group "billing":   c1←P0,P1   c2←P2,P3     (split the work; each msg once per group)
        Group "analytics": c1←P0..P3                (a separate group reads it all again)

(Ordering note: most queues only guarantee order within a partition, not across the whole stream. If you need related events in order, route them to the same partition by a key.)

Backpressure: when consumers can’t keep up

The shock-absorber property has a limit. If producers persistently outrun consumers, the backlog grows without bound — eventually exhausting storage and ballooning latency (messages sit for hours before processing). Backpressure is the family of mechanisms that pushes back when this happens:

Slow or reject producers when the queue is full, so the pressure propagates upstream instead of silently piling up.
Autoscale consumers off queue depth — when the backlog grows past a threshold, add workers.
Shed load / set TTLs — drop or expire low-value messages rather than drowning.

Check your understanding

What does “decoupling” mean precisely, and name three concrete things it lets the producer and consumer do independently?
Distinguish a broker-style queue from a log-style stream by what happens to a message after it’s read. Give a use case where only the log shape works.
Why is exactly-once delivery so hard, and why does at-least-once make consumer idempotency mandatory rather than optional?
What problem does a consumer group solve, and how does a log serve both “one job across many workers” and “many jobs each reading everything”?
Why can an unbounded queue make an overloaded system look healthy, and which two metrics expose the truth?