Skip to content

The Dual-Write Problem & the Outbox Pattern

Here is a piece of code that appears in nearly every microservice ever written, and is almost always subtly broken:

saveOrderToDatabase(order) // step 1
publishEvent("OrderCreated", order) // step 2

You wrote to your database and you published an event so other services find out. Two writes, two systems. The trouble is there is no way to make those two steps atomic — to guarantee that either both happen or neither does. This is the dual-write problem, and it is one of the most common sources of data drift in event-driven systems. It’s the operational shadow of everything in Event-Driven Architecture: the moment you tell other services about a change by emitting an event, you’ve signed up for this problem.

Why it’s genuinely impossible, not just hard

Section titled “Why it’s genuinely impossible, not just hard”

The database and the message broker are two separate systems with two separate transaction domains. Your local database transaction can guarantee atomicity within the database. It cannot reach across the network into Kafka or RabbitMQ and enroll that send in the same all-or-nothing unit. So whichever order you choose, a crash in the gap between the two steps leaves you inconsistent:

DB first, then publish: Publish first, then DB:
✓ save order ✓ publish "OrderCreated"
✗ CRASH before publish ✗ CRASH before save
→ order exists, NO event → event exists, NO order
→ downstream never hears about it → ghost event for an order
(lost notification, no shipment) that doesn't exist

A natural instinct is “wrap them in a distributed transaction (two-phase commit, XA).” You can, and it’s a cure worse than the disease: 2PC couples the availability of your database to the availability of your broker (if either is down, neither write proceeds), it holds locks across a network round trip, it scales poorly, and many modern brokers don’t support it well anyway. You’d trade a rare inconsistency for chronic unavailability and latency.

The dual-write problem is unsolvable as stated because it’s two writes to two systems. So we refuse to do two writes. The transactional outbox pattern turns the event into a row in the same database as the business data, written in the same local transaction:

BEGIN TRANSACTION
INSERT INTO orders (...) -- the business change
INSERT INTO outbox_events (type, payload) -- the "I should publish this" record
COMMIT

Now there is exactly one atomic write. Either both rows commit or neither does — the database’s own transaction guarantees it. There is no longer a gap where the order exists but the event doesn’t, because the event is in the order’s transaction. The outbox_events table is just an ordinary table holding “messages I intend to send, but haven’t yet.”

┌─────────────────── ONE local transaction ───────────────────┐
│ orders table ◄── business row │
│ outbox table ◄── event row (type, payload, created_at) │
└──────────────────────────────────────────────────────────────┘
▼ (separate, asynchronous process)
relay reads new outbox rows
publish to Kafka / RabbitMQ
mark row as sent (or delete it)

The event now sits safely in the outbox. A separate message relay moves it to the broker. Two ways to build the relay:

  1. Polling publisher. A background loop runs SELECT … FROM outbox WHERE sent = false ORDER BY id, publishes each, and marks it sent. Dead simple, works on any database. Costs: polling latency and constant query load; you tune the interval to trade freshness against database pressure.

  2. Change Data Capture (CDC). Instead of polling the table, tail the database’s write-ahead log — the append-only record of every committed change the DB already keeps for crash recovery. A tool like Debezium reads the log, sees new outbox rows the instant they commit, and streams them to Kafka. No polling, near-real-time, near-zero added load on the database, and it reads committed data only — so it never sees a row that rolled back.

WAL (write-ahead log): ...│order#42│outbox#42│order#43│outbox#43│...
Debezium tails the log
│ (only committed entries)
Kafka topic

The outbox pattern buys you the one thing 2PC couldn’t without wrecking availability: atomic, guaranteed publication of an event alongside the data change that caused it. No more orders without events, no more ghost events for orders that don’t exist. Your service stays available even when the broker is down — events simply queue in the outbox table until the relay can drain them.

The costs are honest and worth naming. You add a table, a relay process (or a CDC pipeline to operate and monitor), and end-to-end latency — the event reaches the broker a beat after the commit, not synchronously. You inherit at-least-once delivery, pushing the dedup burden onto consumers. And the outbox table needs housekeeping (delete or archive sent rows so it doesn’t grow unbounded). For almost any system that emits events about its own state changes, that’s a price well paid: you’ve converted an impossible atomicity problem into a routine, observable pipeline.

  1. Explain precisely why a database write and a message-broker publish cannot be made atomic, in terms of transaction domains.
  2. Show the failure outcome for both orderings (DB-first and publish-first) when the process crashes between the two steps.
  3. Why is two-phase commit (XA) a poor fix here? Name at least two concrete downsides.
  4. What is the single conceptual move that makes the outbox pattern work, and why does it eliminate the atomicity gap?
  5. Compare polling vs CDC for the relay. Why does CDC add almost no load and never publish an uncommitted row?