Time, Clocks & Ordering
In a single program, “which happened first?” is answered by a single clock you can simply read. In a distributed system there is no global now — every machine has its own clock, and those clocks disagree. Yet ordering is the thing everything else is built on: causality, consistency, consensus terms, conflict resolution. This page confronts the problem head-on: physical clocks cannot give you reliable order, so we construct order with logical clocks instead.
Wall clocks drift, and NTP only narrows the lie
Section titled “Wall clocks drift, and NTP only narrows the lie”Every computer keeps time with a quartz oscillator, and no two tick at exactly the same rate. A typical crystal drifts on the order of tens of parts per million — roughly a few milliseconds per few minutes — and the rate changes with temperature. Left alone, two servers’ clocks wander apart continuously.
NTP (Network Time Protocol) corrects this by periodically syncing each machine to time servers. It helps enormously, but it cannot make the problem disappear, for reasons baked into the network:
- NTP must estimate the round-trip delay and halve it to guess one-way latency — an estimate that’s wrong whenever the path is asymmetric. Residual error is commonly single-digit to tens of milliseconds over the internet, better on a tuned LAN.
- A correction can step the clock backward, so reading the wall clock twice can show time going in reverse — disastrous if you used it to order events.
Machine A clock: ───|────|────|────|──► (runs a touch fast)Machine B clock: ──|────|────|────|───► (runs a touch slow) ▲ ▲ "A at 12:00:00.030" "B at 12:00:00.010" Did A's event happen after B's? You CANNOT tell from the timestamps — the skew between them dwarfs the difference.Happens-before: ordering without a clock
Section titled “Happens-before: ordering without a clock”Leslie Lamport’s insight (1978) was to stop asking “what time did it happen?” and ask only “could
this event have caused that one?” He defined the happens-before relation, written →:
- If
aandbare in the same process andacomes first, thena → b. - If
ais the sending of a message andbis its receipt, thena → b. - Transitivity: if
a → bandb → c, thena → c.
If neither a → b nor b → a, the events are concurrent — and crucially, concurrency means
genuinely no causal relationship, so the system is free to order them however it likes. Happens-before
captures the only ordering that actually matters for correctness: cause must precede effect.
Lamport timestamps: a counter, not a clock
Section titled “Lamport timestamps: a counter, not a clock”Lamport made happens-before measurable with a tiny rule. Each process keeps an integer counter C:
- Before any local event: C = C + 1- When SENDING a message: C = C + 1, attach C to the message- When RECEIVING a message (ts): C = max(C, ts) + 1The max is the whole trick: receiving a message drags your counter forward to account for the
sender’s “knowledge,” so causality is preserved. The guarantee:
If
a → b, thenC(a) < C(b).
P1: e1[1] ──send──► ................. \P2: ........ e2[1] .......► recv[max(1,1)+1 = 2] ── e3[3]
e1 → recv → e3, and indeed 1 < 2 < 3.Note the direction of the arrow carefully — Lamport timestamps give you one half of an
implication. a → b ⟹ C(a) < C(b). But the converse does not hold: C(a) < C(b) does not
prove a → b; the two events might be concurrent and just happened to get those numbers. So Lamport
timestamps can produce a total order (break ties by process ID) that’s consistent with causality
— perfect for things like ordering requests for a lock — but they cannot detect concurrency. To
ask “are these two events concurrent or causally related?” you need a richer structure.
When you need to detect concurrency: vector clocks
Section titled “When you need to detect concurrency: vector clocks”Vector clocks fix exactly that gap by having each process track a vector of counters — one per process — so you can compare two timestamps and tell whether one happened-before the other or whether they’re truly concurrent. That capability is the foundation for conflict detection in eventually-consistent stores and for CRDTs, which merge concurrent updates without a central coordinator. We develop the full mechanism and its payoff in Vector Clocks & CRDTs — read this page first, then go there for the sequel.
What this buys us, and what it costs
Section titled “What this buys us, and what it costs”Logical clocks buy you a reliable, causality-respecting order with nothing but integers and the messages you were already sending — no special hardware, no synchronized wall clocks, immune to drift and NTP step-backs. The cost is that you give up any tie to real time (a Lamport timestamp tells you nothing about wall-clock seconds), and vector clocks cost storage and bandwidth that grow with the number of participants. The deeper payoff is conceptual: once you stop demanding a global “now” and ask only “what causally precedes what,” most of the ordering problems in distributed systems — consistency, consensus, conflict resolution — become tractable. Ordering, it turns out, was never about clocks. It was about cause and effect.
Check your understanding
Section titled “Check your understanding”- Why can’t NTP make wall-clock timestamps a reliable basis for ordering events across machines? Name two distinct reasons.
- Define the three rules of the happens-before relation, and explain what it means for two events to be “concurrent.”
- In the Lamport timestamp update rule, what specific job does the
maxoperation do, and what would break without it? - Lamport timestamps guarantee
a → b ⟹ C(a) < C(b). Why does the converse fail, and what can’t you conclude fromC(a) < C(b)alone? - What can vector clocks tell you that Lamport timestamps cannot, and why does that capability matter for eventually-consistent stores?