Skip to content

Distributed Tracing

In a monolith, a slow request has a stack trace: one process, one call stack, one profiler tells you which function ate the time. Split that monolith into a dozen services talking over the network and the stack trace vanishes — the call stack is now distributed across machines that don’t share memory. “Checkout is slow” becomes a whodunit: was it the cart service, the pricing service, the inventory lookup, or the database three hops down? Distributed tracing reconstructs that lost call stack across the network. It is the pillar built specifically for the question the whole thing is slow, but no single service looks slow.

Tracing has exactly two concepts:

  • A span is one unit of work with a start and end time — a single service handling a request, a database query, an outbound HTTP call. It carries a name, a duration, and metadata (status, the endpoint, error flags).
  • A trace is the full tree of spans for one end-to-end request. Each span records its parent, so the spans assemble into a tree that mirrors the actual call hierarchy.

Drawn on a time axis, a trace becomes the famous waterfall:

trace: POST /checkout (total 820 ms)
gateway |==========================| 820 ms
cart-service |====| 90 ms
pricing-service | |==========| 210 ms
db query (prices) |=========| 180 ms <- here
inventory-service | |==| 70 ms
(response assembled)

One glance answers the whodunit: the pricing service’s database query is the long bar. You didn’t have to guess or add logging — the shape of the waterfall localizes the cost. That is the entire value proposition: tracing turns “somewhere it’s slow” into “there it’s slow.”

For spans on different machines to assemble into one tree, every service must agree on which trace this is and who its parent span is. That shared identity travels with the request as trace context — typically a traceparent HTTP header (the W3C Trace Context standard) carrying a trace_id (constant for the whole request) and the current span_id (which becomes the parent of the next hop’s span).

gateway service A service B
trace_id = T trace_id = T trace_id = T
span = S1 span = S2 (parent S1) span = S3 (parent S2)
│ header: traceparent: T-S1 ──► │ traceparent: T-S2 ──► │
└─ each hop reads the context, starts a child span, and forwards the updated context

This is context propagation, and it is where tracing lives or dies. If even one service in the chain fails to forward the header — an old library, a queue hop, a thread boundary that loses the context — the trace breaks there, and everything downstream becomes an orphaned, unconnected trace. Notice this is the same trace_id you stamp onto every log line: the correlation ID and the trace ID are the same thread, which is why a good trace and a good log search reinforce each other.

Propagation must cross every boundary your request does — synchronous HTTP and RPC/gRPC calls, but also asynchronous ones: when a request hands work to a message queue, the trace context must ride along in the message so the worker that picks it up minutes later can continue the same trace.

A trace is far heavier than a metric and richer than most logs. Tracing every request at high traffic would cost more to store than the system costs to run, and would bury the interesting traces under millions of boring ones. So you sample — keep some traces, drop the rest. There are two strategies, and the difference matters:

HEAD-BASED sampling TAIL-BASED sampling
decide at the START of the request decide at the END, after seeing the whole trace
───────────────────────────────── ──────────────────────────────────────────────
e.g. "keep 1% at random" e.g. "keep ALL errors + ALL slow traces"
cheap, simple, decided at the edge expensive (must buffer every trace), but smart
may miss the rare slow/error trace keeps exactly the traces you'd want to look at

Head-based sampling decides up front (flip a coin at the edge) — cheap, but it might discard the one slow request you needed. Tail-based sampling buffers complete traces and keeps the interesting ones — every error, every trace over 1 second — which is exactly what you want, at the cost of buffering everything long enough to decide. Many systems combine them: a low baseline head sample for healthy traffic plus tail rules that guarantee errors and slow requests are always kept.

The payoff is diagnosing failures that no single pillar can explain:

  • Serial vs parallel calls. The waterfall instantly shows whether five downstream calls ran one-after-another (a fixable latency bug) or overlapped. A staircase of sequential bars is a classic, invisible-without-tracing slowdown.
  • The unexpected hop. A trace often reveals a service is being called far more times than you thought — the N+1 query problem made visible across the network.
  • Tail latency attribution. Sample the slow traces (tail-based) and you can see which hop is responsible for your p99, not just that p99 is bad.

This is why traces complete the trio. Metrics tell you p99 latency rose. Logs give you the detail of one failing event. Traces tell you where in the request’s journey the time and failures actually live — the connective tissue between the “how much” of metrics and the “what exactly” of logs.

  1. Define span and trace, and explain how parent references turn a flat list of spans into a waterfall.
  2. What is context propagation, what travels in the trace context, and what happens to the trace if one service forgets to forward it?
  3. Why does trace context also have to ride inside a queued message, not just an HTTP header?
  4. Contrast head-based and tail-based sampling. Which one reliably captures the slow/error traces you most want, and what does it cost?
  5. Give one concrete latency bug that a trace waterfall reveals at a glance but that metrics and logs alone would hide.