What is distributed tracing?

Distributed tracing follows a single request as it hops across services, stitching per-service spans into one end-to-end trace so you can see where time went and which hop failed in a microservice system.

Distributed tracing is an observability technique that reconstructs the full journey of a single request as it flows through many services. Each unit of work, an HTTP handler, a database call, a queue consumer, records a span: a timed operation with a start, a duration, and metadata. Spans are tied together by a shared trace ID propagated through request headers (the W3C Trace Context standard), and each span points to its parent, so the collector can assemble them into one tree, the trace, that shows the entire request from the user's click down to the slowest leaf call. This is the only practical way to answer where did the time go and which service failed in a system where a single request touches a dozen services and no single log file tells the whole story. Distributed tracing is one of the three pillars of observability alongside metrics and logs, and it is what makes event-driven and microservice architectures debuggable. OpenTelemetry has become the standard instrumentation layer, with backends like Honeycomb, Datadog, Grafana Tempo, and Jaeger storing and querying traces. For an AI agent debugging latency or an incident, traces are the richest evidence, and a shared memory of which services are chronically slow or flaky lets the agent narrow its search before it even opens a trace.