What is LLM observability?

LLM observability is the practice of capturing and inspecting what an AI application actually does at runtime, prompts, responses, tool calls, tokens, latency, and cost, so you can debug, monitor quality, and control spend.

LLM observability is monitoring and debugging applied to language-model applications, where the usual logs and metrics are not enough because the interesting behavior lives inside prompts, model responses, and tool calls. It captures each interaction in detail: the full prompt and completion, which tools the agent called and with what arguments, token counts, latency, cost, and the model and parameters used. With that record you can answer the questions that matter in production, why did this answer go wrong, which prompt version regressed quality, where is the latency, which calls are burning the budget, and you can attach evaluation scores to live traffic to watch quality over time, not just in offline evals. For agents the value compounds, because a single user request can fan out into many model and tool steps, and without a recorded trace a failure is nearly impossible to reconstruct. Tools in this space, including Langfuse, Honeycomb, and SigNoz, increasingly build on OpenTelemetry so LLM traces sit alongside the rest of your system, and several expose MCP servers so an agent can query its own telemetry. It pairs naturally with tracing and spans, which give the data its structure.