What is SLI (Service Level Indicator)?
A Service Level Indicator is a concrete metric of service health, such as the ratio of successful requests to total requests, that you measure directly and compare against an SLO target.
A Service Level Indicator (SLI) is the raw quantitative measure of some aspect of a service's behavior that users actually care about, expressed as a ratio of good events to valid events. Common SLIs include request success rate (non-error responses over total responses), latency (the fraction of requests served faster than a threshold), freshness (how recently data was updated), and durability. The discipline of good SLI design is choosing indicators that track user experience rather than internal convenience: a 99% CPU utilization figure is a system metric, not an SLI, because users do not feel CPU; they feel slow or failing requests. SLIs feed directly into Service Level Objectives, which set the target an SLI must meet, and into error budgets, which quantify how much the SLI is allowed to fall short. Getting SLIs right is the foundation of reliability engineering: pick the wrong indicator and you will optimize a number nobody experiences. For an AI agent investigating a degradation, the relevant SLIs and their definitions are exactly the kind of durable context worth recording in shared memory, so the next agent does not have to rediscover which metric the team treats as the source of truth.