Best MCP servers for monitoring & incidents

When something breaks in production, the work is investigation: read the error and its stack trace, query the metrics and logs around the spike, check dashboards and alerts, and see whether a recent release or a user-facing regression lines up with the incident. A monitoring MCP setup lets an agent do that investigation across your observability stack instead of an on-call engineer flipping between consoles at 3am. The servers below cover error tracking, full APM and log search, dashboards and alerting, and product-side regressions. Install the ones that match your stack; the workflow is the same loop of detect, query, correlate. Each ships a verified, current install config.

Top pick

Sentry

Sentry

Official

Sentry's official MCP server: pull issues, stack traces, and events, and run Seer root-cause analysis from your editor.

monitoring-observability712

Sentry's official server pulls issues, stack traces, and events and can run Seer root-cause analysis, the fastest path from an error alert to what actually broke.

Pick 2

Datadog

Datadog

Official

Datadog's official remote MCP server lets agents search logs, query metrics, pull APM traces, inspect monitors, and investigate incidents.

monitoring-observability

Datadog's official remote server lets the agent search logs, query metrics, pull APM traces, inspect monitors, and investigate incidents like an on-call engineer.

Pick 3

Grafana

Grafana Labs

Official

Grafana Labs' official MCP server: query dashboards, Prometheus, Loki, incidents, alerts, and OnCall from your agent.

monitoring-observability3,083

Grafana Labs' official server queries dashboards, Prometheus, Loki, incidents, alerts, and OnCall, covering the open observability stack in one place.

Pick 4

PostHog

PostHog

Official

PostHog's official MCP server: query product analytics, manage feature flags and experiments, run HogQL, and triage errors from your editor.

data-analytics

PostHog's official server triages errors and queries product analytics, so the agent can correlate an incident with a user-facing regression or a feature-flag change.