Can the agent write the query for me?

Yes on both. New Relic offers natural_language_to_nrql_query to draft NRQL from plain English. Honeycomb's run_query takes structured calculations and breakdowns, and the agent can assemble those from your request to explore high-cardinality data.

Which is stronger for tracing?

Honeycomb is explicitly trace-first, with waterfall rendering, span drill-down, and a service map. New Relic includes traces within a broader, unified telemetry surface alongside metrics and logs.

New Relic vs Honeycomb

New Relic and Honeycomb are both observability platforms an on-call engineer might query during an incident, and both ship official MCP servers — but they embody two different schools of observability. New Relic is the broad, all-in-one platform: its server turns telemetry into an action engine where an agent translates natural language to NRQL, runs it against NRDB, and gets metrics, traces, and logs with analysis layered on top, spanning entity discovery, alerts, incidents, golden-metric and transaction analytics, and impact reports. Honeycomb is the high-cardinality, trace-first platform beloved for fast debugging: its server lets an agent write time-series queries with calculations and breakdowns, run BubbleUp to explain anomalies, render traces as waterfalls, drill into spans and the service map, and manage Boards, Triggers, and SLOs. Teams comparing them are usually weighing a wide single-vendor suite against a sharp, exploratory debugging tool. Here is how the two servers compare when an agent is the one investigating production.

How they compare

Dimension	New Relic	Honeycomb
Philosophy	Broad single-vendor observability — metrics, traces, logs, alerts, and incidents unified in NRDB and queried as one platform.	High-cardinality, exploratory debugging — ask precise questions of events and traces rather than relying on pre-built views.
Querying	NRQL-centric, with natural_language_to_nrql_query plus execute_nrql_query so the model drafts and runs queries against NRDB.	Structured queries with calculations and breakdowns (run_query), tuned for slicing high-cardinality dimensions.
Anomaly and root cause	analyze_golden_metrics, analyze_transactions, error groups, deployment-impact and user-impact reports — opinionated, packaged analytics.	BubbleUp automatically surfaces the dimensions explaining an anomaly, plus anomaly service profiles — fast, interactive root-cause.
Tracing	Pulls traces and spans as part of a unified telemetry surface alongside metrics and logs.	Trace-first — get_trace renders waterfalls, list_spans/get_span_details drill in, and get_service_map maps dependencies.
Best-fit task	Teams standardized on New Relic that want natural-language NRQL and broad, packaged performance analytics in one place.	Teams that debug by exploring events and traces and want BubbleUp-style explanation with SLOs and Triggers managed in-agent.

Verdict

Choose New Relic's server when your stack is on New Relic and you value breadth plus natural-language NRQL — one platform for metrics, traces, logs, alerts, and incidents, with packaged analytics like golden metrics and transaction analysis. Choose Honeycomb's server when your debugging style is exploratory and trace-centric: write breakdown queries, run BubbleUp to explain anomalies, walk trace waterfalls, and manage SLOs and Triggers from the agent. This is the wide-single-vendor-suite versus sharp-exploratory-debugger decision. If your team thinks in dashboards and packaged reports, New Relic fits; if they think in high-cardinality questions and traces, Honeycomb fits — and which one you already run usually settles it.

FAQ

Can the agent write the query for me?: Yes on both. New Relic offers natural_language_to_nrql_query to draft NRQL from plain English. Honeycomb's run_query takes structured calculations and breakdowns, and the agent can assemble those from your request to explore high-cardinality data.
Which is stronger for tracing?: Honeycomb is explicitly trace-first, with waterfall rendering, span drill-down, and a service map. New Relic includes traces within a broader, unified telemetry surface alongside metrics and logs.