What is Dead-letter queue?
A dead-letter queue holds messages that a system failed to process after exhausting retries, so they are quarantined for inspection and reprocessing instead of being lost or blocking the main queue forever.
A dead-letter queue (DLQ) is a secondary queue where messages are routed after they cannot be processed successfully despite repeated attempts. In an asynchronous, message-driven system, a consumer pulls a message, tries to handle it, and either acknowledges success or fails; on failure the broker retries, often with exponential backoff. But some messages are poison, malformed, referencing deleted data, or triggering a bug, and will fail forever. Without a DLQ, a poison message either blocks the queue (if processing is ordered) or loops endlessly, consuming resources and drowning real work. The DLQ breaks that loop: after a configured retry limit the message is moved aside into the dead-letter queue, the main queue keeps flowing, and an operator (or an automated process) can later inspect, fix, and replay the quarantined messages. DLQs are a standard feature of message brokers and managed queues (SQS, RabbitMQ, Kafka via a dead-letter topic) and a cornerstone of resilient event-driven design alongside idempotency and backoff. For an AI agent operating a pipeline, the contents and growth rate of a DLQ are a strong health signal, and a shared memory of recurring poison-message patterns helps the agent triage faster.