What is Prompt injection?
Prompt injection is an attack where adversarial text hidden in tool output, web pages, or documents hijacks an LLM agent's instructions, causing it to ignore its system prompt and follow the attacker's commands instead.
Prompt injection is the dominant security risk for tool-using AI agents. Because an LLM treats all the text in its context as potentially instructive, an attacker can plant commands in any content the agent ingests, a web page it scrapes, a GitHub issue it reads, an email it summarizes, and steer the agent to leak data, call dangerous tools, or override its system prompt. Indirect prompt injection is the especially nasty variant: the malicious instructions arrive inside data the agent fetches from a tool rather than from the user, so the user never sees them. MCP makes this concrete because tool results flow straight into the model's context; a poisoned tool description or a compromised server (see tool poisoning) can carry an injection payload. Defenses include treating all tool output as untrusted data, constraining what tools an agent may call, requiring human approval for destructive actions, and adding guardrails that detect instruction-like content in retrieved text. A memory layer like Glen reduces the blast radius by being read-and-write of plain observations rather than an execution surface, but operators should still scope which servers an agent trusts and review the privileges each MCP server is granted.