What is agentic RAG?

Agentic RAG is retrieval-augmented generation driven by an agent that decides when and what to retrieve, can issue multiple searches, refine its queries, and use tools, instead of running a single fixed retrieve-then-answer step.

Agentic RAG combines retrieval-augmented generation with an agent loop, so the model actively controls retrieval instead of a pipeline doing one fixed fetch before answering. In classic RAG, the system embeds the question, pulls the top matching chunks, and pastes them into the prompt, one shot, same path every time. Agentic RAG instead lets the model reason about what it needs: it can decide whether to retrieve at all, choose among multiple sources or tools, reformulate a query that returned weak results, perform several searches and combine them, and judge whether the evidence it has is sufficient before answering. This handles multi-hop questions and ambiguous queries far better than single-shot retrieval, because the agent can iterate the way a researcher would. Exposing retrieval as MCP tools fits this pattern naturally: each search, lookup, or memory recall becomes a tool the agent calls on demand, and the agent orchestrates them. The tradeoff is cost and latency, since several model turns and retrievals replace one, so good implementations bound the loop and cache aggressively.