What is reranking?

Reranking is a second retrieval pass that reorders an initial set of candidate results by relevance using a more accurate model, so the best passages rise to the top before they reach the agent.

Reranking is a refinement stage in a retrieval pipeline that takes the candidates from a fast first-pass search and reorders them by how relevant they actually are to the query. The reason it exists is a speed-versus-accuracy split. The first pass, typically vector similarity over a whole corpus, is fast enough to scan millions of items but only approximates relevance, so it casts a wide net and returns, say, the top 50 candidates. A reranker then scores each candidate against the query with a more expensive, more accurate model, commonly a cross-encoder that reads the query and document together rather than comparing pre-computed embeddings, and produces a sharper ordering. You keep the top handful after reranking. This two-stage design gets most of the accuracy of the heavy model while only paying its cost on a small shortlist instead of the entire index. In RAG and agent retrieval, reranking matters because the model's context budget is limited: putting the genuinely best passages first, and dropping marginal ones, improves answer quality and avoids wasting context on near-misses. Several vector and search providers expose reranking as a managed step alongside their similarity search.