What is chunking?

Chunking is splitting a large document into smaller passages before embedding it, so retrieval can return focused, relevant pieces that fit a model's context window instead of whole files.

Chunking is the preprocessing step in a retrieval pipeline that breaks long source documents into smaller units before they are embedded and indexed. It exists because of two constraints: embeddings represent a fixed span of text best when that span is reasonably focused, and a model's context window is limited, so you want to retrieve and inject tight, relevant passages rather than entire files. The strategy you choose shapes retrieval quality. Naive fixed-size splits (every N tokens or characters) are simple but can cut sentences or ideas in half; better approaches split on natural boundaries, paragraphs, headings, code blocks, and add a small overlap between adjacent chunks so context spanning a boundary is not lost. Chunk size is a tradeoff: too large and a retrieved chunk is mostly irrelevant filler that dilutes the signal and wastes the context budget; too small and a single chunk lacks enough surrounding context to be useful on its own. Each chunk is embedded into its own vector and stored, so chunking directly determines what the smallest retrievable unit is, and therefore how precise semantic search and RAG can be over your corpus.