What is a context window?

A context window is the maximum amount of text, measured in tokens, that a language model can consider at once, covering the prompt, conversation history, retrieved data, and the model's own output.

The context window is the fixed budget of tokens a language model can attend to in a single pass. Everything the model reasons over has to fit inside it: the system prompt, the conversation so far, any documents or tool results pulled in, and the response being generated. Modern models advertise large windows, hundreds of thousands of tokens or more, but the window is still finite and shared, so it is a scarce resource to be managed, not an excuse to dump in everything. Two practical pressures arise. First, cost and latency rise with the number of tokens, so stuffing the window is wasteful. Second, models exhibit a lost-in-the-middle tendency where information buried in a huge context gets less reliable attention than content near the edges. This is why retrieval, summarization, and memory matter: rather than pasting an entire codebase or every past conversation, you fetch and inject only what is relevant. MCP servers feed the window precisely this way, returning targeted tool results, and a memory server like Glen returns just the observations relevant to the current task instead of the whole history.