What is a token?

A token is the atomic unit an LLM reads and generates, a word, sub-word, or character fragment. Models measure context length, throughput, and pricing in tokens, not words or characters.

A token is the basic chunk of text a language model operates on. Text is split by a tokenizer into tokens before the model sees it, and the model generates its output one token at a time. Tokens are usually sub-word units: common words may be a single token, while rare words, code, or unusual strings split into several. A rough rule of thumb for English is that one token is about four characters, or roughly three-quarters of a word, but this varies by tokenizer and language (many non-English scripts use more tokens per character). Tokens are the unit of nearly everything that matters operationally. The context window is a token budget shared by the prompt and the response. API pricing is quoted per input and output token, often at different rates. Latency scales with the number of tokens generated. Rate limits are frequently expressed in tokens per minute. Understanding tokens explains otherwise surprising behavior: why a model miscounts letters in a word (it sees tokens, not characters), why pasting a huge file blows the budget, and why trimming or compacting context saves real money. For agent and RAG pipelines, token accounting drives chunk sizes, retrieval limits, and when to compact history.