What is temperature (LLM)?

Temperature is a sampling parameter that controls randomness in an LLM's output: low values make responses focused and deterministic, high values make them more varied and creative.

Temperature scales the probability distribution the model samples its next token from. Before sampling, the model produces a score (logit) for every possible next token; temperature divides those scores before they are turned into probabilities. A low temperature (near 0) sharpens the distribution so the most likely token almost always wins, yielding focused, repeatable, near-deterministic output, ideal for extraction, classification, code, or any task with a single right answer. A higher temperature (around 0.7 to 1.0+) flattens the distribution so less likely tokens get a real chance, producing more diverse, creative, or surprising text, useful for brainstorming and writing. At 0 the model is effectively greedy, though true determinism is not guaranteed across hardware and batching. Temperature interacts with other sampling controls like top-p (nucleus sampling) and top-k, which restrict the candidate pool before temperature reshapes it; tuning one often means revisiting the others. For agentic and tool-using pipelines, low temperature is usually the safer default because it makes tool-argument generation and structured output more reliable and reproducible, which matters when you are debugging a chain or asserting on results in tests.