Temperature

Temperature is a scalar that scales the model’s logits before softmax, controlling how sharp or flat the next-token probability distribution is — and thus how random the output gets.

Why it matters

It is the single biggest knob between deterministic, reproducible and creative, diverse generation. Agents that call tools or emit JSON want low temperature so the structure is stable; a brainstorming or NPC-dialogue node wants it high so outputs don’t collapse to one phrasing. Pick wrong and you either get rigid, repetitive text or malformed tool calls.

How it works

Each logit z_i is divided by T before softmax: p_i = softmax(z_i / T).

T → 0 — the distribution spikes toward the single highest logit (greedy/argmax). Near-deterministic.
T = 1 — the model’s raw, untempered probabilities.
T > 1 — logits flatten, low-probability tokens gain mass, output gets wilder and eventually incoherent.

T	Effect	Typical use
0.0–0.3	Sharp, near-greedy	Tool calls, extraction, code, react-reason-act steps
0.7–0.9	Balanced	Chat, drafting
1.0–1.5	Flat, diverse	Brainstorm, fiction, npc-game-ai

Temperature and top-p compose: temperature reshapes the curve, then top-p/top-k truncate its tail. Most providers apply temperature first. Note T=0 is “as close to greedy as the API allows” — not a true determinism guarantee.

Example

Logits for three tokens: [2.0, 1.0, 0.1].

T = 1.0 → softmax = [0.65, 0.24, 0.11]
T = 0.5 → softmax = [0.85, 0.12, 0.02]   # sharper, top token dominates
T = 2.0 → softmax = [0.46, 0.28, 0.18]   # flatter, tail gains mass

Pitfalls

T=0 is not reproducible — GPU non-determinism, batching, and MoE routing still cause drift across calls and model versions.
Stacking with top-p — setting both aggressively low (T=0.2, top_p=0.5) over-truncates; tune one, leave the other near default.
High T over a long horizon — small per-token randomness compounds across an agent-loop, so a multi-step agent can wander off task far more than a single completion suggests.

tech-studies

Explorer

Temperature

Temperature

Why it matters

How it works

Example

Pitfalls

See also

Graph View

Table of Contents

Backlinks