Model Configuration / Tuning

Lessons in this group, roughly in build order:

  • temperature — Temperature is a scalar that scales the model’s logits before softmax, controlling how sharp or flat the…
  • top-p-top-k — Top-p (nucleus) and top-k are truncation samplers that restrict next-token choice to a shortlist of…
  • max-length-max-tokens — max_tokens caps how many tokens the model is allowed to generate in one response — a hard upper bound on…
  • stopping-criteria — Stopping criteria are the conditions that end token generation — the model’s own end-of-sequence token, an…
  • frequency-penalty — Frequency penalty subtracts from a token’s logit in proportion to how many times it has already appeared…
  • presence-penalty — Presence penalty subtracts a flat, one-time amount from a token’s logit as soon as it has appeared at all…
  • streamed-vs-unstreamed-responses — A response is either unstreamed (the API buffers the whole completion and returns it once) or streamed…