LLM Fundamentals
Lessons in this group, roughly in build order:
- closed-weight-models — A closed-weight model is an LLM whose parameters are never released — you rent inference over an HTTP API…
- open-weight-models — An open-weight model ships its trained parameters publicly (Llama 3, Mistral, Qwen, Gemma) so you can…
- reasoning-vs-standard-models — Reasoning models (o3, Claude with extended thinking, Gemini Thinking, DeepSeek-R1) spend extra hidden…
- context-windows — The context window is the maximum number of tokens a model can attend to at once — system prompt, history,…
- fine-tuning-vs-prompt-engineering — Two ways to steer an LLM without training from scratch: change the input at inference time (prompting) or…
- embeddings-and-vector-search — An embedding maps text to a fixed-length vector so that semantically similar text lands nearby; vector…
- token-based-pricing — LLM APIs bill per token, not per request, and almost always charge output tokens several times more than…
- pricing-of-common-models — A practical map of what the major LLM tiers cost per million tokens, and how to pick a tier so an agent…