Long-term Memory

Long-term memory is the agent’s durable store that survives across sessions — facts, preferences, and past events kept in an external database and retrieved back into context only when relevant.

Why it matters

Short-term memory dies when the session ends and is capped by the window; long-term memory is how an assistant remembers your name, your stack, and a decision made three weeks ago. It decouples total knowledge (unbounded, on disk) from active knowledge (the small slice retrieved per turn), so the agent can “know” gigabytes while only ever paying for a few thousand tokens of context.

How it works

The pattern is write on the way out, retrieve on the way in: extract durable facts from a session, persist them, and pull the top-k relevant ones into the next prompt.

Storage backends. A vector DB for fuzzy semantic recall; a key-value/SQL store for exact lookups (user settings); a knowledge graph for linked entities.
Write path. Summarize or run an LLM “extractor” over the transcript → emit atomic memories (user prefers dark mode) → embed + upsert with metadata (timestamp, source, user_id).
Read path. Embed the current query → ANN search → inject top-k memories as a pinned block in the system prompt.
Dedup + update. New memories that near-duplicate or contradict old ones should merge/overwrite, not pile up (see forgetting-aging-strategies).

Memory kind	Backend	Retrieval
Semantic facts	vector DB	similarity top-k
Exact prefs / state	key-value / SQL	direct lookup by key
Linked entities	graph DB	traversal

Example

A coding assistant across two sessions:

session 1: user says "we use pnpm, never yarn"
  → extractor writes memory{text:"project uses pnpm, not yarn",
                            user:u_12, ts:...} → embed → upsert
session 2 (days later): "add the axios dependency"
  → query embed → recalls the pnpm memory
  → agent runs `pnpm add axios`  (not yarn/npm)

Nothing from session 1 is in the live history; the right command comes purely from retrieved long-term memory.

Pitfalls

Storing raw transcripts. Dumping whole chats makes retrieval noisy; store distilled atomic facts, not logs.
Unbounded accumulation. Without dedup/aging, contradictory memories (“uses yarn” vs “uses pnpm”) both surface — apply forgetting-aging-strategies.
Retrieval ≠ truth. A high cosine score isn’t relevance; threshold and re-rank or you inject confidently wrong “memories”.
PII leakage. Long-term stores accumulate sensitive data across users; scope by user_id and redact, or you cross-contaminate.

tech-studies

Explorer

Long-term Memory

Long-term Memory

Why it matters

How it works

Example

Pitfalls

See also

Graph View

Table of Contents

Backlinks