Observation / Reflection

Observation is taking a tool’s result back into context; reflection is the model judging that result — did it work, what did I learn, what next — and using it to adjust the plan.

Why it matters

This closes the loop: without it an agent acts blind, never noticing a failed call or a wrong answer. Reflection is exactly what separates an agent from a fixed pipeline — the capacity to react to results it couldn’t predict, recover from errors, and stop when actually done. It’s also the cheapest reliability lever: a single “check your work” pass catches a large share of hallucinated or malformed outputs before they reach the user.

How it works

The raw tool output is first observed (normalized and appended to context), then optionally reflected on (the model critiques it and decides to continue, retry, or finish).

Observe: trim and shape the result — a 404, a 10K-row table, or a stack trace — into a compact message; never dump megabytes back verbatim.
Reflect: ask “does this satisfy the goal? is it correct? what’s the next step?” — implicit each turn in ReAct, or an explicit self-critique pass.
Verify against ground truth when possible — re-run the test, re-read the file, diff the value — far more reliable than the model grading itself.
Promote durable lessons (“API X needs ISO dates”) into long-term-memory or a scratchpad so they survive the turn.

Observation	Reflection → action
Tool returned 404	URL wrong → try the search tool instead
Test still failing	fix incomplete → read the error, patch again
Result matches goal	done → emit final answer, exit loop

Example

A coding agent runs the test suite (action), observes 2 failed (observation), reflects that its patch missed a null case, edits again, re-runs, and observes 0 failed — only then does it stop. The verify-by-re-running step is what makes the stop trustworthy; self-assessment alone would often declare victory early.

Pitfalls

Skipping verification — trusting the model’s “looks correct” instead of re-checking against reality lets errors through.
Reflection loops — endless self-critique with no progress; cap reflection passes and require a concrete next action.
Over-trusting self-grading — a model that wrote the answer is a biased judge of it; prefer external checks.
Echoing raw output — appending huge unfiltered results blows the window and buries the signal.

tech-studies

Explorer

Observation / Reflection

Observation / Reflection

Why it matters

How it works

Example

Pitfalls

See also

Graph View

Table of Contents

Backlinks