Basic Backend Development

The server-side skills — handling requests, talking to data stores, managing secrets and state — that an agent needs because an agent is a backend service that happens to call an LLM.

Why it matters

An agent loop runs somewhere: it receives input, calls model and tool APIs, persists memory, and returns a response. That “somewhere” is a backend. Long model calls (5-60s), streaming, retries, and per-user secrets are all backend concerns, and getting them wrong shows up as dropped sessions, leaked API keys, or a $400 bill from a retry storm.

How it works

A typical agent service is a thin HTTP layer over the loop:

ConcernWhy the agent needs itTypical tool
Request handlingAccept user turns, stream tokens backFastAPI, Express
PersistenceConversation and memory statePostgres, Redis
SecretsPer-tenant model/tool keysenv vars, vault
Background workLong tool calls, async jobsqueue, worker

Key shifts from a normal CRUD backend:

  • Latency budget is huge — a single turn may block on the model for tens of seconds, so synchronous request/response with a 30s gateway timeout breaks; use streaming or async jobs.
  • State lives outside the process — keep the loop stateless and push conversation/memory to a store keyed by session, so any worker can resume a turn.
  • Idempotency and retries — model/tool calls fail transiently; wrap them in retry-with-backoff, but make tool side effects idempotent or you double-send the email.

Example

# Minimal agent endpoint (FastAPI), streaming model output
@app.post("/chat/{session_id}")
async def chat(session_id: str, msg: Msg):
    history = await store.load(session_id)      # state lives in the DB
    history.append({"role": "user", "content": msg.text})
 
    async def gen():
        async for chunk in run_agent_loop(history):   # may run 10-30s
            yield chunk                                # stream tokens out
        await store.save(session_id, history)          # persist after turn
 
    return StreamingResponse(gen(), media_type="text/event-stream")

Pitfalls

  • Blocking the event loop — a synchronous requests.post to the model stalls every other request on an async server; use the async client or a thread pool.
  • Hardcoding one API key — fine for a demo, fatal for multi-tenant; scope keys per user and never log them.
  • No timeout/circuit breaker on tools — one hung scrape pins a worker forever; bound every outbound call.

See also