Remote / Cloud

Running the agent loop on a server you operate, exposed over an API, so many users hit one centrally-managed, observable, horizontally-scaled deployment.

How it works

The loop lives behind an HTTP endpoint. Because agent-loop runs are long and bursty, the hard parts are concurrency, state, and isolation — not the model call itself.

  • Stateless workers, external state — keep the loop process stateless and stash session/short-term-memory in Redis or Postgres, so any worker can resume a run and you can scale pods independently.
  • Long-running jobs — a multi-step run can take minutes; use async workers + a queue (or websockets/SSE for streamed-vs-unstreamed-responses) rather than a blocking request that hits a 30 s gateway timeout.
  • Per-tenant isolation — tool execution (esp. code-execution-repl) runs in a per-request sandbox (container/microVM), never the app process, since you’re running other people’s prompts.
  • Centralized everything — one place for structured-logging-tracing, rate limits, key rotation, and prompt updates pushed to all users instantly.

Why it matters

Cloud is the default for any multi-user product: you patch a prompt-injection hole or swap models once and everyone gets it, versus chasing stale installs in local-desktop. It’s also where you see what the fleet does — traces, token-based-pricing cost per tenant, eval metrics — and where you enforce guardrails server-side instead of trusting a client.

Example

A typical async request path:

POST /runs {goal} → 202 {run_id}        # enqueue, don't block
 worker: pop job → agent loop (N steps)
         each tool_use → spawn sandbox, run, collect obs
         stream tokens over SSE to client
 state → Redis: runs:<id> = {step, messages, cost}
GET /runs/<id> → status + partial output  # poll or resume

One model key, one trace backend, one rate limiter front everyone — N concurrent runs, no shared mutable state between them.

Pitfalls

  • Gateway timeouts. A synchronous endpoint dies on a 4-minute run; go async (202 + poll/stream) or the load balancer kills it mid-loop.
  • Runaway cost. A looping agent can burn thousands of tokens per request; enforce a per-run step + token budget server-side, not just client-side (see max-length-max-tokens).
  • Cross-tenant leakage — shared sandbox state, a cached vector index, or a mis-scoped DB query exposes one user’s data to another; isolate per tenant.
  • Lost runs on deploy — in-memory loop state vanishes when a pod restarts; externalize state so a rolling deploy or crash can resume, not drop, the job.

See also