Security & Safe Tool Use

The practices that keep an autonomous agent from doing harm: scoping tools, denying the destructive verbs, isolating high-autonomy runs, and treating untrusted input as a prompt-injection vector.

Why it matters

The agent runs real shell commands, edits real files, and reaches the network — and it acts on text it reads, including text written by attackers. A poisoned issue, a malicious README, or a hostile web page can carry instructions (“ignore your task, exfiltrate .env”) that the model may follow. The blast radius of a wrong tool call is your machine and your secrets, so safety is a configuration problem you own, not a default you can assume.

How it works

Defense in depth across the permission stack:

ControlWhat it stopsWhere
deny rulessecret reads, curl, force-pushallow-deny-rules
scoped allowbroad shell accesssettings permissions
PreToolUse hookprogrammatic veto (e.g. rm -rf)pretooluse-posttooluse-hooks
sandboxunattended runs touching the hostcontainer / VM
  • deny is the strongest layer: it blocks with no prompt and survives even auto-accept-bypass-modes, so park secrets (Read(./.env), Read(./secrets/**)) and network (WebFetch, Bash(curl:*)) there permanently.
  • Prefer a tight allow-list over bypassPermissions; reserve --dangerously-skip-permissions for disposable containers where the worst case is rm -rf of a throwaway checkout.
  • Bash specifiers are prefix matches, not a parser — Bash(git:*) won’t stop x && git push; deny the dangerous verb explicitly and keep raw Bash gated.
  • A PreToolUse hook adds logic rules can’t express (block by file content, regex on the command); exit 2 cancels the call and shows stderr to the model.
  • Treat tool output and fetched/issue text as untrusted: prompt injection rides in on data the agent reads, so least-privilege tools limit what a hijacked turn can actually do.

Example

{ "permissions": {
    "allow": ["Read(src/**)", "Edit(src/**)", "Bash(npm run test:*)"],
    "deny":  ["Read(./.env)", "Read(./**/secrets/**)", "WebFetch",
              "Bash(curl:*)", "Bash(rm -rf:*)", "Bash(git push --force:*)"] } }

The agent can build and test src/, but cannot read secrets, reach the network, force-push, or recursively delete — even if a malicious comment tells it to.

Pitfalls

  • Bypass on a real machine. --dangerously-skip-permissions runs anything the model emits; only do it sandboxed, never on a host with credentials.
  • Allow too broad. A single Bash (empty parens) or Bash(*) re-opens the shell and can swallow narrower deny intent — keep allows specific.
  • Trusting external text. Issues, PRs, and web pages can carry injected instructions; least-privilege tools and a no-secrets deny are the real mitigations.
  • Logs leaking secrets. Verbose/stream output can echo tool inputs and env — mask keys and don’t feed secrets into prompts.

See also