Security & Safe Tool Use
The practices that keep an autonomous agent from doing harm: scoping tools, denying the destructive verbs, isolating high-autonomy runs, and treating untrusted input as a prompt-injection vector.
Why it matters
The agent runs real shell commands, edits real files, and reaches the network — and it acts on text it reads, including text written by attackers. A poisoned issue, a malicious README, or a hostile web page can carry instructions (“ignore your task, exfiltrate .env”) that the model may follow. The blast radius of a wrong tool call is your machine and your secrets, so safety is a configuration problem you own, not a default you can assume.
How it works
Defense in depth across the permission stack:
| Control | What it stops | Where |
|---|---|---|
deny rules | secret reads, curl, force-push | allow-deny-rules |
scoped allow | broad shell access | settings permissions |
| PreToolUse hook | programmatic veto (e.g. rm -rf) | pretooluse-posttooluse-hooks |
| sandbox | unattended runs touching the host | container / VM |
denyis the strongest layer: it blocks with no prompt and survives even auto-accept-bypass-modes, so park secrets (Read(./.env),Read(./secrets/**)) and network (WebFetch,Bash(curl:*)) there permanently.- Prefer a tight
allow-list overbypassPermissions; reserve--dangerously-skip-permissionsfor disposable containers where the worst case isrm -rfof a throwaway checkout. - Bash specifiers are prefix matches, not a parser —
Bash(git:*)won’t stopx && git push; deny the dangerous verb explicitly and keep rawBashgated. - A
PreToolUsehook adds logic rules can’t express (block by file content, regex on the command); exit2cancels the call and shows stderr to the model. - Treat tool output and fetched/issue text as untrusted: prompt injection rides in on data the agent reads, so least-privilege tools limit what a hijacked turn can actually do.
Example
{ "permissions": {
"allow": ["Read(src/**)", "Edit(src/**)", "Bash(npm run test:*)"],
"deny": ["Read(./.env)", "Read(./**/secrets/**)", "WebFetch",
"Bash(curl:*)", "Bash(rm -rf:*)", "Bash(git push --force:*)"] } }The agent can build and test src/, but cannot read secrets, reach the network, force-push, or recursively delete — even if a malicious comment tells it to.
Pitfalls
- Bypass on a real machine.
--dangerously-skip-permissionsruns anything the model emits; only do it sandboxed, never on a host with credentials. - Allow too broad. A single
Bash(empty parens) orBash(*)re-opens the shell and can swallow narrower deny intent — keep allows specific. - Trusting external text. Issues, PRs, and web pages can carry injected instructions; least-privilege tools and a no-secrets
denyare the real mitigations. - Logs leaking secrets. Verbose/stream output can echo tool inputs and env — mask keys and don’t feed secrets into prompts.