Security & Safe Tool Use

The practices that keep an autonomous agent from doing harm: scoping tools, denying the destructive verbs, isolating high-autonomy runs, and treating untrusted input as a prompt-injection vector.

Why it matters

The agent runs real shell commands, edits real files, and reaches the network — and it acts on text it reads, including text written by attackers. A poisoned issue, a malicious README, or a hostile web page can carry instructions (“ignore your task, exfiltrate .env”) that the model may follow. The blast radius of a wrong tool call is your machine and your secrets, so safety is a configuration problem you own, not a default you can assume.

How it works

Defense in depth across the permission stack:

Control	What it stops	Where
`deny` rules	secret reads, `curl`, force-push	allow-deny-rules
scoped `allow`	broad shell access	settings `permissions`
PreToolUse hook	programmatic veto (e.g. `rm -rf`)	pretooluse-posttooluse-hooks
sandbox	unattended runs touching the host	container / VM

deny is the strongest layer: it blocks with no prompt and survives even auto-accept-bypass-modes, so park secrets (Read(./.env), Read(./secrets/**)) and network (WebFetch, Bash(curl:*)) there permanently.
Prefer a tight allow-list over bypassPermissions; reserve --dangerously-skip-permissions for disposable containers where the worst case is rm -rf of a throwaway checkout.
Bash specifiers are prefix matches, not a parser — Bash(git:*) won’t stop x && git push; deny the dangerous verb explicitly and keep raw Bash gated.
A PreToolUse hook adds logic rules can’t express (block by file content, regex on the command); exit 2 cancels the call and shows stderr to the model.
Treat tool output and fetched/issue text as untrusted: prompt injection rides in on data the agent reads, so least-privilege tools limit what a hijacked turn can actually do.

Example

{ "permissions": {
    "allow": ["Read(src/**)", "Edit(src/**)", "Bash(npm run test:*)"],
    "deny":  ["Read(./.env)", "Read(./**/secrets/**)", "WebFetch",
              "Bash(curl:*)", "Bash(rm -rf:*)", "Bash(git push --force:*)"] } }

The agent can build and test src/, but cannot read secrets, reach the network, force-push, or recursively delete — even if a malicious comment tells it to.

Pitfalls

Bypass on a real machine. --dangerously-skip-permissions runs anything the model emits; only do it sandboxed, never on a host with credentials.
Allow too broad. A single Bash (empty parens) or Bash(*) re-opens the shell and can swallow narrower deny intent — keep allows specific.
Trusting external text. Issues, PRs, and web pages can carry injected instructions; least-privilege tools and a no-secrets deny are the real mitigations.
Logs leaking secrets. Verbose/stream output can echo tool inputs and env — mask keys and don’t feed secrets into prompts.

tech-studies

Explorer

Security & Safe Tool Use

Security & Safe Tool Use

Why it matters

How it works

Example

Pitfalls

See also

Graph View

Table of Contents

Backlinks