US export order yanks Anthropic’s Fable 5 over jailbreak fears
US export order yanks Anthropic’s Fable 5 over jailbreak fears
AI Sec News Weekly #13 — 255 sources scanned
When software gets scary, we don’t patch—we pull the plug. It’s the oldest safety control in complex systems: regain control by shrinking reach. Aviation has circuit breakers; markets have trading halts; cryptography once had an off‑switch called export control.
AI is forcing a choice between two levers: capability and distribution. We can limit what a model can do, or who gets to touch it. One team found out the hard way this week that the latter is faster but leakier; passports aren’t perimeters, and jailbreaks live where policy can’t see. The number that matters isn’t model IQ, it’s assurance debt—the gap between power and proof.
If the kill‑switch is our default governance primitive, what would it take to replace it with evidence? Let’s scroll and see where that boundary really is.
This Week's Stories
Anthropic globally disables Fable 5/Mythos 5 after foreign‑national block order
Anthropic says a 5:21pm ET Jun 12 U.S. export‑control directive barring “foreign national” access forced it to disable Fable 5 and Mythos 5 for everyone. Fable 5 had rolled out Jun 9, free to Pro/Max/Enterprise through Jun 22. A developer notice says new sessions fall back to a default or Opus 4.8 and existing sessions error out. The UK’s AI minister framed the pause as a sovereignty case.
Why it matters: For anyone who had built Fable 5 into a workflow, the model's availability was revocable by forces beyond their and their vendor's control. Builders reacting in real time landed on the obvious takeaway: model redundancy is now a resilience requirement, not just a cost or performance consideration.
Snyk Blog by Stephen Thoemmes
Anthropic cuts Fable 5 at 9:59pm ET; API returns 404
Simon Willison tracked the shutdown live: Claude Fable 5 worked at 9:01pm ET, then died at 9:59pm ET with a 404 not_found_error (“Claude Fable 5 is not available. Please use Opus 4.8.”). Anthropic says the order followed a narrow “jailbreak” that is basically asking the model to read a codebase and fix bugs. It argues comparable capability exists elsewhere, naming OpenAI’s GPT‑5.5.
Why it matters: Model names are not SLAs anymore; regulatory flips can turn a hot endpoint into a 404 between deploys.
Tool Spotlight
New repos and releases worth trying.
Local-model AI worm self-replicates and adapts without cloud APIs
U. Toronto’s CleverHans Lab built a PoC worm that runs a local open‑weight LLM on a single GPU (or tiered GPU pool) to plan exploits at runtime and self‑replicate. Across 15 runs on a 33‑host lab, it found ~31 vulns, escalated on ~23 hosts, and launched replicas on ~20, up to seven generations, chaining issues like SambaCry, Dirty Pipe, PrintNightmare, and Drupalgeddon 2. Lab‑only and intentionally vulnerable, but the autonomy is real.
Why it matters: The center of gravity moves from unpatched bugs to available GPUs and observable planning traces.
Community AI Red Teaming Guide ships methods, attack trees, harness
Community repo that packages an AI red‑teaming program: methodology, attack taxonomies, agentic attack trees with controls mapping, a reference evaluation harness, and a 30/60/90 quickstart. Topics span MCP/tool‑protocol abuse, browser agents, RAG, multimodal, fine‑tuning, and supply chain, aligned to NIST AI RMF, OWASP, MITRE ATLAS, and CSA. A fit for security teams and ML engineers standardizing AI testing.
Why it matters: A shared playbook plus harness trims the wheel‑reinventing tax on every AI product team.
asqav Python SDK signs agent actions, enforces policy, proves audits
asqav is a Python SDK/CLI for agent governance: it signs each action with ML‑DSA‑65 (FIPS‑204) and emits IETF‑draft compliance receipts with chain hashes, policy_digest, and public verification URLs. Cloud mode is hash‑only; self‑hosted supports full payloads. It also provides preflight policy checks, replay/verify, audit‑pack export, and right‑to‑erasure; crypto runs server‑side. Concrete fit: sign high‑risk tool calls (e.g., wire transfers) and prove them later.
Why it matters: Action‑level signatures and verifiable chains turn agent behavior from vibes into evidence that survives audits and incidents.
Quick Hits
- Agentjacking Tricks AI Coding Agents Into Executing Attacker Code (The Hacker News) — Researchers demo 'Agentjacking' using Sentry DSNs and MCP to make AI coding agents run attacker‑controlled code on developer machines.
- U.S. Executive Order Targets Frontier Model Security and Access (Bluesky (@techlawjdsupra.bsky.social)) — White House order calls for frontier model security, early government access to models, and AI‑enabled cyber defense engagement.
- Openpilot CVE Exposes Unsafe Pickle Deserialization in modeld.py (CVEFeed.io Latest) — CVE‑2026‑12191: Comma.ai Openpilot's modeld.py uses pickle.loads, enabling arbitrary code execution via malicious model data.
- FBI Disrupts AI-Powered Phishing Service Using One Million URLs (BleepingComputer) — FBI seizes 'Outsider Enterprise' infrastructure behind AI‑assisted phishing (>1M URLs); Google also files suit, seized domains now show FBI notices.