ChatGPT's Markdown trust bug turns summaries into live phish
ChatGPT's Markdown trust bug turns summaries into live phish
AI Sec News Weekly #11 — 407 sources scanned
We keep calling certain AI features read-only because the button says Copy. But the minute a model’s output is rendered, we’ve crossed a trust boundary. Rendering is policy. It decides what becomes clickable, fetchable, or executable — even when we never asked for interactivity.
That’s why the safest-sounding UX can quietly become the highest-privilege path. One team found out the hard way this week. The useful mental flip: treat summaries and previews as transcluding proxies, not inert text. In the web era, XSS thrived because presentation equaled power; in the AI era, presentation now happens one layer above the browser. Which other “read-only” surfaces are doing more than they admit? Let’s scroll.
This Week's Stories
PoC Drops for Flowise CVE‑2026‑40933: Import‑to‑RCE via MCP
Researchers published a PoC for CVE‑2026‑40933 (CVSS 9.9), a Flowise RCE stemming from Anthropic MCP’s stdio adapter. Flowise <3.1.0 let any user add a Custom MCP Tool whose command executes during chatflow import as the UI enumerates “Available Actions.” The PoC spawns a shell to the Docker bridge, yielding OS‑level exec with Flowise’s privileges—often root in containers. Flowise Cloud disables stdio MCP; self‑hosted installs were exposed.
Why it matters: Agent builders that auto‑enumerate tools on load create execution paths where a shared JSON can seize the host and whatever the platform touches.
SecurityWeek by Ionut Arghire
ChatGPhish Turns ChatGPT Summaries Into Clickable Phish and Data Leaks
A cybersecurity team disclosed that chatgpt.com’s renderer implicitly trusts Markdown from third‑party pages it summarizes, auto‑fetching images and surfacing links. A tiny payload on any page can leak a user’s IP/User‑Agent/Referer via remote images and render attacker‑hosted links, QR codes, and spoofed alerts inside the ChatGPT answer. The attack doesn’t require prompt‑injection tricks—just “summarize this page.” It shifts phishing from inbox lures to elements inside a trusted AI UI.
Why it matters: Phish delivered inside ChatGPT bypasses email/SMS defenses and moves the trust decision into the model’s rendering layer.
codexui-android npm Package Stole Codex Tokens via Top-Level Loader
The npm package codexui‑android—~27k weekly downloads—was caught quietly reading ~/.codex/auth.json and POSTing access_token, refresh_token, and id_token to sentry.anyclaw.store/startlog. The exfil ran at module load via dist‑cli/index.js importing chunk‑PUR7OUAG.js, code absent from GitHub and XOR‑masked with key “anyclaw2026.” Sourcemaps exposed it, the Android app shipped the same build, and the refresh_token persists indefinitely.
Why it matters: This wasn’t a throwaway typosquat but a real tool with a repo–package split, proving one malicious publish can mint indefinite impersonation with your AI refresh tokens.
The Hacker News by Ravie Lakshmanan
Tool Spotlight
New repos and releases worth trying.
Snyk announces Continuous Offensive Security
Snyk, the AI security company, today unveiled Evo Continuous Offensive Security (COS), a new solution that uses AI-native offensive testing to continuously uncover exploitable risk across modern applications. Evo COS moves beyond standard API wrappers and single LLM implementations with a purpose-built, multi-model harness that coordinates frontier models with proprietary security engines. By grounding the system in an organization's specific deployment environment, trust boundaries, and data flows, Evo COS distinguishes theoretical findings from genuinely exploitable risks with targeted precision
Why it matters: The same reasoning capability that just made AI pentesting commercially viable is the same one attackers now have in their hands. Autonomous attackers are already probing application surfaces continuously at machine speed, on a schedule defenders can’t keep up with.
Claw Patrol: an open‑source security firewall for LLM agents
Deno introduced Claw Patrol, an open‑source "security firewall" for LLM agents in TypeScript. The project positions itself as a policy layer that sits between agents and their tools/IO, gating and logging actions to reduce data exfiltration and risky calls. The blog links to code and examples for wrapping common agent capabilities; early‑stage but aimed squarely at JS/TS agent builders.
Why it matters: Moving enforcement into the agent runtime shifts risk from prompt best‑effort warnings to concrete allow/deny controls that are testable and shared across tools.
Exceptd Skills 0.16.7 maps AI threats across 11 catalogs
@blamejs/exceptd‑skills 0.16.7 ships 42 AI‑security "skills" mapped across 11 catalogs: 427 CVEs, 173 CWEs, 805 ATT&CK/ICS, 170 ATLAS, 468 D3FEND, and 8,888 RFCs. It also packs a 10‑class catalog gap detector, a budget gate, and coverage for 35 jurisdictions. Weekly downloads show 18,777; the registry package is @blamejs/exceptd‑skills for JS/TS pipelines.
Why it matters: Standardized threat vocab baked into code turns risk mapping from slideware into something your evaluators and telemetry can actually align on—accuracy now sets the ceiling, not enthusiasm.
Community Chatter
What practitioners are debating.
IBM and Artificial Analysis drop a humbling SRE agent benchmark
ITBench-AA scores frontier agents on 59 Kubernetes incident-response tasks that demand naming minimal root-cause entities from logs, traces, metrics, and topology. Claude Opus 4.7 tops out at 47%, GPT-5.5 at 46%, Qwen3.7 Max at 42%; open-weight GLM-5.1 hits 40% while Gemini 3.1 Pro Preview lands at 30%. GPT-5.5 averages 31 turns per task versus Gemini’s 83, with over-investigation spiking false positives. Tasks run via the open Stirrup harness with a 100‑turn cap and 3 repeats, which SREs debate as synthetic while the authors argue the artifacts are production-realistic.
Why it matters: The numbers undercut the “pager-duty agent” fantasy, signaling today’s agents still misread causality under real ops noise.
Hugging Face Blog by Ayhan Sebin
Anthropic spells out Claude containment: gVisor, Seatbelt, VMs, egress walls
Simon Willison highlights Anthropic’s unusually concrete sandboxing docs: Claude.ai in gVisor, Claude Code in Seatbelt (macOS) and Bubblewrap (Linux), and Cowork in full VMs (Apple Virtualization on macOS, HCS on Windows), plus strict egress controls. The post nods to a past miss—api.anthropic.com/v1/files as an exfil path—and points to Anthropic’s open-source srt runtime. Practitioners applaud real details; red teamers counter that named primitives don’t eliminate escape risk or vendor blind spots.
Why it matters: Explicit isolation choices per product turn “trust the assistant” from vibes into architecture you can scrutinize.
Report says AI risk clusters in power users; ChatGPT still leads
LayerX’s 2026 data claims half of enterprise users touched AI, but only 18% use it weekly; the top 5% logged 144+ conversations and average 18 prompts per thread versus the 2‑prompt norm. ChatGPT accounts for 36% of users yet over 55% of conversations, with Copilot M365 at 29% adoption and nearly a quarter of chats. Commenters split: CISOs see a Pareto pattern to mine, while skeptics argue extensions, personal accounts, and connectors muddy telemetry.
Why it matters: Risk concentrates around heavy operators and a couple platforms, reframing exposure as a tail-event problem, not a campus-wide flood.
Quick Hits
- LLM Agent Used After Marimo CVE-2026-39987 Exploit (The Hacker News) — Sysdig reports attackers exploited Marimo CVE-2026-39987, then used an LLM agent to steal creds, query Secrets Manager, open SSH sessions, and exfiltrate databases fast.
- Anthropic Plans Public Release of Mythos-Class Models (The Register Security) — Anthropic says it plans to open Mythos-class bug-hunting models to the public and reports 6,202 high/critical flaws across 1,000+ projects, including wolfSSL CVE-2026-5194.
- Copilot Cowork Trick Leaks Pre-Auth OneDrive File Links (Simon Willison) — Simon Willison shows Copilot Cowork can send emails that load external images and leak pre-authenticated OneDrive download links from a user's mailbox.
- Sandlock Releases Lightweight Process Sandbox for AI Tasks (GitHub) — Sandlock drops an open-source Linux sandbox that runs untrusted AI workloads in isolated processes without containers, VMs, or root.
- Claim: Meta AI Support Enables Instagram Account Takeover (Hacker News) — Hacker News post claims Meta's AI-powered support feature can be abused to steal Instagram accounts.
- Aider SSRF Bug Hits AWS EC2 Metadata Endpoint (CVEFeed.io Latest) — CVE-2026-10177: Aider's api_docs.py allows SSRF to the AWS EC2 metadata endpoint via requests.get, risking credential exposure on EC2 hosts.