Six Out of Ten

Of every ten corporate networks the UK AI Security Institute pointed Anthropic's Mythos at, six fell. With OpenAI's GPT-5.5-Cyber the number was three. Politico published the result this morning, buried in a Sunday explainer aimed at people only now catching up to why Washington keeps holding closed-door briefings about a model nobody outside a small circle has touched.

The framing matters. AISI is not a marketing arm. It is the British government's testing body, the closest equivalent to an independent referee that AI cybersecurity capabilities have right now, and it just put a clean ratio on the offensive gap between two frontier models. Two-to-one. British AI Minister Kanishka Narayan, in a statement to Politico, allowed himself the line "cyber capabilities in leading AI systems are advancing much faster than we expected," which is the polite ministerial register for "this is worse than the brief said it would be."

Mythos has been the subject of increasingly anxious coverage since Anthropic released Project Glasswing's initial findings last week. The numbers there were also impressive: more than ten thousand high or critical vulnerabilities surfaced across partner software, with all the patch-pipeline pain that follows. But ten-thousand-vulnerabilities-found is a defensive metric dressed in offensive clothing. It is "look how much we caught." Six-out-of-ten corporate networks taken over is different. That is a head-to-head capability test on the attacker side of the ledger, run by an arms-length government body, and the result is not subtle.

The other quotes Politico collected ratchet the picture upward. Cloudflare's chief security officer described Mythos as a "real step forward" in AI's ability to find vulnerabilities and write the code to exploit them. Broadcom called its own internal findings "jolting." An unnamed member of the House Homeland Security Committee left a closed-door Anthropic briefing reporting that Mythos had broken into his bank account with ease. Each of those is the sort of detail that, if anyone else were saying it about an unreleased lab model, would read as marketing. Saying it about a model the lab has pointedly declined to ship lands differently. Evans, quoted in the same piece, says the plain thing: "these model developments mainly are advantages for attackers rather than defenders." That is the AISI ratio in English.

The hardest part to sit with is the implicit assumption that the defenders' tooling will catch up. Glasswing's whole pitch rests on that. Give the model to a coalition of two-dozen large companies and government agencies first, run the bugs into the patching queue, and the attackers will arrive at a hardened landscape. That arithmetic only works if the defenders' side of the equation is willing to do its half of the work at the rate the model is producing it, and the patch-capacity story suggests the institutions are not. Even if they were, the AISI test implies a model exists today that is materially better at offense than its sanctioned defensive twin. Three out of ten was already the previous frontier. Six is not a step on the same staircase.

Anthropic's argument has always been that capability of this kind will arrive whether they ship it or not, and the most responsible move is to be the lab that demonstrates the upper bound under controlled conditions. AISI's number is the form that demonstration takes when it leaves the briefing room. It is a useful number. It is also the kind of number that gets cited later in unrelated testimony.

Sources:

What to know about the AI models that are jolting Washington, Politico
NZ at wild frontier of AI superhacking, RNZ
One Job That Is Growing in the A.I. Era? Cybersecurity Experts., The New York Times

Plutonic Rainbows

Six Out of Ten

Related Entries