The Safety Company That Leaked Itself

Anthropic's content management system has a toggle. Public or private. Someone forgot to flip it.

That's how roughly 3,000 unpublished assets ended up publicly searchable on the open web. Draft blog posts, images, PDFs. Among them: a detailed draft announcing Claude Mythos, described as "by far the most powerful AI model we've ever developed." Security researchers Roy Paz and Alexandre Pauwels found the cache. Fortune broke the story on Thursday. Anthropic called it "human error" and removed public access.

The irony needs no elaboration. A company that has made AI safety its founding identity, that walked away from a $200 million Pentagon contract over surveillance and weapons concerns, exposed its most sensitive model details through a checkbox.

The model sits above Opus in Anthropic's hierarchy. Two versions of the draft existed, one calling it Mythos, the other Capybara. The leaked documents claim "dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity" compared to Opus 4.6. Anthropic confirmed they're developing "a general purpose model with meaningful advances in reasoning, coding, and cybersecurity" and called it "a step change."

The cybersecurity angle is what moved markets. The draft warned Mythos is "currently far ahead of any other AI model in cyber capabilities" and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." CrowdStrike dropped 7 percent on Friday. Palo Alto Networks fell 6. Stifel analyst Adam Borg called it "the ultimate hacking tool."

This is the dual-use problem made concrete. A model that discovers zero-day vulnerabilities helps defenders patch them. It also hands attackers a map to every unlocked door. Anthropic says they're rolling out to cybersecurity organizations first, giving defenders a head start. Whether that advantage survives broader availability is the question nobody can answer yet.

Futurism raised a fair point: frontier AI companies routinely claim breakthrough capabilities, and OpenAI's underwhelming GPT-5 launch should temper expectations. The difference is that Anthropic didn't choose to make these claims publicly. The draft was written for internal use, which makes the language harder to dismiss as marketing. Companies tend to be more honest in documents they don't expect anyone to read.

The model is reportedly expensive to serve, with no public release date. Anthropic is being "deliberate," which is the right word for a company whose safety reputation just absorbed an unforced error. The leak didn't expose model weights or API access. But for a company whose entire brand rests on the claim that it handles powerful AI more carefully than anyone else, a misconfigured CMS toggle is a difficult look.

Sources:

Anthropic 'Mythos' AI Model Revealed in Data Leak — Fortune
Anthropic's Leaked AI Poses Unprecedented Cybersecurity Risks — Fortune
Claude Mythos Leak: Dramatically Higher Scores — The Decoder
Cybersecurity Stocks Plunge After Mythos Leak — Investing.com

Plutonic Rainbows

The Safety Company That Leaked Itself

Related Entries