The Leak Anthropic Couldn't DMCA Away

A 59.8 megabyte source map file. That is what separated Anthropic's most sophisticated product from the public domain. The @anthropic-ai/claude-code npm package shipped with a .map file that pointed to a zip archive sitting on Anthropic's own Cloudflare R2 storage bucket. Anyone could download it. Inside: approximately 1,900 TypeScript files, 512,000 lines of unobfuscated code, and the complete architectural blueprint of the agentic harness that makes Claude Code work.

Security researcher Chaofan Shou found it on March 31. By the time Anthropic responded, the source had been forked 41,500 times on GitHub.

The root cause was not exotic. Bun, the JavaScript runtime Claude Code uses, generates source maps by default. Somebody needed to add *.map to .npmignore or the files field of package.json. Nobody did. Gabriel Anhaia, a software engineer who analysed the leak, put it plainly: "A single misconfigured .npmignore or files field in package.json can expose everything." Anthropic engineer Boris Cherny later acknowledged "a manual deploy step that should have been better automated." The identical vector had leaked source code thirteen months earlier, in February 2025. The fix was never properly automated.

This was Anthropic's second public exposure in five days. I wrote last week about the CMS misconfiguration that left 3,000 unpublished files searchable, including draft blog posts revealing the internal codename Mythos for an unreleased model family above Opus. That leak was embarrassing. This one was structural.

The distinction matters. A CMS toggle is a configuration error. Shipping your entire source tree to npm is a pipeline failure, one that had already happened before and was supposedly addressed. The question of whether the Mythos leak was accidental is interesting in its own right, but nobody is suggesting Anthropic wanted 512,000 lines of TypeScript indexed on every package manager mirror on Earth.

What the code revealed is more interesting than how it escaped.

The leak exposed Claude Code's full tool system, fewer than twenty default tools and up to sixty-plus total, including file editing, bash execution, and web search. It revealed a three-tier memory architecture designed around context window conservation: an index layer always loaded into the conversation, topic files pulled on demand, and transcripts searchable via grep but never loaded directly. The system treats memory as hints rather than truth, which is a surprisingly honest design philosophy for a product that markets itself on reliability.

More revealing was KAIROS, an unreleased autonomous daemon mode that runs continuously via a heartbeat prompt asking "anything worth doing right now?" It integrates with GitHub webhooks, operates on five-minute cron cycles, and includes a /dream command for background memory consolidation. Forty-four hidden feature flags gate unreleased capabilities including voice commands, browser control via Playwright, and multi-agent orchestration. The source comments reference internal model codenames: Capybara for v8 with a one-million-token context window, Numbat and Fennec for upcoming releases, and Tengu, which appears in connection with something called "undercover mode."

Undercover mode deserves its own paragraph. It is enabled by default for Anthropic employees working in public repositories. The system suppresses internal codenames, unreleased version numbers, references to "Claude Code," and Co-Authored-By attribution lines. The leaked configuration exposed 22 private Anthropic repository names. The opacity is not inherently sinister, companies routinely scrub internal references from public commits, but for a lab that has built its brand on transparency and careful stewardship, the discovery of a system specifically designed to hide AI involvement in public code contributions is not a great look.

The codebase also contained anti-distillation defences: decoy tool definitions injected into system prompts to poison any training data scraped from Claude Code sessions, plus cryptographically signed server-side summaries that prevent access to full reasoning chains. A 9,707-line bash security system uses tree-sitter WASM AST parsing with 22 unique validators. And buried in the source comments, a documented parser differential vulnerability where carriage return characters could bypass command validation, because shell-quote and bash disagree on what constitutes whitespace.

An internal BigQuery comment, timestamped March 10, noted that 1,279 sessions had experienced fifty or more consecutive compaction failures, wasting approximately 250,000 API calls daily before a cap of three retries was applied. That is the kind of detail that transforms a leak from an IP issue into a product credibility question.

One function in the codebase spans 3,100 lines with 486 branch points of cyclomatic complexity. The Hacker News thread, which accumulated 2,074 points and over a thousand comments, featured a lively debate about whether traditional code quality standards apply to AI-generated software. Some argued that velocity matters more than structure when models write the code. Others pointed out that humans still have to maintain it. I find myself in the second camp, but the argument is genuinely unsettled.

The community response was immediate and aggressive. The primary mirror repository hit 32,600 stars before Anthropic's legal team intervened. A developer using the handle @realsigridjin released Claw Code, a ground-up Python port built using OpenAI's Codex to sidestep copyright claims. It reached 75,000 stars and remains online. SafeRL-Lab published nano-claude-code, a minimal 900-line reimplementation supporting Claude, GPT, Gemini, DeepSeek, and local models. Multiple analysis repositories appeared, mapping the architecture in detail. The genie is not going back in the bottle.

Between 00:21 and 03:29 UTC on March 31, attackers published typosquatted npm packages targeting users attempting to compile the leaked code, bundling a remote access trojan. The supply chain attack was discovered quickly, but it illustrates a second-order risk that Anthropic's official statement did not address. "No sensitive customer data or credentials were involved" is technically accurate and completely beside the point when your leaked code is being weaponised as a lure within hours.

The DMCA response made things worse. Anthropic filed takedown notices that accidentally removed approximately 8,100 GitHub repositories, including legitimate forks of Anthropic's own public Claude Code repository that contained none of the leaked source. Boris Cherny acknowledged: "This was not intentional, we've been working with GitHub to fix it." Anthropic retracted notices for all but one repository and 96 forks containing the actual leaked material. The formal DMCA filing is publicly visible on GitHub's transparency repository. Nuking eight thousand innocent repos to protect code that was already mirrored across dozens of platforms is not a strategy. It is damage compounding.

The broader pattern is what concerns me. Anthropic has positioned itself as the careful lab, the one that thinks about safety before shipping, the one that walks away from defence contracts over ethical concerns. Two major leaks in five days, one of them a repeat of a known vector from thirteen months earlier, followed by a DMCA overreach that punished thousands of uninvolved developers. The engineering quality of the leaked codebase was broadly praised, the memory architecture is clever, the anti-distillation measures are sophisticated, but operational security is not about how good your code is. It is about whether your release pipeline remembers to exclude the source map.

Security researcher Roy Paz, writing for LayerX, noted that the exposure reveals "nonpublic details about how the systems work, such as internal APIs and processes," potentially informing attempts to circumvent existing safeguards. The compaction system's inability to distinguish user instructions from injected file content was specifically flagged as an attack surface. The bash parser differential is a concrete, exploitable vulnerability.

Competitors now have a detailed map of Anthropic's product direction. The feature flags, the model codenames, the KAIROS architecture, the anti-distillation approach. This is the kind of intelligence that normally costs months of reverse engineering or a well-placed hire. Anthropic handed it out for free, twice in one week, because somebody forgot a line in a config file.

I keep thinking about the Cursor situation from the week before, where a model identifier leaked through an API endpoint and revealed that Composer 2 was running on Moonshot AI's open-source Kimi K2.5. The AI developer tools space has a transparency problem that runs deeper than any single incident. Companies build proprietary products on foundations they do not fully disclose, then act surprised when the seams show. The difference with Anthropic is that the seams showed everything.

Sources:

Anthropic leaks its own AI coding tool's source code — Fortune
Anthropic accidentally exposes Claude Code source code — The Register
Diving into Claude Code's Source Code Leak — Engineer's Codex
Comprehensive Analysis of Claude Code Source Leak — sabrina.dev
Anthropic took down thousands of GitHub repos — TechCrunch
Anthropic DMCA Notice — GitHub DMCA Repository

Plutonic Rainbows

The Leak Anthropic Couldn't DMCA Away

Recent Entries