A Million Tokens, No Asterisk

Anthropic quietly dropped the long-context pricing premium today. When Opus 4.6 launched five weeks ago, the million-token context window carried a surcharge — anything over 200K tokens was billed at double the input rate and 1.5x on output. That caveat is gone. A full million-token prompt to Opus 4.6 now costs exactly what a 50K prompt costs per token: $5 input, $25 output per million. Sonnet 4.6 follows the same pattern at $3/$15.

The beta header is gone too. If your code was setting anthropic-beta: long-context on every request, it still works — the API just ignores it now. No migration, no breaking change. That's the kind of rollout I wish happened more often.

A million tokens is roughly 750,000 words. Thirty thousand lines of code. An entire mid-size codebase with room to spare. For anyone using Claude Code on Max or Team plans, this is where it gets tangible — sessions that previously hit compaction walls after an hour of deep work can now hold the full conversation history. I've been running Opus 4.6 with the extended context since February and the difference isn't subtle. Fewer moments where the model forgets a file you discussed twenty minutes ago. Fewer re-explanations.

The less obvious addition is context compaction. When conversations approach the token ceiling, the API automatically compresses earlier turns into a summary and continues from there. It's lossy — you lose verbatim recall of compressed sections — but it means long-running agentic workflows don't just crash into a wall. They degrade gracefully instead of failing. For coding agents that accumulate tool calls, observations, and intermediate reasoning over hours of work, this matters more than the raw number.

I should be honest about the limitations. Research on retrieval accuracy at these context lengths consistently shows a U-shaped curve — the model handles information at the beginning and end of the window reliably, but accuracy dips in the middle. Effective capacity is probably closer to 600-700K tokens of dependable recall. The window is a million tokens. The model's attention isn't uniformly distributed across all of them.

However. The competitive framing matters here. GPT-5 tops out at 400K tokens. Gemini offers larger windows but at significantly higher per-token cost for comparable output quality. Anthropic removing the premium essentially says: this is standard now, not a luxury feature. That's the right move. Context shouldn't be metered like roaming charges.

The media limit bumped to 600 images or PDF pages per request, up from 100. I haven't stress-tested that yet, but for anyone doing document analysis or legal review workflows, six hundred pages in a single pass changes the architecture of those pipelines entirely.

Sources:

1M Context GA Announcement — Anthropic
Introducing Claude Opus 4.6 — Anthropic
Context Compaction for Long-Running Agents — InfoQ
Long Context vs RAG — SitePoint

Plutonic Rainbows

A Million Tokens, No Asterisk

Related Entries