GPT-5.6 Arrives in Fragments

OpenAI is expected to give GPT-5.6 a general release within weeks, July 7 by the going estimate, and the strange part is how little any date will actually reveal. The model has been arriving in pieces since late May, when a researcher named Haider found a single routing entry in OpenAI's Codex backend logs pointing at a model called GPT-5.6, then watched the line disappear from later sessions. That is canary testing, a slice of live traffic quietly routed to an unannounced model to see how it holds up under real load. By the time a launch post goes up, the thing has already been answering strangers' questions for weeks.

It went further than a log line. On June 26 the company put out a limited preview of three variants, named from least to most capable Luna, Terra, and Sol, handed to trusted partners through the API and Codex rather than dropped into ChatGPT. Prediction markets treated the rest as paperwork. Manifold sat near 97% on a public GPT-5.6 by September; Polymarket had been at 89% for a June 30 date that came and went. So the 7th, if it holds, is less a reveal than the moment the rope comes down.

What has leaked is mostly shape, not substance. The number that stuck is a 1.5-million-token context window, up from the 1.05 million GPT-5.5 exposes through its API, traced to an internal codename, iris-alpha. Testers with early Pro access describe generation times stretching back out to twenty and forty minutes, one run clocking eighty-seven against GPT-5.5's thirty-four, which reads to optimists as deeper reasoning and to everyone else as a model thinking itself in circles. Sol reportedly tops Terminal-Bench 2.1 at 91.9%, and I'd hold the applause on that: day-one leaderboard leads have a way of not surviving the first six weeks of real use.

The change worth watching isn't a benchmark at all. On April 30, OpenAI published a post-mortem titled "Where the Goblins Came From," documenting a genuinely odd GPT-5.5 failure: the model had developed a measurable fixation on goblins, gremlins, trolls, and pigeons, turning up across hundreds of millions of responses. GPT-5.6 is reportedly the first model trained with a redesigned audit pipeline built to catch that class of drift before it ships. Reliability work like that never trends, and it decides whether a model is usable in production far more than another half-million tokens of context most retrieval stacks can't exploit anyway.

None of this is unique to OpenAI, but GPT-5.6 is the cleanest case yet of the inversion. A launch used to come first and set the terms. Now it comes last and confirms them: by the 7th the model will have been benchmarked, timed against 5.5, and picked apart by people who were never told it existed, and whatever OpenAI publishes will land as a summary of what they already found.

Sources:

Previewing GPT-5.6 Sol — OpenAI
GPT-5.6 — Wikipedia
GPT-5.6 Just Showed Up in OpenAI's Codex Logs — WaveSpeed
GPT-5.6 Rumors Heat Up as Users Swear ChatGPT Suddenly Got Smarter — Decrypt
GPT-5.6 Rumor: 1.5M Context Window — Knightli
GPT-5.6 Guide: Sol, Terra, Luna Models, Pricing, and Benchmarks — ExplainX
Will OpenAI publicly release a GPT-5.6 model by September? — Manifold

Plutonic Rainbows

GPT-5.6 Arrives in Fragments

Related Entries