An Inference Company Now

OpenAI shipped GPT-5.5 this morning, and the two numbers that matter are not on the benchmark slide. They are on the pricing page. Five dollars per million input tokens. Thirty per million output. That is steep enough to make you ask who, exactly, this model is for.

The benchmarks are real. Terminal-Bench 2.0 lands at 82.7%, Expert-SWE at 73.1%, both meaningful jumps in the kinds of tasks where a model has to plan, execute, recover from a failed step, and try again. The model ships with a one million token context window, available across the Plus, Pro, Business, and Enterprise tiers via ChatGPT and the API. For coding agents and computer-use workflows, this is a serious upgrade. For the person opening ChatGPT to draft an email, it is not. Gizmodo's headline put it best: 0.1% more excited about the future of ChatGPT.

Sam Altman's own framing was telling. "I personally like it," he posted on X, which is the kind of endorsement you give a sandwich. The line he was pushing harder was about the inference team: "Really excellent work by the inference team to serve this model so efficiently. To a significant degree, we have to become an AI inference company now." That is not an aside. That is OpenAI admitting, in public, that the differentiation game has moved from training to serving. Anyone can train a frontier model now, or close to it. Whether you can run it cheaply at scale, with predictable latency, against contracts that will not bankrupt you when token usage spikes, that is the actual moat.

The pricing makes more sense in that frame. OpenAI is not aiming this at the consumer who wants a slightly better autocomplete. It is aiming at the enterprise customer building agents that consume tokens by the billion, where higher per-token cost is offset by the model burning fewer tokens per task and finishing more reliably. The MLQ writeup notes that 5.5 maintains prior latency while using fewer tokens for efficiency, which is the language of someone who has been told to optimise the unit economics, not the wow factor.

Compare this with what happened the same morning across the Pacific. DeepSeek previewed V4 Pro at $0.145 in and $3.48 out, with a million token context, open weights, running on Huawei Ascend silicon. GPT-5.5 is a better model, almost certainly. It is also roughly thirty-five times more expensive on input and nearly nine times more expensive on output. Whether that gap is defensible depends on whether the agentic tasks OpenAI is targeting actually need GPT-5.5-grade reasoning, or whether V4 Pro is good enough at a fraction of the cost.

The other thing worth noting is the cadence. In the past week alone, OpenAI shipped a new image generator, workspace agents, a personally-identifiable-information redaction model, a Codex update, and now 5.5. The shipping pace I wrote about in March has not slowed; if anything it has accelerated. None of these launches individually feels like an event. Stacked together, they look like a company trying to occupy every adjacent surface before someone else does.

GPT-5.5 is not the model that takes us to AGI, despite Magic Path's Pietro Schirano reposting that exact framing and Altman amplifying it. It is a competent step on a long curve, priced like a strategic asset rather than a consumer product, optimised for a customer profile that is not most of us. The interesting question is whether the inference-company pivot actually works. Models keep getting commoditised. Serving them well, at predictable cost, might not.

Sources:

Plutonic Rainbows

An Inference Company Now

Related Entries