Thinking Machines Talks Back
May 13, 2026 · uneasy.in/befa91f
On Monday, Mira Murati's Thinking Machines Lab previewed a product it calls an interaction model. The pitch, summarised by the Business Insider piece that ran today, is a system that handles speech the way two people actually handle it: listening and talking at the same time, taking interruptions in stride, translating between languages on the fly. The wider launch is promised for later this year. What was shown on Monday is a preview.
The architectural claim is the interesting bit, not the demo reel. Every voice assistant most people have used, including the current crop of LLM front-ends, is fundamentally turn-based. You speak, it waits, it transcribes, it thinks, it responds. Even when the latency drops to under a second the structure is still strictly sequential. An interaction model, by contrast, runs the listening loop and the speaking loop in parallel, so the foreground stays responsive while the heavier reasoning happens underneath. The reported latency is around 0.4 seconds with internal micro-turns of about 200 milliseconds, which is roughly the point at which human conversation stops feeling like a walkie-talkie call.
Whether this generalises beyond a demo is a separate question. Full-duplex audio is not a new idea in research, it's been sitting around in conversational systems work for years, and shipping it as a product is mostly an engineering exercise in keeping a generative model coherent while it is being talked over. The hard part, historically, has been preventing the model from collapsing into either a babbling overlap or a panicked silence the moment the input pattern departs from the training distribution. Real interruptions are messy. People trail off, backtrack, change their minds mid-clause. You can build a duplex system that handles a scripted interview beautifully and falls apart on a phone call to a plumber.
The other half of the story, in the same Business Insider report, is that Thinking Machines has now lost roughly a third of its founding team to OpenAI, Meta and xAI, with the one-year cliff vesting being one of the levers. I wrote about an earlier wave of those departures in January. The pattern has not changed since. A lab founded by a charismatic ex-incumbent raises a fortune at a valuation that prices in superintelligence, and the talent it gathered to justify that valuation gets bought back, individually, by the labs with even more compute and even larger compensation envelopes. The product gets shipped or it doesn't. The founders mostly end up somewhere else.
So I'm reading the interaction-model preview with two different kinds of attention. As a technical demonstration, it's a genuinely fresh framing of what an LLM-fronted voice product can be, and it points at a future where the dominant mode is duplex rather than turn-based. As a corporate signal, it is the kind of thing a lab puts out when it needs to show the market that the engineering core still functions even while the org chart is being rewritten in real time. Both readings can be true. They usually are.
The thing I want to know is whether the underlying model is small enough to run on a phone, or whether the 0.4-second latency depends on a hyperscaler-grade GPU sitting on the other end of a private fibre. The press release does not say, which is itself a signal. Watch for the API price when the wider launch happens. That will tell you which of those two worlds we're in.
Sources:
Recent Entries
- OpenAI Sells Engineers Now May 13, 2026
- Plaintext Across the Ward May 13, 2026
- Two Bristols: Concrete Time May 13, 2026