Plutonic Rainbows

When Talent Returns to Where the Compute Lives

The news from Thinking Machines Lab landed this week with a thud that reverberated across the AI industry. Barret Zoph, the startup's co-founder and chief technology officer, has departed — reportedly dismissed after Mira Murati discovered he had shared confidential company information with competitors. Shortly afterward, OpenAI confirmed that Zoph, along with fellow co-founders Luke Metz and Sam Schoenholz, would be returning to the company they left barely a year ago. Additional departures followed: researcher Lia Guy heading to OpenAI, and at least one other senior staff member, Ian O'Connell, also leaving. The exodus comes just six months after Thinking Machines closed a record-breaking $2 billion funding round that valued the company at $12 billion.

I have watched this pattern before. A star executive leaves a dominant incumbent to start something new. They raise enormous sums on the strength of their reputation and the promise of a different approach. They recruit top talent with equity stakes and the allure of building from scratch. Then reality intrudes. The resources that seemed abundant prove insufficient. The freedom that attracted them becomes indistinguishable from the absence of infrastructure. The gravitational pull of the incumbents — with their data, their compute, their distribution — proves difficult to escape. Talent returns to where the leverage lives.

The circumstances of Zoph's departure are murky and contested. WIRED reported allegations of confidential information being shared with competitors. OpenAI's statement claimed they "do not share these concerns" about the conduct in question. The truth likely lies somewhere in the middle, obscured by competing narratives and legal considerations. However, the specific reasons matter less than what the broader departure pattern reveals about the structural challenges facing AI startups in the current moment.

Thinking Machines was supposed to be different. Murati brought impeccable credentials — former CTO of OpenAI during its most transformative period, architect of the GPT-4 launch, experienced navigator of the complex terrain where research meets product. The founding team combined deep technical expertise with operational experience at the frontier. The funding — $2 billion in a seed round led by Andreessen Horowitz, with participation from Nvidia, AMD, and Jane Street — should have provided runway measured in years, not months. If any startup could challenge the incumbents, this one had the pedigree.

What went wrong remains subject to speculation, but the Fortune reporting offers clues: concerns about compute constraints, uncertainty about product direction, questions about business model clarity. These are not idiosyncratic failures. They are the predictable challenges that emerge when you attempt to build a frontier AI lab from scratch in an industry where the moat is measured in data centre capacity and the cost of a training run can exceed the GDP of small nations.

The compute problem deserves particular attention. Modern AI capabilities emerge from scale — vast datasets processed through enormous models on clusters of specialised hardware that cost hundreds of millions of dollars to build and operate. The incumbents have spent years and billions securing this infrastructure. They have negotiated long-term contracts with cloud providers, built their own data centres, and cultivated relationships with chip manufacturers that give them privileged access to scarce supply. A startup with $2 billion can rent compute. It cannot replicate a decade of infrastructure investment.

This creates a dynamic where the most talented researchers face a stark choice. They can join a startup and spend their time waiting for training runs that never quite have enough capacity, debugging infrastructure that more established labs solved years ago, and watching their equity stakes lose value as funding conditions tighten. Or they can return to the incumbents, where the compute is plentiful, the infrastructure is mature, and the work can proceed at pace. The choice is not about loyalty or courage. It is about where one can have the most impact with limited time.

Additionally, the talent dynamics compound the resource constraints. Each departure from a startup makes subsequent departures more likely. When senior researchers leave, the remaining team inherits their responsibilities without inheriting their expertise. Projects stall. Institutional knowledge evaporates. The researchers who remain watch their colleagues depart for better-resourced environments and wonder whether they should follow. The startup that loses its CTO must either promote from within — elevating someone who now lacks the team they were supposed to lead — or recruit externally into a situation that looks increasingly precarious. Soumith Chintala, the PyTorch co-creator appointed as Thinking Machines' new CTO, inherits a formidable challenge.

I find myself thinking about what Murati must be experiencing. She left OpenAI at the peak of her influence to build something independent. She assembled a team of people she had worked with, people she trusted. She raised more money in a seed round than most companies raise in their entire existence. Yet here she is, less than eighteen months later, watching the founding team scatter back to the place they left together. The personal dimension of this — the sense of a shared vision unravelling — must be acute.

However, I resist the temptation to read this as a story of individual failure. The structural forces arrayed against AI startups are formidable. The incumbents have compounding advantages that grow with each passing quarter. They have the compute, the data, the distribution channels, the customer relationships, and the regulatory relationships that startups must build from nothing. They have the ability to hire talent at compensation levels that would destroy a startup's cap table. They have the patience that comes from diversified revenue streams and patient capital.

The implications extend beyond Thinking Machines. Every AI startup must now confront the question of whether the independent path remains viable. The investors who funded Murati's venture will scrutinise future pitches more carefully. The researchers contemplating startup opportunities will weight the risks more heavily. The narrative that talented people can leave incumbents and build competitive alternatives — a narrative that sustained much of the tech industry's dynamism over the past decades — will face renewed scepticism.

Perhaps this is simply the maturation of a young industry. In the early days of any technology, garage-scale innovation can compete with established players because the technology itself is immature and advantage accrues to insight rather than infrastructure. As the technology matures, scale becomes decisive. The semiconductor industry consolidated. The cloud computing industry consolidated. The AI industry may be following the same trajectory, compressing a decades-long pattern into a handful of years.

The talent will go where it can be most effective. The compute will remain where it has already been built. The startups that survive will be those that find niches the incumbents cannot easily address — vertical applications, specialised domains, markets too small to attract attention from companies optimising for billion-user scale. The era of challenging OpenAI and Anthropic and Google head-on may already be closing. Thinking Machines' struggles suggest the window was narrower than anyone wanted to believe.

I watch the departures from Thinking Machines Lab and I see not failure but physics. Talent flows toward leverage. Leverage concentrates where resources accumulate. Resources accumulate where previous advantages compound. The gravity is real. The escape velocity is higher than anyone expected.

When Speed Becomes the Only Moat

I have watched the AI industry obsess over latency for the past eighteen months with growing unease. Every product announcement now leads with response time. Every benchmark comparison highlights milliseconds saved. Every funding pitch emphasizes infrastructure speed above all else. This fixation on velocity has calcified into something more concerning than a mere trend — it has become the primary competitive moat that companies believe will protect them from disruption.

The logic seems straightforward at first. Users prefer faster responses. Developers build applications around snappy interactions. Products that feel instant create better experiences than those that lag. Therefore, the reasoning goes, the company with the lowest latency wins the market. However, this reasoning collapses when you examine what gets sacrificed in pursuit of pure speed.

I find myself increasingly troubled by how latency optimization crowds out other forms of innovation. When a company invests billions in custom silicon and global edge networks to shave milliseconds off response times, those resources cannot simultaneously fund research into more capable models or better reasoning architectures. The opportunity cost becomes staggering. We optimize for speed at the expense of depth, reliability, and genuine capability improvements.

The infrastructure arms race this creates benefits nobody except hardware vendors and cloud providers. Smaller companies cannot compete on latency alone. They lack the capital to build worldwide inference networks or manufacture specialized chips. As a result, the entire competitive landscape narrows to a handful of well-funded players who can afford the infrastructure. This consolidation stifles the diversity of approaches that drives meaningful progress in any technical field.

Additionally, the emphasis on latency moats encourages companies to optimize for metrics that users care about least. When I use an AI system, I rarely notice whether it responds in 200 milliseconds versus 400 milliseconds. The difference feels imperceptible in practice. What I do notice — what genuinely affects my experience — is whether the system understands my intent, provides accurate information, and handles edge cases gracefully. These qualities have nothing to do with infrastructure speed and everything to do with model quality and system design.

The pursuit of latency advantages also creates technical debt that compounds over time. Companies optimize their inference pipelines so aggressively that they become brittle and difficult to modify. They lock themselves into specific hardware platforms or network architectures. When better modeling approaches emerge, these companies find themselves unable to adopt them because their entire system has been fine-tuned for speed above flexibility. The moat they built to keep competitors out also walls them in.

I have seen this pattern before in other industries. Database companies once competed primarily on query speed. Web hosting providers marketed themselves on page load times. Content delivery networks built entire businesses around millisecond improvements. In each case, the performance advantage proved temporary. Competitors eventually caught up, and the companies that survived were those that had invested in differentiated value beyond raw speed.

The danger becomes more acute when companies mistake infrastructure advantages for product advantages. A fast inference engine is not a product — it is merely infrastructure. Users do not purchase infrastructure; they purchase solutions to problems. A system that responds instantly but provides mediocre answers loses to one that thinks for three seconds but gets things right. Yet the obsession with latency moats pushes companies to prioritize the former over the latter.

Furthermore, the latency focus creates perverse incentives around model development. If your primary competitive advantage stems from fast inference, you naturally gravitate toward smaller, simpler models that run quickly. You avoid complex reasoning approaches that might improve accuracy but add latency. You resist architectures that could unlock new capabilities but require more compute. The entire research agenda becomes constrained by infrastructure considerations rather than driven by what would make the systems genuinely more useful.

I worry particularly about how this affects the trajectory of AI development broadly. When the industry's most successful companies anchor their competitive strategy on infrastructure speed, they signal to everyone else that this is where value lives. Startups mimic the approach. Investors reward it. Researchers orient their work around it. The entire field converges on a narrow definition of progress that may not align with what we actually need from these systems.

The environmental cost also deserves consideration. Building global inference networks and manufacturing custom silicon at scale consumes enormous energy and resources. When companies compete primarily on latency, they must continuously expand this infrastructure to maintain their advantage. This creates an escalating resource consumption cycle that seems divorced from any proportional increase in actual utility delivered to users. We optimize for milliseconds while burning through electricity and rare earth metals.

I have also observed how latency moats affect the talent market in troubling ways. The most capable engineers get funneled into infrastructure optimization rather than working on fundamental advances in AI capabilities — a concentration of talent flowing toward where the infrastructure lives. Companies hire brilliant researchers and set them to work on CUDA kernel optimization and network topology refinement. These are valuable skills, but they represent a misallocation when we still have so many unsolved problems in making AI systems reliable, truthful, and genuinely helpful.

The alternative approach seems obvious yet gets surprisingly little attention. Companies could compete on the quality of their outputs, the reliability of their systems, their ability to handle complex tasks, their transparency about limitations, or their success at solving real user problems. These dimensions of competition would drive innovation toward making AI systems actually better rather than merely faster.

I recognize that latency matters for certain applications. Real-time systems legitimately require quick responses. Interactive experiences benefit from snappiness. However, the current industry dynamic has elevated latency from one consideration among many to the primary basis for competitive differentiation. This represents a fundamental misalignment between what companies optimize for and what users need.

The path forward requires consciously resisting the latency moat trap. We need companies willing to compete on dimensions other than pure speed. We need investors who reward sustainable advantages built on genuine capability improvements. We need users who demand quality over quickness. Most importantly, we need industry leaders who recognize that the race to zero latency is ultimately a race to nowhere — a competition that consumes enormous resources while delivering diminishing returns.

I remain cautiously optimistic that this phase will pass. As infrastructure commoditizes and latency advantages narrow, companies will have no choice but to compete on other dimensions. The question is how much time, money, and talent we waste before reaching that inevitable conclusion. The longer we remain fixated on speed as the primary moat, the longer we delay building AI systems that genuinely serve human needs rather than just serving them quickly.

Claude Pro Subscription

I’m really struggling with the Pro subscription because it runs out far too quickly to be genuinely useful for my workflow. As a result, my project tasks are now backing up — I’ve already hit the usage limits with more than a week still to go before the monthly reset. At this rate, I’m going to have to seriously consider moving back up to the £90 tier so I have enough capacity to keep work moving without frequent interruptions.

No System Can Verify Its Own Blind Spots

I have spent considerable time thinking about a question that recurs in nearly every serious discussion of AI safety: can a large language model police itself? The answer, I believe, is no — and the reasons why illuminate something important about the nature of intelligence, accountability, and the limits of self-knowledge.

The appeal of self-policing AI is obvious. If we could build systems that monitor their own outputs, detect their own errors, and correct their own behaviour, we would have solved one of the most difficult problems in AI safety. We could deploy increasingly capable systems without proportionally increasing human oversight. The machines would watch themselves. The mathematics of scale would work in our favour rather than against us.

However, this vision collapses under scrutiny. The fundamental problem is epistemic: an LLM has no privileged access to truth. It does not possess an internal oracle that distinguishes correct outputs from incorrect ones. What it has instead is a vast pattern-matching apparatus trained on human-generated text — a system that infers probable responses based on statistical regularities in its training data. When the model evaluates its own output, it does so using the same apparatus that generated the output in the first place. The blind spots that produced the error are the same blind spots that evaluate the result.

This limitation runs deeper than it might initially appear. Consider what happens when a model attempts self-critique. The critique emerges from the same learned distributions, the same embedded assumptions, the same correlated errors. If the model's training data contained systematic biases — and all training data contains systematic biases — those biases will appear in both the original output and the evaluation of that output. The model cannot see what it was never shown. It cannot correct for patterns it does not recognise as patterns. A self-evaluation loop that uses the same flawed instrument to assess itself does not reduce error. It amplifies it.

I find it useful to distinguish between what models can actually do and what we might wish they could do. Models can compare outputs against predefined rules. They can check whether a response violates explicit constraints, matches specified formats, or contains forbidden content. This is procedural compliance — following instructions, not making judgments. The model does not decide what counts as harmful; it executes rules that humans wrote. The safety comes from the human-authored constraints, not from any capacity for moral or epistemic evaluation within the system itself.

Models can also engage in model-on-model critique, where a separate system evaluates the output of the primary model. This architecture reduces certain error modes and catches some failures that would otherwise slip through. However, it does not escape the fundamental limitation. Both models derive from similar training distributions. Both share overlapping blind spots. The critic model may catch errors that differ from its own systematic biases, but it will miss errors that align with them. We have added a filter, not achieved genuine oversight.

The most robust form of model self-regulation I have encountered is uncertainty estimation — systems that express confidence levels and defer to humans when confidence is low. This approach has genuine value, as Stuart Russell argues in his case for machines that doubt themselves. A model that knows when it does not know, and that refuses to act in conditions of high uncertainty, provides a meaningful safety buffer. Yet even here, the limitation persists. Uncertainty calibration degrades under distribution shift. The model may be confidently wrong precisely when the situation differs most from its training data — which is exactly when accurate uncertainty estimation matters most. And regardless of how well calibrated the uncertainty signal becomes, someone must decide what to do when the model defers. That someone cannot be the model itself.

The comparison to humans clarifies both the limitation and its implications. Humans make mistakes constantly. We hold contradictory beliefs, act against our stated values, and rationalise failures with impressive creativity. In this respect, LLMs are not worse than humans — they exhibit similar failure modes. However, humans operate within corrective systems that do not apply to machines. We receive physical feedback from the environment. We face social and legal consequences for our actions. We experience direct, embodied costs when we err. These feedback mechanisms do not guarantee good behaviour, but they provide external pressure that shapes behaviour over time.

LLMs lack intrinsic stakes. Nothing happens to the model when it produces a harmful output. It does not suffer consequences, learn from punishment, or feel the weight of responsibility. The system processes inputs and generates outputs according to its training. The concept of accountability has no purchase on a process that cannot experience anything at all. Responsibility, if it exists, must be imposed from outside — through human oversight, institutional constraints, and designed corrigibility. It cannot emerge from within.

This leads me to what I consider the correct framing of the problem. The question is not whether an LLM can police itself. The question is what minimum external structures are required to keep an autonomous system corrigible rather than merely consistent. Consistency is easy. A model can be perfectly internally coherent while being catastrophically wrong. Corrigibility — the property of remaining open to correction, deferring to appropriate authorities, and not resisting shutdown or modification — requires something the model cannot provide for itself: an external reference point against which its behaviour can be judged.

The implications for AI development are significant. We cannot rely on self-governance as a safety mechanism. We cannot assume that sufficiently capable models will somehow develop the capacity to constrain themselves. We must design systems that assume failure and build external structures to detect, contain, and correct it. The safety does not come from the model. It comes from the architecture around the model — the human oversight, the institutional checks, the guardrails that the model cannot unilaterally remove.

I recognise this conclusion is unsatisfying to those who hoped that AI safety could be solved from within. It would be convenient if the systems could watch themselves. It would scale better. It would require less human effort. However, convenience is not an argument. The structure of the problem does not change because we wish it were different. A system cannot audit itself with tools it controls. A judge cannot preside over their own trial. A model cannot verify its own blind spots. These are not engineering challenges to be overcome. They are structural impossibilities that constrain what we can reasonably expect from self-policing AI.

The path forward requires accepting this limitation and building accordingly. External oversight is not a temporary measure until the models become good enough to govern themselves. It is a permanent requirement, built into the architecture of safe deployment. The models will improve. The need for human judgment will not disappear.

When Realism Becomes a Disguise for Resignation

I have noticed a particular thought pattern that arrives quietly, dressed as wisdom, and proceeds to corrode everything it touches. It is the conviction that everything becomes a lesser version of what it once was — that diminishment is not merely a feature of certain experiences but the fundamental direction of life itself. The thought does not announce itself as despair. It announces itself as realism. That disguise is what makes it dangerous.

The mechanism works in two directions simultaneously. Retrospectively, it reframes the past as a lost summit: intensity, clarity, authenticity, connection. Prospectively, it narrows the future into a corridor of weaker repetitions. The present becomes an uncomfortable interval — never enough to justify itself, always compared against something already gone. Once this framing settles in, it produces effects that compound over time.

The first effect is the invalidation of the present. Even objectively positive experiences are dismissed as inferior versions of what came before. Enjoyment is permitted but never trusted. Satisfaction remains provisional, always conditional on a comparison it cannot win. The second effect is the undermining of agency. If everything is already in decline, effort feels cosmetic. Engagement feels naive. Withdrawal begins to feel like intelligence rather than what it actually is: resignation wearing a clever mask. The third effect is the hardening of perception into destiny. What begins as observation becomes belief becomes background truth. The belief stops being tested because it no longer registers as belief at all.

I find this pattern genuinely sinister because of how it operates. It is quiet, rational, internally coherent. It does not arrive with the melodrama of despair. It arrives with the measured tone of someone who has seen enough to know how things work. The mind prefers clean narratives, and "everything is less than it was" is emotionally economical. It is difficult to falsify because memory collaborates with it so willingly.

Memory edits ruthlessly. It removes boredom, anxiety, confusion, and uncertainty, leaving behind intensity and meaning. The present, unedited and unresolved, cannot compete with this reconstruction. I have caught myself romanticising periods of my life that I know — from journals, from contemporaneous evidence — were marked by significant difficulty. The past becomes a highlight reel competing against raw footage. The comparison is unfair by design.

This state is often mistaken for wisdom. It is not. Wisdom differentiates between genuine loss and cognitive distortion. The diminishment narrative collapses them into one. Some loss is real. Time does close doors. However, the narrative does not content itself with acknowledging specific losses. It insists on a universal frame. Everything. Always. The absolute is the tell.

The corrective is not optimism. I have no patience for the suggestion that one should simply think positive thoughts and watch the problem dissolve. The corrective is control — specifically, preventing the emotion from becoming totalizing while remaining honest about what is actually happening.

The first discipline is separating sensation from judgment. There is a critical fork in the mental process: the sensation that something feels muted, and the judgment that it is therefore inferior and always will be. I cannot control the sensation. I can interrupt the judgment. The practice is learning to pause where description turns into conclusion. I do not need to replace the negative judgment with a positive one. I only need to refuse finality.

The second discipline is refusing global conclusions. The sinister move is always absolute: everything, nothing, always. I force specificity instead. This experience lacks intensity. This phase feels emotionally thin. These statements may be true without licensing the conclusion that all experiences will lack intensity or that life itself has entered permanent decline. Specificity keeps mood from hardening into worldview.

The third discipline involves changing the metric entirely. Early life delivers meaning through intensity. The experiences are new, the emotions are unregulated, the stakes feel absolute even when they are not. Later life, if it delivers meaning at all, does so through texture: subtlety, restraint, depth, irony, contrast. If intensity remains the only metric, decline is guaranteed by definition. The measurement system must change. Texture is quieter than intensity. It must be attended to deliberately. It does not announce itself.

The fourth discipline is containing rumination. This mindset feeds on unlimited reflection. I have learned to set boundaries — defined time to think about loss and comparison, and outside that window, acknowledgment followed by deferral. This is not avoidance. Avoidance pretends the thought does not exist. Containment acknowledges the thought and refuses to let it colonise every waking hour.

The fifth discipline is acting without emotional permission. Waiting to feel engagement before acting hands control to the very force I am trying to resist. I act because the action is structurally sound, not because it promises emotional return. Meaning sometimes follows action. Sometimes it does not. Agency must be preserved regardless. The alternative is waiting for permission that the diminishment narrative will never grant.

I do not mistake these disciplines for a cure. They are maintenance. The narrative does not disappear; it recedes, returns, recedes again. The work is ongoing because the tendency is structural. Some minds incline toward this pattern more than others. Mine does.

The quiet corrective is not hope. It is precision. Not everything is a lesser version. Some things are worse. Some are better. Some are simply different in ways that do not map onto decline at all. The sinister narrative insists on a single story. Emotional control comes from insisting on plurality — even when none of the alternatives are comforting.

That insistence is not denial. It is discipline. The distinction matters more than it might appear.