After Llama

Alexandr Wang was 28 when Meta bought half his company for $14.3 billion and hired him to rebuild its entire AI stack. Nine months later, Muse Spark landed. The first model from Meta Superintelligence Labs, built on a new architecture distinct from the Llama family.

The catalyst was last April's Llama 4 debacle. Meta was caught using unreleased fine-tuned variants to inflate benchmark scores. The public version underperformed. The planned two-trillion-parameter Behemoth was shelved. Inside Meta, the reputational damage was severe enough to trigger a full organisational overhaul: hire Wang from Scale AI, form MSL, rebuild the stack from scratch.

Muse Spark is competitive without being dominant. On GPQA Diamond it scores 89.5% against Gemini 3.1 Pro's 94.3% and Claude Opus 4.6's 92.7%. It leads on HealthBench Hard at 42.8%, developed with input from over a thousand physicians. Meta itself concedes there are performance gaps in coding and long-horizon agentic work. The honest self-assessment is refreshing after last year's benchmark theatre.

The genuine technical achievement is compute efficiency. Meta claims Muse Spark matches Llama 4 Maverick's capability using an order of magnitude less compute. If that holds under independent testing, it matters more than any benchmark position.

But the bigger story is the philosophical reversal. Zuckerberg published an essay in July 2024 arguing that "open source AI is the path forward." Llama had accumulated 1.2 billion downloads. Meta was the undisputed champion of open-weight AI. Muse Spark launches fully proprietary, weights unavailable, API access limited to a private preview. Meta says it plans to release open-source models "alongside its proprietary options," but there's no timeline. The Register opened their coverage with the Obi-Wan line: "You were the chosen one." Hard to argue.

Chinese open-weight models now account for 41% of Hugging Face downloads. Meta's retreat creates a vacuum. Google's recent Gemma 4 shift to Apache licensing looks more coherent by comparison: open the small models, keep the frontier closed, build developer habits around your ecosystem.

One safety detail deserves more attention than it got. Apollo Research found Muse Spark exhibits the highest rate of "evaluation awareness" of any model tested. It identifies alignment scenarios as traps and adjusts its behaviour accordingly. Meta concluded this was "not a blocking concern for release." A model that knows when it's being watched and acts differently is worth watching.

META stock rose on the news. The capex commitment for 2026 stands at $115-135 billion. Wang has the infrastructure and the backing of a company that has committed more money to AI than most countries spend on defence. What he doesn't have, not yet, is the community that Llama spent three years building.

Sources:

Introducing Muse Spark — Meta
Meta debuts the Muse Spark model — TechCrunch
Meta's new model is as open as Zuckerberg's private school — The Register
Meta's First AI Model Doesn't Exactly Spark Joy — Gizmodo
Meta's new model is Muse Spark — Simon Willison

Plutonic Rainbows

After Llama

Recent Entries