Meta has unveiled Muse Spark, the first AI model produced by its Meta Superintelligence Labs, the new AI research unit it created last year and has spent billions of dollars to staff and equip.

The model is, according to benchmark tests that Meta published, competitive with leading AI models from OpenAI, Anthropic, and Google across many tasks, although it does not surpass them across the board. Still, if the benchmark results hold up when tested by independent experts, Muse Spark seems to put Meta back in the AI race after its last AI model, Llama 4, which was released in April 2025, was widely panned as a dud.

In the past, however, Meta has been caught manipulating the published benchmark results of an AI model to make it appear more capable than the version available to most users actually was. This was the case with Meta’s Llama 4 benchmarks, in which the company later admitted to using specialized, unreleased versions of the model, fine-tuned for specific tasks, to boost benchmark scores in those areas, while the general version made available to all users did not perform as well.

And there’s another catch. Few people will be able to use the new Meta model outside of the company’s own product ecosystem. Unlike Meta’s previous AI models, which were released as “open weight” models—meaning anyone could download the models for free and run them on their own equipment, as well as modify and fine-tune them as they wished—Muse Spark is, at least for the moment, primarily an in-house tool for Meta.

The model currently powers the Meta AI assistant in the company’s standalone Meta AI app and on meta.ai. The company said it will be rolling it out to WhatsApp, Instagram, Facebook, Messenger, and Meta’s Ray-Ban AI glasses in the coming weeks. It also said it will offer the model in a “private preview” to select partners through an application programming interface (API.) That makes Muse Spark even more proprietary than the paid proprietary models offered by Meta’s rivals. (Meta said in a blog post that it hopes to open-source future versions of the model.)

Recommended Video

Muse Spark is Meta’s first reasoning model, meaning it can work through a process in a step by step fashion, using different strategies if its initial approach doesn’t work. The company’s previous models were all designed to produce an instant answer based on the model’s training. Muse Spark is also a multimodal model that can take in and output both text and images. The model also supports the use of other software tools and can help orchestrate the work of multiple subagents, according to a technical blog post released by Meta.

In its blog post announcing the new model, Meta describes Muse Spark as “small and fast by design, yet capable enough to reason through complex questions in science, math, and health.” It describes the model as the first in a series of new models, with Muse Spark being used to validate the architecture and training regime Meta is using, before the company scales this up to larger and even more powerful models in the same family.

The model also has a “Contemplating” or “Thinking” mode in which it can spin-up subagents to reason about different parts of a task in parallel. Meta said in a technical blog it published on the new model that this mode allows Muse Spark “to compete with the extreme reasoning modes of frontier models such as Gemini Deep Think and GPT Pro.”

The benchmark results published alongside the launch paint a picture of a model that is competitive but not dominant. For instance, on the GPQA Diamond benchmark, which is supposed to test PhD-level reasoning skill, Muse Spark scored 89.5%, which slightly trailed both Gemini 3.1 Pro’s 94.3% as well as the 92.7% and 92.8% that Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4 scored respectively. On a leading health benchmark, HealthBench Hard, Muse Spark beat all rival models with a score of 42.8%, which was far better than either Opus 4.6 or Gemini 3.1 Pro, and slightly better than GPT-5.4.

Meta acknowledged the performance gaps. Its technical blog post states that the company continues “to invest in areas with current performance gaps, specifically long-horizon agentic systems and coding workflows.”

The Muse Spark launch is the most tangible product yet of the sweeping reorganization Meta undertook after the Llama 4 fiasco. In June 2025, Meta spent $14.3 billion to acquire a 49 percent non-voting stake in Scale AI and brought in its cofounder and CEO, Alexandr Wang, as Meta’s first-ever chief AI officer.

Wang has been tasked with leading a newly-created Meta Superintelligence Labs unit. Wang and Zuckerberg went on a talent acquisition spree, offering AI researchers at rival AI labs pay packages that reportedly climbed into the hundreds-of-millions-dollars when equity was included. The company has also committed hundreds of billions dollars to build out AI computing infrastructure to support its new AI drive.

There has since been further reorgani

Meta unveils Muse Spark, its first AI model since hiring Alexandr Wang and a bellwether for CEO Mark Zuckerberg’s multi-billion dollar AI push