Muse Spark

Muse Spark is the first model in the Muse family developed by Meta Superintelligence Labs. It is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. It features a Contemplating mode that orchestrates multiple agents reasoning in parallel. It demonstrates competitive performance in multimodal perception, reasoning, health, and agentic tasks, with Contemplating mode achieving 58% on Humanity's Last Exam and 38% on FrontierScience Research.

Benchmark results

Benchmark Score Tags Source
ARC-AGI v2 42.5% self-reported llm-stats link →
CharXiv-R 86.4% self-reported llm-stats link →
DeepSearchQA 74.8% self-reported llm-stats link →
ERQA 64.7% self-reported llm-stats link →
FrontierScience Research 38.3% self-reported llm-stats link →
GDPval-AA 1,444 self-reported llm-stats link →
GPQA 89.5% self-reported llm-stats link →
HealthBench Hard 42.8% self-reported llm-stats link →
Humanity's Last Exam 58.4% self-reported llm-stats link →
IPhO 2025 82.6% self-reported llm-stats link →
LiveCodeBench Pro 0.8 self-reported llm-stats link →
MedXpertQA 78.4% self-reported llm-stats link →
MMMU-Pro 80.4% self-reported llm-stats link →
ScreenSpot Pro 84.1% self-reported llm-stats link →
SimpleVQA 71.3% self-reported llm-stats link →
SWE-Bench Pro 52.4% self-reported llm-stats link →
SWE-Bench Verified 77.4% self-reported llm-stats link →
Tau2 Telecom 91.5% self-reported llm-stats link →
Terminal-Bench 2.0 59.0% self-reported llm-stats link →
ZEROBench 33.0% self-reported llm-stats link →