MiniMax M3

MiniMax M3 is the first open-weight model to combine three frontier capabilities: top-tier coding and agentic performance, a 1M-token context window, and native multimodality. It is powered by MiniMax Sparse Attention (MSA), a new sparse attention architecture that partitions the KV cache into blocks to cut per-token compute at long context — roughly 1/20 the cost of the previous generation at 1M tokens, with more than 9x faster prefill and more than 15x faster decode while matching full attention on most capabilities. Trained with mixed-modality data from step zero across 100T+ tokens, M3 natively supports image and video input and can operate a desktop computer. On SWE-Bench Pro it scores 59.0%, surpassing GPT-5.5 and Gemini 3.1 Pro and approaching Opus 4.7, and on BrowseComp it scores 83.5%, surpassing Opus 4.7. M3 supports toggling thinking on or off at request time.

Benchmark results

Benchmark Score Tags Source
APEX-Agents 27.7% self-reported llm-stats link →
BankerToolBench 76.1% self-reported llm-stats link →
BrowseComp 83.5% self-reported llm-stats link →
CL-bench 20.5% self-reported llm-stats link →
Claw-Eval 74.5% self-reported llm-stats link →
DRACO 73.2% self-reported llm-stats link →
GDPval-Rubrics 74.8% self-reported llm-stats link →
IMO 2025 35 self-reported llm-stats link →
KernelBench Hard 28.8% self-reported llm-stats link →
LiveSQLBench 40.2% self-reported llm-stats link →
LOCA-Bench (256k) 49.3% self-reported llm-stats link →
MCP Atlas 74.2% self-reported llm-stats link →
MMMU-Pro 78.1% self-reported llm-stats link →
NL2Repo 42.1% self-reported llm-stats link →
OfficeQA Pro 45.1% self-reported llm-stats link →
OmniDocBench 1.5 91.6% self-reported llm-stats link →
OSWorld-Verified 70.1% self-reported llm-stats link →
PaperBench 52.6% self-reported llm-stats link →
PostTrainBench 37.1% self-reported llm-stats link →
SpreadSheetBench-v1 89.3% self-reported llm-stats link →
SVG-Bench 63.7% self-reported llm-stats link →
SWE Atlas - Codebase QnA 37.9% self-reported llm-stats link →
SWE Atlas - Test Writing 30.8% self-reported llm-stats link →
SWE-Bench Pro 59.0% self-reported llm-stats link →
SWE-Bench Verified 80.5% self-reported llm-stats link →
SWE-fficiency 34.8% self-reported llm-stats link →
Terminal-Bench 2.1 66.0% self-reported llm-stats link →
USAMO 2026 36 self-reported llm-stats link →
VIBE-V2 50.1% self-reported llm-stats link →
Video-MME 85.4% self-reported llm-stats link →
VideoMMMU 84.6% self-reported llm-stats link →
YC-Bench 2,100,000 self-reported llm-stats link →