MiniMax M3
MiniMax M3 is the first open-weight model to combine three frontier capabilities: top-tier coding and agentic performance, a 1M-token context window, and native multimodality. It is powered by MiniMax Sparse Attention (MSA), a new sparse attention architecture that partitions the KV cache into blocks to cut per-token compute at long context — roughly 1/20 the cost of the previous generation at 1M tokens, with more than 9x faster prefill and more than 15x faster decode while matching full attention on most capabilities. Trained with mixed-modality data from step zero across 100T+ tokens, M3 natively supports image and video input and can operate a desktop computer. On SWE-Bench Pro it scores 59.0%, surpassing GPT-5.5 and Gemini 3.1 Pro and approaching Opus 4.7, and on BrowseComp it scores 83.5%, surpassing Opus 4.7. M3 supports toggling thinking on or off at request time.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| APEX-Agents | 27.7% | self-reported llm-stats | link → |
| BankerToolBench | 76.1% | self-reported llm-stats | link → |
| BrowseComp | 83.5% | self-reported llm-stats | link → |
| CL-bench | 20.5% | self-reported llm-stats | link → |
| Claw-Eval | 74.5% | self-reported llm-stats | link → |
| DRACO | 73.2% | self-reported llm-stats | link → |
| GDPval-Rubrics | 74.8% | self-reported llm-stats | link → |
| IMO 2025 | 35 | self-reported llm-stats | link → |
| KernelBench Hard | 28.8% | self-reported llm-stats | link → |
| LiveSQLBench | 40.2% | self-reported llm-stats | link → |
| LOCA-Bench (256k) | 49.3% | self-reported llm-stats | link → |
| MCP Atlas | 74.2% | self-reported llm-stats | link → |
| MMMU-Pro | 78.1% | self-reported llm-stats | link → |
| NL2Repo | 42.1% | self-reported llm-stats | link → |
| OfficeQA Pro | 45.1% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 91.6% | self-reported llm-stats | link → |
| OSWorld-Verified | 70.1% | self-reported llm-stats | link → |
| PaperBench | 52.6% | self-reported llm-stats | link → |
| PostTrainBench | 37.1% | self-reported llm-stats | link → |
| SpreadSheetBench-v1 | 89.3% | self-reported llm-stats | link → |
| SVG-Bench | 63.7% | self-reported llm-stats | link → |
| SWE Atlas - Codebase QnA | 37.9% | self-reported llm-stats | link → |
| SWE Atlas - Test Writing | 30.8% | self-reported llm-stats | link → |
| SWE-Bench Pro | 59.0% | self-reported llm-stats | link → |
| SWE-Bench Verified | 80.5% | self-reported llm-stats | link → |
| SWE-fficiency | 34.8% | self-reported llm-stats | link → |
| Terminal-Bench 2.1 | 66.0% | self-reported llm-stats | link → |
| USAMO 2026 | 36 | self-reported llm-stats | link → |
| VIBE-V2 | 50.1% | self-reported llm-stats | link → |
| Video-MME | 85.4% | self-reported llm-stats | link → |
| VideoMMMU | 84.6% | self-reported llm-stats | link → |
| YC-Bench | 2,100,000 | self-reported llm-stats | link → |