MiniMax M2.7
MiniMax M2.7 features model self-improvement driving productivity innovation. It builds complex agent harnesses independently to accomplish highly complex productivity tasks. M2.7 demonstrates excellent performance in real-world software engineering including end-to-end project delivery, log analysis, code security, and ML tasks. On SWE-Pro it scores 56.22%, nearly matching Opus. It excels in professional office domains achieving the highest ELO among open-source models on GDPval-AA (1495), with significant improvement in complex editing for Office Suite. M2.7 maintains 97% skill adherence on 40 complex skills cases.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Artificial Analysis | 50.0% | self-reported llm-stats | link → |
| GDPval-AA | 1,494 | self-reported llm-stats | link → |
| MLE-Bench Lite | 66.6% | self-reported llm-stats | link → |
| MM-ClawBench | 62.7% | self-reported llm-stats | link → |
| Multi-SWE-Bench | 52.7% | self-reported llm-stats | link → |
| NL2Repo | 39.8% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 76.5% | self-reported llm-stats | link → |
| SWE-Bench Pro | 56.2% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 57.0% | self-reported llm-stats | link → |
| Toolathlon | 46.3% | self-reported llm-stats | link → |
| VIBE-Pro | 55.6% | self-reported llm-stats | link → |