MiniMax M2.7

MiniMax M2.7 features model self-improvement driving productivity innovation. It builds complex agent harnesses independently to accomplish highly complex productivity tasks. M2.7 demonstrates excellent performance in real-world software engineering including end-to-end project delivery, log analysis, code security, and ML tasks. On SWE-Pro it scores 56.22%, nearly matching Opus. It excels in professional office domains achieving the highest ELO among open-source models on GDPval-AA (1495), with significant improvement in complex editing for Office Suite. M2.7 maintains 97% skill adherence on 40 complex skills cases.

Benchmark results

Benchmark Score Tags Source
Artificial Analysis 50.0% self-reported llm-stats link →
GDPval-AA 1,494 self-reported llm-stats link →
MLE-Bench Lite 66.6% self-reported llm-stats link →
MM-ClawBench 62.7% self-reported llm-stats link →
Multi-SWE-Bench 52.7% self-reported llm-stats link →
NL2Repo 39.8% self-reported llm-stats link →
SWE-bench Multilingual 76.5% self-reported llm-stats link →
SWE-Bench Pro 56.2% self-reported llm-stats link →
Terminal-Bench 2.0 57.0% self-reported llm-stats link →
Toolathlon 46.3% self-reported llm-stats link →
VIBE-Pro 55.6% self-reported llm-stats link →