Kimi K2.5
Kimi K2.5 is Moonshot AI's flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes, and single-agent and multi-agent execution into one model. Built with Full-Parameter RL tuning, it achieves state-of-the-art performance across agents, coding, image, and video benchmarks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AA-LCR | 70.0% | self-reported llm-stats | link → |
| AIME 2025 | 96.1% | self-reported llm-stats | link → |
| BrowseComp | 74.9% | self-reported llm-stats | link → |
| CharXiv-R | 77.5% | self-reported llm-stats | link → |
| CyberGym | 41.3% | self-reported llm-stats | link → |
| DeepSearchQA | 77.1% | self-reported llm-stats | link → |
| FinSearchComp T2&T3 | 67.8% | self-reported llm-stats | link → |
| GPQA | 87.6% | self-reported llm-stats | link → |
| HMMT 2025 | 95.4% | self-reported llm-stats | link → |
| Humanity's Last Exam | 50.2% | self-reported llm-stats | link → |
| IMO-AnswerBench | 81.8% | self-reported llm-stats | link → |
| InfoVQAtest | 92.6% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 85.0% | self-reported llm-stats | link → |
| LongBench v2 | 61.0% | self-reported llm-stats | link → |
| LongVideoBench | 79.8% | self-reported llm-stats | link → |
| LVBench | 75.9% | self-reported llm-stats | link → |
| MathVision | 84.2% | self-reported llm-stats | link → |
| MathVista-Mini | 90.1% | self-reported llm-stats | link → |
| MMLU-Pro | 87.1% | self-reported llm-stats | link → |
| MMMU-Pro | 78.5% | self-reported llm-stats | link → |
| MMVU | 80.4% | self-reported llm-stats | link → |
| MotionBench | 70.4% | self-reported llm-stats | link → |
| OCRBench | 92.3% | self-reported llm-stats | link → |
| OJBench (C++) | 57.4% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 88.8% | self-reported llm-stats | link → |
| PaperBench | 63.5% | self-reported llm-stats | link → |
| SciCode | 48.7% | self-reported llm-stats | link → |
| Seal-0 | 57.4% | self-reported llm-stats | link → |
| SimpleVQA | 71.2% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 73.0% | self-reported llm-stats | link → |
| SWE-Bench Pro | 50.7% | self-reported llm-stats | link → |
| SWE-Bench Verified | 76.8% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 50.8% | self-reported llm-stats | link → |
| Video-MME | 87.4% | self-reported llm-stats | link → |
| VideoMMMU | 86.6% | self-reported llm-stats | link → |
| WideSearch | 79.0% | self-reported llm-stats | link → |
| WorldVQA | 46.3% | self-reported llm-stats | link → |
| ZEROBench | 11.0% | self-reported llm-stats | link → |