Kimi K2.5

Kimi K2.5 is Moonshot AI's flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes, and single-agent and multi-agent execution into one model. Built with Full-Parameter RL tuning, it achieves state-of-the-art performance across agents, coding, image, and video benchmarks.

Benchmark results

Benchmark Score Tags Source
AA-LCR 70.0% self-reported llm-stats link →
AIME 2025 96.1% self-reported llm-stats link →
BrowseComp 74.9% self-reported llm-stats link →
CharXiv-R 77.5% self-reported llm-stats link →
CyberGym 41.3% self-reported llm-stats link →
DeepSearchQA 77.1% self-reported llm-stats link →
FinSearchComp T2&T3 67.8% self-reported llm-stats link →
GPQA 87.6% self-reported llm-stats link →
HMMT 2025 95.4% self-reported llm-stats link →
Humanity's Last Exam 50.2% self-reported llm-stats link →
IMO-AnswerBench 81.8% self-reported llm-stats link →
InfoVQAtest 92.6% self-reported llm-stats link →
LiveCodeBench v6 85.0% self-reported llm-stats link →
LongBench v2 61.0% self-reported llm-stats link →
LongVideoBench 79.8% self-reported llm-stats link →
LVBench 75.9% self-reported llm-stats link →
MathVision 84.2% self-reported llm-stats link →
MathVista-Mini 90.1% self-reported llm-stats link →
MMLU-Pro 87.1% self-reported llm-stats link →
MMMU-Pro 78.5% self-reported llm-stats link →
MMVU 80.4% self-reported llm-stats link →
MotionBench 70.4% self-reported llm-stats link →
OCRBench 92.3% self-reported llm-stats link →
OJBench (C++) 57.4% self-reported llm-stats link →
OmniDocBench 1.5 88.8% self-reported llm-stats link →
PaperBench 63.5% self-reported llm-stats link →
SciCode 48.7% self-reported llm-stats link →
Seal-0 57.4% self-reported llm-stats link →
SimpleVQA 71.2% self-reported llm-stats link →
SWE-bench Multilingual 73.0% self-reported llm-stats link →
SWE-Bench Pro 50.7% self-reported llm-stats link →
SWE-Bench Verified 76.8% self-reported llm-stats link →
Terminal-Bench 2.0 50.8% self-reported llm-stats link →
Video-MME 87.4% self-reported llm-stats link →
VideoMMMU 86.6% self-reported llm-stats link →
WideSearch 79.0% self-reported llm-stats link →
WorldVQA 46.3% self-reported llm-stats link →
ZEROBench 11.0% self-reported llm-stats link →