Kimi K2 Instruct
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the MuonClip optimizer, it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. The instruct variant is post-trained for drop-in, general-purpose chat and agentic experiences without long thinking.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ACEBench | 76.5% | self-reported llm-stats | link → |
| Aider-Polyglot | 60.0% | self-reported llm-stats | link → |
| AIME 2024 | 69.6% | self-reported llm-stats | link → |
| AIME 2025 | 49.5% | self-reported llm-stats | link → |
| AutoLogi | 89.5% | self-reported llm-stats | link → |
| CBNSL | 95.6% | self-reported llm-stats | link → |
| CNMO 2024 | 74.3% | self-reported llm-stats | link → |
| CSimpleQA | 78.4% | self-reported llm-stats | link → |
| GPQA | 75.1% | self-reported llm-stats | link → |
| GSM8k | 97.3% | self-reported llm-stats | link → |
| HMMT 2025 | 38.8% | self-reported llm-stats | link → |
| HumanEval | 93.3% | self-reported llm-stats | link → |
| HumanEval-ER | 81.1% | self-reported llm-stats | link → |
| Humanity's Last Exam | 4.7% | self-reported llm-stats | link → |
| IFEval | 89.8% | self-reported llm-stats | link → |
| LiveBench | 76.4% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 53.7% | self-reported llm-stats | link → |
| MATH-500 | 97.4% | self-reported llm-stats | link → |
| MMLU | 89.5% | self-reported llm-stats | link → |
| MMLU-Pro | 81.1% | self-reported llm-stats | link → |
| MMLU-Redux | 92.7% | self-reported llm-stats | link → |
| Multi-Challenge | 54.1% | self-reported llm-stats | link → |
| MultiPL-E | 85.7% | self-reported llm-stats | link → |
| MuSR | 76.4% | self-reported llm-stats | link → |
| OJBench | 27.1% | self-reported llm-stats | link → |
| PolyMath-en | 65.1% | self-reported llm-stats | link → |
| SimpleQA | 31.0% | self-reported llm-stats | link → |
| SuperGPQA | 57.2% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 47.3% | self-reported llm-stats | link → |
| SWE-bench Verified (Agentic Coding) | 65.8% | self-reported llm-stats | link → |
| SWE-bench Verified (Agentless) | 51.8% | self-reported llm-stats | link → |
| SWE-bench Verified (Multiple Attempts) | 71.6% | self-reported llm-stats | link → |
| Tau2 Airline | 56.5% | self-reported llm-stats | link → |
| Tau2 Retail | 70.6% | self-reported llm-stats | link → |
| Tau2 Telecom | 65.8% | self-reported llm-stats | link → |
| Terminal-Bench | 30.0% | self-reported llm-stats | link → |
| Terminus | 25.0% | self-reported llm-stats | link → |
| ZebraLogic | 89.0% | self-reported llm-stats | link → |