Kimi K2 Instruct

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the MuonClip optimizer, it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. The instruct variant is post-trained for drop-in, general-purpose chat and agentic experiences without long thinking.

Benchmark results

Benchmark Score Tags Source
ACEBench 76.5% self-reported llm-stats link →
Aider-Polyglot 60.0% self-reported llm-stats link →
AIME 2024 69.6% self-reported llm-stats link →
AIME 2025 49.5% self-reported llm-stats link →
AutoLogi 89.5% self-reported llm-stats link →
CBNSL 95.6% self-reported llm-stats link →
CNMO 2024 74.3% self-reported llm-stats link →
CSimpleQA 78.4% self-reported llm-stats link →
GPQA 75.1% self-reported llm-stats link →
GSM8k 97.3% self-reported llm-stats link →
HMMT 2025 38.8% self-reported llm-stats link →
HumanEval 93.3% self-reported llm-stats link →
HumanEval-ER 81.1% self-reported llm-stats link →
Humanity's Last Exam 4.7% self-reported llm-stats link →
IFEval 89.8% self-reported llm-stats link →
LiveBench 76.4% self-reported llm-stats link →
LiveCodeBench v6 53.7% self-reported llm-stats link →
MATH-500 97.4% self-reported llm-stats link →
MMLU 89.5% self-reported llm-stats link →
MMLU-Pro 81.1% self-reported llm-stats link →
MMLU-Redux 92.7% self-reported llm-stats link →
Multi-Challenge 54.1% self-reported llm-stats link →
MultiPL-E 85.7% self-reported llm-stats link →
MuSR 76.4% self-reported llm-stats link →
OJBench 27.1% self-reported llm-stats link →
PolyMath-en 65.1% self-reported llm-stats link →
SimpleQA 31.0% self-reported llm-stats link →
SuperGPQA 57.2% self-reported llm-stats link →
SWE-bench Multilingual 47.3% self-reported llm-stats link →
SWE-bench Verified (Agentic Coding) 65.8% self-reported llm-stats link →
SWE-bench Verified (Agentless) 51.8% self-reported llm-stats link →
SWE-bench Verified (Multiple Attempts) 71.6% self-reported llm-stats link →
Tau2 Airline 56.5% self-reported llm-stats link →
Tau2 Retail 70.6% self-reported llm-stats link →
Tau2 Telecom 65.8% self-reported llm-stats link →
Terminal-Bench 30.0% self-reported llm-stats link →
Terminus 25.0% self-reported llm-stats link →
ZebraLogic 89.0% self-reported llm-stats link →