Kimi K2-Instruct-0905

Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2, achieving state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models. This Mixture-of-Experts model features 32 billion activated parameters and 1 trillion total parameters, meticulously optimized for agentic tasks. Key features include enhanced agentic coding intelligence, extended context length to 256K tokens, and a hybrid architecture trained with MuonClip optimizer on 15.5T tokens. The model achieves 65.8% on SWE-bench Verified (single attempt), 47.3% on SWE-bench Multilingual, and excels at tool use with 70.6% on Tau2-retail. It is a reflex-grade model without long thinking, designed to act and execute complex tasks seamlessly.

Benchmark results

Benchmark Score Tags Source
ACEBench 76.5% self-reported llm-stats link →
Aider-Polyglot 60.0% self-reported llm-stats link →
AIME 2024 69.6% self-reported llm-stats link →
AIME 2025 49.5% self-reported llm-stats link →
AutoLogi 89.5% self-reported llm-stats link →
CNMO 2024 74.3% self-reported llm-stats link →
GPQA 75.1% self-reported llm-stats link →
HLE 4.7% self-reported llm-stats link →
HMMT 2025 38.8% self-reported llm-stats link →
Humanity's Last Exam 4.7% self-reported llm-stats link →
IFEval 89.8% self-reported llm-stats link →
LiveBench 76.4% self-reported llm-stats link →
LiveCodeBench 53.7% self-reported llm-stats link →
MATH-500 97.4% self-reported llm-stats link →
MMLU 89.5% self-reported llm-stats link →
MMLU-Pro 81.1% self-reported llm-stats link →
MMLU-Redux 92.7% self-reported llm-stats link →
Multi-Challenge 54.1% self-reported llm-stats link →
MultiPL-E 85.7% self-reported llm-stats link →
OJBench 27.1% self-reported llm-stats link →
PolyMath-en 65.1% self-reported llm-stats link →
SimpleQA 31.0% self-reported llm-stats link →
SuperGPQA 57.2% self-reported llm-stats link →
SWE-bench Multilingual 47.3% self-reported llm-stats link →
SWE-Bench Verified 65.8% self-reported llm-stats link →
Tau2 Airline 56.5% self-reported llm-stats link →
Tau2 Retail 70.6% self-reported llm-stats link →
Tau2 Telecom 65.8% self-reported llm-stats link →
Terminal-Bench 25.0% self-reported llm-stats link →
ZebraLogic 89.0% self-reported llm-stats link →