Kimi K2.6

Kimi K2.6 is Moonshot AI's open-source, native multimodal agentic model focused on state-of-the-art coding, long-horizon execution, and agent swarm capabilities. It scales horizontally to 300 sub-agents executing 4,000 coordinated steps, dynamically decomposing tasks into parallel, domain-specialized subtasks. K2.6 unifies text, image, and video input with thinking and non-thinking modes, supports a 256K context, and powers proactive 24/7 background agents that manage schedules, execute code, and orchestrate cross-platform operations without human oversight.

Benchmark results

Benchmark Score Tags Source
AIME 2026 96.4% self-reported llm-stats link →
APEX-Agents 27.9% self-reported llm-stats link →
BabyVision 68.5% self-reported llm-stats link →
BrowseComp 86.3% self-reported llm-stats link →
CharXiv-R 86.7% self-reported llm-stats link →
Claw-Eval 80.9% self-reported llm-stats link →
DeepSearchQA 83.0% self-reported llm-stats link →
GPQA 90.5% self-reported llm-stats link →
HMMT Feb 26 92.7% self-reported llm-stats link →
Humanity's Last Exam 36.4% self-reported llm-stats link →
IMO-AnswerBench 86.0% self-reported llm-stats link →
LiveCodeBench v6 89.6% self-reported llm-stats link →
MathVision 93.2% self-reported llm-stats link →
MCP-Mark 55.9% self-reported llm-stats link →
MMMU-Pro 80.1% self-reported llm-stats link →
OJBench 60.6% self-reported llm-stats link →
OSWorld-Verified 73.1% self-reported llm-stats link →
SciCode 52.2% self-reported llm-stats link →
SWE-bench Multilingual 76.7% self-reported llm-stats link →
SWE-Bench Pro 58.6% self-reported llm-stats link →
SWE-Bench Verified 80.2% self-reported llm-stats link →
Terminal-Bench 2.0 66.7% self-reported llm-stats link →
Toolathlon 50.0% self-reported llm-stats link →
V* 96.9% self-reported llm-stats link →
WideSearch 80.8% self-reported llm-stats link →