LongCat-Flash-Thinking-2601

LongCat-Flash-Thinking-2601 is an upgraded version of LongCat-Flash-Thinking with 560B total parameters (MoE, ~27B activated). It achieves open-source SOTA performance on core evaluation benchmarks including Agentic Search, Agentic Tool Use, and Tool-Integrated Reasoning (TIR). Features Heavy Thinking mode that contributes +4-6 points on demanding agentic reasoning benchmarks. Mid-training with structured agentic trajectories improves pass@k by up to +12 points, and context management yields +17.5 improvement.

Benchmark results

Benchmark Score Tags Source
AIME 2025 99.6% self-reported llm-stats link →
BrowseComp 56.6% self-reported llm-stats link →
BrowseComp-zh 69.0% self-reported llm-stats link →
GPQA 80.5% self-reported llm-stats link →
Humanity's Last Exam 25.2% self-reported llm-stats link →
IMO-AnswerBench 78.6% self-reported llm-stats link →
LiveCodeBench 82.8% self-reported llm-stats link →
SWE-Bench Verified 70.0% self-reported llm-stats link →
Tau2 Airline 76.5% self-reported llm-stats link →
Tau2 Retail 88.6% self-reported llm-stats link →
Tau2 Telecom 99.3% self-reported llm-stats link →