LongCat-Flash-Thinking-2601
LongCat-Flash-Thinking-2601 is an upgraded version of LongCat-Flash-Thinking with 560B total parameters (MoE, ~27B activated). It achieves open-source SOTA performance on core evaluation benchmarks including Agentic Search, Agentic Tool Use, and Tool-Integrated Reasoning (TIR). Features Heavy Thinking mode that contributes +4-6 points on demanding agentic reasoning benchmarks. Mid-training with structured agentic trajectories improves pass@k by up to +12 points, and context management yields +17.5 improvement.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 99.6% | self-reported llm-stats | link → |
| BrowseComp | 56.6% | self-reported llm-stats | link → |
| BrowseComp-zh | 69.0% | self-reported llm-stats | link → |
| GPQA | 80.5% | self-reported llm-stats | link → |
| Humanity's Last Exam | 25.2% | self-reported llm-stats | link → |
| IMO-AnswerBench | 78.6% | self-reported llm-stats | link → |
| LiveCodeBench | 82.8% | self-reported llm-stats | link → |
| SWE-Bench Verified | 70.0% | self-reported llm-stats | link → |
| Tau2 Airline | 76.5% | self-reported llm-stats | link → |
| Tau2 Retail | 88.6% | self-reported llm-stats | link → |
| Tau2 Telecom | 99.3% | self-reported llm-stats | link → |