LongCat-Flash-Chat
LongCat-Flash-Chat is Meituan's first open-source foundation model, a 560B parameter Mixture-of-Experts (MoE) model that dynamically activates 18.6B-31.3B parameters (~27B average) based on contextual demands. It features Zero-Computation Experts for efficient routing and supports 128K context. Optimized for conversational and agentic tasks, it shows competitive performance across reasoning, coding, instruction following, and domain benchmarks with particular strengths in tool use and complex multi-step interactions. Achieves over 100 tokens per second on H800 GPUs.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 61.3% | self-reported llm-stats | link → |
| CMMLU | 84.3% | self-reported llm-stats | link → |
| DROP | 79.1% | self-reported llm-stats | link → |
| GPQA | 73.2% | self-reported llm-stats | link → |
| HumanEval | 88.4% | self-reported llm-stats | link → |
| IFEval | 89.6% | self-reported llm-stats | link → |
| LiveCodeBench | 48.0% | self-reported llm-stats | link → |
| MATH-500 | 96.4% | self-reported llm-stats | link → |
| MMLU | 89.7% | self-reported llm-stats | link → |
| MMLU-Pro | 82.7% | self-reported llm-stats | link → |
| SWE-Bench Verified | 60.4% | self-reported llm-stats | link → |
| Tau2 Airline | 58.0% | self-reported llm-stats | link → |
| Tau2 Retail | 71.3% | self-reported llm-stats | link → |
| Tau2 Telecom | 73.7% | self-reported llm-stats | link → |
| Terminal-Bench | 39.5% | self-reported llm-stats | link → |
| ZebraLogic | 89.3% | self-reported llm-stats | link → |