LongCat-Flash-Lite
LongCat-Flash-Lite is a lightweight MoE model from Meituan with 68.5B total parameters and only 2.9B-4.5B activated per token. It explores N-gram embedding expansion as a new scaling direction, supporting 256K context length via YaRN. Optimized for agent tooling and programming tasks, achieving 500-700 tokens per second inference speed while maintaining strong performance on coding, math, and agentic benchmarks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 72.2% | self-reported llm-stats | link → |
| AIME 2025 | 63.2% | self-reported llm-stats | link → |
| CMMLU | 82.5% | self-reported llm-stats | link → |
| GPQA | 66.8% | self-reported llm-stats | link → |
| MATH-500 | 96.8% | self-reported llm-stats | link → |
| MMLU | 85.5% | self-reported llm-stats | link → |
| MMLU-Pro | 78.3% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 38.1% | self-reported llm-stats | link → |
| SWE-Bench Verified | 54.4% | self-reported llm-stats | link → |
| Tau2 Airline | 58.0% | self-reported llm-stats | link → |
| Tau2 Retail | 73.1% | self-reported llm-stats | link → |
| Tau2 Telecom | 72.8% | self-reported llm-stats | link → |
| Terminal-Bench | 33.8% | self-reported llm-stats | link → |