LongCat-Flash-Lite

LongCat-Flash-Lite is a lightweight MoE model from Meituan with 68.5B total parameters and only 2.9B-4.5B activated per token. It explores N-gram embedding expansion as a new scaling direction, supporting 256K context length via YaRN. Optimized for agent tooling and programming tasks, achieving 500-700 tokens per second inference speed while maintaining strong performance on coding, math, and agentic benchmarks.

Benchmark results

Benchmark Score Tags Source
AIME 2024 72.2% self-reported llm-stats link →
AIME 2025 63.2% self-reported llm-stats link →
CMMLU 82.5% self-reported llm-stats link →
GPQA 66.8% self-reported llm-stats link →
MATH-500 96.8% self-reported llm-stats link →
MMLU 85.5% self-reported llm-stats link →
MMLU-Pro 78.3% self-reported llm-stats link →
SWE-bench Multilingual 38.1% self-reported llm-stats link →
SWE-Bench Verified 54.4% self-reported llm-stats link →
Tau2 Airline 58.0% self-reported llm-stats link →
Tau2 Retail 73.1% self-reported llm-stats link →
Tau2 Telecom 72.8% self-reported llm-stats link →
Terminal-Bench 33.8% self-reported llm-stats link →