MiniCPM-SALA
MiniCPM-SALA (Sparse Attention and Linear Attention) is a 9B hybrid model built from a MiniCPM-4.0 checkpoint via continual training (~2T tokens, 25% of training-from-scratch cost). It interleaves 25% InfLLM-V2 sparse attention and 75% Lightning Attention layers, achieving up to 3.5x inference speed over dense baselines at 256K tokens. With HyPE (Hybrid Positional Encoding) and NoPE in sparse layers, the model extrapolates to 2048K tokens despite a 520K training length, enabling 1M-token inference on consumer GPUs like the RTX 5090.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 83.8% | self-reported llm-stats | link → |
| AIME 2025 | 78.3% | self-reported llm-stats | link → |
| BBH | 81.5% | self-reported llm-stats | link → |
| CMMLU | 81.5% | self-reported llm-stats | link → |
| HumanEval | 95.1% | self-reported llm-stats | link → |
| IFEval | 76.3% | self-reported llm-stats | link → |
| LiveCodeBench v5 | 60.5% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 52.0% | self-reported llm-stats | link → |
| MBPP | 89.1% | self-reported llm-stats | link → |
| MMLU-Pro | 67.0% | self-reported llm-stats | link → |
| MRCR 128K (2-needle) | 28.6% | self-reported llm-stats | link → |
| MRCR 128K (4-needle) | 19.6% | self-reported llm-stats | link → |
| MRCR 128K (8-needle) | 10.1% | self-reported llm-stats | link → |
| MRCR 64K (2-needle) | 29.8% | self-reported llm-stats | link → |
| MRCR 64K (4-needle) | 20.6% | self-reported llm-stats | link → |
| MRCR 64K (8-needle) | 16.6% | self-reported llm-stats | link → |
| NoLiMa 128K | 23.9% | self-reported llm-stats | link → |
| NoLiMa 32K | 54.5% | self-reported llm-stats | link → |
| NoLiMa 64K | 43.0% | self-reported llm-stats | link → |
| RULER 1000K | 86.3% | self-reported llm-stats | link → |
| RULER 128k | 89.4% | self-reported llm-stats | link → |
| RULER 2048K | 81.6% | self-reported llm-stats | link → |
| RULER 512K | 87.1% | self-reported llm-stats | link → |
| RULER 64k | 92.7% | self-reported llm-stats | link → |