MiMo-V2.5

MiMo-V2.5 is Xiaomi's native omnimodal sparse Mixture-of-Experts model with 310B total parameters, 15B activated parameters, and a 1M-token context window. Built on the MiMo-V2-Flash backbone, it adds dedicated vision and audio encoders for text, image, video, and audio understanding, and is post-trained with SFT, agentic reinforcement learning, and Multi-Teacher On-Policy Distillation for multimodal perception, long-context reasoning, and agentic workflows.

HR-Bench (4k) i

88.5%

source →
Video-MME i

87.7%

source →
OmniDocBench i

87.2%

source →
GraphWalks i

87.0%

source →
DailyOmni i

83.5%

source →
CharXiv-R i

81.0%

source →
MMMU-Pro i

77.9%

source →
MiMo Coding Bench i

71.8%

source →
Terminal-Bench 2.0 i

65.8%

source →
VideoHolmes i

64.0%

source →
Claw-Eval i

63.2%

source →
SWE-Bench Pro i

56.1%

source →
ResearchClawBench i

16.9%

source →

Pricing, uptime, and speed via OpenRouter — updated Jun 12, 2026, 04:59 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Xiaomi	available	$0.14/Mtok cache $0.00/Mtok	$0.28/Mtok	1.0M tokens context 131K tokens max output	99.9% 5m 99.8%	2,581 ms p50 TTFT 41 tok/s p50	fp8