MiMo-V2.5
MiMo-V2.5 is Xiaomi's native omnimodal sparse Mixture-of-Experts model with 310B total parameters, 15B activated parameters, and a 1M-token context window. Built on the MiMo-V2-Flash backbone, it adds dedicated vision and audio encoders for text, image, video, and audio understanding, and is post-trained with SFT, agentic reinforcement learning, and Multi-Teacher On-Policy Distillation for multimodal perception, long-context reasoning, and agentic workflows.
| Provider | Status | Input | Output | Limits | Uptime | Speed | Notes |
|---|---|---|---|---|---|---|---|
| Xiaomi | available | $0.14/Mtok | $0.28/Mtok | 1.0M tokens context | 99.9% | 2,581 ms p50 TTFT | fp8 |