Qwen2.5-Omni-7B
Qwen2.5-Omni is the flagship end-to-end multimodal model in the Qwen series. It processes diverse inputs including text, images, audio, and video, delivering real-time streaming responses through text generation and natural speech synthesis using a novel Thinker-Talker architecture.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 83.2% | self-reported llm-stats | link → |
| ChartQA | 85.3% | self-reported llm-stats | link → |
| Common Voice 15 | 7.6% | self-reported llm-stats | link → |
| CoVoST2 en-zh | 41.4% | self-reported llm-stats | link → |
| CRPErelation | 76.5% | self-reported llm-stats | link → |
| DocVQA | 95.2% | self-reported llm-stats | link → |
| EgoSchema | 68.6% | self-reported llm-stats | link → |
| FLEURS | 4.1% | self-reported llm-stats | link → |
| GiantSteps Tempo | 88.0% | self-reported llm-stats | link → |
| GPQA | 30.8% | self-reported llm-stats | link → |
| GSM8k | 88.7% | self-reported llm-stats | link → |
| HumanEval | 78.7% | self-reported llm-stats | link → |
| LiveBench | 29.6% | self-reported llm-stats | link → |
| MATH | 71.5% | self-reported llm-stats | link → |
| MathVision | 25.0% | self-reported llm-stats | link → |
| MathVista | 67.9% | self-reported llm-stats | link → |
| MBPP | 73.2% | self-reported llm-stats | link → |
| Meld | 57.0% | self-reported llm-stats | link → |
| MM-MT-Bench | 0.06 | self-reported llm-stats | link → |
| MMAU | 65.6% | self-reported llm-stats | link → |
| MMAU Music | 69.2% | self-reported llm-stats | link → |
| MMAU Sound | 67.9% | self-reported llm-stats | link → |
| MMAU Speech | 59.8% | self-reported llm-stats | link → |
| MMBench-V1.1 | 81.8% | self-reported llm-stats | link → |
| MME-RealWorld | 61.6% | self-reported llm-stats | link → |
| MMLU-Pro | 47.0% | self-reported llm-stats | link → |
| MMLU-Redux | 71.0% | self-reported llm-stats | link → |
| MMMU | 59.2% | self-reported llm-stats | link → |
| MMMU-Pro | 36.6% | self-reported llm-stats | link → |
| MMStar | 64.0% | self-reported llm-stats | link → |
| MuirBench | 59.2% | self-reported llm-stats | link → |
| MultiPL-E | 65.8% | self-reported llm-stats | link → |
| MusicCaps | 32.8% | self-reported llm-stats | link → |
| MVBench | 70.3% | self-reported llm-stats | link → |
| NMOS | 4.5% | self-reported llm-stats | link → |
| OCRBench_V2 | 57.8% | self-reported llm-stats | link → |
| ODinW | 42.4% | self-reported llm-stats | link → |
| OmniBench | 56.1% | self-reported llm-stats | link → |
| OmniBench Music | 52.8% | self-reported llm-stats | link → |
| PointGrounding | 66.5% | self-reported llm-stats | link → |
| RealWorldQA | 70.3% | self-reported llm-stats | link → |
| TextVQA | 84.4% | self-reported llm-stats | link → |
| VideoMME w sub. | 72.4% | self-reported llm-stats | link → |
| VocalSound | 93.9% | self-reported llm-stats | link → |
| VoiceBench Avg | 74.1% | self-reported llm-stats | link → |