Qwen2.5-Omni-7B

Qwen2.5-Omni is the flagship end-to-end multimodal model in the Qwen series. It processes diverse inputs including text, images, audio, and video, delivering real-time streaming responses through text generation and natural speech synthesis using a novel Thinker-Talker architecture.

Benchmark results

Benchmark Score Tags Source
AI2D 83.2% self-reported llm-stats link →
ChartQA 85.3% self-reported llm-stats link →
Common Voice 15 7.6% self-reported llm-stats link →
CoVoST2 en-zh 41.4% self-reported llm-stats link →
CRPErelation 76.5% self-reported llm-stats link →
DocVQA 95.2% self-reported llm-stats link →
EgoSchema 68.6% self-reported llm-stats link →
FLEURS 4.1% self-reported llm-stats link →
GiantSteps Tempo 88.0% self-reported llm-stats link →
GPQA 30.8% self-reported llm-stats link →
GSM8k 88.7% self-reported llm-stats link →
HumanEval 78.7% self-reported llm-stats link →
LiveBench 29.6% self-reported llm-stats link →
MATH 71.5% self-reported llm-stats link →
MathVision 25.0% self-reported llm-stats link →
MathVista 67.9% self-reported llm-stats link →
MBPP 73.2% self-reported llm-stats link →
Meld 57.0% self-reported llm-stats link →
MM-MT-Bench 0.06 self-reported llm-stats link →
MMAU 65.6% self-reported llm-stats link →
MMAU Music 69.2% self-reported llm-stats link →
MMAU Sound 67.9% self-reported llm-stats link →
MMAU Speech 59.8% self-reported llm-stats link →
MMBench-V1.1 81.8% self-reported llm-stats link →
MME-RealWorld 61.6% self-reported llm-stats link →
MMLU-Pro 47.0% self-reported llm-stats link →
MMLU-Redux 71.0% self-reported llm-stats link →
MMMU 59.2% self-reported llm-stats link →
MMMU-Pro 36.6% self-reported llm-stats link →
MMStar 64.0% self-reported llm-stats link →
MuirBench 59.2% self-reported llm-stats link →
MultiPL-E 65.8% self-reported llm-stats link →
MusicCaps 32.8% self-reported llm-stats link →
MVBench 70.3% self-reported llm-stats link →
NMOS 4.5% self-reported llm-stats link →
OCRBench_V2 57.8% self-reported llm-stats link →
ODinW 42.4% self-reported llm-stats link →
OmniBench 56.1% self-reported llm-stats link →
OmniBench Music 52.8% self-reported llm-stats link →
PointGrounding 66.5% self-reported llm-stats link →
RealWorldQA 70.3% self-reported llm-stats link →
TextVQA 84.4% self-reported llm-stats link →
VideoMME w sub. 72.4% self-reported llm-stats link →
VocalSound 93.9% self-reported llm-stats link →
VoiceBench Avg 74.1% self-reported llm-stats link →