Qwen2.5 VL 72B Instruct
Qwen2.5-VL is the new flagship vision-language model of Qwen, significantly improved from Qwen2-VL. It excels at recognizing objects, analyzing text/charts/layouts in images, acting as a visual agent, understanding long videos (over 1 hour) with event pinpointing, performing visual localization (bounding boxes/points), and generating structured outputs from documents.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 88.4% | self-reported llm-stats | link → |
| AITZ_EM | 83.2% | self-reported llm-stats | link → |
| Android Control High_EM | 67.4% | self-reported llm-stats | link → |
| Android Control Low_EM | 93.7% | self-reported llm-stats | link → |
| AndroidWorld_SR | 35.0% | self-reported llm-stats | link → |
| CC-OCR | 79.8% | self-reported llm-stats | link → |
| ChartQA | 89.5% | self-reported llm-stats | link → |
| DocVQA | 96.4% | self-reported llm-stats | link → |
| EgoSchema | 76.2% | self-reported llm-stats | link → |
| Hallusion Bench | 55.2% | self-reported llm-stats | link → |
| LVBench | 47.3% | self-reported llm-stats | link → |
| MathVision | 38.1% | self-reported llm-stats | link → |
| MathVista-Mini | 74.8% | self-reported llm-stats | link → |
| MLVU-M | 74.6% | self-reported llm-stats | link → |
| MMBench | 88.0% | self-reported llm-stats | link → |
| MMBench-Video | 2.0% | self-reported llm-stats | link → |
| MMMU | 70.2% | self-reported llm-stats | link → |
| MMMU-Pro | 51.1% | self-reported llm-stats | link → |
| MMStar | 70.8% | self-reported llm-stats | link → |
| MMVet | 76.2% | self-reported llm-stats | link → |
| MobileMiniWob++_SR | 68.0% | self-reported llm-stats | link → |
| MVBench | 70.4% | self-reported llm-stats | link → |
| OCRBench | 88.5% | self-reported llm-stats | link → |
| OCRBench-V2 (en) | 61.5% | self-reported llm-stats | link → |
| OSWorld | 8.8% | self-reported llm-stats | link → |
| PerceptionTest | 73.2% | self-reported llm-stats | link → |
| ScreenSpot | 87.1% | self-reported llm-stats | link → |
| ScreenSpot Pro | 43.6% | self-reported llm-stats | link → |
| TempCompass | 74.8% | self-reported llm-stats | link → |
| VideoMME w/o sub. | 73.3% | self-reported llm-stats | link → |