Qwen3 VL 235B A22B Instruct
Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 89.7% | self-reported llm-stats | link → |
| AIME 2025 | 74.7% | self-reported llm-stats | link → |
| AndroidWorld_SR | 63.7% | self-reported llm-stats | link → |
| Arena-Hard v2 | 77.4% | self-reported llm-stats | link → |
| BFCL-v3 | 67.7% | self-reported llm-stats | link → |
| BLINK | 70.7% | self-reported llm-stats | link → |
| CC-OCR | 82.2% | self-reported llm-stats | link → |
| CharadesSTA | 64.8% | self-reported llm-stats | link → |
| CharXiv-R | 62.1% | self-reported llm-stats | link → |
| Creative Writing v3 | 0.865 | self-reported llm-stats | link → |
| CSimpleQA | 83.4% | self-reported llm-stats | link → |
| DocVQAtest | 97.1% | self-reported llm-stats | link → |
| ERQA | 51.3% | self-reported llm-stats | link → |
| Hallusion Bench | 63.2% | self-reported llm-stats | link → |
| HMMT25 | 57.4% | self-reported llm-stats | link → |
| IFEval | 87.8% | self-reported llm-stats | link → |
| Include | 80.0% | self-reported llm-stats | link → |
| InfoVQAtest | 89.2% | self-reported llm-stats | link → |
| LiveBench 20241125 | 74.8% | self-reported llm-stats | link → |
| LiveCodeBench v5 | 61.4% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 54.3% | self-reported llm-stats | link → |
| LVBench | 67.7% | self-reported llm-stats | link → |
| MathVision | 66.5% | self-reported llm-stats | link → |
| MathVista-Mini | 84.9% | self-reported llm-stats | link → |
| MLVU | 84.3% | self-reported llm-stats | link → |
| MM-MT-Bench | 8.5 | self-reported llm-stats | link → |
| MMBench-V1.1 | 89.9% | self-reported llm-stats | link → |
| MMLU | 88.8% | self-reported llm-stats | link → |
| MMLU-Pro | 81.8% | self-reported llm-stats | link → |
| MMLU-ProX | 77.8% | self-reported llm-stats | link → |
| MMLU-Redux | 92.2% | self-reported llm-stats | link → |
| MMMU-Pro | 68.1% | self-reported llm-stats | link → |
| MMMUval | 78.7% | self-reported llm-stats | link → |
| MMStar | 78.4% | self-reported llm-stats | link → |
| MuirBench | 72.8% | self-reported llm-stats | link → |
| Multi-IF | 76.3% | self-reported llm-stats | link → |
| MultiPL-E | 86.1% | self-reported llm-stats | link → |
| OCRBench | 92.0% | self-reported llm-stats | link → |
| OCRBench-V2 (en) | 67.1% | self-reported llm-stats | link → |
| OCRBench-V2 (zh) | 61.8% | self-reported llm-stats | link → |
| ODinW | 48.6% | self-reported llm-stats | link → |
| OSWorld | 66.7% | self-reported llm-stats | link → |
| RealWorldQA | 79.3% | self-reported llm-stats | link → |
| ScreenSpot | 95.4% | self-reported llm-stats | link → |
| ScreenSpot Pro | 62.0% | self-reported llm-stats | link → |
| SimpleQA | 51.9% | self-reported llm-stats | link → |
| SuperGPQA | 60.4% | self-reported llm-stats | link → |
| VideoMME w/o sub. | 79.2% | self-reported llm-stats | link → |
| VideoMMMU | 74.7% | self-reported llm-stats | link → |
| WritingBench | 85.5% | self-reported llm-stats | link → |