Qwen3 VL 30B A3B Thinking
Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 86.9% | self-reported llm-stats | link → |
| AIME 2025 | 83.1% | self-reported llm-stats | link → |
| Arena-Hard v2 | 56.7% | self-reported llm-stats | link → |
| BFCL-v3 | 68.6% | self-reported llm-stats | link → |
| BLINK | 65.4% | self-reported llm-stats | link → |
| CC-OCR | 77.8% | self-reported llm-stats | link → |
| CharadesSTA | 62.7% | self-reported llm-stats | link → |
| CharXiv-D | 86.9% | self-reported llm-stats | link → |
| CharXiv-R | 56.6% | self-reported llm-stats | link → |
| Creative Writing v3 | 0.825 | self-reported llm-stats | link → |
| DocVQAtest | 95.0% | self-reported llm-stats | link → |
| ERQA | 45.3% | self-reported llm-stats | link → |
| GPQA | 74.4% | self-reported llm-stats | link → |
| Hallusion Bench | 66.0% | self-reported llm-stats | link → |
| HMMT25 | 67.6% | self-reported llm-stats | link → |
| IFEval | 81.7% | self-reported llm-stats | link → |
| Include | 74.5% | self-reported llm-stats | link → |
| InfoVQAtest | 86.0% | self-reported llm-stats | link → |
| LiveBench 20241125 | 72.1% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 64.2% | self-reported llm-stats | link → |
| LVBench | 59.2% | self-reported llm-stats | link → |
| MathVision | 65.7% | self-reported llm-stats | link → |
| MathVista-Mini | 81.9% | self-reported llm-stats | link → |
| MLVU-M | 78.9% | self-reported llm-stats | link → |
| MM-MT-Bench | 7.9 | self-reported llm-stats | link → |
| MMBench-V1.1 | 88.9% | self-reported llm-stats | link → |
| MMLU | 87.6% | self-reported llm-stats | link → |
| MMLU-Pro | 80.5% | self-reported llm-stats | link → |
| MMLU-ProX | 76.1% | self-reported llm-stats | link → |
| MMLU-Redux | 90.9% | self-reported llm-stats | link → |
| MMMU (val) | 76.0% | self-reported llm-stats | link → |
| MMMU-Pro | 63.0% | self-reported llm-stats | link → |
| MMStar | 75.5% | self-reported llm-stats | link → |
| MuirBench | 77.6% | self-reported llm-stats | link → |
| Multi-IF | 73.0% | self-reported llm-stats | link → |
| MVBench | 72.0% | self-reported llm-stats | link → |
| OCRBench | 83.9% | self-reported llm-stats | link → |
| OCRBench-V2 (en) | 62.6% | self-reported llm-stats | link → |
| OCRBench-V2 (zh) | 60.4% | self-reported llm-stats | link → |
| ODinW | 42.3% | self-reported llm-stats | link → |
| OSWorld | 30.6% | self-reported llm-stats | link → |
| PolyMATH | 51.7% | self-reported llm-stats | link → |
| RealWorldQA | 77.4% | self-reported llm-stats | link → |
| ScreenSpot | 94.7% | self-reported llm-stats | link → |
| ScreenSpot Pro | 57.3% | self-reported llm-stats | link → |
| SimpleQA | 23.9% | self-reported llm-stats | link → |
| SuperGPQA | 56.4% | self-reported llm-stats | link → |
| Video-MME | 73.3% | self-reported llm-stats | link → |
| VideoMMMU | 75.0% | self-reported llm-stats | link → |
| WritingBench | 85.2% | self-reported llm-stats | link → |