Qwen3 VL 235B A22B Instruct

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.

Benchmark results

Benchmark Score Tags Source
AI2D 89.7% self-reported llm-stats link →
AIME 2025 74.7% self-reported llm-stats link →
AndroidWorld_SR 63.7% self-reported llm-stats link →
Arena-Hard v2 77.4% self-reported llm-stats link →
BFCL-v3 67.7% self-reported llm-stats link →
BLINK 70.7% self-reported llm-stats link →
CC-OCR 82.2% self-reported llm-stats link →
CharadesSTA 64.8% self-reported llm-stats link →
CharXiv-R 62.1% self-reported llm-stats link →
Creative Writing v3 0.865 self-reported llm-stats link →
CSimpleQA 83.4% self-reported llm-stats link →
DocVQAtest 97.1% self-reported llm-stats link →
ERQA 51.3% self-reported llm-stats link →
Hallusion Bench 63.2% self-reported llm-stats link →
HMMT25 57.4% self-reported llm-stats link →
IFEval 87.8% self-reported llm-stats link →
Include 80.0% self-reported llm-stats link →
InfoVQAtest 89.2% self-reported llm-stats link →
LiveBench 20241125 74.8% self-reported llm-stats link →
LiveCodeBench v5 61.4% self-reported llm-stats link →
LiveCodeBench v6 54.3% self-reported llm-stats link →
LVBench 67.7% self-reported llm-stats link →
MathVision 66.5% self-reported llm-stats link →
MathVista-Mini 84.9% self-reported llm-stats link →
MLVU 84.3% self-reported llm-stats link →
MM-MT-Bench 8.5 self-reported llm-stats link →
MMBench-V1.1 89.9% self-reported llm-stats link →
MMLU 88.8% self-reported llm-stats link →
MMLU-Pro 81.8% self-reported llm-stats link →
MMLU-ProX 77.8% self-reported llm-stats link →
MMLU-Redux 92.2% self-reported llm-stats link →
MMMU-Pro 68.1% self-reported llm-stats link →
MMMUval 78.7% self-reported llm-stats link →
MMStar 78.4% self-reported llm-stats link →
MuirBench 72.8% self-reported llm-stats link →
Multi-IF 76.3% self-reported llm-stats link →
MultiPL-E 86.1% self-reported llm-stats link →
OCRBench 92.0% self-reported llm-stats link →
OCRBench-V2 (en) 67.1% self-reported llm-stats link →
OCRBench-V2 (zh) 61.8% self-reported llm-stats link →
ODinW 48.6% self-reported llm-stats link →
OSWorld 66.7% self-reported llm-stats link →
RealWorldQA 79.3% self-reported llm-stats link →
ScreenSpot 95.4% self-reported llm-stats link →
ScreenSpot Pro 62.0% self-reported llm-stats link →
SimpleQA 51.9% self-reported llm-stats link →
SuperGPQA 60.4% self-reported llm-stats link →
VideoMME w/o sub. 79.2% self-reported llm-stats link →
VideoMMMU 74.7% self-reported llm-stats link →
WritingBench 85.5% self-reported llm-stats link →