Qwen3 VL 8B Instruct

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.

Benchmark results

Benchmark Score Tags Source
AI2D 85.7% self-reported llm-stats link →
AIME 2025 45.9% self-reported llm-stats link →
BFCL-v3 66.3% self-reported llm-stats link →
BLINK 69.1% self-reported llm-stats link →
CC-OCR 79.9% self-reported llm-stats link →
CharadesSTA 56.0% self-reported llm-stats link →
CharXiv-D 83.0% self-reported llm-stats link →
CharXiv-R 46.4% self-reported llm-stats link →
DocVQAtest 96.1% self-reported llm-stats link →
ERQA 45.8% self-reported llm-stats link →
Hallusion Bench 61.1% self-reported llm-stats link →
HMMT25 32.5% self-reported llm-stats link →
IFEval 83.7% self-reported llm-stats link →
Include 67.0% self-reported llm-stats link →
InfoVQAtest 83.1% self-reported llm-stats link →
LiveBench 20241125 62.0% self-reported llm-stats link →
LiveCodeBench v6 39.3% self-reported llm-stats link →
LVBench 58.0% self-reported llm-stats link →
MathVision 53.9% self-reported llm-stats link →
MathVista-Mini 77.2% self-reported llm-stats link →
MLVU-M 78.1% self-reported llm-stats link →
MM-MT-Bench 7.7 self-reported llm-stats link →
MMBench-V1.1 85.0% self-reported llm-stats link →
MMLU 80.7% self-reported llm-stats link →
MMLU-Pro 71.6% self-reported llm-stats link →
MMLU-ProX 65.4% self-reported llm-stats link →
MMLU-Redux 84.9% self-reported llm-stats link →
MMMU (val) 69.6% self-reported llm-stats link →
MMMU-Pro 55.9% self-reported llm-stats link →
MMStar 70.9% self-reported llm-stats link →
MuirBench 64.4% self-reported llm-stats link →
Multi-IF 75.1% self-reported llm-stats link →
MVBench 68.7% self-reported llm-stats link →
OCRBench 89.6% self-reported llm-stats link →
OCRBench-V2 (en) 65.4% self-reported llm-stats link →
OCRBench-V2 (zh) 61.2% self-reported llm-stats link →
ODinW 44.7% self-reported llm-stats link →
OSWorld 33.9% self-reported llm-stats link →
PolyMATH 30.4% self-reported llm-stats link →
RealWorldQA 71.5% self-reported llm-stats link →
ScreenSpot 94.4% self-reported llm-stats link →
ScreenSpot Pro 54.6% self-reported llm-stats link →
SuperGPQA 44.5% self-reported llm-stats link →
Video-MME 71.4% self-reported llm-stats link →
VideoMMMU 65.3% self-reported llm-stats link →
WritingBench 83.1% self-reported llm-stats link →