Qwen3 VL 235B A22B Thinking

Qwen3-VL-235B-A22B-Thinking is the most powerful vision-language model in the Qwen series, featuring 236B parameters with MoE architecture for reasoning-enhanced multimodal understanding. Key capabilities include: Visual Agent (operates PC/mobile GUIs, recognizes elements, invokes tools), Visual Coding (generates Draw.io/HTML/CSS/JS from images/videos), Advanced Spatial Perception (2D grounding and 3D grounding for spatial reasoning and embodied AI), Long Context & Video Understanding (native 256K context expandable to 1M, handles hours-long video with second-level indexing), Enhanced Multimodal Reasoning (excels in STEM/Math with causal analysis), Upgraded Visual Recognition (celebrities, anime, products, landmarks, flora/fauna), and Expanded OCR (32 languages, robust in low light/blur/tilt). Architecture innovations include Interleaved-MRoPE for positional embeddings, DeepStack for multi-level ViT feature fusion, and Text-Timestamp Alignment for precise video temporal modeling.

Benchmark results

Benchmark Score Tags Source
AI2D 89.2% self-reported llm-stats link →
AIME 2025 89.7% self-reported llm-stats link →
ARKitScenes 53.7% self-reported llm-stats link →
BFCL-v3 71.9% self-reported llm-stats link →
BLINK 67.1% self-reported llm-stats link →
CC-OCR 81.5% self-reported llm-stats link →
CharadesSTA 63.5% self-reported llm-stats link →
CharXiv-R 66.1% self-reported llm-stats link →
CountBench 93.7% self-reported llm-stats link →
Creative Writing v3 0.857 self-reported llm-stats link →
Design2Code 93.4% self-reported llm-stats link →
DocVQAtest 96.5% self-reported llm-stats link →
EmbSpatialBench 84.3% self-reported llm-stats link →
ERQA 52.5% self-reported llm-stats link →
Hallusion Bench 66.7% self-reported llm-stats link →
HMMT25 77.4% self-reported llm-stats link →
Humanity's Last Exam 13.6% self-reported llm-stats link →
Hypersim 11.0% self-reported llm-stats link →
IFEval 88.2% self-reported llm-stats link →
Include 80.0% self-reported llm-stats link →
InfoVQAtest 89.5% self-reported llm-stats link →
LiveBench 20241125 79.6% self-reported llm-stats link →
LiveCodeBench v6 70.1% self-reported llm-stats link →
LVBench 63.6% self-reported llm-stats link →
MathVerse-Mini 85.0% self-reported llm-stats link →
MathVision 74.6% self-reported llm-stats link →
MathVista-Mini 85.8% self-reported llm-stats link →
MIABench 92.7% self-reported llm-stats link →
MLVU 83.8% self-reported llm-stats link →
MM-MT-Bench 8.5 self-reported llm-stats link →
MMBench-V1.1 90.6% self-reported llm-stats link →
MMLongBench-Doc 56.2% self-reported llm-stats link →
MMLU 90.6% self-reported llm-stats link →
MMLU-Pro 83.8% self-reported llm-stats link →
MMLU-ProX 80.6% self-reported llm-stats link →
MMLU-Redux 93.7% self-reported llm-stats link →
MMMU-Pro 69.3% self-reported llm-stats link →
MMMUval 80.6% self-reported llm-stats link →
MMStar 78.7% self-reported llm-stats link →
MuirBench 80.1% self-reported llm-stats link →
Multi-IF 79.1% self-reported llm-stats link →
Objectron 71.2% self-reported llm-stats link →
OCRBench 87.5% self-reported llm-stats link →
OCRBench-V2 (en) 66.8% self-reported llm-stats link →
OCRBench-V2 (zh) 63.5% self-reported llm-stats link →
ODinW 43.2% self-reported llm-stats link →
OSWorld 38.1% self-reported llm-stats link →
OSWorld-G 68.3% self-reported llm-stats link →
RealWorldQA 81.3% self-reported llm-stats link →
RefCOCO-avg 92.4% self-reported llm-stats link →
RefSpatialBench 69.9% self-reported llm-stats link →
RoboSpatialHome 73.9% self-reported llm-stats link →
ScreenSpot 95.4% self-reported llm-stats link →
ScreenSpot Pro 61.8% self-reported llm-stats link →
SIFO 77.3% self-reported llm-stats link →
SIFO-Multiturn 71.1% self-reported llm-stats link →
SimpleQA 44.4% self-reported llm-stats link →
SimpleVQA 61.3% self-reported llm-stats link →
SUNRGBD 34.9% self-reported llm-stats link →
SuperGPQA 64.3% self-reported llm-stats link →
VideoMME w/o sub. 79.0% self-reported llm-stats link →
VideoMMMU 80.0% self-reported llm-stats link →
VisuLogic 34.4% self-reported llm-stats link →
WritingBench 86.7% self-reported llm-stats link →
ZebraLogic 97.3% self-reported llm-stats link →
ZEROBench 4.0% self-reported llm-stats link →
ZEROBench-Sub 27.7% self-reported llm-stats link →