Qwen3 VL 235B A22B Thinking
Qwen3-VL-235B-A22B-Thinking is the most powerful vision-language model in the Qwen series, featuring 236B parameters with MoE architecture for reasoning-enhanced multimodal understanding. Key capabilities include: Visual Agent (operates PC/mobile GUIs, recognizes elements, invokes tools), Visual Coding (generates Draw.io/HTML/CSS/JS from images/videos), Advanced Spatial Perception (2D grounding and 3D grounding for spatial reasoning and embodied AI), Long Context & Video Understanding (native 256K context expandable to 1M, handles hours-long video with second-level indexing), Enhanced Multimodal Reasoning (excels in STEM/Math with causal analysis), Upgraded Visual Recognition (celebrities, anime, products, landmarks, flora/fauna), and Expanded OCR (32 languages, robust in low light/blur/tilt). Architecture innovations include Interleaved-MRoPE for positional embeddings, DeepStack for multi-level ViT feature fusion, and Text-Timestamp Alignment for precise video temporal modeling.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 89.2% | self-reported llm-stats | link → |
| AIME 2025 | 89.7% | self-reported llm-stats | link → |
| ARKitScenes | 53.7% | self-reported llm-stats | link → |
| BFCL-v3 | 71.9% | self-reported llm-stats | link → |
| BLINK | 67.1% | self-reported llm-stats | link → |
| CC-OCR | 81.5% | self-reported llm-stats | link → |
| CharadesSTA | 63.5% | self-reported llm-stats | link → |
| CharXiv-R | 66.1% | self-reported llm-stats | link → |
| CountBench | 93.7% | self-reported llm-stats | link → |
| Creative Writing v3 | 0.857 | self-reported llm-stats | link → |
| Design2Code | 93.4% | self-reported llm-stats | link → |
| DocVQAtest | 96.5% | self-reported llm-stats | link → |
| EmbSpatialBench | 84.3% | self-reported llm-stats | link → |
| ERQA | 52.5% | self-reported llm-stats | link → |
| Hallusion Bench | 66.7% | self-reported llm-stats | link → |
| HMMT25 | 77.4% | self-reported llm-stats | link → |
| Humanity's Last Exam | 13.6% | self-reported llm-stats | link → |
| Hypersim | 11.0% | self-reported llm-stats | link → |
| IFEval | 88.2% | self-reported llm-stats | link → |
| Include | 80.0% | self-reported llm-stats | link → |
| InfoVQAtest | 89.5% | self-reported llm-stats | link → |
| LiveBench 20241125 | 79.6% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 70.1% | self-reported llm-stats | link → |
| LVBench | 63.6% | self-reported llm-stats | link → |
| MathVerse-Mini | 85.0% | self-reported llm-stats | link → |
| MathVision | 74.6% | self-reported llm-stats | link → |
| MathVista-Mini | 85.8% | self-reported llm-stats | link → |
| MIABench | 92.7% | self-reported llm-stats | link → |
| MLVU | 83.8% | self-reported llm-stats | link → |
| MM-MT-Bench | 8.5 | self-reported llm-stats | link → |
| MMBench-V1.1 | 90.6% | self-reported llm-stats | link → |
| MMLongBench-Doc | 56.2% | self-reported llm-stats | link → |
| MMLU | 90.6% | self-reported llm-stats | link → |
| MMLU-Pro | 83.8% | self-reported llm-stats | link → |
| MMLU-ProX | 80.6% | self-reported llm-stats | link → |
| MMLU-Redux | 93.7% | self-reported llm-stats | link → |
| MMMU-Pro | 69.3% | self-reported llm-stats | link → |
| MMMUval | 80.6% | self-reported llm-stats | link → |
| MMStar | 78.7% | self-reported llm-stats | link → |
| MuirBench | 80.1% | self-reported llm-stats | link → |
| Multi-IF | 79.1% | self-reported llm-stats | link → |
| Objectron | 71.2% | self-reported llm-stats | link → |
| OCRBench | 87.5% | self-reported llm-stats | link → |
| OCRBench-V2 (en) | 66.8% | self-reported llm-stats | link → |
| OCRBench-V2 (zh) | 63.5% | self-reported llm-stats | link → |
| ODinW | 43.2% | self-reported llm-stats | link → |
| OSWorld | 38.1% | self-reported llm-stats | link → |
| OSWorld-G | 68.3% | self-reported llm-stats | link → |
| RealWorldQA | 81.3% | self-reported llm-stats | link → |
| RefCOCO-avg | 92.4% | self-reported llm-stats | link → |
| RefSpatialBench | 69.9% | self-reported llm-stats | link → |
| RoboSpatialHome | 73.9% | self-reported llm-stats | link → |
| ScreenSpot | 95.4% | self-reported llm-stats | link → |
| ScreenSpot Pro | 61.8% | self-reported llm-stats | link → |
| SIFO | 77.3% | self-reported llm-stats | link → |
| SIFO-Multiturn | 71.1% | self-reported llm-stats | link → |
| SimpleQA | 44.4% | self-reported llm-stats | link → |
| SimpleVQA | 61.3% | self-reported llm-stats | link → |
| SUNRGBD | 34.9% | self-reported llm-stats | link → |
| SuperGPQA | 64.3% | self-reported llm-stats | link → |
| VideoMME w/o sub. | 79.0% | self-reported llm-stats | link → |
| VideoMMMU | 80.0% | self-reported llm-stats | link → |
| VisuLogic | 34.4% | self-reported llm-stats | link → |
| WritingBench | 86.7% | self-reported llm-stats | link → |
| ZebraLogic | 97.3% | self-reported llm-stats | link → |
| ZEROBench | 4.0% | self-reported llm-stats | link → |
| ZEROBench-Sub | 27.7% | self-reported llm-stats | link → |