Qwen3.5-27B
Qwen3.5-27B is a multimodal dense foundation model with 27 billion parameters. It combines strong reasoning, coding, multilingual, long-context, and visual understanding performance in a production-friendly open-weight package with a native 262K context window.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AA-LCR | 66.1% | self-reported llm-stats | link → |
| AI2D | 92.9% | self-reported llm-stats | link → |
| AndroidWorld_SR | 64.2% | self-reported llm-stats | link → |
| BabyVision | 44.6% | self-reported llm-stats | link → |
| BFCL-V4 | 68.5% | self-reported llm-stats | link → |
| BrowseComp | 61.0% | self-reported llm-stats | link → |
| BrowseComp-zh | 62.1% | self-reported llm-stats | link → |
| C-Eval | 90.5% | self-reported llm-stats | link → |
| CC-OCR | 81.0% | self-reported llm-stats | link → |
| CharXiv-R | 79.5% | self-reported llm-stats | link → |
| CodeForces | 80.7% | self-reported llm-stats | link → |
| CountBench | 97.8% | self-reported llm-stats | link → |
| DeepPlanning | 22.6% | self-reported llm-stats | link → |
| DynaMath | 87.7% | self-reported llm-stats | link → |
| EmbSpatialBench | 84.5% | self-reported llm-stats | link → |
| ERQA | 60.5% | self-reported llm-stats | link → |
| FullStackBench en | 60.1% | self-reported llm-stats | link → |
| FullStackBench zh | 57.4% | self-reported llm-stats | link → |
| Global PIQA | 87.5% | self-reported llm-stats | link → |
| GPQA | 85.5% | self-reported llm-stats | link → |
| Hallusion Bench | 70.0% | self-reported llm-stats | link → |
| HMMT 2025 | 92.0% | self-reported llm-stats | link → |
| HMMT25 | 89.8% | self-reported llm-stats | link → |
| Humanity's Last Exam | 48.5% | self-reported llm-stats | link → |
| Hypersim | 13.0% | self-reported llm-stats | link → |
| IFBench | 76.5% | self-reported llm-stats | link → |
| IFEval | 95.0% | self-reported llm-stats | link → |
| Include | 81.6% | self-reported llm-stats | link → |
| LingoQA | 82.0% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 80.7% | self-reported llm-stats | link → |
| LongBench v2 | 60.6% | self-reported llm-stats | link → |
| LVBench | 73.6% | self-reported llm-stats | link → |
| MathVision | 86.0% | self-reported llm-stats | link → |
| MathVista-Mini | 87.8% | self-reported llm-stats | link → |
| MAXIFE | 88.0% | self-reported llm-stats | link → |
| MedXpertQA | 62.4% | self-reported llm-stats | link → |
| MLVU | 85.9% | self-reported llm-stats | link → |
| MMBench-V1.1 | 92.6% | self-reported llm-stats | link → |
| MMLongBench-Doc | 60.2% | self-reported llm-stats | link → |
| MMLU-Pro | 86.1% | self-reported llm-stats | link → |
| MMLU-ProX | 82.2% | self-reported llm-stats | link → |
| MMLU-Redux | 93.2% | self-reported llm-stats | link → |
| MMMLU | 85.9% | self-reported llm-stats | link → |
| MMMU | 82.3% | self-reported llm-stats | link → |
| MMMU-Pro | 75.0% | self-reported llm-stats | link → |
| MMStar | 81.0% | self-reported llm-stats | link → |
| MMVU | 73.3% | self-reported llm-stats | link → |
| Multi-Challenge | 60.8% | self-reported llm-stats | link → |
| MVBench | 74.6% | self-reported llm-stats | link → |
| NOVA-63 | 58.1% | self-reported llm-stats | link → |
| Nuscene | 15.2% | self-reported llm-stats | link → |
| OCRBench | 89.4% | self-reported llm-stats | link → |
| ODinW | 41.1% | self-reported llm-stats | link → |
| OJBench | 40.1% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 88.9% | self-reported llm-stats | link → |
| OSWorld-Verified | 56.2% | self-reported llm-stats | link → |
| PMC-VQA | 62.4% | self-reported llm-stats | link → |
| PolyMATH | 71.2% | self-reported llm-stats | link → |
| RealWorldQA | 83.7% | self-reported llm-stats | link → |
| RefCOCO-avg | 90.9% | self-reported llm-stats | link → |
| RefSpatialBench | 67.7% | self-reported llm-stats | link → |
| ScreenSpot Pro | 70.3% | self-reported llm-stats | link → |
| Seal-0 | 47.2% | self-reported llm-stats | link → |
| SimpleVQA | 56.0% | self-reported llm-stats | link → |
| SlakeVQA | 80.0% | self-reported llm-stats | link → |
| SUNRGBD | 35.4% | self-reported llm-stats | link → |
| SuperGPQA | 65.6% | self-reported llm-stats | link → |
| SWE-Bench Verified | 72.4% | self-reported llm-stats | link → |
| t2-bench | 79.0% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 41.6% | self-reported llm-stats | link → |
| TIR-Bench | 59.8% | self-reported llm-stats | link → |
| V* | 93.7% | self-reported llm-stats | link → |
| VideoMME w sub. | 87.0% | self-reported llm-stats | link → |
| VideoMME w/o sub. | 82.8% | self-reported llm-stats | link → |
| VideoMMMU | 82.3% | self-reported llm-stats | link → |
| VITA-Bench | 41.9% | self-reported llm-stats | link → |
| VLMsAreBlind | 96.9% | self-reported llm-stats | link → |
| WideSearch | 61.1% | self-reported llm-stats | link → |
| WMT24++ | 77.6% | self-reported llm-stats | link → |
| ZEROBench | 10.0% | self-reported llm-stats | link → |
| ZEROBench-Sub | 36.2% | self-reported llm-stats | link → |