Qwen3.6 Plus
Qwen3.6 Plus is Alibaba's next-generation flagship model featuring a 1 million token native context window, up to 65,536 output tokens, and always-on chain-of-thought reasoning. It uses a next-generation hybrid architecture optimized for efficiency and scalability. It leads on Terminal-Bench 2.0 agentic coding (61.6), surpassing Claude 4.5 Opus, and achieves strong results on document understanding (OmniDocBench 91.2) and multimodal reasoning (MMMU 86.0). Compared to Qwen 3.5, it is significantly more decisive in reasoning, using fewer tokens on straightforward tasks with better agent stability.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AA-LCR | 68.3% | self-reported llm-stats | link → |
| AI2D | 94.4% | self-reported llm-stats | link → |
| AIME 2026 | 95.3% | self-reported llm-stats | link → |
| C-Eval | 93.3% | self-reported llm-stats | link → |
| CC-OCR | 83.4% | self-reported llm-stats | link → |
| CharXiv-R | 81.5% | self-reported llm-stats | link → |
| Claw-Eval | 58.7% | self-reported llm-stats | link → |
| CountBench | 97.6% | self-reported llm-stats | link → |
| DeepPlanning | 41.5% | self-reported llm-stats | link → |
| DynaMath | 88.0% | self-reported llm-stats | link → |
| ERQA | 65.7% | self-reported llm-stats | link → |
| Global PIQA | 89.8% | self-reported llm-stats | link → |
| GPQA | 90.4% | self-reported llm-stats | link → |
| HMMT 2025 | 96.7% | self-reported llm-stats | link → |
| HMMT Feb 26 | 87.8% | self-reported llm-stats | link → |
| HMMT25 | 94.6% | self-reported llm-stats | link → |
| Humanity's Last Exam | 28.8% | self-reported llm-stats | link → |
| IFBench | 74.2% | self-reported llm-stats | link → |
| IFEval | 94.3% | self-reported llm-stats | link → |
| IMO-AnswerBench | 83.8% | self-reported llm-stats | link → |
| Include | 85.1% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 87.1% | self-reported llm-stats | link → |
| LongBench v2 | 62.0% | self-reported llm-stats | link → |
| MathVision | 88.0% | self-reported llm-stats | link → |
| MAXIFE | 88.2% | self-reported llm-stats | link → |
| MCP Atlas | 74.1% | self-reported llm-stats | link → |
| MCP-Mark | 48.2% | self-reported llm-stats | link → |
| MLVU | 86.7% | self-reported llm-stats | link → |
| MMLongBench-Doc | 62.0% | self-reported llm-stats | link → |
| MMLU-Pro | 88.5% | self-reported llm-stats | link → |
| MMLU-ProX | 84.7% | self-reported llm-stats | link → |
| MMLU-Redux | 94.5% | self-reported llm-stats | link → |
| MMMLU | 89.5% | self-reported llm-stats | link → |
| MMMU | 86.0% | self-reported llm-stats | link → |
| MMMU-Pro | 78.8% | self-reported llm-stats | link → |
| MMStar | 83.3% | self-reported llm-stats | link → |
| NL2Repo | 37.9% | self-reported llm-stats | link → |
| NOVA-63 | 57.9% | self-reported llm-stats | link → |
| ODinW | 51.8% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 91.2% | self-reported llm-stats | link → |
| OSWorld-Verified | 62.5% | self-reported llm-stats | link → |
| PolyMATH | 77.4% | self-reported llm-stats | link → |
| RealWorldQA | 85.4% | self-reported llm-stats | link → |
| RefCOCO-avg | 93.5% | self-reported llm-stats | link → |
| ScreenSpot Pro | 68.2% | self-reported llm-stats | link → |
| SimpleVQA | 67.3% | self-reported llm-stats | link → |
| SkillsBench | 45.7% | self-reported llm-stats | link → |
| SuperGPQA | 71.6% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 73.8% | self-reported llm-stats | link → |
| SWE-Bench Pro | 56.6% | self-reported llm-stats | link → |
| SWE-Bench Verified | 78.8% | self-reported llm-stats | link → |
| TAU3-Bench | 70.7% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 61.6% | self-reported llm-stats | link → |
| TIR-Bench | 61.6% | self-reported llm-stats | link → |
| Toolathlon | 39.8% | self-reported llm-stats | link → |
| V* | 96.9% | self-reported llm-stats | link → |
| Video-MME | 84.2% | self-reported llm-stats | link → |
| VideoMMMU | 84.0% | self-reported llm-stats | link → |
| VITA-Bench | 44.3% | self-reported llm-stats | link → |
| We-Math | 89.0% | self-reported llm-stats | link → |
| WideSearch | 74.3% | self-reported llm-stats | link → |
| WMT24++ | 84.3% | self-reported llm-stats | link → |