Qwen3.6 Plus

Qwen3.6 Plus is Alibaba's next-generation flagship model featuring a 1 million token native context window, up to 65,536 output tokens, and always-on chain-of-thought reasoning. It uses a next-generation hybrid architecture optimized for efficiency and scalability. It leads on Terminal-Bench 2.0 agentic coding (61.6), surpassing Claude 4.5 Opus, and achieves strong results on document understanding (OmniDocBench 91.2) and multimodal reasoning (MMMU 86.0). Compared to Qwen 3.5, it is significantly more decisive in reasoning, using fewer tokens on straightforward tasks with better agent stability.

Benchmark results

Benchmark Score Tags Source
AA-LCR 68.3% self-reported llm-stats link →
AI2D 94.4% self-reported llm-stats link →
AIME 2026 95.3% self-reported llm-stats link →
C-Eval 93.3% self-reported llm-stats link →
CC-OCR 83.4% self-reported llm-stats link →
CharXiv-R 81.5% self-reported llm-stats link →
Claw-Eval 58.7% self-reported llm-stats link →
CountBench 97.6% self-reported llm-stats link →
DeepPlanning 41.5% self-reported llm-stats link →
DynaMath 88.0% self-reported llm-stats link →
ERQA 65.7% self-reported llm-stats link →
Global PIQA 89.8% self-reported llm-stats link →
GPQA 90.4% self-reported llm-stats link →
HMMT 2025 96.7% self-reported llm-stats link →
HMMT Feb 26 87.8% self-reported llm-stats link →
HMMT25 94.6% self-reported llm-stats link →
Humanity's Last Exam 28.8% self-reported llm-stats link →
IFBench 74.2% self-reported llm-stats link →
IFEval 94.3% self-reported llm-stats link →
IMO-AnswerBench 83.8% self-reported llm-stats link →
Include 85.1% self-reported llm-stats link →
LiveCodeBench v6 87.1% self-reported llm-stats link →
LongBench v2 62.0% self-reported llm-stats link →
MathVision 88.0% self-reported llm-stats link →
MAXIFE 88.2% self-reported llm-stats link →
MCP Atlas 74.1% self-reported llm-stats link →
MCP-Mark 48.2% self-reported llm-stats link →
MLVU 86.7% self-reported llm-stats link →
MMLongBench-Doc 62.0% self-reported llm-stats link →
MMLU-Pro 88.5% self-reported llm-stats link →
MMLU-ProX 84.7% self-reported llm-stats link →
MMLU-Redux 94.5% self-reported llm-stats link →
MMMLU 89.5% self-reported llm-stats link →
MMMU 86.0% self-reported llm-stats link →
MMMU-Pro 78.8% self-reported llm-stats link →
MMStar 83.3% self-reported llm-stats link →
NL2Repo 37.9% self-reported llm-stats link →
NOVA-63 57.9% self-reported llm-stats link →
ODinW 51.8% self-reported llm-stats link →
OmniDocBench 1.5 91.2% self-reported llm-stats link →
OSWorld-Verified 62.5% self-reported llm-stats link →
PolyMATH 77.4% self-reported llm-stats link →
RealWorldQA 85.4% self-reported llm-stats link →
RefCOCO-avg 93.5% self-reported llm-stats link →
ScreenSpot Pro 68.2% self-reported llm-stats link →
SimpleVQA 67.3% self-reported llm-stats link →
SkillsBench 45.7% self-reported llm-stats link →
SuperGPQA 71.6% self-reported llm-stats link →
SWE-bench Multilingual 73.8% self-reported llm-stats link →
SWE-Bench Pro 56.6% self-reported llm-stats link →
SWE-Bench Verified 78.8% self-reported llm-stats link →
TAU3-Bench 70.7% self-reported llm-stats link →
Terminal-Bench 2.0 61.6% self-reported llm-stats link →
TIR-Bench 61.6% self-reported llm-stats link →
Toolathlon 39.8% self-reported llm-stats link →
V* 96.9% self-reported llm-stats link →
Video-MME 84.2% self-reported llm-stats link →
VideoMMMU 84.0% self-reported llm-stats link →
VITA-Bench 44.3% self-reported llm-stats link →
We-Math 89.0% self-reported llm-stats link →
WideSearch 74.3% self-reported llm-stats link →
WMT24++ 84.3% self-reported llm-stats link →