Qwen3-Next-80B-A3B-Thinking
Qwen3-Next-80B-A3B-Thinking is the thinking variant of the Qwen3-Next series, featuring the same groundbreaking architecture as the instruct model. Leveraging GSPO, it addresses stability and efficiency challenges of hybrid attention + high-sparsity MoE in RL training. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared), and Multi-Token Prediction. With 80B total parameters and only 3B activated, it demonstrates outstanding performance on complex reasoning tasks — outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even the proprietary Gemini-2.5-Flash-Thinking across multiple benchmarks. Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)). Supports only thinking mode with automatic <think> tag inclusion, may generate longer thinking content.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 87.8% | self-reported llm-stats | link → |
| Arena-Hard v2 | 62.3% | self-reported llm-stats | link → |
| BFCL-v3 | 72.0% | self-reported llm-stats | link → |
| CFEval | 2,071 | self-reported llm-stats | link → |
| GPQA | 77.2% | self-reported llm-stats | link → |
| HMMT25 | 73.9% | self-reported llm-stats | link → |
| IFEval | 88.9% | self-reported llm-stats | link → |
| Include | 78.9% | self-reported llm-stats | link → |
| LiveBench 20241125 | 76.6% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 68.7% | self-reported llm-stats | link → |
| MMLU-Pro | 82.7% | self-reported llm-stats | link → |
| MMLU-ProX | 78.7% | self-reported llm-stats | link → |
| MMLU-Redux | 92.5% | self-reported llm-stats | link → |
| Multi-IF | 77.8% | self-reported llm-stats | link → |
| OJBench | 29.7% | self-reported llm-stats | link → |
| PolyMATH | 56.3% | self-reported llm-stats | link → |
| SuperGPQA | 60.8% | self-reported llm-stats | link → |
| TAU-bench Airline | 49.0% | self-reported llm-stats | link → |
| TAU-bench Retail | 69.6% | self-reported llm-stats | link → |
| Tau2 Airline | 60.5% | self-reported llm-stats | link → |
| Tau2 Retail | 67.8% | self-reported llm-stats | link → |
| Tau2 Telecom | 43.9% | self-reported llm-stats | link → |
| WritingBench | 84.6% | self-reported llm-stats | link → |