Qwen3-Next-80B-A3B-Thinking

Qwen3-Next-80B-A3B-Thinking is the thinking variant of the Qwen3-Next series, featuring the same groundbreaking architecture as the instruct model. Leveraging GSPO, it addresses stability and efficiency challenges of hybrid attention + high-sparsity MoE in RL training. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared), and Multi-Token Prediction. With 80B total parameters and only 3B activated, it demonstrates outstanding performance on complex reasoning tasks — outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even the proprietary Gemini-2.5-Flash-Thinking across multiple benchmarks. Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)). Supports only thinking mode with automatic <think> tag inclusion, may generate longer thinking content.

Benchmark results

Benchmark Score Tags Source
AIME 2025 87.8% self-reported llm-stats link →
Arena-Hard v2 62.3% self-reported llm-stats link →
BFCL-v3 72.0% self-reported llm-stats link →
CFEval 2,071 self-reported llm-stats link →
GPQA 77.2% self-reported llm-stats link →
HMMT25 73.9% self-reported llm-stats link →
IFEval 88.9% self-reported llm-stats link →
Include 78.9% self-reported llm-stats link →
LiveBench 20241125 76.6% self-reported llm-stats link →
LiveCodeBench v6 68.7% self-reported llm-stats link →
MMLU-Pro 82.7% self-reported llm-stats link →
MMLU-ProX 78.7% self-reported llm-stats link →
MMLU-Redux 92.5% self-reported llm-stats link →
Multi-IF 77.8% self-reported llm-stats link →
OJBench 29.7% self-reported llm-stats link →
PolyMATH 56.3% self-reported llm-stats link →
SuperGPQA 60.8% self-reported llm-stats link →
TAU-bench Airline 49.0% self-reported llm-stats link →
TAU-bench Retail 69.6% self-reported llm-stats link →
Tau2 Airline 60.5% self-reported llm-stats link →
Tau2 Retail 67.8% self-reported llm-stats link →
Tau2 Telecom 43.9% self-reported llm-stats link →
WritingBench 84.6% self-reported llm-stats link →