Qwen3-Next-80B-A3B-Instruct

Qwen3-Next-80B-A3B-Instruct is the first in the Qwen3-Next series, featuring groundbreaking architectural innovations. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared) achieving extreme low activation ratio, and Multi-Token Prediction for improved performance and faster inference.

MMLU-Redux

90.9%

i
MultiPL-E

87.8%

i
IFEval

87.6%

i
WritingBench

87.3%

i
Arena-Hard v2

82.7%

i
MMLU-Pro

80.6%

i
Include

78.9%

i
MMLU-ProX

76.7%

i
LiveBench 20241125

75.8%

i
Multi-IF

75.8%

i
GPQA

72.9%

i
BFCL-v3

70.3%

i
AIME 2025

69.5%

i
TAU-bench Retail

60.9%

i
SuperGPQA

58.8%

i
Tau2 Retail

57.3%

i
LiveCodeBench v6

56.6%

i
HMMT25

54.1%

i
Aider-Polyglot

49.8%

i
PolyMATH

45.9%

i
Tau2 Airline

45.5%

i
TAU-bench Airline

44.0%

i
Tau2 Telecom

13.2%

i
Creative Writing v3

85.3

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.09/Mtok	$1.10/Mtok	262K tokens context 16K tokens max output	98% 5m 98%	533 ms p50 TTFT 55 tok/s p50	fp8
Alibaba	available	$0.10/Mtok	$0.78/Mtok	131K tokens context 33K tokens max output	100.0% 5m 100.0%	564 ms p50 TTFT 134 tok/s p50
Parasail	available	$0.10/Mtok cache $0.07/Mtok	$1.10/Mtok	262K tokens context 262K tokens max output	100.0% 5m 100.0%	534 ms p50 TTFT 90 tok/s p50	fp8
Google	available	$0.15/Mtok	$1.20/Mtok	262K tokens context 262K tokens max output	100.0% 5m 100.0%	584 ms p50 TTFT 101 tok/s p50
Novita	available	$0.15/Mtok	$1.50/Mtok	131K tokens context 33K tokens max output	99.5% 5m 100.0%	976 ms p50 TTFT 96 tok/s p50	bf16