Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is a state-of-the-art thinking-enabled Mixture-of-Experts (MoE) model with 235B total parameters (22B activated). It features 94 layers, 128 experts (8 activated), and supports 262K native context length.

MMLU-Redux

93.8%

i
AIME 2025

92.3%

i
WritingBench

88.3%

i
IFEval

87.8%

i
MMLU-Pro

84.4%

i
HMMT25

83.9%

i
GPQA

81.1%

i
Include

81.0%

i
MMLU-ProX

81.0%

i
Multi-IF

80.6%

i
Arena-Hard v2

79.7%

i
LiveBench 20241125

78.4%

i
LiveCodeBench v6

74.1%

i
BFCL-v3

71.9%

i
Tau2 Retail

71.9%

i
TAU-bench Retail

67.8%

i
SuperGPQA

64.9%

i
PolyMATH

60.1%

i
Tau2 Airline

58.0%

i
TAU-bench Airline

46.0%

i
Tau2 Telecom

45.6%

i
OJBench

32.5%

i
Humanity's Last Exam

18.2%

i
CFEval

2,134

i
Creative Writing v3

0.861

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Alibaba	available	$0.15/Mtok	$1.50/Mtok	131K tokens context	100.0% 5m 100.0%	638 ms p50 TTFT 58 tok/s p50
DeepInfra	available	$0.23/Mtok cache $0.20/Mtok	$2.30/Mtok	262K tokens context 33K tokens max output	100.0% 5m 100.0%	410 ms p50 TTFT 34 tok/s p50	fp8
Novita	available	$0.30/Mtok	$3.00/Mtok	131K tokens context 33K tokens max output	100.0%	1,910 ms p50 TTFT 22 tok/s p50	fp8
Venice	available	$0.45/Mtok	$3.50/Mtok	128K tokens context 16K tokens max output	—	847 ms p50 TTFT 30 tok/s p50	fp8