Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is the updated instruct version of Qwen3-235B-A22B featuring significant improvements in general capabilities including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It provides substantial gains in long-tail knowledge coverage across multiple languages and markedly better alignment with user preferences in subjective and open-ended tasks.

Benchmark results

Benchmark Score Tags Source
Aider-Polyglot 57.3% self-reported llm-stats link →
AIME 2025 70.3% self-reported llm-stats link →
ARC-AGI 41.8% self-reported llm-stats link →
Arena-Hard v2 79.2% self-reported llm-stats link →
BFCL-v3 70.9% self-reported llm-stats link →
Creative Writing v3 0.875 self-reported llm-stats link →
CSimpleQA 84.3% self-reported llm-stats link →
GPQA 77.5% self-reported llm-stats link →
HMMT25 55.4% self-reported llm-stats link →
IFEval 88.7% self-reported llm-stats link →
Include 79.5% self-reported llm-stats link →
LiveBench 20241125 75.4% self-reported llm-stats link →
LiveCodeBench v6 51.8% self-reported llm-stats link →
MMLU-Pro 83.0% self-reported llm-stats link →
MMLU-ProX 79.4% self-reported llm-stats link →
MMLU-Redux 93.1% self-reported llm-stats link →
Multi-IF 77.5% self-reported llm-stats link →
MultiPL-E 87.9% self-reported llm-stats link →
PolyMATH 50.2% self-reported llm-stats link →
SimpleQA 54.3% self-reported llm-stats link →
SuperGPQA 62.6% self-reported llm-stats link →
Tau2 Airline 44.0% self-reported llm-stats link →
Tau2 Retail 71.3% self-reported llm-stats link →
WritingBench 85.2% self-reported llm-stats link →
ZebraLogic 95.0% self-reported llm-stats link →