Qwen3-235B-A22B-Instruct-2507
Qwen3-235B-A22B-Instruct-2507 is the updated instruct version of Qwen3-235B-A22B featuring significant improvements in general capabilities including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It provides substantial gains in long-tail knowledge coverage across multiple languages and markedly better alignment with user preferences in subjective and open-ended tasks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot | 57.3% | self-reported llm-stats | link → |
| AIME 2025 | 70.3% | self-reported llm-stats | link → |
| ARC-AGI | 41.8% | self-reported llm-stats | link → |
| Arena-Hard v2 | 79.2% | self-reported llm-stats | link → |
| BFCL-v3 | 70.9% | self-reported llm-stats | link → |
| Creative Writing v3 | 0.875 | self-reported llm-stats | link → |
| CSimpleQA | 84.3% | self-reported llm-stats | link → |
| GPQA | 77.5% | self-reported llm-stats | link → |
| HMMT25 | 55.4% | self-reported llm-stats | link → |
| IFEval | 88.7% | self-reported llm-stats | link → |
| Include | 79.5% | self-reported llm-stats | link → |
| LiveBench 20241125 | 75.4% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 51.8% | self-reported llm-stats | link → |
| MMLU-Pro | 83.0% | self-reported llm-stats | link → |
| MMLU-ProX | 79.4% | self-reported llm-stats | link → |
| MMLU-Redux | 93.1% | self-reported llm-stats | link → |
| Multi-IF | 77.5% | self-reported llm-stats | link → |
| MultiPL-E | 87.9% | self-reported llm-stats | link → |
| PolyMATH | 50.2% | self-reported llm-stats | link → |
| SimpleQA | 54.3% | self-reported llm-stats | link → |
| SuperGPQA | 62.6% | self-reported llm-stats | link → |
| Tau2 Airline | 44.0% | self-reported llm-stats | link → |
| Tau2 Retail | 71.3% | self-reported llm-stats | link → |
| WritingBench | 85.2% | self-reported llm-stats | link → |
| ZebraLogic | 95.0% | self-reported llm-stats | link → |