Qwen3-235B-A22B-Thinking-2507
Qwen3-235B-A22B-Thinking-2507 is a state-of-the-art thinking-enabled Mixture-of-Experts (MoE) model with 235B total parameters (22B activated). It features 94 layers, 128 experts (8 activated), and supports 262K native context length. This version delivers significantly improved reasoning performance, achieving state-of-the-art results among open-source thinking models on logical reasoning, mathematics, science, coding, and academic benchmarks. Key enhancements include markedly better general capabilities (instruction following, tool usage, text generation), enhanced 256K long-context understanding, and increased thinking depth. The model supports only thinking mode with automatic <think> tag inclusion.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 92.3% | self-reported llm-stats | link → |
| Arena-Hard v2 | 79.7% | self-reported llm-stats | link → |
| BFCL-v3 | 71.9% | self-reported llm-stats | link → |
| CFEval | 2,134 | self-reported llm-stats | link → |
| Creative Writing v3 | 0.861 | self-reported llm-stats | link → |
| GPQA | 81.1% | self-reported llm-stats | link → |
| HMMT25 | 83.9% | self-reported llm-stats | link → |
| Humanity's Last Exam | 18.2% | self-reported llm-stats | link → |
| IFEval | 87.8% | self-reported llm-stats | link → |
| Include | 81.0% | self-reported llm-stats | link → |
| LiveBench 20241125 | 78.4% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 74.1% | self-reported llm-stats | link → |
| MMLU-Pro | 84.4% | self-reported llm-stats | link → |
| MMLU-ProX | 81.0% | self-reported llm-stats | link → |
| MMLU-Redux | 93.8% | self-reported llm-stats | link → |
| Multi-IF | 80.6% | self-reported llm-stats | link → |
| OJBench | 32.5% | self-reported llm-stats | link → |
| PolyMATH | 60.1% | self-reported llm-stats | link → |
| SuperGPQA | 64.9% | self-reported llm-stats | link → |
| TAU-bench Airline | 46.0% | self-reported llm-stats | link → |
| TAU-bench Retail | 67.8% | self-reported llm-stats | link → |
| Tau2 Airline | 58.0% | self-reported llm-stats | link → |
| Tau2 Retail | 71.9% | self-reported llm-stats | link → |
| Tau2 Telecom | 45.6% | self-reported llm-stats | link → |
| WritingBench | 88.3% | self-reported llm-stats | link → |