Qwen3.7 Max
Qwen3.7 Max is Alibaba Cloud Qwen Team's proprietary flagship model for agent-driven workflows. It is designed for coding agents, office automation, MCP and multi-agent orchestration, and long-horizon autonomous execution, with a 1 million token context window and up to 65,536 output tokens. Qwen reports strong agentic coding results including 69.7 on Terminal-Bench 2.0-Terminus, 80.4 on SWE-bench Verified, 60.6 on SWE-Pro, and 78.3 on SWE-Multilingual, alongside 92.4 on GPQA Diamond and 97.1 on HMMT 2026 Feb.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| BFCL-V4 | 75.0% | self-reported llm-stats | link → |
| Claw-Eval | 65.2% | self-reported llm-stats | link → |
| CoWorkBench | 67.2% | self-reported llm-stats | link → |
| CritPT | 11.4% | self-reported llm-stats | link → |
| Global PIQA | 91.4% | self-reported llm-stats | link → |
| GPQA | 92.4% | self-reported llm-stats | link → |
| HMMT Feb 26 | 97.1% | self-reported llm-stats | link → |
| Humanity's Last Exam | 41.4% | self-reported llm-stats | link → |
| IFBench | 79.1% | self-reported llm-stats | link → |
| IFEval | 94.3% | self-reported llm-stats | link → |
| IMO-AnswerBench | 90.0% | self-reported llm-stats | link → |
| Include | 86.2% | self-reported llm-stats | link → |
| Kernel Bench L3 | 96.0% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 91.6% | self-reported llm-stats | link → |
| MathArena Apex | 44.5% | self-reported llm-stats | link → |
| MAXIFE | 89.2% | self-reported llm-stats | link → |
| MCP Atlas | 76.4% | self-reported llm-stats | link → |
| MCP-Mark | 60.8% | self-reported llm-stats | link → |
| MMLU-Pro | 89.6% | self-reported llm-stats | link → |
| MMLU-ProX | 87.0% | self-reported llm-stats | link → |
| MMLU-Redux | 95.0% | self-reported llm-stats | link → |
| MMMLU | 90.3% | self-reported llm-stats | link → |
| MRCR 128K (8-needle) | 90.4% | self-reported llm-stats | link → |
| NL2Repo | 47.2% | self-reported llm-stats | link → |
| NOVA-63 | 59.0% | self-reported llm-stats | link → |
| PolyMATH | 86.5% | self-reported llm-stats | link → |
| QwenSVG | 1,608 | self-reported llm-stats | link → |
| QwenWebBench | 1,568 | self-reported llm-stats | link → |
| QwenWorldBench | 57.3% | self-reported llm-stats | link → |
| SciCode | 53.5% | self-reported llm-stats | link → |
| SkillsBench | 59.2% | self-reported llm-stats | link → |
| SpreadSheetBench-v1 | 87.0% | self-reported llm-stats | link → |
| SuperGPQA | 73.6% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 78.3% | self-reported llm-stats | link → |
| SWE-Bench Pro | 60.6% | self-reported llm-stats | link → |
| SWE-Bench Verified | 80.4% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 69.7% | self-reported llm-stats | link → |
| VITA-Bench | 47.9% | self-reported llm-stats | link → |
| WMT24++ | 85.8% | self-reported llm-stats | link → |
| ZClawBench | 64.3% | self-reported llm-stats | link → |