Qwen3.7 Max

Qwen3.7 Max is Alibaba Cloud Qwen Team's proprietary flagship model for agent-driven workflows. It is designed for coding agents, office automation, MCP and multi-agent orchestration, and long-horizon autonomous execution, with a 1 million token context window and up to 65,536 output tokens. Qwen reports strong agentic coding results including 69.7 on Terminal-Bench 2.0-Terminus, 80.4 on SWE-bench Verified, 60.6 on SWE-Pro, and 78.3 on SWE-Multilingual, alongside 92.4 on GPQA Diamond and 97.1 on HMMT 2026 Feb.

Benchmark results

Benchmark Score Tags Source
BFCL-V4 75.0% self-reported llm-stats link →
Claw-Eval 65.2% self-reported llm-stats link →
CoWorkBench 67.2% self-reported llm-stats link →
CritPT 11.4% self-reported llm-stats link →
Global PIQA 91.4% self-reported llm-stats link →
GPQA 92.4% self-reported llm-stats link →
HMMT Feb 26 97.1% self-reported llm-stats link →
Humanity's Last Exam 41.4% self-reported llm-stats link →
IFBench 79.1% self-reported llm-stats link →
IFEval 94.3% self-reported llm-stats link →
IMO-AnswerBench 90.0% self-reported llm-stats link →
Include 86.2% self-reported llm-stats link →
Kernel Bench L3 96.0% self-reported llm-stats link →
LiveCodeBench v6 91.6% self-reported llm-stats link →
MathArena Apex 44.5% self-reported llm-stats link →
MAXIFE 89.2% self-reported llm-stats link →
MCP Atlas 76.4% self-reported llm-stats link →
MCP-Mark 60.8% self-reported llm-stats link →
MMLU-Pro 89.6% self-reported llm-stats link →
MMLU-ProX 87.0% self-reported llm-stats link →
MMLU-Redux 95.0% self-reported llm-stats link →
MMMLU 90.3% self-reported llm-stats link →
MRCR 128K (8-needle) 90.4% self-reported llm-stats link →
NL2Repo 47.2% self-reported llm-stats link →
NOVA-63 59.0% self-reported llm-stats link →
PolyMATH 86.5% self-reported llm-stats link →
QwenSVG 1,608 self-reported llm-stats link →
QwenWebBench 1,568 self-reported llm-stats link →
QwenWorldBench 57.3% self-reported llm-stats link →
SciCode 53.5% self-reported llm-stats link →
SkillsBench 59.2% self-reported llm-stats link →
SpreadSheetBench-v1 87.0% self-reported llm-stats link →
SuperGPQA 73.6% self-reported llm-stats link →
SWE-bench Multilingual 78.3% self-reported llm-stats link →
SWE-Bench Pro 60.6% self-reported llm-stats link →
SWE-Bench Verified 80.4% self-reported llm-stats link →
Terminal-Bench 2.0 69.7% self-reported llm-stats link →
VITA-Bench 47.9% self-reported llm-stats link →
WMT24++ 85.8% self-reported llm-stats link →
ZClawBench 64.3% self-reported llm-stats link →