GLM-5.1

GLM-5.1 is Z.AI's next-generation flagship foundation model designed for long-horizon agentic engineering tasks. Built on a 754B MoE architecture (40B active parameters), it can work continuously and autonomously on a single task for up to 8 hours, completing the full loop from planning and execution to iterative optimization and delivery. GLM-5.1 achieves state-of-the-art on SWE-Bench Pro (58.4) and demonstrates strong performance across coding, reasoning, and agentic benchmarks. It supports 200K context length, 128K max output tokens, thinking mode, function calling, structured output, context caching, and MCP integration. Overall performance is aligned with Claude Opus 4.6 with particular strengths in sustained execution and complex engineering optimization.

Benchmark results

Benchmark Score Tags Source
AIME 2026 95.3% self-reported llm-stats link →
BrowseComp 79.3% self-reported llm-stats link →
CyberGym 68.7% self-reported llm-stats link →
GPQA 86.2% self-reported llm-stats link →
HMMT 2025 94.0% self-reported llm-stats link →
HMMT Feb 26 82.6% self-reported llm-stats link →
Humanity's Last Exam 52.3% self-reported llm-stats link →
IMO-AnswerBench 83.8% self-reported llm-stats link →
MCP Atlas 71.8% self-reported llm-stats link →
NL2Repo 42.7% self-reported llm-stats link →
SWE-Bench Pro 58.4% self-reported llm-stats link →
TAU3-Bench 70.6% self-reported llm-stats link →
Terminal-Bench 2.0 69.0% self-reported llm-stats link →
Toolathlon 40.7% self-reported llm-stats link →
Vending-Bench 2 5,634.41 self-reported llm-stats link →