GLM-5.1

GLM-5.1 is Z.AI's next-generation flagship foundation model designed for long-horizon agentic engineering tasks. Built on a 754B MoE architecture (40B active parameters), it can work continuously and autonomously on a single task for up to 8 hours, completing the full loop from planning and execution to iterative optimization and delivery. GLM-5.1 achieves state-of-the-art on SWE-Bench Pro (58.4) and demonstrates strong performance across coding, reasoning, and agentic benchmarks. It supports 200K context length, 128K max output tokens, thinking mode, function calling, structured output, context caching, and MCP integration. Overall performance is aligned with Claude Opus 4.6 with particular strengths in sustained execution and complex engineering optimization.

Benchmark results

Benchmark	Score	Tags	Source
AIME 2026	95.3%	self-reported llm-stats	link →
BrowseComp	79.3%	self-reported llm-stats	link →
CyberGym	68.7%	self-reported llm-stats	link →
GPQA	86.2%	self-reported llm-stats	link →
HMMT 2025	94.0%	self-reported llm-stats	link →
HMMT Feb 26	82.6%	self-reported llm-stats	link →
Humanity's Last Exam	52.3%	self-reported llm-stats	link →
IMO-AnswerBench	83.8%	self-reported llm-stats	link →
MCP Atlas	71.8%	self-reported llm-stats	link →
NL2Repo	42.7%	self-reported llm-stats	link →
SWE-Bench Pro	58.4%	self-reported llm-stats	link →
TAU3-Bench	70.6%	self-reported llm-stats	link →
Terminal-Bench 2.0	69.0%	self-reported llm-stats	link →
Toolathlon	40.7%	self-reported llm-stats	link →
Vending-Bench 2	5,634.41	self-reported llm-stats	link →