GLM-4.5

GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.

Benchmark results

Benchmark	Score	Tags	Source
AA-Index	67.7%	self-reported llm-stats	link →
AIME 2024	91.0%	self-reported llm-stats	link →
BFCL-v3	77.8%	self-reported llm-stats	link →
BrowseComp	26.4%	self-reported llm-stats	link →
GPQA	79.1%	self-reported llm-stats	link →
HLE	17.2%	self-reported llm-stats	link →
Humanity's Last Exam	14.4%	self-reported llm-stats	link →
LiveCodeBench	72.9%	self-reported llm-stats	link →
MATH-500	98.2%	self-reported llm-stats	link →
MMLU-Pro	84.6%	self-reported llm-stats	link →
SciCode	41.7%	self-reported llm-stats	link →
SWE-Bench Verified	64.2%	self-reported llm-stats	link →
TAU-bench Airline	60.4%	self-reported llm-stats	link →
TAU-bench Retail	79.7%	self-reported llm-stats	link →
Terminal-Bench	37.5%	self-reported llm-stats	link →