GLM-4.5
GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AA-Index | 67.7% | self-reported llm-stats | link → |
| AIME 2024 | 91.0% | self-reported llm-stats | link → |
| BFCL-v3 | 77.8% | self-reported llm-stats | link → |
| BrowseComp | 26.4% | self-reported llm-stats | link → |
| GPQA | 79.1% | self-reported llm-stats | link → |
| HLE | 17.2% | self-reported llm-stats | link → |
| Humanity's Last Exam | 14.4% | self-reported llm-stats | link → |
| LiveCodeBench | 72.9% | self-reported llm-stats | link → |
| MATH-500 | 98.2% | self-reported llm-stats | link → |
| MMLU-Pro | 84.6% | self-reported llm-stats | link → |
| SciCode | 41.7% | self-reported llm-stats | link → |
| SWE-Bench Verified | 64.2% | self-reported llm-stats | link → |
| TAU-bench Airline | 60.4% | self-reported llm-stats | link → |
| TAU-bench Retail | 79.7% | self-reported llm-stats | link → |
| Terminal-Bench | 37.5% | self-reported llm-stats | link → |