GLM-4.7
GLM 4.7 is a coding‑centric model that thinks before acting, preserves its reasoning across turns, and lets you control thinking per request for speed or accuracy. It upgrades agentic workflows with stronger multi‑step tool use, better terminal and multilingual coding, and a noticeable jump in UI output quality for modern, clean webpages and slides. You can use it in popular coding agents, call it via the Z.ai API, and even run it locally with public weights on HuggingFace and ModelScope using vLLM or SGLang.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 95.7% | self-reported llm-stats | link → |
| BrowseComp | 52.0% | self-reported llm-stats | link → |
| BrowseComp-zh | 66.6% | self-reported llm-stats | link → |
| GPQA | 85.7% | self-reported llm-stats | link → |
| Humanity's Last Exam | 42.8% | self-reported llm-stats | link → |
| IMO-AnswerBench | 82.0% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 84.9% | self-reported llm-stats | link → |
| MMLU-Pro | 84.3% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 66.7% | self-reported llm-stats | link → |
| SWE-Bench Verified | 73.8% | self-reported llm-stats | link → |
| Tau-bench | 87.4% | self-reported llm-stats | link → |
| Terminal-Bench | 33.3% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 41.0% | self-reported llm-stats | link → |