GLM-4.7-Flash
GLM-4.7-Flash is a high-speed, cost-efficient variant of GLM-4.7 optimized for fast inference and lower latency. It retains the coding-centric capabilities of GLM-4.7 including thinking before acting, preserved reasoning across turns, and per-request thinking control for speed or accuracy trade-offs. Ideal for applications requiring quick responses while maintaining strong performance on coding, agentic workflows, and general reasoning tasks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 91.6% | self-reported llm-stats | link → |
| BrowseComp | 42.8% | self-reported llm-stats | link → |
| GPQA | 75.2% | self-reported llm-stats | link → |
| Humanity's Last Exam | 14.4% | self-reported llm-stats | link → |
| SWE-Bench Verified | 59.2% | self-reported llm-stats | link → |
| Tau-bench | 79.5% | self-reported llm-stats | link → |