GLM-4.7-Flash

GLM-4.7-Flash is a high-speed, cost-efficient variant of GLM-4.7 optimized for fast inference and lower latency. It retains the coding-centric capabilities of GLM-4.7 including thinking before acting, preserved reasoning across turns, and per-request thinking control for speed or accuracy trade-offs. Ideal for applications requiring quick responses while maintaining strong performance on coding, agentic workflows, and general reasoning tasks.

Benchmark results

Benchmark Score Tags Source
AIME 2025 91.6% self-reported llm-stats link →
BrowseComp 42.8% self-reported llm-stats link →
GPQA 75.2% self-reported llm-stats link →
Humanity's Last Exam 14.4% self-reported llm-stats link →
SWE-Bench Verified 59.2% self-reported llm-stats link →
Tau-bench 79.5% self-reported llm-stats link →