GLM-4.7-Flash

GLM-4.7-Flash is a high-speed, cost-efficient variant of GLM-4.7 optimized for fast inference and lower latency. It retains the coding-centric capabilities of GLM-4.7 including thinking before acting, preserved reasoning across turns, and per-request thinking control for speed or accuracy trade-offs.

AIME 2025

91.6%

i
Tau-bench

79.5%

i
GPQA

75.2%

i
SWE-Bench Verified

59.2%

i
BrowseComp

42.8%

i
Humanity's Last Exam

14.4%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.06/Mtok cache $0.01/Mtok	$0.40/Mtok	203K tokens context 16K tokens max output	99.8% 5m 99.9%	453 ms p50 TTFT 28 tok/s p50	bf16
Cloudflare	available	$0.06/Mtok	$0.40/Mtok	131K tokens context 131K tokens max output	98% 5m 96%	420 ms p50 TTFT 21 tok/s p50
Venice	available	$0.13/Mtok	$0.50/Mtok	128K tokens context 16K tokens max output	99% 5m 100.0%	625 ms p50 TTFT 41 tok/s p50	fp8
Novita	-2	$0.07/Mtok cache $0.01/Mtok	$0.40/Mtok	200K tokens context 128K tokens max output	82% 5m 75%	1,170 ms p50 TTFT 52 tok/s p50	bf16