GLM-5V-Turbo

GLM-5V-Turbo is Z.AI's first multimodal coding foundation model, built for vision-based coding tasks. It natively processes multimodal inputs including images, video, text, and files, while excelling at long-horizon planning, complex coding, and action execution. Deeply optimized for agent workflows, it works seamlessly with agents such as Claude Code and OpenClaw to complete the full loop of perceiving, planning, and executing tasks. It features systematic upgrades across model architecture, training methods, data construction, and tooling, including native multimodal fusion, 30+ task joint reinforcement learning, and an expanded multimodal toolchain.

Benchmark results

Benchmark Score Tags Source
AndroidWorld 75.7% self-reported llm-stats link →
BrowseComp-VL 51.9% self-reported llm-stats link →
CC-Bench-V2 Backend 22.8% self-reported llm-stats link →
CC-Bench-V2 Frontend 68.4% self-reported llm-stats link →
CC-Bench-V2 Repo Exploration 72.2% self-reported llm-stats link →
Claw-Eval 75.0% self-reported llm-stats link →
Design2Code 94.8% self-reported llm-stats link →
FACTS Grounding 58.6% self-reported llm-stats link →
Flame-VLM-Code 93.8% self-reported llm-stats link →
ImageMining 30.7% self-reported llm-stats link →
MMSearch 72.9% self-reported llm-stats link →
MMSearch-Plus 30.0% self-reported llm-stats link →
OSWorld 62.3% self-reported llm-stats link →
PinchBench 80.7% self-reported llm-stats link →
SimpleVQA 78.2% self-reported llm-stats link →
V* 89.0% self-reported llm-stats link →
Vision2Web 31.0% self-reported llm-stats link →
WebVoyager 88.5% self-reported llm-stats link →
ZClawBench 57.6% self-reported llm-stats link →