GLM-5V-Turbo
GLM-5V-Turbo is Z.AI's first multimodal coding foundation model, built for vision-based coding tasks. It natively processes multimodal inputs including images, video, text, and files, while excelling at long-horizon planning, complex coding, and action execution. Deeply optimized for agent workflows, it works seamlessly with agents such as Claude Code and OpenClaw to complete the full loop of perceiving, planning, and executing tasks. It features systematic upgrades across model architecture, training methods, data construction, and tooling, including native multimodal fusion, 30+ task joint reinforcement learning, and an expanded multimodal toolchain.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AndroidWorld | 75.7% | self-reported llm-stats | link → |
| BrowseComp-VL | 51.9% | self-reported llm-stats | link → |
| CC-Bench-V2 Backend | 22.8% | self-reported llm-stats | link → |
| CC-Bench-V2 Frontend | 68.4% | self-reported llm-stats | link → |
| CC-Bench-V2 Repo Exploration | 72.2% | self-reported llm-stats | link → |
| Claw-Eval | 75.0% | self-reported llm-stats | link → |
| Design2Code | 94.8% | self-reported llm-stats | link → |
| FACTS Grounding | 58.6% | self-reported llm-stats | link → |
| Flame-VLM-Code | 93.8% | self-reported llm-stats | link → |
| ImageMining | 30.7% | self-reported llm-stats | link → |
| MMSearch | 72.9% | self-reported llm-stats | link → |
| MMSearch-Plus | 30.0% | self-reported llm-stats | link → |
| OSWorld | 62.3% | self-reported llm-stats | link → |
| PinchBench | 80.7% | self-reported llm-stats | link → |
| SimpleVQA | 78.2% | self-reported llm-stats | link → |
| V* | 89.0% | self-reported llm-stats | link → |
| Vision2Web | 31.0% | self-reported llm-stats | link → |
| WebVoyager | 88.5% | self-reported llm-stats | link → |
| ZClawBench | 57.6% | self-reported llm-stats | link → |