GroundUI-1K
multimodal official site →
A subset of GroundUI-18K for UI grounding evaluation, where models must predict action coordinates on screenshots based on single-step instructions across web, desktop, and mobile platforms.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: grounding, multimodal, vision. Language: en. Verified by llm-stats: no.