Skip to content

Models Benchmarks Providers

Search models and benchmarks /

Android Control Low_EM

reasoning

Categories: multimodal, reasoning
Modality: multimodal
Language: en
Multilingual: No
Max score: 1
Scoring: %, higher is better
Verified by llm-stats: No

Android control benchmark evaluating autonomous agents on mobile device interaction tasks with low exact match scoring criteria

Leaderboard

Showing 3 of 3 results

Qwen2.5 VL 72B Instruct

93.7%

i
Qwen2.5 VL 32B Instruct

93.3%

i
Qwen2.5 VL 7B Instruct

91.4%

i

Wikibench About Theme Content licensed CC BY-SA 4.0.