Android Control Low_EM
reasoning
Android control benchmark evaluating autonomous agents on mobile device interaction tasks with low exact match scoring criteria
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning. Language: en. Verified by llm-stats: no.