MobileMiniWob++_SR

multimodal official site →

MobileMiniWob++ SR (Success Rate) is an adaptation of the MiniWob++ web interaction benchmark for mobile Android environments within AndroidWorld. It comprises 92 web interaction tasks adapted for touch-based mobile interfaces, evaluating agents' ability to navigate and interact with web applications on mobile devices.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: agents, frontend_development, multimodal. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 VL 7B Instruct self-reported llm-stats
    91.4%
  2. Qwen2.5 VL 72B Instruct self-reported llm-stats
    68.0%