MM-Mind2Web

reasoning official site →

A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: agents, frontend_development, multimodal, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Nova Pro self-reported llm-stats
    63.7%
  2. Nova Lite self-reported llm-stats
    60.7%