MME-RealWorld

multimodal official site →

A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5-Omni-7B self-reported llm-stats
    61.6%