MMLongBench-Doc

multimodal

MMLongBench-Doc evaluates long document understanding capabilities in vision-language models.

Methodology

Imported from llm-stats public benchmark metadata. Modality: image. Max score: 100. Categories: long_context, multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.6 Plus self-reported llm-stats
    62.0%
  2. Qwen3.5-27B self-reported llm-stats
    60.2%
  3. Qwen3.5-35B-A3B self-reported llm-stats
    59.5%
  4. Qwen3.5-122B-A10B self-reported llm-stats
    59.0%
  5. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    56.2%