OmniDocBench 1.5

reasoning

OmniDocBench 1.5 is a comprehensive benchmark for evaluating multimodal large language models on document understanding tasks, including OCR, document parsing, information extraction, and visual question answering across diverse document types. Lower Overall Edit Distance scores are better.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning, structured_output, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiniMax M3 self-reported llm-stats
    91.6%
  2. Qwen3.6 Plus self-reported llm-stats
    91.2%
  3. Qwen3.5-122B-A10B self-reported llm-stats
    89.8%
  4. Qwen3.5-35B-A3B self-reported llm-stats
    89.3%
  5. GPT-5.4 self-reported llm-stats
    89.1%
  6. Qwen3.5-27B self-reported llm-stats
    88.9%
  7. Kimi K2.5 self-reported llm-stats
    88.8%
  8. GPT-5.5 Instant self-reported llm-stats
    87.5%
  9. GPT-5.4 mini self-reported llm-stats
    87.4%
  10. GPT-5.4 nano self-reported llm-stats
    75.8%
  11. Gemini 3 Flash self-reported llm-stats
    12.1%
  12. Gemini 3 Pro self-reported llm-stats
    11.5%