OCRBench-V2 (en)

vision official site →

OCRBench v2 English subset: Enhanced benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with English text content

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: image_to_text, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 32B Thinking self-reported llm-stats
    68.4%
  2. Qwen3 VL 32B Instruct self-reported llm-stats
    67.4%
  3. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    67.1%
  4. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    66.8%
  5. Qwen3 VL 8B Instruct self-reported llm-stats
    65.4%
  6. Qwen3 VL 8B Thinking self-reported llm-stats
    63.9%
  7. Qwen3 VL 4B Instruct self-reported llm-stats
    63.7%
  8. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    63.2%
  9. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    62.6%
  10. Qwen3 VL 4B Thinking self-reported llm-stats
    61.8%
  11. Qwen2.5 VL 72B Instruct self-reported llm-stats
    61.5%
  12. Qwen2.5 VL 32B Instruct self-reported llm-stats
    57.2%