OCRBench-V2 (zh)

vision official site →

OCRBench v2 Chinese subset: Enhanced benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with Chinese text content

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: image_to_text, vision. Language: zh. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    63.5%
  2. Qwen3 VL 32B Thinking self-reported llm-stats
    62.1%
  3. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    61.8%
  4. Qwen3 VL 8B Instruct self-reported llm-stats
    61.2%
  5. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    60.4%
  6. Qwen3 VL 32B Instruct self-reported llm-stats
    59.2%
  7. Qwen3 VL 8B Thinking self-reported llm-stats
    59.2%
  8. Qwen2.5 VL 32B Instruct self-reported llm-stats
    59.1%
  9. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    57.8%
  10. Qwen3 VL 4B Instruct self-reported llm-stats
    57.6%
  11. Qwen3 VL 4B Thinking self-reported llm-stats
    55.8%