CC-OCR

multimodal

A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.

Leaderboard

Showing 18 of 18 results

Qwen3.6 Plus

83.4%

i
Qwen3 VL 235B A22B Instruct

82.2%

i
Qwen3.6-35B-A3B

81.9%

i
Qwen3.5-122B-A10B

81.8%

i
Qwen3 VL 235B A22B Thinking

81.5%

i
Qwen3.6-27B

81.2%

i
Qwen3.5-27B

81.0%

i
Qwen3 VL 30B A3B Instruct

80.7%

i
Qwen3.5-35B-A3B

80.7%

i
Qwen3 VL 32B Instruct

80.3%

i
Qwen3 VL 8B Instruct

79.9%

i
Qwen2.5 VL 72B Instruct

79.8%

i
Qwen2.5 VL 7B Instruct

77.8%

i
Qwen3 VL 30B A3B Thinking

77.8%

i
Qwen2.5 VL 32B Instruct

77.1%

i
Qwen3 VL 8B Thinking

76.3%

i
Qwen3 VL 4B Instruct

76.2%

i
Qwen3 VL 4B Thinking

73.8%

i