DocVQAtest

multimodal

DocVQA is a Visual Question Answering benchmark on document images containing 50,000 questions defined on 12,000+ document images. The benchmark focuses on understanding document structure and content to answer questions about various document types including letters, memos, notes, and reports from the UCSF Industry Documents Library.

Leaderboard

Showing 11 of 11 results

Qwen3 VL 235B A22B Instruct

97.1%

i
Qwen3 VL 32B Instruct

96.9%

i
Qwen2-VL-72B-Instruct

96.5%

i
Qwen3 VL 235B A22B Thinking

96.5%

i
Qwen3 VL 32B Thinking

96.1%

i
Qwen3 VL 8B Instruct

96.1%

i
Qwen3 VL 4B Instruct

95.3%

i
Qwen3 VL 8B Thinking

95.3%

i
Qwen3 VL 30B A3B Instruct

95.0%

i
Qwen3 VL 30B A3B Thinking

95.0%

i
Qwen3 VL 4B Thinking

94.2%

i