DeepSeek VL2 Small

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

DocVQA

92.3%

i
ChartQA

84.5%

i
OCRBench

83.4%

i
TextVQA

83.4%

i
MMBench

80.3%

i
AI2D

80.0%

i
MMBench-V1.1

79.3%

i
InfoVQA

75.8%

i
RealWorldQA

65.4%

i
MMT-Bench

62.9%

i
MathVista

60.7%

i
MMStar

57.0%

i
MMMU

48.0%

i
MME

21.2%

i