DeepSeek VL2

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

DocVQA

93.3%

i
ChartQA

86.0%

i
TextVQA

84.2%

i
AI2D

81.4%

i
OCRBench

81.1%

i
MMBench

79.6%

i
MMBench-V1.1

79.2%

i
InfoVQA

78.1%

i
RealWorldQA

68.4%

i
MMT-Bench

63.6%

i
MathVista

62.8%

i
MMStar

61.3%

i
MMMU

51.1%

i
MME

22.5%

i