DeepSeek VL2

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

Benchmark results

Benchmark Score Tags Source
AI2D 81.4% self-reported llm-stats link →
ChartQA 86.0% self-reported llm-stats link →
DocVQA 93.3% self-reported llm-stats link →
InfoVQA 78.1% self-reported llm-stats link →
MathVista 62.8% self-reported llm-stats link →
MMBench 79.6% self-reported llm-stats link →
MMBench-V1.1 79.2% self-reported llm-stats link →
MME 22.5% self-reported llm-stats link →
MMMU 51.1% self-reported llm-stats link →
MMStar 61.3% self-reported llm-stats link →
MMT-Bench 63.6% self-reported llm-stats link →
OCRBench 81.1% self-reported llm-stats link →
RealWorldQA 68.4% self-reported llm-stats link →
TextVQA 84.2% self-reported llm-stats link →