DeepSeek VL2 Small

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

Benchmark results

Benchmark Score Tags Source
AI2D 80.0% self-reported llm-stats link →
ChartQA 84.5% self-reported llm-stats link →
DocVQA 92.3% self-reported llm-stats link →
InfoVQA 75.8% self-reported llm-stats link →
MathVista 60.7% self-reported llm-stats link →
MMBench 80.3% self-reported llm-stats link →
MMBench-V1.1 79.3% self-reported llm-stats link →
MME 21.2% self-reported llm-stats link →
MMMU 48.0% self-reported llm-stats link →
MMStar 57.0% self-reported llm-stats link →
MMT-Bench 62.9% self-reported llm-stats link →
OCRBench 83.4% self-reported llm-stats link →
RealWorldQA 65.4% self-reported llm-stats link →
TextVQA 83.4% self-reported llm-stats link →