DeepSeek VL2 Small
An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 80.0% | self-reported llm-stats | link → |
| ChartQA | 84.5% | self-reported llm-stats | link → |
| DocVQA | 92.3% | self-reported llm-stats | link → |
| InfoVQA | 75.8% | self-reported llm-stats | link → |
| MathVista | 60.7% | self-reported llm-stats | link → |
| MMBench | 80.3% | self-reported llm-stats | link → |
| MMBench-V1.1 | 79.3% | self-reported llm-stats | link → |
| MME | 21.2% | self-reported llm-stats | link → |
| MMMU | 48.0% | self-reported llm-stats | link → |
| MMStar | 57.0% | self-reported llm-stats | link → |
| MMT-Bench | 62.9% | self-reported llm-stats | link → |
| OCRBench | 83.4% | self-reported llm-stats | link → |
| RealWorldQA | 65.4% | self-reported llm-stats | link → |
| TextVQA | 83.4% | self-reported llm-stats | link → |