DeepSeek VL2
An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 81.4% | self-reported llm-stats | link → |
| ChartQA | 86.0% | self-reported llm-stats | link → |
| DocVQA | 93.3% | self-reported llm-stats | link → |
| InfoVQA | 78.1% | self-reported llm-stats | link → |
| MathVista | 62.8% | self-reported llm-stats | link → |
| MMBench | 79.6% | self-reported llm-stats | link → |
| MMBench-V1.1 | 79.2% | self-reported llm-stats | link → |
| MME | 22.5% | self-reported llm-stats | link → |
| MMMU | 51.1% | self-reported llm-stats | link → |
| MMStar | 61.3% | self-reported llm-stats | link → |
| MMT-Bench | 63.6% | self-reported llm-stats | link → |
| OCRBench | 81.1% | self-reported llm-stats | link → |
| RealWorldQA | 68.4% | self-reported llm-stats | link → |
| TextVQA | 84.2% | self-reported llm-stats | link → |