MathVision

math official site →

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: math, multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Kimi K2.6 self-reported llm-stats
    93.2%
  2. Qwen3.6 Plus self-reported llm-stats
    88.0%
  3. Qwen3.5-122B-A10B self-reported llm-stats
    86.2%
  4. Qwen3.5-27B self-reported llm-stats
    86.0%
  5. Gemma 4 31B self-reported llm-stats
    85.6%
  6. Kimi K2.5 self-reported llm-stats
    84.2%
  7. Qwen3.5-35B-A3B self-reported llm-stats
    83.9%
  8. Gemma 4 26B-A4B self-reported llm-stats
    82.4%
  9. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    74.6%
  10. Qwen3 VL 32B Thinking self-reported llm-stats
    70.2%
  11. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    66.5%
  12. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    65.7%
  13. Qwen3 VL 32B Instruct self-reported llm-stats
    63.4%
  14. Qwen3 VL 8B Thinking self-reported llm-stats
    62.7%
  15. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    60.2%
  16. Qwen3 VL 4B Thinking self-reported llm-stats
    60.0%
  17. Gemma 4 E4B self-reported llm-stats
    59.5%
  18. Qwen3 VL 8B Instruct self-reported llm-stats
    53.9%
  19. Gemma 4 E2B self-reported llm-stats
    52.4%
  20. Qwen3 VL 4B Instruct self-reported llm-stats
    51.6%