MathVista-Mini

math official site →

MathVista-Mini is a smaller version of the MathVista benchmark that evaluates mathematical reasoning in visual contexts. It consists of examples derived from multimodal datasets involving mathematics, combining challenges from diverse mathematical and visual tasks to assess foundation models' ability to solve problems requiring both visual understanding and mathematical reasoning.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: math, multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Kimi K2.5 self-reported llm-stats
    90.1%
  2. Qwen3.5-27B self-reported llm-stats
    87.8%
  3. Qwen3.5-122B-A10B self-reported llm-stats
    87.4%
  4. Qwen3.6-27B self-reported llm-stats
    87.4%
  5. Qwen3.5-35B-A3B self-reported llm-stats
    86.2%
  6. Qwen3 VL 32B Thinking self-reported llm-stats
    85.9%
  7. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    85.8%
  8. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    84.9%
  9. Qwen3 VL 32B Instruct self-reported llm-stats
    83.8%
  10. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    81.9%
  11. Qwen3 VL 8B Thinking self-reported llm-stats
    81.4%
  12. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    80.1%
  13. Qwen3 VL 4B Thinking self-reported llm-stats
    79.5%
  14. Qwen3 VL 8B Instruct self-reported llm-stats
    77.2%
  15. Qwen2.5 VL 72B Instruct self-reported llm-stats
    74.8%
  16. Qwen2.5 VL 32B Instruct self-reported llm-stats
    74.7%
  17. Qwen3 VL 4B Instruct self-reported llm-stats
    73.7%
  18. Qwen2-VL-72B-Instruct self-reported llm-stats
    70.5%
  19. Qwen2.5 VL 7B Instruct self-reported llm-stats
    68.2%
  20. Gemma 3 27B self-reported llm-stats
    67.6%