MMVetGPT4Turbo

math official site →

MM-Vet evaluation using GPT-4 Turbo for scoring. This variant of MM-Vet examines large multimodal models on complicated multimodal tasks requiring integrated capabilities across six core vision-language abilities: recognition, knowledge, spatial awareness, language generation, OCR, and math.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, math, multimodal, reasoning, spatial_reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2-VL-72B-Instruct self-reported llm-stats
    74.0%