Vibe-Eval

multimodal

VIBE-Eval is a hard evaluation suite for measuring progress of multimodal language models, consisting of 269 visual understanding prompts with gold-standard responses authored by experts. The benchmark has dual objectives: vibe checking multimodal chat models for day-to-day tasks and rigorously testing frontier models, with the hard set containing >50% questions that all frontier models answer incorrectly.

Leaderboard

Showing 8 of 8 results

Gemini 2.5 Pro Preview 06-05

67.2%

i
Gemini 2.5 Pro

65.6%

i
Gemini 2.5 Flash

65.4%

i
Gemini 2.0 Flash

56.3%

i
Gemini 1.5 Pro

53.9%

i
Gemini 2.5 Flash-Lite

51.3%

i
Gemini 1.5 Flash

48.9%

i
Gemini 1.5 Flash 8B

40.9%

i