BLINK
reasoning official site →
BLINK: Multimodal Large Language Models Can See but Not Perceive. A benchmark for multimodal language models focusing on core visual perception abilities. Reformats 14 classic computer vision tasks into 3,807 multiple-choice questions paired with single or multiple images and visual prompting. Tasks include relative depth estimation, visual correspondence, forensics detection, multi-view reasoning, counting, object localization, and spatial reasoning that humans can solve 'within a blink'.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: 3d, multimodal, reasoning, spatial_reasoning, vision. Language: en. Verified by llm-stats: no.