Hallusion Bench
reasoning official site →
A comprehensive benchmark designed to evaluate image-context reasoning in large visual-language models (LVLMs) by challenging models with 346 images and 1,129 carefully crafted questions to assess language hallucination and visual illusion
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: reasoning, vision. Language: en. Verified by llm-stats: no.