POPE

multimodal

Polling-based Object Probing Evaluation (POPE) is a benchmark for evaluating object hallucination in Large Vision-Language Models (LVLMs). POPE addresses the problem where LVLMs generate objects inconsistent with target images by using a polling-based query method that asks yes/no questions about object presence in images, providing more stable and flexible evaluation of object hallucination.

Leaderboard

Showing 2 of 2 results

Phi-3.5-vision-instruct

86.1%

i
Phi-4-multimodal-instruct

85.6%

i