POPE

multimodal official site →

Polling-based Object Probing Evaluation (POPE) is a benchmark for evaluating object hallucination in Large Vision-Language Models (LVLMs). POPE addresses the problem where LVLMs generate objects inconsistent with target images by using a polling-based query method that asks yes/no questions about object presence in images, providing more stable and flexible evaluation of object hallucination.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, safety, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Phi-3.5-vision-instruct self-reported llm-stats
    86.1%
  2. Phi-4-multimodal-instruct self-reported llm-stats
    85.6%