MM IF-Eval

reasoning official site →

A challenging multimodal instruction-following benchmark that includes both compose-level constraints for output responses and perception-level constraints tied to input images, with comprehensive evaluation pipeline.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning, structured_output. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Pixtral-12B self-reported llm-stats
    52.7%