MM-MT-Bench
multimodal
A multi-turn LLM-as-a-judge evaluation benchmark for testing multimodal instruction-tuned models' ability to follow user instructions in multi-turn dialogues and answer open-ended questions in a zero-shot manner.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 100. Categories: communication, multimodal. Language: en. Verified by llm-stats: no.