IF
general official site →
Instruction-Following Evaluation (IFEval) benchmark for large language models, focusing on verifiable instructions with 25 types of instructions and around 500 prompts containing one or more verifiable constraints
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, structured_output. Language: en. Verified by llm-stats: no.