IF

general official site →

Instruction-Following Evaluation (IFEval) benchmark for large language models, focusing on verifiable instructions with 25 types of instructions and around 500 prompts containing one or more verifiable constraints

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, structured_output. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Mistral Small 3.2 24B Instruct self-reported llm-stats
    84.8%
  2. MiniMax M2 self-reported llm-stats
    72.0%