DRACO

reasoning

DRACO is a deep research benchmark that evaluates an agent's ability to gather, synthesize, and reason over information to answer complex research questions. Scores are based on official rubrics per question, with the final score being the average across all questions.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, search. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiniMax M3 self-reported llm-stats
    73.2%