DeepSearchQA

reasoning

DeepSearchQA is a benchmark for evaluating deep search and question-answering capabilities, testing models' ability to perform multi-hop reasoning and information retrieval across complex knowledge domains.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, search. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Opus 4.8 self-reported llm-stats
    93.1%
  2. Claude Opus 4.6 self-reported llm-stats
    91.3%
  3. MiMo-V2-Pro self-reported llm-stats
    86.7%
  4. Kimi K2.6 self-reported llm-stats
    83.0%
  5. Kimi K2.5 self-reported llm-stats
    77.1%
  6. Muse Spark self-reported llm-stats
    74.8%