CRAG

reasoning official site →

CRAG (Comprehensive RAG Benchmark) is a factual question answering benchmark consisting of 4,409 question-answer pairs across 5 domains (finance, sports, music, movie, open domain) and 8 question categories. The benchmark includes mock APIs to simulate web and Knowledge Graph search, designed to represent the diverse and dynamic nature of real-world QA tasks with temporal dynamism ranging from years to seconds. It evaluates retrieval-augmented generation systems for trustworthy question answering.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: economics, finance, reasoning, search. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Nova Pro self-reported llm-stats
    50.3%
  2. Nova Lite self-reported llm-stats
    43.8%
  3. Nova Micro self-reported llm-stats
    43.1%