CRAG
reasoning official site →
CRAG (Comprehensive RAG Benchmark) is a factual question answering benchmark consisting of 4,409 question-answer pairs across 5 domains (finance, sports, music, movie, open domain) and 8 question categories. The benchmark includes mock APIs to simulate web and Knowledge Graph search, designed to represent the diverse and dynamic nature of real-world QA tasks with temporal dynamism ranging from years to seconds. It evaluates retrieval-augmented generation systems for trustworthy question answering.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: economics, finance, reasoning, search. Language: en. Verified by llm-stats: no.