CSimpleQA

language official site →

Chinese SimpleQA is the first comprehensive Chinese benchmark to evaluate the factuality ability of language models to answer short questions. It contains 3,000 high-quality questions spanning 6 major topics with 99 diverse subtopics, designed to assess Chinese factual knowledge across humanities, science, engineering, culture, and society.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language. Language: en. Verified by llm-stats: no.

Leaderboard

  1. DeepSeek-V4-Pro-Max self-reported llm-stats
    84.4%
  2. Qwen3-235B-A22B-Instruct-2507 self-reported llm-stats
    84.3%
  3. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    83.4%
  4. DeepSeek-V4-Flash-Max self-reported llm-stats
    78.9%
  5. Kimi K2 Instruct self-reported llm-stats
    78.4%
  6. Kimi K2 Base self-reported llm-stats
    77.6%
  7. DeepSeek-V3 self-reported llm-stats
    64.8%