ARC-C

reasoning official site →

The AI2 Reasoning Challenge (ARC) Challenge Set is a multiple-choice question-answering benchmark containing grade-school level science questions that require advanced reasoning capabilities. ARC-C specifically contains questions that were answered incorrectly by both retrieval-based and word co-occurrence algorithms, making it a particularly challenging subset designed to test commonsense reasoning abilities in AI systems.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Llama 3.1 405B Instruct self-reported llm-stats
    96.9%
  2. Claude 3 Opus self-reported llm-stats
    96.4%
  3. Nova Pro self-reported llm-stats
    94.8%
  4. Llama 3.1 70B Instruct self-reported llm-stats
    94.8%
  5. Claude 3 Sonnet self-reported llm-stats
    93.2%
  6. Jamba 1.5 Large self-reported llm-stats
    93.0%
  7. Nova Lite self-reported llm-stats
    92.4%
  8. Mistral Small 3 24B Base self-reported llm-stats
    91.3%
  9. Phi-3.5-MoE-instruct self-reported llm-stats
    91.0%
  10. Nova Micro self-reported llm-stats
    90.2%
  11. Claude 3 Haiku self-reported llm-stats
    89.2%
  12. Jamba 1.5 Mini self-reported llm-stats
    85.7%
  13. Phi-3.5-mini-instruct self-reported llm-stats
    84.6%
  14. Phi 4 Mini self-reported llm-stats
    83.7%
  15. Llama 3.1 8B Instruct self-reported llm-stats
    83.4%
  16. Llama 3.2 3B Instruct self-reported llm-stats
    78.6%
  17. Ministral 8B Instruct self-reported llm-stats
    71.9%
  18. Gemma 2 27B self-reported llm-stats
    71.4%
  19. Command R+ self-reported llm-stats
    71.0%
  20. Qwen2.5-Coder 32B Instruct self-reported llm-stats
    70.5%