Social IQa

reasoning official site →

The first large-scale benchmark for commonsense reasoning about social situations. Contains 38,000 multiple choice questions probing emotional and social intelligence in everyday situations, testing commonsense understanding of social interactions and theory of mind reasoning about the implied emotions and behavior of others.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: creativity, psychology, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Phi-3.5-MoE-instruct self-reported llm-stats
    78.0%
  2. Phi-3.5-mini-instruct self-reported llm-stats
    74.7%
  3. Phi 4 Mini self-reported llm-stats
    72.5%
  4. Gemma 2 27B self-reported llm-stats
    53.7%
  5. Gemma 2 9B self-reported llm-stats
    53.4%
  6. Gemma 3n E4B self-reported llm-stats
    50.0%
  7. 50.0%
  8. Gemma 3n E2B self-reported llm-stats
    48.8%
  9. 48.8%