Global PIQA

reasoning

Global PIQA is a multilingual commonsense reasoning benchmark that evaluates physical interaction knowledge across 100 languages and cultures. It tests AI systems' understanding of physical world knowledge in diverse cultural contexts through multiple choice questions about everyday situations requiring physical commonsense.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, physics, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 3 Pro self-reported llm-stats
    93.4%
  2. Gemini 3 Flash self-reported llm-stats
    92.8%
  3. Qwen3.7 Max self-reported llm-stats
    91.4%
  4. Qwen3.5-397B-A17B self-reported llm-stats
    89.8%
  5. Qwen3.6 Plus self-reported llm-stats
    89.8%
  6. Qwen3.5-122B-A10B self-reported llm-stats
    88.4%
  7. Qwen3.5-27B self-reported llm-stats
    87.5%
  8. Qwen3.5-35B-A3B self-reported llm-stats
    86.6%