PIQA

reasoning

PIQA (Physical Interaction: Question Answering) is a benchmark dataset for physical commonsense reasoning in natural language. It tests AI systems' ability to answer questions requiring physical world knowledge through multiple choice questions with everyday situations, focusing on atypical solutions inspired by instructables.com. The dataset contains 21,000 multiple choice questions where models must choose the most appropriate solution for physical interactions.

Leaderboard

Showing 11 of 11 results

Phi-3.5-MoE-instruct

88.6%

i
Hermes 3 70B

84.4%

i
Gemma 2 27B

83.2%

i
Gemma 2 9B

81.7%

i
Gemma 3n E4B

81.0%

i
Gemma 3n E4B Instructed LiteRT Preview

81.0%

i
Phi-3.5-mini-instruct

81.0%

i
Gemma 3n E2B

78.9%

i
Gemma 3n E2B Instructed LiteRT (Preview)

78.9%

i
Phi 4 Mini

77.6%

i
ERNIE 4.5

55.2%

i