PIQA
reasoning official site →
PIQA (Physical Interaction: Question Answering) is a benchmark dataset for physical commonsense reasoning in natural language. It tests AI systems' ability to answer questions requiring physical world knowledge through multiple choice questions with everyday situations, focusing on atypical solutions inspired by instructables.com. The dataset contains 21,000 multiple choice questions where models must choose the most appropriate solution for physical interactions.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, physics, reasoning. Language: en. Verified by llm-stats: no.