FunctionalMATH
math official site →
A functional variant of the MATH benchmark that tests language models' ability to generalize reasoning patterns across different problem instances, revealing the reasoning gap between static and functional performance.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.