FunctionalMATH

math

A functional variant of the MATH benchmark that tests language models' ability to generalize reasoning patterns across different problem instances, revealing the reasoning gap between static and functional performance.

Leaderboard

Showing 2 of 2 results

Gemini 1.5 Pro

64.6%

i
Gemini 1.5 Flash

53.6%

i