FunctionalMATH

math official site →

A functional variant of the MATH benchmark that tests language models' ability to generalize reasoning patterns across different problem instances, revealing the reasoning gap between static and functional performance.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 1.5 Pro self-reported llm-stats
    64.6%
  2. Gemini 1.5 Flash self-reported llm-stats
    53.6%