PinchBench

coding

PinchBench evaluates coding agents on real-world agentic coding tasks, measuring both best-case and average performance across complex software engineering scenarios.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, coding. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiMo-V2-Omni self-reported llm-stats
    81.2%
  2. MiMo-V2-Pro self-reported llm-stats
    81.0%
  3. GLM-5V-Turbo self-reported llm-stats
    80.7%