NL2Repo

coding

NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, coding. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.7 Max self-reported llm-stats
    47.2%
  2. GLM-5.1 self-reported llm-stats
    42.7%
  3. MiniMax M3 self-reported llm-stats
    42.1%
  4. MiniMax M2.7 self-reported llm-stats
    39.8%
  5. Qwen3.6 Plus self-reported llm-stats
    37.9%
  6. Qwen3.6-27B self-reported llm-stats
    36.2%