Aider

coding

Aider is a comprehensive code editing benchmark based on 133 practice exercises from Exercism's Python repository, designed to evaluate AI models' ability to translate natural language coding requests into executable code that passes unit tests. The benchmark measures end-to-end code editing capabilities, including GPT's ability to edit existing code and format code changes for automated saving to local files. The Aider Polyglot variant extends this evaluation across 225 challenging exercises spanning C++, Go, Java, JavaScript, Python, and Rust, making it a standard benchmark for assessing multilingual code editing performance in AI research.

Leaderboard

Showing 4 of 4 results

DeepSeek-V2.5

72.2%

i
Qwen3 235B A22B

61.8%

i
Qwen2.5-Coder 7B Instruct

55.6%

i
Qwen3 32B

50.2%

i