Multi-SWE-Bench

coding

A multilingual benchmark for issue resolving that evaluates Large Language Models' ability to resolve software issues across diverse programming ecosystems. Covers 7 programming languages (Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances carefully annotated by 68 expert annotators. Addresses limitations of existing benchmarks that focus almost exclusively on Python.

Leaderboard

Showing 6 of 6 results

MiniMax M2.7

52.7%

i
MiniMax M2.5

51.3%

i
MiniMax M2.1

49.4%

i
Kimi K2-Thinking-0905

41.9%

i
MiniMax M2

36.2%

i
Qwen3-Coder 480B A35B Instruct

25.8%

i