SWE-bench Multilingual
coding official site →
A multilingual benchmark for issue resolving in software engineering that covers Java, TypeScript, JavaScript, Go, Rust, C, and C++. Contains 1,632 high-quality instances carefully annotated from 2,456 candidates by 68 expert annotators, designed to evaluate Large Language Models across diverse software ecosystems beyond Python.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: code, reasoning. Language: en. Verified by llm-stats: no.