SWE-Lancer (IC-Diamond subset)
coding official site →
SWE-Lancer (IC-Diamond subset) is a benchmark of real-world freelance software engineering tasks from Upwork, ranging from $50 bug fixes to $32,000 feature implementations. It evaluates AI models on independent engineering tasks using end-to-end tests triple-verified by experienced software engineers, and includes managerial tasks where models choose between technical implementation proposals.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: code, reasoning. Language: en. Verified by llm-stats: no.