DS-Arena-Code

reasoning official site →

Data Science Arena Code benchmark for evaluating LLMs on realistic data science code generation tasks. Tests capabilities in complex data processing, analysis, and programming across popular Python libraries used in data science workflows.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. DeepSeek-V2.5 self-reported llm-stats
    63.1%