MLS-Bench Lite

coding

MLS-Bench Lite is the official 30-task subset of MLS-Bench for evaluating whether AI systems can invent generalizable and scalable machine learning methods across LLM pretraining and post-training, robotics, world models, computer vision, reinforcement learning, optimization, ML systems, and AI for Science.

Leaderboard