YC-Bench

agents

YC-Bench evaluates agents on long-horizon, open-ended business and investment decision-making. The reported metric is the final assets (fund value, in US dollars) accumulated by the agent over the course of the simulation.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 10000000. Categories: agents, finance. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiniMax M3 self-reported llm-stats
    2,100,000