LOCA-Bench (256k)

reasoning

LOCA-Bench is a long-context agentic benchmark. The 256k variant evaluates agents using the official ReAct mode with an environment description length of 256k tokens, measuring how well models reason and act over very long contexts.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiniMax M3 self-reported llm-stats
    49.3%