LOCA-Bench (256k)
reasoning
LOCA-Bench is a long-context agentic benchmark. The 256k variant evaluates agents using the official ReAct mode with an environment description length of 256k tokens, measuring how well models reason and act over very long contexts.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning. Language: en. Verified by llm-stats: no.