LongCodeBench

coding

LongCodeBench evaluates the code understanding and comprehension abilities of large language models at very long context windows, scaling up to 1M tokens. It tests whether models can reason about extensive codebases provided in a single prompt by answering multiple-choice questions about the code.

Leaderboard

Showing 2 of 2 results

Nova 2 Lite

84.0%

i
Nova 2 Pro

84.0%

i