NIH/Multi-needle

long context

Multi-needle in a haystack benchmark for evaluating long-context comprehension capabilities of language models by testing retrieval of multiple target pieces of information from extended documents

Leaderboard

Showing 1 of 1 result

Llama 3.2 3B Instruct

84.7%

i