TydiQA

reasoning official site →

A multilingual question answering benchmark covering 11 typologically diverse languages with 204K question-answer pairs. Questions are written by people seeking genuine information and data is collected directly in each language without translation to test model generalization across diverse linguistic structures.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: language, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Llama 4 Maverick self-reported llm-stats
    31.7%
  2. Llama 4 Scout self-reported llm-stats
    31.5%