LMArena Text Leaderboard

reasoning official site →

LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 2000. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Grok-4.1 Thinking self-reported llm-stats
    1,483
  2. Grok-4.1 self-reported llm-stats
    1,465