LMArena Text Leaderboard

reasoning

LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.

Leaderboard

Showing 2 of 2 results

Grok-4.1 Thinking

1,483

i
Grok-4.1

1,465

i