Creative Writing v3

creativity official site →

EQ-Bench Creative Writing v3 is an LLM-judged creative writing benchmark that evaluates models across 32 writing prompts with 3 iterations per prompt. Uses a hybrid scoring system combining rubric assessment and Elo ratings through pairwise comparisons. Challenges models in areas like humor, romance, spatial awareness, and unique perspectives to assess emotional intelligence and creative writing capabilities.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: creativity, writing. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Grok-4.1 Thinking self-reported llm-stats
    1,721.9
  2. Grok-4.1 self-reported llm-stats
    1,708.6
  3. Qwen3-Next-80B-A3B-Instruct self-reported llm-stats
    85.3
  4. Qwen3-235B-A22B-Instruct-2507 self-reported llm-stats
    0.875
  5. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    0.865
  6. Qwen3-235B-A22B-Thinking-2507 self-reported llm-stats
    0.861
  7. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    0.857
  8. Qwen3 VL 32B Instruct self-reported llm-stats
    0.856
  9. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    0.846
  10. Qwen3 VL 32B Thinking self-reported llm-stats
    0.833
  11. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    0.825
  12. Qwen3 VL 8B Thinking self-reported llm-stats
    0.824
  13. Qwen3 VL 4B Thinking self-reported llm-stats
    0.761