Include

general

Include benchmark - specific documentation not found in official sources

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Opus 4.8 self-reported llm-stats
    87.6%
  2. Qwen3.7 Max self-reported llm-stats
    86.2%
  3. Qwen3.5-397B-A17B self-reported llm-stats
    85.6%
  4. Qwen3.6 Plus self-reported llm-stats
    85.1%
  5. Qwen3.5-122B-A10B self-reported llm-stats
    82.8%
  6. Qwen3.5-27B self-reported llm-stats
    81.6%
  7. Qwen3-235B-A22B-Thinking-2507 self-reported llm-stats
    81.0%
  8. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    80.0%
  9. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    80.0%
  10. Qwen3.5-35B-A3B self-reported llm-stats
    79.7%
  11. Qwen3-235B-A22B-Instruct-2507 self-reported llm-stats
    79.5%
  12. Qwen3-Next-80B-A3B-Instruct self-reported llm-stats
    78.9%
  13. Qwen3-Next-80B-A3B-Thinking self-reported llm-stats
    78.9%
  14. Qwen3 VL 32B Thinking self-reported llm-stats
    76.3%
  15. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    74.5%
  16. Qwen3 VL 32B Instruct self-reported llm-stats
    74.0%
  17. Qwen3 235B A22B self-reported llm-stats
    73.5%
  18. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    71.6%
  19. Qwen3 VL 8B Thinking self-reported llm-stats
    69.5%
  20. Qwen3 VL 8B Instruct self-reported llm-stats
    67.0%