QwenWebBench

coding

QwenWebBench is an internal front-end code generation benchmark by Qwen. It is bilingual (EN/CN) and spans 7 categories (Web Design, Web Apps, Games, SVG, Data Visualization, Animation, and 3D), using auto-render plus a multimodal judge for code and visual correctness. Scores are reported as BT/Elo ratings.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 2000. Categories: agents, coding, multimodal. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.7 Max self-reported llm-stats
    1,568
  2. Qwen3.6-27B self-reported llm-stats
    1,487