QwenWorldBench

reasoning

QwenWorldBench is Qwen's internal benchmark for evaluating LLMs as world models that simulate agentic environments across Terminal, SWE, MCP, Search, OS, Android, and Web domains.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, simulation. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.7 Max self-reported llm-stats
    57.3%