QwenWorldBench
reasoning
QwenWorldBench is Qwen's internal benchmark for evaluating LLMs as world models that simulate agentic environments across Terminal, SWE, MCP, Search, OS, Android, and Web domains.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, simulation. Language: en. Verified by llm-stats: no.