OSWorld

multimodal

OSWorld: The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across Ubuntu, Windows, and macOS with 369 computer tasks involving real web and desktop applications, OS file I/O, and multi-application workflows

Leaderboard

Showing 19 of 19 results

Claude Opus 4.6

72.7%

i
Claude Sonnet 4.6

72.5%

i
Qwen3 VL 235B A22B Instruct

66.7%

i
Claude Opus 4.5

66.3%

i
GLM-5V-Turbo

62.3%

i
Claude Sonnet 4.5

61.4%

i
Claude Haiku 4.5

50.7%

i
Claude Haiku 4.5

44.9%

i
Qwen3 VL 32B Thinking

41.0%

i
Qwen3 VL 235B A22B Thinking

38.1%

i
Qwen3 VL 8B Instruct

33.9%

i
Qwen3 VL 8B Thinking

33.9%

i
Qwen3 VL 32B Instruct

32.6%

i
Qwen3 VL 4B Thinking

31.4%

i
Qwen3 VL 30B A3B Thinking

30.6%

i
Qwen3 VL 30B A3B Instruct

30.3%

i
Qwen3 VL 4B Instruct

26.2%

i
Qwen2.5 VL 72B Instruct

8.8%

i
Qwen2.5 VL 32B Instruct

5.9%

i