OSWorld Screenshot-only

multimodal

OSWorld Screenshot-only: A variant of the OSWorld benchmark that evaluates multimodal AI agents using only screenshot observations to complete open-ended computer tasks across real operating systems (Ubuntu, Windows, macOS). Tests agents' ability to perform complex workflows involving web apps, desktop applications, file I/O, and multi-application tasks through visual interface understanding and GUI grounding.

Leaderboard

Showing 1 of 1 result

Claude 3.5 Sonnet

14.9%

i