RealWorldQA

vision

RealWorldQA is a benchmark designed to evaluate basic real-world spatial understanding capabilities of multimodal models. The initial release consists of over 700 anonymized images taken from vehicles and other real-world scenarios, each accompanied by a question and easily verifiable answer. Released by xAI as part of their Grok-1.5 Vision preview to test models' ability to understand natural scenes and spatial relationships in everyday visual contexts.

Leaderboard

Showing 20 of 22 results

Qwen3.6 Plus

85.4%

i
Qwen3.6-35B-A3B

85.3%

i
Qwen3.5-122B-A10B

85.1%

i
Qwen3.5-35B-A3B

84.1%

i
Qwen3.6-27B

84.1%

i
Qwen3.5-27B

83.7%

i
Qwen3 VL 235B A22B Thinking

81.3%

i
Qwen3 VL 235B A22B Instruct

79.3%

i
Qwen3 VL 32B Instruct

79.0%

i
Qwen3 VL 32B Thinking

78.4%

i
Qwen2-VL-72B-Instruct

77.8%

i
Qwen3 VL 30B A3B Thinking

77.4%

i
Qwen3 VL 30B A3B Instruct

73.7%

i
Qwen3 VL 8B Thinking

73.5%

i
Qwen3 VL 4B Thinking

73.2%

i
Qwen3 VL 8B Instruct

71.5%

i
Qwen3 VL 4B Instruct

70.9%

i
Qwen2.5-Omni-7B

70.3%

i
Grok-1.5V

68.7%

i
DeepSeek VL2

68.4%

i