RefCOCOg

multimodal

RefCOCOg is a referring expression comprehension benchmark that evaluates spatial grounding in images. Given a natural language expression describing an object, the model must localize the correct region, evaluated by accuracy at a 0.5 IoU threshold. It features longer, more descriptive expressions than RefCOCO and RefCOCO+.

Leaderboard

Showing 1 of 1 result

Nova 2 Omni

86.3%

i