ODinW

vision official site →

Object Detection in the Wild (ODinW) benchmark for evaluating object detection models' task-level transfer ability across diverse real-world datasets in terms of prediction accuracy and adaptation efficiency

Methodology

Imported from llm-stats public benchmark metadata. Modality: image. Max score: 1. Categories: vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.6 Plus self-reported llm-stats
    51.8%
  2. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    48.6%
  3. Qwen3 VL 4B Instruct self-reported llm-stats
    48.2%
  4. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    47.5%
  5. Qwen3 VL 32B Instruct self-reported llm-stats
    46.6%
  6. Qwen3 VL 8B Instruct self-reported llm-stats
    44.7%
  7. Qwen3.5-122B-A10B self-reported llm-stats
    44.5%
  8. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    43.2%
  9. Qwen3.5-35B-A3B self-reported llm-stats
    42.6%
  10. Qwen2.5-Omni-7B self-reported llm-stats
    42.4%
  11. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    42.3%
  12. Qwen3.5-27B self-reported llm-stats
    41.1%
  13. Qwen3 VL 8B Thinking self-reported llm-stats
    39.8%
  14. Qwen3 VL 4B Thinking self-reported llm-stats
    39.4%