ActivityNet

vision official site →

A large-scale video benchmark for human activity understanding. Provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. The benchmark covers a wide range of complex human activities that are of interest to people in their daily living and can be used to compare algorithms for three scenarios: untrimmed video classification, trimmed activity classification, and activity detection.

Methodology

Imported from llm-stats public benchmark metadata. Modality: video. Max score: 1. Categories: video, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-4o self-reported llm-stats
    61.9%