LVBench

multimodal official site →

LVBench is an extreme long video understanding benchmark designed to evaluate multimodal models on videos up to two hours in duration. It contains 6 major categories and 21 subcategories, with videos averaging five times longer than existing datasets. The benchmark addresses applications requiring comprehension of extremely long videos.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: long_context, multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Kimi K2.5 self-reported llm-stats
    75.9%
  2. Qwen3.5-122B-A10B self-reported llm-stats
    74.4%
  3. Qwen3.5-27B self-reported llm-stats
    73.6%
  4. Qwen3.5-35B-A3B self-reported llm-stats
    71.4%
  5. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    67.7%
  6. Qwen3 VL 32B Instruct self-reported llm-stats
    63.8%
  7. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    63.6%
  8. Qwen3 VL 32B Thinking self-reported llm-stats
    62.6%
  9. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    62.5%
  10. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    59.2%
  11. Qwen3 VL 8B Instruct self-reported llm-stats
    58.0%
  12. Qwen3 VL 4B Instruct self-reported llm-stats
    56.2%
  13. Qwen3 VL 8B Thinking self-reported llm-stats
    55.8%
  14. Qwen3 VL 4B Thinking self-reported llm-stats
    53.5%
  15. Qwen2.5 VL 32B Instruct self-reported llm-stats
    49.0%
  16. Qwen2.5 VL 72B Instruct self-reported llm-stats
    47.3%
  17. Qwen2.5 VL 7B Instruct self-reported llm-stats
    45.3%
  18. Nova Pro self-reported llm-stats
    41.6%
  19. Nova Lite self-reported llm-stats
    40.4%