VocalSound

audio official site →

A dataset for improving human vocal sounds recognition, containing over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Used for audio event classification and recognition of human non-speech vocalizations.

Methodology

Imported from llm-stats public benchmark metadata. Modality: audio. Max score: 1. Categories: audio. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5-Omni-7B self-reported llm-stats
    93.9%