VoiceBench Avg

reasoning official site →

VoiceBench is the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants, evaluating capabilities including general knowledge, instruction-following, reasoning, and safety using both synthetic and real spoken instruction data with diverse speaker characteristics and environmental conditions.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: communication, general, reasoning, safety, speech_to_text. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5-Omni-7B self-reported llm-stats
    74.1%