ComplexFuncBench
reasoning official site →
ComplexFuncBench is a benchmark designed to evaluate large language models' capabilities in handling complex function calling scenarios. It encompasses multi-step and constrained function calling tasks that require long-parameter filling, parameter value reasoning, and managing contexts up to 128k tokens. The benchmark includes 1,000 samples across five real-world scenarios.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: long_context, reasoning, structured_output, tool_calling. Language: en. Verified by llm-stats: no.