BFCL

reasoning

The Berkeley Function Calling Leaderboard (BFCL) is the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' ability to invoke functions. It evaluates serial and parallel function calls across multiple programming languages (Python, Java, JavaScript, REST API) using a novel Abstract Syntax Tree (AST) evaluation method. The benchmark consists of over 2,000 question-function-answer pairs covering diverse application domains and complex use cases including multiple function calls, parallel function calls, and multi-turn interactions.

Leaderboard

Showing 11 of 11 results

Llama 3.1 405B Instruct

88.5%

i
Llama 3.1 70B Instruct

84.8%

i
Llama 3.1 8B Instruct

76.1%

i
Nova 2 Sonic

74.5%

i
Qwen3 235B A22B

70.8%

i
Qwen3 32B

70.3%

i
Qwen3 30B A3B

69.1%

i
Nova Pro

68.4%

i
Nova Lite

66.6%

i
QwQ-32B

66.4%

i
Nova Micro

56.2%

i