BFCL-V4

agents

Berkeley Function Calling Leaderboard V4 (BFCL-V4) evaluates LLMs on their ability to accurately call functions and APIs, including simple, multiple, parallel, and nested function calls across diverse programming scenarios.

Leaderboard

Showing 12 of 12 results

Qwen3.7 Max

75.0%

i
Qwen3.5-397B-A17B

72.9%

i
Qwen3.5-122B-A10B

72.2%

i
Qwen3.5-27B

68.5%

i
Qwen3.5-35B-A3B

67.3%

i
Qwen3.5-9B

66.1%

i
Nova 2 Pro

61.6%

i
Nova 2 Lite

60.3%

i
Nova 2 Omni

58.3%

i
Qwen3.5-4B

50.3%

i
Qwen3.5-2B

43.6%

i
Qwen3.5-0.8B

25.3%

i