API-Bank

reasoning

A comprehensive benchmark for tool-augmented LLMs that evaluates API planning, retrieval, and calling capabilities. Contains 314 tool-use dialogues with 753 API calls across 73 API tools, designed to assess how effectively LLMs can utilize external tools and overcome obstacles in tool leveraging.

Leaderboard

Showing 3 of 3 results

Llama 3.1 405B Instruct

92.0%

i
Llama 3.1 70B Instruct

90.0%

i
Llama 3.1 8B Instruct

82.6%

i