ResearchClawBench

agents researchtool calling official site →

ResearchClawBench evaluates research agents on realistic, tool-using research tasks that require code execution and filesystem workspace interaction.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, research, tool_calling. Language: en.

Leaderboard

  1. 16.9%