ResearchClawBench
agents researchtool calling official site →
ResearchClawBench evaluates research agents on realistic, tool-using research tasks that require code execution and filesystem workspace interaction.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, research, tool_calling. Language: en.