Claw-Eval

coding

Claw-Eval tests real-world agentic task completion across complex multi-step scenarios, evaluating a model's ability to use tools, navigate environments, and complete end-to-end tasks autonomously.

Leaderboard

Showing 11 of 11 results

Kimi K2.6

80.9%

i
GLM-5V-Turbo

75.0%

i
MiniMax M3

74.5%

i
Qwen3.7 Max

65.2%

i
MiMo-V2.5-Pro

64.0%

i
MiMo-V2.5

63.2%

i
MiMo-V2-Pro

61.5%

i
Qwen3.6-27B

60.6%

i
Qwen3.6 Plus

58.7%

i
MiMo-V2-Omni

54.8%

i
Qwen3.6-35B-A3B

50.0%

i