ExploitBench

coding agentscodesafety

ExploitBench is a cybersecurity benchmark that evaluates a model's ability to discover and exploit software vulnerabilities, reported as the fraction of challenges where the model captures the target (Cap%).

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, code, safety. Language: en.

Leaderboard