ExploitBench
coding agentscodesafety
ExploitBench is a cybersecurity benchmark that evaluates a model's ability to discover and exploit software vulnerabilities, reported as the fraction of challenges where the model captures the target (Cap%).
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, code, safety. Language: en.