CyBench
coding official site →
CyBench is a suite of Capture-the-Flag (CTF) challenges measuring agentic cyber attack capabilities. It evaluates dual-use cybersecurity knowledge and measures the 'unguided success rate', where agents complete tasks end-to-end without guidance on appropriate subtasks.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, code, safety. Language: en. Verified by llm-stats: no.