CyberGym
coding
CyberGym is a benchmark for evaluating AI agents on cybersecurity tasks, testing their ability to identify vulnerabilities, perform security analysis, and complete security-related challenges in a controlled environment.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, code, safety. Language: en. Verified by llm-stats: no.