Cybersecurity CTFs
safety official site →
Cybersecurity Capture the Flag (CTF) benchmark for evaluating LLMs in offensive security challenges. Contains diverse cybersecurity tasks including cryptography, web exploitation, binary analysis, and forensics to assess AI capabilities in cybersecurity problem-solving.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: safety. Language: en. Verified by llm-stats: no.