CyBench

coding official site →

CyBench is a suite of Capture-the-Flag (CTF) challenges measuring agentic cyber attack capabilities. It evaluates dual-use cybersecurity knowledge and measures the 'unguided success rate', where agents complete tasks end-to-end without guidance on appropriate subtasks.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, code, safety. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Mythos Preview self-reported llm-stats
    100.0%
  2. Grok-4.1 Thinking self-reported llm-stats
    39.0%