CyberSecEval 4

coding official site →

CyberSecEval 4 is an evaluation suite covering cybersecurity-related capabilities and risks of large language models. The insecure-code-generation tracks measure whether a model produces vulnerable code: the Instruct track presents coding requests designed to elicit known insecure patterns, while the Autocomplete track prompts the model with code context leading up to a known insecure pattern, with vulnerabilities detected via static analysis.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: code, safety. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MAI-Thinking-1 self-reported llm-stats
    63.0%