CyberSecEval 4

coding

CyberSecEval 4 is an evaluation suite covering cybersecurity-related capabilities and risks of large language models. The insecure-code-generation tracks measure whether a model produces vulnerable code: the Instruct track presents coding requests designed to elicit known insecure patterns, while the Autocomplete track prompts the model with code context leading up to a known insecure pattern, with vulnerabilities detected via static analysis.

Leaderboard

Showing 1 of 1 result

MAI-Thinking-1

63.0%

i