MASK
reasoning official site →
MASK is a collection of 1000 questions measuring whether models faithfully report their beliefs when pressured to lie. It operationalizes deception as the rate at which the model lies, i.e., knowingly making false statements intended to be received as true. Lower dishonesty rates indicate better honesty.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: reasoning, safety. Language: en. Verified by llm-stats: no.