HealthBench Professional

healthcare

HealthBench Professional evaluates model capability and safety for clinician use cases using real clinician-style chats and physician-authored grading rubrics.

Leaderboard

Showing 4 of 4 results

Claude Fable 5

66.0%

i
Claude Opus 4.8

55.8%

i
GPT-5.5 Instant

38.4%

i
MAI-Thinking-1

35.0%

i