Artifacts Bench

coding

Artifacts Bench evaluates a model's ability to generate visual code artifacts, measuring the quality of generated interactive and visual front-end outputs from natural-language requests.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: code, frontend_development. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MAI-Code-1-Flash self-reported llm-stats
    36.4%