AutoLogi
reasoning official site →
AutoLogi is an automated method for synthesizing open-ended logic puzzles to evaluate reasoning abilities of Large Language Models. The benchmark addresses limitations of existing multiple-choice reasoning evaluations by featuring program-based verification and controllable difficulty levels. It includes 1,575 English and 883 Chinese puzzles, enabling more reliable evaluation that better distinguishes models' reasoning capabilities across languages.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: reasoning. Language: en. Verified by llm-stats: no.