Llama 3.1 8B Instruct

Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.

Benchmark results

Benchmark Score Tags Source
API-Bank 82.6% self-reported llm-stats link →
ARC-C 83.4% self-reported llm-stats link →
BFCL 76.1% self-reported llm-stats link →
DROP 59.5% self-reported llm-stats link →
Gorilla Benchmark API Bench 8.2% self-reported llm-stats link →
GPQA 30.4% self-reported llm-stats link →
GSM-8K (CoT) 84.5% self-reported llm-stats link →
HumanEval 72.6% self-reported llm-stats link →
IFEval 80.4% self-reported llm-stats link →
MATH (CoT) 51.9% self-reported llm-stats link →
MBPP EvalPlus (base) 72.8% self-reported llm-stats link →
MMLU 69.4% self-reported llm-stats link →
MMLU (CoT) 73.0% self-reported llm-stats link →
MMLU-Pro 48.3% self-reported llm-stats link →
Multilingual MGSM (CoT) 68.9% self-reported llm-stats link →
Multipl-E HumanEval 50.8% self-reported llm-stats link →
Multipl-E MBPP 52.4% self-reported llm-stats link →
Nexus 38.5% self-reported llm-stats link →