Llama 3.1 405B Instruct

Llama 3.1 405B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks. The model supports 8 languages and has a 128K token context length.

Benchmark results

Benchmark Score Tags Source
API-Bank 92.0% self-reported llm-stats link →
ARC-C 96.9% self-reported llm-stats link →
BFCL 88.5% self-reported llm-stats link →
DROP 84.8% self-reported llm-stats link →
Gorilla Benchmark API Bench 35.3% self-reported llm-stats link →
GPQA 50.7% self-reported llm-stats link →
GSM8k 96.8% self-reported llm-stats link →
HumanEval 89.0% self-reported llm-stats link →
IFEval 88.6% self-reported llm-stats link →
MATH 73.8% self-reported llm-stats link →
MBPP EvalPlus 88.6% self-reported llm-stats link →
MMLU 87.3% self-reported llm-stats link →
MMLU (CoT) 88.6% self-reported llm-stats link →
MMLU-Pro 73.3% self-reported llm-stats link →
Multilingual MGSM (CoT) 91.6% self-reported llm-stats link →
Multipl-E HumanEval 75.2% self-reported llm-stats link →
Multipl-E MBPP 65.7% self-reported llm-stats link →
Nexus 58.7% self-reported llm-stats link →