Llama 3.1 70B Instruct

Llama 3.1 70B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.

Benchmark results

Benchmark Score Tags Source
API-Bank 90.0% self-reported llm-stats link →
ARC-C 94.8% self-reported llm-stats link →
BFCL 84.8% self-reported llm-stats link →
DROP 79.6% self-reported llm-stats link →
Gorilla Benchmark API Bench 29.7% self-reported llm-stats link →
GPQA 41.7% self-reported llm-stats link →
GSM-8K (CoT) 95.1% self-reported llm-stats link →
HumanEval 80.5% self-reported llm-stats link →
IFEval 87.5% self-reported llm-stats link →
MATH (CoT) 68.0% self-reported llm-stats link →
MBPP ++ base version 86.0% self-reported llm-stats link →
MMLU 83.6% self-reported llm-stats link →
MMLU (CoT) 86.0% self-reported llm-stats link →
MMLU-Pro 66.4% self-reported llm-stats link →
Multilingual MGSM (CoT) 86.9% self-reported llm-stats link →
Multipl-E HumanEval 65.5% self-reported llm-stats link →
Multipl-E MBPP 62.0% self-reported llm-stats link →
Nexus 56.7% self-reported llm-stats link →