Llama 3.1 70B Instruct
Llama 3.1 70B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| API-Bank | 90.0% | self-reported llm-stats | link → |
| ARC-C | 94.8% | self-reported llm-stats | link → |
| BFCL | 84.8% | self-reported llm-stats | link → |
| DROP | 79.6% | self-reported llm-stats | link → |
| Gorilla Benchmark API Bench | 29.7% | self-reported llm-stats | link → |
| GPQA | 41.7% | self-reported llm-stats | link → |
| GSM-8K (CoT) | 95.1% | self-reported llm-stats | link → |
| HumanEval | 80.5% | self-reported llm-stats | link → |
| IFEval | 87.5% | self-reported llm-stats | link → |
| MATH (CoT) | 68.0% | self-reported llm-stats | link → |
| MBPP ++ base version | 86.0% | self-reported llm-stats | link → |
| MMLU | 83.6% | self-reported llm-stats | link → |
| MMLU (CoT) | 86.0% | self-reported llm-stats | link → |
| MMLU-Pro | 66.4% | self-reported llm-stats | link → |
| Multilingual MGSM (CoT) | 86.9% | self-reported llm-stats | link → |
| Multipl-E HumanEval | 65.5% | self-reported llm-stats | link → |
| Multipl-E MBPP | 62.0% | self-reported llm-stats | link → |
| Nexus | 56.7% | self-reported llm-stats | link → |