Llama 3.1 405B Instruct
Llama 3.1 405B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks. The model supports 8 languages and has a 128K token context length.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| API-Bank | 92.0% | self-reported llm-stats | link → |
| ARC-C | 96.9% | self-reported llm-stats | link → |
| BFCL | 88.5% | self-reported llm-stats | link → |
| DROP | 84.8% | self-reported llm-stats | link → |
| Gorilla Benchmark API Bench | 35.3% | self-reported llm-stats | link → |
| GPQA | 50.7% | self-reported llm-stats | link → |
| GSM8k | 96.8% | self-reported llm-stats | link → |
| HumanEval | 89.0% | self-reported llm-stats | link → |
| IFEval | 88.6% | self-reported llm-stats | link → |
| MATH | 73.8% | self-reported llm-stats | link → |
| MBPP EvalPlus | 88.6% | self-reported llm-stats | link → |
| MMLU | 87.3% | self-reported llm-stats | link → |
| MMLU (CoT) | 88.6% | self-reported llm-stats | link → |
| MMLU-Pro | 73.3% | self-reported llm-stats | link → |
| Multilingual MGSM (CoT) | 91.6% | self-reported llm-stats | link → |
| Multipl-E HumanEval | 75.2% | self-reported llm-stats | link → |
| Multipl-E MBPP | 65.7% | self-reported llm-stats | link → |
| Nexus | 58.7% | self-reported llm-stats | link → |