Llama 3.1 8B Instruct
Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| API-Bank | 82.6% | self-reported llm-stats | link → |
| ARC-C | 83.4% | self-reported llm-stats | link → |
| BFCL | 76.1% | self-reported llm-stats | link → |
| DROP | 59.5% | self-reported llm-stats | link → |
| Gorilla Benchmark API Bench | 8.2% | self-reported llm-stats | link → |
| GPQA | 30.4% | self-reported llm-stats | link → |
| GSM-8K (CoT) | 84.5% | self-reported llm-stats | link → |
| HumanEval | 72.6% | self-reported llm-stats | link → |
| IFEval | 80.4% | self-reported llm-stats | link → |
| MATH (CoT) | 51.9% | self-reported llm-stats | link → |
| MBPP EvalPlus (base) | 72.8% | self-reported llm-stats | link → |
| MMLU | 69.4% | self-reported llm-stats | link → |
| MMLU (CoT) | 73.0% | self-reported llm-stats | link → |
| MMLU-Pro | 48.3% | self-reported llm-stats | link → |
| Multilingual MGSM (CoT) | 68.9% | self-reported llm-stats | link → |
| Multipl-E HumanEval | 50.8% | self-reported llm-stats | link → |
| Multipl-E MBPP | 52.4% | self-reported llm-stats | link → |
| Nexus | 38.5% | self-reported llm-stats | link → |