Llama 3.3 70B Instruct
Llama 3.3 is a multilingual large language model optimized for dialogue use cases across multiple languages. It is a pretrained and instruction-tuned generative model with 70 billion parameters, outperforming many open-source and closed chat models on common industry benchmarks. Llama 3.3 supports a context length of 128,000 tokens and is designed for commercial and research use in multiple languages.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| BFCL v2 | 77.3% | self-reported llm-stats | link → |
| GPQA | 50.5% | self-reported llm-stats | link → |
| HumanEval | 88.4% | self-reported llm-stats | link → |
| IFEval | 92.1% | self-reported llm-stats | link → |
| MATH | 77.0% | self-reported llm-stats | link → |
| MBPP EvalPlus | 87.6% | self-reported llm-stats | link → |
| MGSM | 91.1% | self-reported llm-stats | link → |
| MMLU | 86.0% | self-reported llm-stats | link → |
| MMLU-Pro | 68.9% | self-reported llm-stats | link → |