Llama 3.3 70B Instruct

Llama 3.3 is a multilingual large language model optimized for dialogue use cases across multiple languages. It is a pretrained and instruction-tuned generative model with 70 billion parameters, outperforming many open-source and closed chat models on common industry benchmarks. Llama 3.3 supports a context length of 128,000 tokens and is designed for commercial and research use in multiple languages.

Benchmark results

Benchmark Score Tags Source
BFCL v2 77.3% self-reported llm-stats link →
GPQA 50.5% self-reported llm-stats link →
HumanEval 88.4% self-reported llm-stats link →
IFEval 92.1% self-reported llm-stats link →
MATH 77.0% self-reported llm-stats link →
MBPP EvalPlus 87.6% self-reported llm-stats link →
MGSM 91.1% self-reported llm-stats link →
MMLU 86.0% self-reported llm-stats link →
MMLU-Pro 68.9% self-reported llm-stats link →