Llama 3.2 90B Instruct

Llama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, offering state-of-the-art performance in image understanding and generative tasks.

AI2D

92.3%

i
DocVQA

90.1%

i
MGSM

86.9%

i
MMLU

86.0%

i
ChartQA

85.5%

i
VQAv2

78.1%

i
TextVQA

73.5%

i
MATH

68.0%

i
MMMU

60.3%

i
MathVista

57.3%

i
InfographicsQA

56.8%

i
GPQA

46.7%

i
MMMU-Pro

45.2%

i