Gemma 4 12B

Gemma 4 12B is Google DeepMind's encoder-free multimodal instruction-tuned model with 11.95 billion parameters and a 256K context window. It supports text, image, audio, and video inputs with text output, projecting image patches and audio waveforms directly into a single decoder-only transformer for streamlined local deployment.

FLEURS

93.1%

i
MMMLU

83.4%

i
MathVision

79.7%

i
GPQA

78.8%

i
AIME 2026

77.5%

i
MMLU-Pro

77.2%

i
LiveCodeBench v6

72.0%

i
MMMU-Pro

69.1%

i
CodeForces

55.3%

i
BIG-Bench Extra Hard

53.0%

i
MedXpertQA

48.7%

i
MRCR v2

43.4%

i
CoVoST2

38.5%

i
OmniDocBench 1.5

16.4%

i
Humanity's Last Exam

5.2%

i