Gemma 3 12B

Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights.

GSM8k

94.4%

i
IFEval

88.9%

i
DocVQA

87.1%

i
BIG-Bench Hard

85.7%

i
HumanEval

85.4%

i
AI2D

84.2%

i
MATH

83.8%

i
Natural2Code

80.7%

i
FACTS Grounding

75.8%

i
ChartQA

75.7%

i
MBPP

73.0%

i
VQAv2 (val)

71.6%

i
Global-MMLU-Lite

69.5%

i
TextVQA

67.7%

i
InfoVQA

64.9%

i
MathVista-Mini

62.9%

i
MMLU-Pro

60.6%

i
MMMU (val)

59.6%

i
HiddenMath

54.5%

i
WMT24++

51.6%

i
Bird-SQL (dev)

47.9%

i
GPQA

40.9%

i
LiveCodeBench

24.6%

i
BIG-Bench Extra Hard

16.3%

i
ECLeKTic

10.3%

i
SimpleQA

6.3%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.05/Mtok	$0.15/Mtok	131K tokens context 16K tokens max output	100.0% 5m 100.0%	414 ms p50 TTFT 34 tok/s p50	bf16