Gemma 3 4B

Gemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights.

IFEval

90.2%

i
GSM8k

89.2%

i
DocVQA

75.8%

i
MATH

75.6%

i
AI2D

74.8%

i
BIG-Bench Hard

72.2%

i
HumanEval

71.3%

i
Natural2Code

70.3%

i
FACTS Grounding

70.1%

i
ChartQA

68.8%

i
MBPP

63.2%

i
VQAv2 (val)

62.4%

i
TextVQA

57.8%

i
Global-MMLU-Lite

54.5%

i
InfoVQA

50.0%

i
MathVista-Mini

50.0%

i
MMMU (val)

48.8%

i
WMT24++

46.8%

i
MMLU-Pro

43.6%

i
HiddenMath

43.0%

i
Bird-SQL (dev)

36.3%

i
GPQA

30.8%

i
LiveCodeBench

12.6%

i
BIG-Bench Extra Hard

11.0%

i
ECLeKTic

4.6%

i
SimpleQA

4.0%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.05/Mtok	$0.10/Mtok	131K tokens context 16K tokens max output	100.0% 5m 100.0%	217 ms p50 TTFT 16 tok/s p50	bf16