Qwen3 VL 30B A3B Thinking

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding.

DocVQAtest

95.0%

i
ScreenSpot

94.7%

i
MMLU-Redux

90.9%

i
MMBench-V1.1

88.9%

i
MMLU

87.6%

i
AI2D

86.9%

i
CharXiv-D

86.9%

i
InfoVQAtest

86.0%

i
WritingBench

85.2%

i
OCRBench

83.9%

i
AIME 2025

83.1%

i
MathVista-Mini

81.9%

i
IFEval

81.7%

i
MMLU-Pro

80.5%

i
MLVU-M

78.9%

i
CC-OCR

77.8%

i
MuirBench

77.6%

i
RealWorldQA

77.4%

i
MMLU-ProX

76.1%

i
MMMU (val)

76.0%

i
MMStar

75.5%

i
VideoMMMU

75.0%

i
Include

74.5%

i
GPQA

74.4%

i
Video-MME

73.3%

i
Multi-IF

73.0%

i
LiveBench 20241125

72.1%

i
MVBench

72.0%

i
BFCL-v3

68.6%

i
HMMT25

67.6%

i
Hallusion Bench

66.0%

i
MathVision

65.7%

i
BLINK

65.4%

i
LiveCodeBench v6

64.2%

i
MMMU-Pro

63.0%

i
CharadesSTA

62.7%

i
OCRBench-V2 (en)

62.6%

i
OCRBench-V2 (zh)

60.4%

i
LVBench

59.2%

i
ScreenSpot Pro

57.3%

i
Arena-Hard v2

56.7%

i
CharXiv-R

56.6%

i
SuperGPQA

56.4%

i
PolyMATH

51.7%

i
ERQA

45.3%

i
ODinW

42.3%

i
OSWorld

30.6%

i
SimpleQA

23.9%

i
MM-MT-Bench

7.9

i
Creative Writing v3

0.825

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Alibaba	available	$0.13/Mtok	$1.56/Mtok	131K tokens context 33K tokens max output	100.0%	413 ms p50 TTFT 120 tok/s p50
SiliconFlow	available	$0.29/Mtok	$1.00/Mtok	262K tokens context 262K tokens max output	—	—	fp8