Qwen3 VL 8B Thinking

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding.

DocVQAtest

95.3%

i
ScreenSpot

93.6%

i
MMLU-Redux

88.8%

i
MMBench-V1.1

87.5%

i
InfoVQAtest

86.0%

i
CharXiv-D

85.9%

i
WritingBench

85.5%

i
MMLU

85.2%

i
AI2D

84.9%

i
IFEval

83.2%

i
OCRBench

81.9%

i
MathVista-Mini

81.4%

i
AIME 2025

80.3%

i
MMLU-Pro

77.3%

i
MuirBench

76.8%

i
CC-OCR

76.3%

i
MMStar

75.3%

i
MLVU-M

75.1%

i
Multi-IF

75.1%

i
MMMU (val)

74.1%

i
RealWorldQA

73.5%

i
VideoMMMU

72.8%

i
Video-MME

71.8%

i
MMLU-ProX

70.7%

i
GPQA

69.9%

i
LiveBench 20241125

69.8%

i
Include

69.5%

i
MVBench

69.0%

i
BLINK

68.7%

i
Hallusion Bench

65.4%

i
OCRBench-V2 (en)

63.9%

i
BFCL-v3

63.0%

i
MathVision

62.7%

i
HMMT25

60.6%

i
MMMU-Pro

60.4%

i
CharadesSTA

59.9%

i
OCRBench-V2 (zh)

59.2%

i
LiveCodeBench v6

58.6%

i
LVBench

55.8%

i
CharXiv-R

53.0%

i
SuperGPQA

51.2%

i
Arena-Hard v2

51.1%

i
SimpleQA

49.6%

i
PolyMATH

47.5%

i
ERQA

46.8%

i
ScreenSpot Pro

46.6%

i
ODinW

39.8%

i
OSWorld

33.9%

i
MM-MT-Bench

8

i
Creative Writing v3

0.824

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Alibaba	available	$0.12/Mtok	$1.36/Mtok	131K tokens context 33K tokens max output	—	437 ms p50 TTFT 141 tok/s p50