Qwen3.5-9B

Qwen3.5-9B is a 9 billion parameter vision-language model using Gated DeltaNet hybrid architecture with a 3:1 ratio of linear attention to full softmax attention. It supports 262K native context length and delivers strong performance across knowledge, reasoning, coding, and multilingual tasks.

IFEval i

91.5%

source →
MMLU-Redux i

91.1%

source →
C-Eval i

88.2%

source →
MAXIFE i

83.4%

source →
Global PIQA i

83.2%

source →
HMMT 2025 i

83.2%

source →
HMMT25 i

82.9%

source →
MMLU-Pro i

82.5%

source →
GPQA i

81.7%

source →
MMMLU i

81.2%

source →
t2-bench i

79.1%

source →
MMLU-ProX i

76.3%

source →
Include i

75.6%

source →
WMT24++ i

72.6%

source →
BFCL-V4 i

66.1%

source →
LiveCodeBench v6 i

65.6%

source →
IFBench i

64.5%

source →
AA-LCR i

63.0%

source →
SuperGPQA i

58.2%

source →
PolyMATH i

57.3%

source →
NOVA-63 i

55.9%

source →
LongBench v2 i

55.2%

source →
Multi-Challenge i

54.5%

source →
VITA-Bench i

29.8%

source →
DeepPlanning i

18.0%

source →

Pricing, uptime, and speed via OpenRouter — updated Jun 12, 2026, 04:59 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.10/Mtok	$0.15/Mtok	262K tokens context 82K tokens max output	—	581 ms p50 TTFT 21 tok/s p50	bf16
SiliconFlow	available	$0.10/Mtok	$0.15/Mtok	262K tokens context 262K tokens max output	99.9% 5m 100.0%	1,096 ms p50 TTFT 33 tok/s p50	fp8
Venice	available	$0.10/Mtok	$0.15/Mtok	256K tokens context 33K tokens max output	98% 5m 100.0%	889 ms p50 TTFT 20 tok/s p50	fp8
Together	available	$0.17/Mtok	$0.25/Mtok	262K tokens context 262K tokens max output	98% 5m 98%	441 ms p50 TTFT 32 tok/s p50