Qwen3.5-9B
Qwen3.5-9B is a 9 billion parameter vision-language model using Gated DeltaNet hybrid architecture with a 3:1 ratio of linear attention to full softmax attention. It supports 262K native context length and delivers strong performance across knowledge, reasoning, coding, and multilingual tasks.
| Provider | Status | Input | Output | Limits | Uptime | Speed | Notes |
|---|---|---|---|---|---|---|---|
| DeepInfra | available | $0.10/Mtok | $0.15/Mtok | 262K tokens context | — | 581 ms p50 TTFT | bf16 |
| SiliconFlow | available | $0.10/Mtok | $0.15/Mtok | 262K tokens context | 99.9% | 1,096 ms p50 TTFT | fp8 |
| Venice | available | $0.10/Mtok | $0.15/Mtok | 256K tokens context | 98% | 889 ms p50 TTFT | fp8 |
| Together | available | $0.17/Mtok | $0.25/Mtok | 262K tokens context | 98% | 441 ms p50 TTFT |