Qwen3.6-35B-A3B
Qwen3.6-35B-A3B is the first open-weight variant of the Qwen3.6 series, a multimodal Mixture-of-Experts model with 35B total parameters and 3B activated. It pairs a vision encoder with a hybrid 40-layer language model that interleaves Gated DeltaNet linear-attention blocks and Gated Attention blocks (10 × (3 × DeltaNet + 1 × Attention)) over 256 experts (8 routed + 1 shared, expert dim 512). The release prioritizes stability and real-world utility, with substantial gains in agentic coding (frontend workflows, repo-level reasoning) and a new option to preserve reasoning context across turns. Native context length is 262K tokens, extensible to ~1M via YaRN, and the model thinks by default.
| Provider | Status | Input | Output | Limits | Uptime | Speed | Notes |
|---|---|---|---|---|---|---|---|
| Ambient | available | $0.15/Mtok | $1.00/Mtok | 262K tokens context | 99% | 536 ms p50 TTFT | |
| Parasail | available | $0.15/Mtok | $1.00/Mtok | 262K tokens context | 99.5% | 699 ms p50 TTFT | fp8 |
| AtlasCloud | available | $0.16/Mtok | $0.97/Mtok | 262K tokens context | 99% | 903 ms p50 TTFT | fp8 |
| AkashML | available | $0.17/Mtok | $1.20/Mtok | 262K tokens context | 99.6% | 685 ms p50 TTFT | fp8 |
| WandB | available | $0.25/Mtok | $1.25/Mtok | 262K tokens context | 100.0% | 243 ms p50 TTFT | fp8 |
| SiliconFlow | -2 | $0.20/Mtok | $1.60/Mtok | 262K tokens context | 93% | 1,339 ms p50 TTFT | fp8 |
| Io Net | -2 | $0.27/Mtok | $1.50/Mtok | 262K tokens context | 82% | 557 ms p50 TTFT | fp8 |