We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Qwen3.5 4B via DeepInfra: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 4B via DeepInfra: Latency, Throughput & Cost

About Qwen3.5 4B (Reasoning) Qwen3.5 4B is a compact 4-billion parameter open-weights model released in March 2026 as part of Alibaba Cloud’s Qwen3.5 Small Model Series. It employs an Efficient Hybrid Architecture combining Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts, delivering high-throughput inference with minimal latency overhead — a significant architectural […]

Qwen3.5 9B API Benchmarks: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 9B API Benchmarks: Latency, Throughput & Cost

About Qwen3.5 9B Qwen3.5 9B is the flagship of Alibaba’s Qwen3.5 Small Model Series, released on March 2, 2026. It is a dense multimodal model combining Gated Delta Networks (a form of linear attention) with a sparse Mixture-of-Experts system, enabling higher throughput and lower latency during inference compared to traditional dense architectures. The architecture utilizes […]

Qwen3.5 27B API Benchmarks: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 27B API Benchmarks: Latency, Throughput & Cost

About Qwen3.5 27B (Reasoning) Qwen3.5 27B is part of Alibaba Cloud’s latest-generation foundation model family, released in February 2026. Unlike the Mixture-of-Experts variants in the Qwen3.5 series, the 27B model uses a dense architecture combining Gated Delta Networks and Feed Forward Networks. It achieves strong benchmark scores including MMLU-Pro (86.1%), GPQA Diamond (85.5%), and SWE-bench […]

Qwen3.5 35B A3B API Benchmarks: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 35B A3B API Benchmarks: Latency, Throughput & Cost

About Qwen3.5 35B A3B Qwen3.5 35B A3B is a native vision-language model released by Alibaba Cloud in February 2026. It uses a hybrid architecture that integrates Gated Delta Networks with a sparse Mixture-of-Experts model, achieving higher inference efficiency. With 35 billion total parameters and only 3 billion activated per token through 256 experts (8 routed […]

Qwen3.5 122B A10B API Benchmarks: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 122B A10B API Benchmarks: Latency, Throughput & Cost

About Qwen3.5 122B A10B Qwen3.5 122B A10B is Alibaba Cloud’s mid-tier multimodal foundation model, released in February 2026. It is a multimodal vision-language Mixture-of-Experts model supporting text, image, and video inputs, designed for native multimodal agent applications. It features 122 billion total parameters with 10 billion activated per token through a hybrid architecture that integrates […]

Qwen3.5 397B A17B API Benchmarks: Latency, Throughput & CostPublished on 2026.04.03 by DeepInfraQwen3.5 397B A17B API Benchmarks: Latency, Throughput & Cost

About Qwen3.5 397B A17B Qwen3.5 397B A17B is Alibaba Cloud’s largest and most capable multimodal foundation model, released in February 2026. It features a hybrid Mixture-of-Experts (MoE) architecture with 397 billion total parameters and 17 billion active parameters per inference pass, utilizing 512 experts with a routing mechanism selecting a subset per token. This sparse […]