DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
Published on 2026.04.03 by DeepInfraQwen3.5 4B via DeepInfra: Latency, Throughput & CostAbout Qwen3.5 4B (Reasoning) Qwen3.5 4B is a compact 4-billion parameter open-weights model released in March 2026 as part of Alibaba Cloud’s Qwen3.5 Small Model Series. It employs an Efficient Hybrid Architecture combining Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts, delivering high-throughput inference with minimal latency overhead — a significant architectural […]
Published on 2026.04.03 by DeepInfraQwen3.5 9B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 9B Qwen3.5 9B is the flagship of Alibaba’s Qwen3.5 Small Model Series, released on March 2, 2026. It is a dense multimodal model combining Gated Delta Networks (a form of linear attention) with a sparse Mixture-of-Experts system, enabling higher throughput and lower latency during inference compared to traditional dense architectures. The architecture utilizes […]
Published on 2026.04.03 by DeepInfraQwen3.5 27B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 27B (Reasoning) Qwen3.5 27B is part of Alibaba Cloud’s latest-generation foundation model family, released in February 2026. Unlike the Mixture-of-Experts variants in the Qwen3.5 series, the 27B model uses a dense architecture combining Gated Delta Networks and Feed Forward Networks. It achieves strong benchmark scores including MMLU-Pro (86.1%), GPQA Diamond (85.5%), and SWE-bench […]
Published on 2026.04.03 by DeepInfraQwen3.5 35B A3B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 35B A3B Qwen3.5 35B A3B is a native vision-language model released by Alibaba Cloud in February 2026. It uses a hybrid architecture that integrates Gated Delta Networks with a sparse Mixture-of-Experts model, achieving higher inference efficiency. With 35 billion total parameters and only 3 billion activated per token through 256 experts (8 routed […]
Published on 2026.04.03 by DeepInfraQwen3.5 122B A10B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 122B A10B Qwen3.5 122B A10B is Alibaba Cloud’s mid-tier multimodal foundation model, released in February 2026. It is a multimodal vision-language Mixture-of-Experts model supporting text, image, and video inputs, designed for native multimodal agent applications. It features 122 billion total parameters with 10 billion activated per token through a hybrid architecture that integrates […]
Published on 2026.04.03 by DeepInfraQwen3.5 397B A17B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 397B A17B Qwen3.5 397B A17B is Alibaba Cloud’s largest and most capable multimodal foundation model, released in February 2026. It features a hybrid Mixture-of-Experts (MoE) architecture with 397 billion total parameters and 17 billion active parameters per inference pass, utilizing 512 experts with a routing mechanism selecting a subset per token. This sparse […]
© 2026 DeepInfra. All rights reserved.