DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Today we're excited to share that DeepInfra has raised $18 million in Series A funding, led by Felicis and our earliest believer and advisor Georges Harik.
When we founded DeepInfra in 2022, we saw a clear gap: while enormous resources were being poured into training AI models, the infrastructure needed to run these models in production was lagging behind.
The past two years have been a whirlwind. We've scaled our processing volume by over 8,000x since our seed stage. What started as a bet on AI infrastructure has quickly become a critical service for developers deploying increasingly sophisticated models.
Our growth accelerated following the emergence of "thinking models" like DeepSeek. These open source alternatives demonstrated that the innovation cycle in AI was becoming even more rapid than anticipated, requiring significantly more computation during inference.
The reality of deploying modern AI models is challenging for most organizations. Running these models requires significant compute resources, specialized hardware like GPUs that are difficult to acquire, and deep expertise in infrastructure optimization. Most companies simply can't afford the investment or overcome the supply chain challenges to build this infrastructure themselves.
This challenge has shaped our approach from day one. After years of scaling systems to hundreds of millions of users before founding this company, we've developed a set of core principles that guide how we build DeepInfra:
These principles have guided our approach as we've expanded our computing capacity, recently receiving a large shipment of NVIDIA Blackwell GPUs with more on order to support our rapid growth. You can see how this funding injection will be put to good use.
To our customers who have trusted us with their production workloads: thank you. We're just getting started as we continue building the infrastructure that powers the next generation of AI applications.
Follow us on X (formerly Twitter) and LinkedIn to stay updated on our journey. We look forward to sharing more exciting developments in the coming months.

Step 3.7 Flash is Live on DeepInfra: An Agentic, Multimodal Model Built for ProductionStepFun's Step 3.7 Flash is now live on DeepInfra. It's a 198B-parameter sparse MoE vision-language model with just ~11B active parameters per token, a 256K context window, and three selectable reasoning levels—purpose-built for high-throughput agentic workflows that combine perception, search, and reasoning.
NVIDIA Nemotron API Pricing Guide 2026<p>While everyone knows Llama 3 and Qwen, a quieter revolution has been happening in NVIDIA’s labs. They have been taking standard Llama models and “supercharging” them using advanced alignment techniques and pruning methods. The result is Nemotron—a family of models that frequently tops the “Helpfulness” leaderboards (like Arena Hard), often beating GPT-4o while being significantly […]</p>
Qwen3.5 122B A10B API Benchmarks: Latency, Throughput & Cost<p>About Qwen3.5 122B A10B Qwen3.5 122B A10B is Alibaba Cloud’s mid-tier multimodal foundation model, released in February 2026. It is a multimodal vision-language Mixture-of-Experts model supporting text, image, and video inputs, designed for native multimodal agent applications. It features 122 billion total parameters with 10 billion activated per token through a hybrid architecture that integrates […]</p>
© 2026 DeepInfra. All rights reserved.