DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

A family of open models, datasets, and training recipes engineered for high performance, efficiency, and customization. Nemotron models support synthetic data workflows and supervised fine-tuning—and are optimized for real-time inference, reasoning agents, and production AI systems.
Nemotron 3 Nano Omni is an open multimodal model built on a hybrid Mixture-of-Experts (MoE) architecture, engineered for high efficiency and strong accuracy across image, video, audio, and text inputs. It powers always-on sub-agents for computer use, document intelligence, and audio-video understanding—replacing fragmented vision, speech, and language pipelines with a single unified inference pass.
Price per 1M input tokens
$0.20
Price per 1M output tokens
$0.80
Release Date
04/28/2026
Context Size
262,144
Quantization
bfloat16
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
NVIDIA Nemotron 3 Super is a hybrid Mixture-of-Experts (MoE) model engineered for highest compute efficiency and accuracy in multi-agent applications and specialized agentic systems. It is optimized to run many collaborating agents per application on a single GPU, delivering high accuracy for reasoning, tool use, and instruction following.
Price per 1M input tokens
$0.10
Price per 1M output tokens
$0.50
Release Date
03/10/2026
Context Size
262,144
Quantization
bfloat16
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/NVIDIA-Nemotron-3-Super-120B-A12B",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
NVIDIA Nemotron 3 Nano is an open small reasoning model optimized for fast, cost-efficient inference in agentic and production workloads. Built with a hybrid Mixture-of-Experts (MoE) and Mamba-Transformer architecture, it delivers strong multi-step reasoning, high token throughput, stable latency with predictable cost, and efficient deployment for agent-based systems. Designed for real-world AI systems where reasoning can generate significantly more tokens per prompt, Nemotron Nano reduces compute cost while maintaining strong reasoning quality.
Price per 1M input tokens
$0.05
Price per 1M output tokens
$0.20
Release Date
12/15/2025
Context Size
262,144
Quantization
fp4
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/Nemotron-3-Nano-30B-A3B",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
The Nemotron family spans Omni, Nano, Super, and specialized instruct variants, enabling you to balance accuracy, reasoning depth, latency, and cost for your specific workload.
| Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
|---|---|---|---|---|
| Nemotron-3-Nano-Omni-30B-A3B-Reasoning | 256k | $0.20 | $0.80 | |
| NVIDIA-Nemotron-3-Super-120B-A12B | 256k | $0.10 | $0.50 | |
| Nemotron-3-Nano-30B-A3B | 256k | $0.05 | $0.20 | |
| Llama-3.3-Nemotron-Super-49B-v1.5 | 128k | $0.10 | $0.40 | |
| NVIDIA-Nemotron-Nano-9B-v2 | 128k | $0.04 | $0.16 |
Latest Models
© 2026 DeepInfra. All rights reserved.