FLUX.2 is live! High-fidelity image generation made simple.
The Nemotron family is a group of large language models developed by NVIDIA, specifically engineered to excel at generating high-quality synthetic data for training other, more powerful AI models. Unlike models focused solely on end-user chat or content creation, Nemotron's core strength lies in producing diverse and realistic text-based training examples—including question-answer pairs, instructions, and conversations—that are crucial for the "supervised fine-tuning" stage of AI development. By providing a robust toolkit for creating these datasets, Nemotron acts as a powerful "force multiplier" in the AI training pipeline, enabling developers to build more capable and refined specialized models efficiently and at scale, without relying solely on scarce, human-curated data.
NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.
Price per 1M input tokens
$0.06
Price per 1M output tokens
$0.24
Release Date
12/15/2025
Context Size
262,144
Quantization
bfloat16
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/Nemotron-3-Nano-30B-A3B",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
Price per 1M input tokens
$0.20
Price per 1M output tokens
$0.60
Release Date
10/28/2025
Context Size
131,072
Quantization
fp8
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
NVIDIA Nemotron is a family of open models customized for efficiency, accuracy, and specialized workloads.
| Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
|---|---|---|---|---|
| Nemotron-3-Nano-30B-A3B | 256k | $0.06 | $0.24 | |
| NVIDIA-Nemotron-Nano-12B-v2-VL | 128k | $0.20 | $0.60 | |
| Llama-3.1-Nemotron-70B-Instruct | 128k | $1.20 | $1.20 | |
| Llama-3.3-Nemotron-Super-49B-v1.5 | 128k | $0.10 | $0.40 | |
| NVIDIA-Nemotron-Nano-9B-v2 | 128k | $0.04 | $0.16 |
© 2025 Deep Infra. All rights reserved.