We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Nemotron Model Family

The Nemotron family is a group of large language models developed by NVIDIA, specifically engineered to excel at generating high-quality synthetic data for training other, more powerful AI models. Unlike models focused solely on end-user chat or content creation, Nemotron's core strength lies in producing diverse and realistic text-based training examples—including question-answer pairs, instructions, and conversations—that are crucial for the "supervised fine-tuning" stage of AI development. By providing a robust toolkit for creating these datasets, Nemotron acts as a powerful "force multiplier" in the AI training pipeline, enabling developers to build more capable and refined specialized models efficiently and at scale, without relying solely on scarce, human-curated data.

Featured Model: nvidia/Llama-3.3-Nemotron-Super-49B-v1.5

Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements.

Price per 1M input tokens

$0.10


Price per 1M output tokens

$0.40


Release Date

09/9/2025


Context Size

131,072


Quantization

fp8


# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="nvidia/Llama-3.3-Nemotron-Super-49B-v1.5",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Featured Model: nvidia/NVIDIA-Nemotron-Nano-9B-v2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

Price per 1M input tokens

$0.04


Price per 1M output tokens

$0.16


Release Date

09/9/2025


Context Size

131,072


Quantization

bfloat16


# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="nvidia/NVIDIA-Nemotron-Nano-9B-v2",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Nemotron Models

NVIDIA Nemotron is a family of open models customized for efficiency, accuracy, and specialized workloads.

ModelContext$ per 1M input tokens$ per 1M output tokens
Actions
Llama-3.1-Nemotron-70B-Instruct128k$0.60$0.60
Llama-3.3-Nemotron-Super-49B-v1.5128k$0.10$0.40
NVIDIA-Nemotron-Nano-9B-v2128k$0.04$0.16

FAQ

How do I integrate Nemotron models into my application?

You can integrate Nemotron models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.

What are the pricing details for using Nemotron models on DeepInfra?

Pricing is usage-based:
  • Input Tokens: between $0.04 and $0.60 per million
  • Output Tokens: between $0.16 and $0.60 per million
Prices vary slightly by model. There are no upfront fees, and you only pay for what you use.

How do I get started using Nemotron on DeepInfra?

Sign in with GitHub at deepinfra.com
  • Get your API key
  • Test models directly from the browser, cURL, or SDKs
  • Review pricing on your usage dashboard
Within minutes, you can deploy apps using Nemotron models—without any infrastructure setup.