We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

nvidia logo

nvidia/

Nemotron-3-Nano-30B-A3B

$0.05

in

$0.20

out

/ 1M tokens

TierInputOutput
Priority (1.5×)Learn More
$0.075$0.30

per 1M tokens

NVIDIA Nemotron 3 Nano is an open small reasoning model optimized for fast, cost-efficient inference in agentic and production workloads. Built with a hybrid Mixture-of-Experts (MoE) and Mamba-Transformer architecture, it delivers strong multi-step reasoning, high token throughput, stable latency with predictable cost, and efficient deployment for agent-based systems. Designed for real-world AI systems where reasoning can generate significantly more tokens per prompt, Nemotron Nano reduces compute cost while maintaining strong reasoning quality.

Deploy Private Endpoint
Supports Priority Tier
Public
fp4
262,144
Function
ProjectNemotron
nvidia/Nemotron-3-Nano-30B-A3B cover image
demoapi

s9hzRI4c

2025-12-15T03:01:09+00:00