DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
nvidia/
$0.05
in
$0.20
out
/ 1M tokens
| Tier | Input | Output |
|---|---|---|
Priority (1.5×)Learn More | $0.075 | $0.30 |
per 1M tokens
NVIDIA Nemotron 3 Nano is an open small reasoning model optimized for fast, cost-efficient inference in agentic and production workloads. Built with a hybrid Mixture-of-Experts (MoE) and Mamba-Transformer architecture, it delivers strong multi-step reasoning, high token throughput, stable latency with predictable cost, and efficient deployment for agent-based systems. Designed for real-world AI systems where reasoning can generate significantly more tokens per prompt, Nemotron Nano reduces compute cost while maintaining strong reasoning quality.

Ask me anything
You need to log in to use this model
Log InSettings
© 2026 DeepInfra. All rights reserved.