Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!

DeepSeek V4 Pro is a 1.6-trillion parameter Mixture-of-Experts (MoE) model from DeepSeek, released on April 24, 2026 under the MIT license. It is designed for advanced reasoning, complex software engineering, and long-running agentic tasks, and arrives alongside DeepSeek-V4-Flash, a lighter 284B-parameter variant built for faster, lower-cost inference. The V4 series is DeepSeek’s first two-tier lineup and introduces a new architecture — the first from the lab since V3. Both models are hybrid thinking/non-thinking and support a 1 million token context window.
The V4 series is built on several technical advances over DeepSeek-V3.2:
The V4-Pro-Base model shows consistent improvements over V3.2 across standard academic benchmarks:
| Benchmark (Metric) | DeepSeek-V3.2-Base | DeepSeek-V4-Flash-Base | DeepSeek-V4-Pro-Base |
|---|---|---|---|
| MMLU (EM) | 87.8 | 88.7 | 90.1 |
| MMLU-Pro (EM) | 65.5 | 68.3 | 73.5 |
| GSM8K (8-shot) | 91.1 | 90.8 | 92.6 |
| HumanEval (Pass@1) | 62.8 | 69.5 | 76.8 |
In its maximum reasoning effort mode (V4-Pro-Max), the model competes directly with leading closed-source systems:
| Benchmark (Metric) | DS-V4-Pro Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | Opus-4.6 Max |
|---|---|---|---|---|
| LiveCodeBench (Pass@1) | 93.5 | — | 91.7 | 88.8 |
| GPQA Diamond (Pass@1) | 90.1 | 93.0 | 94.3 | 91.3 |
| SWE Verified (Resolved) | 80.6 | — | 80.6 | 80.8 |
A few additional results worth noting:
DeepSeek-V4-Pro is available for immediate integration via the DeepInfra platform under the model identifier deepseek-ai/DeepSeek-V4-Pro. Access the model at deepinfra.com/deepseek-ai/DeepSeek-V4-Pro.
Reasoning Modes
A key feature of DeepSeek V4 is configurable reasoning depth. Developers can select the level of thinking effort per request, trading latency for analytical depth:
| Reasoning Mode | Characteristics | Typical Use Cases |
|---|---|---|
| Non-think | Fast, intuitive, low-latency | Routine tasks, simple chat, low-risk decisions |
| Think High | Logical analysis, moderate latency | Complex problem-solving, planning, coding |
| Think Max | Maximum reasoning depth | Hard agentic tasks, boundary-pushing logic |
Response Format
The model’s output structure changes based on the selected mode, using <think> tags to encapsulate internal chain-of-thought reasoning:
JSON output is supported across all modes. The thinking and summary content are embedded within the standard JSON response body.
DeepSeek V4 Pro is available on DeepInfra with usage-based pricing calculated per million tokens:
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $1.74 |
| Output Tokens | $3.48 |
| Cached Input Tokens | $0.145 |
A note on cost in practice: Think Max mode is token-intensive. On the Artificial Analysis Intelligence Index, V4 Pro (Max) used approximately 190M output tokens — far above the median of 47M for comparable open-weights models — bringing the total benchmark run cost to $1,071. That is still more than 4x cheaper than running the same benchmark on Claude Opus 4.7 ($4,811). For general output token pricing, the gap is larger: at $3.48/1M output tokens versus $25/1M for Claude Opus 4.7, V4 Pro is approximately 7x cheaper on output. For applications where Think Max mode generates long responses, monitoring output token usage is important.
Introducing GPU Instances: On-Demand GPU Compute for AI WorkloadsLaunch dedicated GPU containers in minutes with our new GPU Instances feature, designed for machine learning training, inference, and compute-intensive workloads.
Kimi K2.6 is Now Available on DeepInfra<p>Kimi K2.6 can coordinate up to 300 sub-agents executing 4,000 steps in a single autonomous run — Moonshot AI’s answer to the gap between what frontier models can do in a chat window and what production agentic systems actually need. Built for long-horizon coding, deep research, and complex orchestration, the model is open source under […]</p>
Kimi K2.5 API Benchmarks: Latency, Throughput & Cost<p>About Kimi K2.5 Kimi K2.5 is Moonshot AI’s flagship open-source reasoning model, released in January 2026. It is a native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens. The model features a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion activated parameters. Kimi K2.5 […]</p>
© 2026 Deep Infra. All rights reserved.