We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

DeepSeek Model Family

DeepSeek develops advanced foundation models optimized for computational efficiency and strong generalization across diverse tasks. The architecture incorporates recent advances in transformer-based systems, delivering robust performance in both zero-shot and fine-tuned scenarios. Models are pretrained on rigorously filtered multilingual corpora with specialized optimizations for mathematical reasoning and algorithmic tasks. The inference stack achieves competitive throughput while maintaining low latency, making it suitable for production deployment. Researchers and engineers can leverage these models for tasks ranging from natural language processing to complex analytical problem-solving.

Get an API Key

Check Token Pricing

Featured Model: deepseek-ai/DeepSeek-V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Price per 1M input tokens

$0.26

Price per 1M cached input tokens

$0.13

Price per 1M output tokens

$0.38

Release Date

12/2/2025

Context Size

163,840

Quantization

fp4

License Type

License

# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Featured Model: deepseek-ai/DeepSeek-V3.1-Terminus

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

Price per 1M input tokens

$0.27

Price per 1M cached input tokens

$0.13

Price per 1M output tokens

$0.95

Release Date

09/22/2025

Context Size

163,840

Quantization

fp4

# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.1-Terminus",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available DeepSeek Models

DeepSeek's models are a suite of advanced AI systems that prioritize efficiency, scalability, and real-world applicability.

Model	Context	$ per 1M input tokens	$ per 1M output tokens	Actions
DeepSeek-V4-Pro	1024k	$1.30 / $0.10 cached	$2.60	View more
DeepSeek-V4-Flash	1024k	$0.09 / $0.018 cached	$0.18	View more
DeepSeek-V3.2	160k	$0.26 / $0.13 cached	$0.38	View more
DeepSeek-V3.1-Terminus	160k	$0.27 / $0.13 cached	$0.95	View more
DeepSeek-V3.1	160k	$0.25 / $0.13 cached	$0.95	View more
DeepSeek-V3-0324	160k	$0.24 / $0.135 cached	$0.90	View more
DeepSeek-V3	160k	$0.32	$0.89	View more
DeepSeek-R1-0528	160k	$0.50 / $0.35 cached	$2.15	View more

FAQ

What is DeepSeek?

DeepSeek is a family of high-performance, open-source language models developed by DeepSeek AI. These models, including DeepSeek-R1 and DeepSeek-V3, are optimized for reasoning, coding, and multi-modal tasks. DeepInfra hosts these models with scalable, low-latency inference infrastructure and OpenAI-compatible APIs—so you can use them immediately without managing your own GPUs.

How do DeepSeek models compare to OpenAI or Claude models?

DeepSeek-R1 achieves performance comparable to OpenAI’s GPT-4 and Claude 3 on math, reasoning, and coding tasks. DeepSeek-V3, a 671B-parameter MoE model, rivals top-tier closed-source LLMs while remaining fully open-source. DeepInfra provides low-latency access and predictable pricing that’s often more affordable.

Are the DeepSeek models open source?

Yes. All DeepSeek models are MIT-licensed, with open weights and training details publicly released. This ensures transparency, customizability, and legal flexibility for commercial use.

How do I integrate DeepSeek models into my application?

You can integrate DeepSeek models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.