We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

GLM-5.1 - state-of-the-art agentic engineering, now available on DeepInfra!

deepseek-ai logo

deepseek-ai/

DeepSeek-V4-Flash

$0.14

in

$0.28

out

$0.028

cached

/ 1M tokens

DeepSeek V4 Flash is an efficiency-focused MoE model with 284B total parameters (13B active) and a 1M-token context window. It's tuned for fast inference and high-throughput use cases while still holding up on reasoning and coding tasks.

Deploy Private Endpoint
Public
fp4
1,048,576
JSON
Function
ProjectPaper
deepseek-ai/DeepSeek-V4-Flash cover image
deepseek-ai/DeepSeek-V4-Flash cover image
DeepSeek-V4-Flash

Ask me anything

0.00s

Settings

Model Information

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4

Technical Report👁️

Introduction

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.

DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:

  1. Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
  2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
  3. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.

We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.

Model Downloads

Model#Total Params#Activated ParamsContext LengthPrecisionDownload
DeepSeek-V4-Flash-Base284B13B1MFP8 MixedHuggingFace | ModelScope
DeepSeek-V4-Flash284B13B1MFP4 + FP8 Mixed*HuggingFace | ModelScope
DeepSeek-V4-Pro-Base1.6T49B1MFP8 MixedHuggingFace | ModelScope
DeepSeek-V4-Pro1.6T49B1MFP4 + FP8 Mixed*HuggingFace | ModelScope

*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

Evaluation Results

Base Model

Benchmark (Metric)# ShotsDeepSeek-V3.2-BaseDeepSeek-V4-Flash-BaseDeepSeek-V4-Pro-Base
Architecture-MoEMoEMoE
# Activated Params-37B13B49B
# Total Params-671B284B1.6T
World Knowledge
AGIEval (EM)0-shot80.182.683.1
MMLU (EM)5-shot87.888.790.1
MMLU-Redux (EM)5-shot87.589.490.8
MMLU-Pro (EM)5-shot65.568.373.5
MMMLU (EM)5-shot87.988.890.3
C-Eval (EM)5-shot90.492.193.1
CMMLU (EM)5-shot88.990.490.8
MultiLoKo (EM)5-shot38.742.251.1
Simple-QA verified (EM)25-shot28.330.155.2
SuperGPQA (EM)5-shot45.046.553.9
FACTS Parametric (EM)25-shot27.133.962.6
TriviaQA (EM)5-shot83.382.885.6
Language & Reasoning
BBH (EM)3-shot87.686.987.5
DROP (F1)1-shot88.288.688.7
HellaSwag (EM)0-shot86.485.788.0
WinoGrande (EM)0-shot78.979.581.5
CLUEWSC (EM)5-shot83.582.285.2
Code & Math
BigCodeBench (Pass@1)3-shot63.956.859.2
HumanEval (Pass@1)0-shot62.869.576.8
GSM8K (EM)8-shot91.190.892.6
MATH (EM)4-shot60.557.464.5
MGSM (EM)8-shot81.385.784.4
CMath (EM)3-shot92.693.690.9
Long Context
LongBench-V2 (EM)1-shot40.244.751.5

Instruct Model

DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:

Reasoning ModeCharacteristicsTypical Use CasesResponse Format
Non-thinkFast, intuitive responsesRoutine daily tasks, low-risk decisions**\</think>** summary
Think HighConscious logical analysis, slower but more accurateComplex problem-solving, planning**\<think>** thinking </think> summary
Think MaxPush reasoning to its fullest extentExploring the boundary of model reasoning capabilitySpecial system prompt + <think> thinking </think> summary

DeepSeek-V4-Pro-Max vs Frontier Models

Benchmark (Metric)Opus-4.6 MaxGPT-5.4 xHighGemini-3.1-Pro HighK2.6 ThinkingGLM-5.1 ThinkingDS-V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)89.187.591.087.186.087.5
SimpleQA-Verified (Pass@1)46.245.375.636.938.157.9
Chinese-SimpleQA (Pass@1)76.476.885.975.975.084.4
GPQA Diamond (Pass@1)91.393.094.390.586.290.1
HLE (Pass@1)40.039.844.436.434.737.7
LiveCodeBench (Pass@1)88.8-91.789.6-93.5
Codeforces (Rating)-31683052--3206
HMMT 2026 Feb (Pass@1)96.297.794.792.789.495.2
IMOAnswerBench (Pass@1)75.391.481.086.083.889.8
Apex (Pass@1)34.554.160.924.011.538.3
Apex Shortlist (Pass@1)85.978.189.175.572.490.2
Long Context
MRCR 1M (MMR)92.9-76.3--83.5
CorpusQA 1M (ACC)71.7-53.8--62.0
Agentic
Terminal Bench 2.0 (Acc)65.475.168.566.763.567.9
SWE Verified (Resolved)80.8-80.680.2-80.6
SWE Pro (Resolved)57.357.754.258.658.455.4
SWE Multilingual (Resolved)77.5--76.773.376.2
BrowseComp (Pass@1)83.782.785.983.279.383.4
HLE w/ tools (Pass@1)53.152.051.654.050.448.2
GDPval-AA (Elo)161916741314148215351554
MCPAtlas Public (Pass@1)73.867.269.266.671.873.6
Toolathlon (Pass@1)47.254.648.850.040.751.8

Comparison across Modes

Benchmark (Metric)V4-Flash Non-ThinkV4-Flash HighV4-Flash MaxV4-Pro Non-ThinkV4-Pro HighV4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)83.086.486.282.987.187.5
SimpleQA-Verified (Pass@1)23.128.934.145.046.257.9
Chinese-SimpleQA (Pass@1)71.573.278.975.877.784.4
GPQA Diamond (Pass@1)71.287.488.172.989.190.1
HLE (Pass@1)8.129.434.87.734.537.7
LiveCodeBench (Pass@1)55.288.491.656.889.893.5
Codeforces (Rating)-28163052-29193206
HMMT 2026 Feb (Pass@1)40.891.994.831.794.095.2
IMOAnswerBench (Pass@1)41.985.188.435.388.089.8
Apex (Pass@1)1.019.133.00.427.438.3
Apex Shortlist (Pass@1)9.372.185.79.285.590.2
Long Context
MRCR 1M (MMR)37.576.978.744.783.383.5
CorpusQA 1M (ACC)15.559.360.535.656.562.0
Agentic
Terminal Bench 2.0 (Acc)49.156.656.959.163.367.9
SWE Verified (Resolved)73.778.679.073.679.480.6
SWE Pro (Resolved)49.152.352.652.154.455.4
SWE Multilingual (Resolved)69.770.273.369.874.176.2
BrowseComp (Pass@1)-53.573.2-80.483.4
HLE w/ tools (Pass@1)-40.345.1-44.748.2
MCPAtlas (Pass@1)64.067.469.069.474.273.6
GDPval-AA (Elo)--1395--1554
Toolathlon (Pass@1)40.743.547.846.349.051.8