Published: April 24, 2026
DeepSeek has officially released DeepSeek-V4, its latest flagship large language model, in two variants:
- DeepSeek-V4-Pro (1.6 trillion total parameters, 49 billion active)
- DeepSeek-V4-Flash (284 billion total parameters, 13 billion active).
Both models use a Mixture of Experts (MoE) architecture, support a 1 million token context window, and are open-sourced under the Apache 2.0 license. The release positions DeepSeek-V4 as a direct competitor to frontier closed-source models from OpenAI, Anthropic, and Google, while offering API pricing that undercuts those rivals by 50 to 80 percent.
What can DeepSeek-V4 do?
DeepSeek-V4 introduces a hybrid attention mechanism that combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), enabling the model to process up to 1 million tokens of context in a single pass. This makes it suitable for tasks that require analyzing entire codebases, lengthy legal documents, or extended conversation histories. The model supports dual reasoning modes: a Thinking mode with three effort levels (high, max, and non-think) and a standard Non-Thinking mode for faster responses.
Both variants support JSON output, tool calls, and chat prefix completion (currently in beta). The V4-Pro model also supports FIM (Fill-in-the-Middle) completion in non-thinking mode, making it particularly effective for code completion and editing tasks. DeepSeek-V4 is a native multimodal model, capable of generating text, images, and video, though the image and video capabilities are expected to roll out in phases following the initial text-focused launch.
DeepSeek-V4 benchmarks and technical specs
DeepSeek-V4-Pro delivers benchmark results that place it among the top-performing models available today.
- On MMLU-Pro, the model scores 87.5, matching GPT-5.4.
- On LiveCodeBench it reaches 93.5, exceeding both Gemini 3.1 Pro (91.7) and Claude Opus 4.6 (88.8).
- Its Codeforces rating of 3,206 edges out GPT-5.4 (3,168).
- On Apex Shortlist it scores 90.2, ahead of Claude Opus 4.6 (85.9) and GPT-5.4 (78.1).
- On agentic tasks, DeepSeek-V4-Pro scores 80.6 on SWE-Verified (on par with Claude Opus 4.6 at 80.8 and Gemini 3.1 Pro at 80.6).
- On Terminal Bench 2.0 it scores 67.9 (above Claude Opus 4.6 at 65.4).
- On Humanity’s Last Exam (HLE), it trails the leading models with a score of 37.7, compared to Gemini 3.1 Pro at 44.4 and Claude Opus 4.6 at 40.0.
- The V4-Flash variant typically scores 1 to 3 percentage points behind V4-Pro, with larger gaps on factual recall and complex tool use benchmarks.
DeepSeek-V4 pricing and availability
DeepSeek-V4-Flash is priced at $0.14 per million input tokens (cache miss) and $0.28 per million output tokens, with cached input tokens dropping to $0.028 per million. DeepSeek-V4-Pro costs $1.74 per million input tokens (cache miss) and $3.48 per million output tokens, with cached input at $0.145 per million. For comparison, GPT-5.4 costs $2.50 per million input tokens and $15 per million output tokens, while Claude Opus 4.6 costs $5 per million input and $25 per million output. Both variants support a maximum output length of 384,000 tokens.
The models are available immediately through the DeepSeek API. The open-source weights are hosted on Hugging Face under the Apache 2.0 license. DeepSeek has announced that the legacy deepseek-chat and deepseek-reasoner API endpoints will be deprecated on July 24, 2026, and will route to deepseek-v4-flash in the interim.
How does DeepSeek-V4 compare to GPT-5.4 and Claude Opus 4.6?
DeepSeek-V4-Pro matches or exceeds GPT-5.4 and Claude Opus 4.6 on most coding and reasoning benchmarks, while costing a fraction of the price. On code generation tasks such as LiveCodeBench and Codeforces, V4-Pro leads both rivals. On knowledge-intensive benchmarks like SimpleQA-Verified and HLE, it trails behind, suggesting that DeepSeek-V4’s strengths lie more in structured reasoning and code than in factual recall. The V4-Flash variant, with its much smaller active parameter count (13 billion versus 49 billion for Pro), offers a compelling option for high-volume applications where cost matters more than peak performance.
The open-source nature of both models is a significant differentiator. Unlike GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, DeepSeek-V4 can be self-hosted, fine-tuned, and inspected. For organizations with data sovereignty requirements or those looking to build customized AI solutions, this remains a meaningful advantage over the closed-source alternatives.
DeepSeek-V4 architecture: what changed from V3?
DeepSeek-V4-Pro was trained on 33 trillion tokens, up from V3’s training data volume, while V4-Flash was trained on 32 trillion tokens. The core architectural innovation is the hybrid CSA/HCA attention mechanism, which allows efficient processing of very long sequences without the quadratic scaling that limits standard transformer attention. The MoE approach means that only a subset of parameters activates per token (49 billion out of 1.6 trillion for Pro, and 13 billion out of 284 billion for Flash), keeping inference costs manageable despite the large total parameter count.
SWE-bench Verified performance jumped from 69% with DeepSeek-V3 to 80.6% with V4-Pro, a 12-percentage-point improvement that reflects both the architectural upgrades and the expanded training data. The 1 million token context window, enabled by DeepSeek Sparse Attention (DSA) and token-wise compression, is four times the context length of V3.
DeepSeek-V4 is available now via the DeepSeek API and on Hugging Face. The full technical report and model weights can be found on DeepSeek’s official website at deepseek.com.