DeepSeek: AI News Week Ending 05/01/2026

DeepSeek: AI News Week Ending 05/01/2026

May 1, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: High-end product photograph of a tall blizzard sundae with deep ocean-blue swirled soft-serve sculpted into a whale-tail curl on top, studded with white chocolate pearls and sapphire rock candy, a chocolate anchor cookie tucked into the side, served in a crisp white paper cup wrapped with a bold red ‘DeepSeek’ band and a tiny ’75 — Milford, DE’ along the rim, soft directional studio light, shallow depth of field, glossy macro detail, landscape composition.

DeepSeek-V4 Pricing gives you glimpses into the future Imagine in one year using a Mythos level model that can basically code everything for $4/million tokens
https://x.com/scaling01/status/2047707820552831028

You can now run DeepSeek4-Flash on 256GB Mac. Next up speed 🚀 PR:
https://x.com/Prince_Canuma/status/2047685898163147125

.@deepseek_ai v4 Pro’s checkpoint is both in FP4 and FP8, depending on the layer. This means that the entire model can fit on a single NVIDIA 8xB200 node without trouble. @vllm_project: “”Checkpoint is FP4+FP8 mixed: MoE expert weights are stored in FP4 while the remaining
https://x.com/LambdaAPI/status/2047654086263320965

Thoughts after reading the DeepSeek V4 paper: – NVIDIA really is something else. Remember how back in 2024 people were bashing Blackwell as overspec’d and dismissing FP4 as just marketing? Turns out it was all groundwork for the next generation of models. Maybe NVIDIA’s moat is
https://x.com/jukan05/status/2047861732702662741

✨ DeepSeek-V4 is here — a million-token context, 1.6T parameter powerhouse optimized for agentic workflows. Out of the box, on DeepSeek-V4-Pro, NVIDIA Blackwell Ultra delivers over 150 TPS/user interactivity for agentic workflows. And we’re just getting started. Expect these
https://x.com/NVIDIAAI/status/2047765637808664759

Scores I would like to see from DeepSeek-V4 to confirm it being less than 6 months behind frontier models ARC-AGI-1: ~75% ARC-AGI-2: ~35% GSO: ~26% METR: 4.5-5 hours WeirdML: ~63% basically Opus 4.5 / GPT-5.2 scores
https://x.com/scaling01/status/2047686712051048598

A few more notes on DeepSeek-V4: – it seems to be a ~GPT-5.2/Opus 4.5+ tier model, so they are still ~4-5 months behind the frontier, but ahead of other chinese labs, with Kimi K2.6 being closest – at 1.6T params they now have a model that’s in the same weight class as GPT-5.4
https://x.com/scaling01/status/2047618271310926151

DeepSeek-V4 is definitely better than GLM-5.1 but not quite Opus 4.7, GPT-5.4 or Gemini 3.1 Pro level unfortunately this video had no comparison to Kimi-K2.6
https://x.com/scaling01/status/2047733998714052819

1.6T MoE chad vs 128B dense normie insane price-performance mog
https://x.com/scaling01/status/2049546078664397105

anon do you realize that V4-Pro is straight up the strongest pretrained model we have? Like… 1.6T@49AB (≈280B dense), 33T – even by meme formula it’s > LLaMA 3. Add Muon, mHC, most steps 64K context + extended to 1M… No excuses now. Every “”unicorn”” can have its brand AGI.
https://x.com/teortaxesTex/status/2047630981364883816

DeepSeek-V4 dropped. 1M context. 10x smaller KV cache. First open model where the context window and the agentic post-training meet.
https://x.com/ben_burtenshaw/status/2047646980139016560

Not much detail about the pretraining data unfortunately beyond the standard math, code, webpages etc. Also they use 32T tokens with a total parameter size of 1.6T. That works out to 20 tokens per parameter. Wait a minute….
https://x.com/nrehiew_/status/2047666048334450754

@NousResearch absolutely crushing the 0-day support! Deepseek-v4-pro is live in the Nous Portal 😍 If you want a real personal agent/assistant/quant/researcher/artist/coworker, Hermes Agent continues to deliver!
https://x.com/mr_r0b0t/status/2047673600900010044

🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean’s Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain
https://x.com/vllm_project/status/2049503979898274163

📊 Day 0 performance is here: DeepSeek-V4-Pro running on NVIDIA Blackwell Ultra. Using @vllm_project’s Day 0 recipe, we’ve captured the initial performance Pareto for DeepSeek’s flagship 1M long-context model. This curve highlights the baseline for balancing AI factory
https://x.com/NVIDIAAI/status/2047823093578518758

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params.
https://x.com/deepseek_ai/status/2047516922263285776?s=20

🚨 DeepSeek V4 Pro just dropped 75% OFF API pricing + permanent cache price cut to 1/10! 🔥 4/26 Update: Cache permanently 90% cheaper Offer ends May 5, 2026 Insights from Zhihu contributor 普杰 💡 Key Insights: • DeepSeek doesn’t do loss-leader promotions → ¥3 in / ¥6 out ≥
https://x.com/ZhihuFrontier/status/2049027925920637077

8x VLLM CUDA MOAT ALERT: InferenceX has added @deepseek_ai V4 Pro for @vllm_project for day 3 performance across B200, B300, H200, GB200 disagg. We are seeing that B300 is up to 8x faster than H200. The team is working on benchmarking vLLM 0.20 which has the new DeepGEMM MegaMoE
https://x.com/SemiAnalysis_/status/2048957715955765284

Also, deepseek v4 is available as well
https://x.com/Teknium/status/2047798102091067677

And now a new DeepSeek model, and appears to be fully open weights. Good benchmarks, but with open models, that isn’t always as meaningful. Should be live soon to actually try.
https://x.com/emollick/status/2047516272062058890

Another reason I’m watching Delton closely is that the company works closely with Huawei. As DeepSeek’s comments suggest, Huawei’s 950 is expected to enter heavy mass production starting in the second half of this year, right?
https://x.com/jukan05/status/2047823601462812932

Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?
https://x.com/simonw/status/2047844236142497850

Compressed Sparse Attention. A Faithful Implementation of CSA from the DeepSeek-V4 paper.
https://x.com/arjunkocher/status/2049066844925936041

DeepSeek cuts V4-Pro prices by 75%
https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent

DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index @deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture
https://x.com/ArtificialAnlys/status/2047735160544841953

DeepSeek removed it’s “Thinking with Visual Primitives” repo. here a paper link if anyone needs to read it.
https://x.com/arjunkocher/status/2049875566678118898

DeepSeek said Pro pricing could fall sharply once Huawei Ascend 950 supernodes are deployed at scale in the second half of the year””
https://x.com/scaling01/status/2047760776769720360

DeepSeek staff has deleted the repo and all mentions of the vision paper. What the hell happened? People who got Vision enabled on web: do you still have it?
https://x.com/teortaxesTex/status/2049880056420298995

DeepSeek themselves estimate the gap to be 3-6 months I think it’s on the higher end of that range
https://x.com/scaling01/status/2047626000091971811

DeepSeek trains vision capabilities into their v4 Flash model by having the model directly output bounding boxes and point coordinates of an image during reasoning. This is DeepSeek’s Computer Use Agent.
https://x.com/nrehiew_/status/2049840778491662623

DeepSeek v4 earmarks the next era of open weight models and is one of the landmark papers for open weight model training. Thread and notes below 🙂
https://x.com/nrehiew_/status/2047665987730993363

DeepSeek V4 just launched on Huawei hardware, and the numbers tell a story the headlines are hiding. • Huawei’s Ascend 910C delivers roughly 60% of the inference power of an Nvidia H100. • Production is capped at 750,000 units this year; Nvidia ships that many in a single
https://x.com/PalwinderCFA/status/2047614823102619974

DeepSeek V4 MLX Quants now on MLX community HF repo, Made possible by @LambdaAPI and @TheZachMueller ❤️ Without a GPU cluster it would take me a week to upload the quants… Model collection 👇🏽
https://x.com/Prince_Canuma/status/2047847095466385899

DeepSeek V4 Open Source + vLLM Support LIVE 🚀 | Technical Breakdown 🧠 Core Insight DeepSeek V4 is built to solve 1M-token long-context inference — the biggest pain point for LLMs today. ⚠️ 2 Key Long-Context Challenges • KV Cache Explosion: KV cache grows linearly with
https://x.com/ZhihuFrontier/status/2047664976215839021

DeepSeek writing quality (at least in Chinese) is good because they’ve been obsessing about data for the entire history of the company (tbh “”clean data”” is an obvious instinct for algo traders too, but I think this is more about Wenfeng’s purism) and have such job listings
https://x.com/teortaxesTex/status/2047614729145745623

DeepSeek_V4.pdf · deepseek-ai/DeepSeek-V4-Pro at main

Click to access DeepSeek_V4.pdf

DeepSeek-V4 is a full-stack redesign of LLMs around long context + efficiency Here are some of the changes: – Hybrid attention: Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) for long-context efficiency – 1M-token context becomes ~3-10× cheaper in memory
https://x.com/TheTuringPost/status/2048566818118545887

DeepSeek-V4 uses our Hash routing approach developed back in 2021 — see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced ‘looped transformers’, but we called that staircase & ladder (see
https://x.com/jaseweston/status/2047690308217926055

DeepSeekv4 Pro 1.6T is supported on InferenceX on Day 0! We have already gotten H200 vLLM working and working on @vllm_project & @sgl_project MI355, B200, B300, GB200/300 disaggregated DeepSeekv4 day 0 performance benchmarking too to track the progress of improvement. Thank you
https://x.com/SemiAnalysis_/status/2047726025748930687

Early DeepSeek v4 impressions not great.
https://x.com/mbusigin/status/2047707082007220393

Here’s DeepSeek v4 Pro. Added to the playable gallery as well.
https://x.com/emollick/status/2047527060713664754

I get the impression many Chinese hate Huawei irrationally and suspect it of a conspiracy to deprive DeepSeek of based American chips
https://x.com/teortaxesTex/status/2047631470664020211

I hear similarly it’s not unique to Mythos/5.5 ofc, frontier models have been dealing with >100T for a while, as far as I know. We see even the open source models get close to 50T. A 100T DeepSeek V4 is just V4 + 2 more epochs, 3e25 FLOPs. still below Llama 405B level
https://x.com/teortaxesTex/status/2049830477167526255

I hope the upgrade to DeepSeek v4 will make the bot comments on here more bearable.
https://x.com/emollick/status/2047519187287846937

I’m still confused by some of the decisions done in deepseek v4 Main confusion is why the huge focus on reducing KV cache size when with something like HiSparse u can offload most of ur kv cache (making ur decode compute bound) This also is compensated with a huge 128 heads and
https://x.com/Grad62304977/status/2048785005216723072

interesting that deepseek’s also joined the path of not allowing sampler control on their api. i wonder why and how long this has been there
https://x.com/stochasticchasm/status/2047717161070989499

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows.
https://x.com/togethercompute/status/2047743446522224987

its so messed up that deepseek trained on deepseek reasoning traces. has chinese distillation gone too far?
https://x.com/kalomaze/status/2047762970931827125

Jensen was making a good point, but now it’s too late. DeepSeek is fully committed to ditching CUDA. The rest of the Chinese xiaoren ecosystem can be swayed by Hoppers; Wenfeng believes too much in long-termism. After V4, non-CUDA hardware is guaranteed to live and prosper.
https://x.com/teortaxesTex/status/2049185408785998217

Let’s dive deeper into the difference between DeepSeek V4 Pro & V4 Flash by @DeepSeek_AI. – Both support 1M token context and V4 Flash Thinking shifts the price Pareto frontier. V4 Pro ranks ~30 places higher than the V4 Flash variants, but costs 12x more at launch pricing.
https://x.com/arena/status/2047774037204742255

Let’s see DeepSeek are all nice folks and China’s national heroes, Xi is personally a man of integrity, and they’re not starting wars. American society firebombs Sam Altman, Ant is a weird sex cult, and elected US leader is a murderous monke. Why should compute decide this?
https://x.com/teortaxesTex/status/2047645676234846459

looks like the ~Opus 4.5 estimate for DeepSeek-V4 holds for now, at least on SimpleBench
https://x.com/scaling01/status/2047682465624445015

My first two TiKZ Sparks unicorns from DeepSeek v4. (Expert mode, from the DeepSeek site, which is supposed to be v4 Pro according to the release)
https://x.com/emollick/status/2047523193481547929

My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware
https://x.com/iscienceluvr/status/2047514399393579235?s=46

not even DeepSeek has any appetite for doing this again this is evolution tier architecture they’ll refactor it when they get some time
https://x.com/teortaxesTex/status/2047648219081974034

somewhere in france, still awake at sunrise, adding exclamations to their first read of the deepseek technical report, “one of the best i’ve ever read”
https://x.com/morqon/status/2047643246923325833

Surprisingly a lot of info about the data and process (which is unlike some other deepseek papers). On first read, it sounded like they only cared about specific tasks rather than a general multimodal model. On second thought however, I realized these “”visual primitives”” and
https://x.com/nrehiew_/status/2049840802562740311

TEHRAN, April 29, 2026 — Less than a week after the release of @deepseek_ai DeepSeek v4 Pro, the cracked team at @vllm_project and @inferact has achieved considerable improvement on GB200 (Dynamo+vLLM). This is largely due to the release of vLLM 0.20.0, which comes with MegaMoE
https://x.com/SemiAnalysis_/status/2049578313111216271

Thank you @NVIDIAAI for highlighting vLLM’s day 0 @deepseek_ai support and enhancing the open source inference ecosystem!
https://x.com/vllm_project/status/2047843293447500069

The strongest open-source agentic model is live on Baseten! DeepSeek V4 is a preview of two powerful MoE models: V4-Pro (1.6T params) and V4-Flash (284B params) with 1M context and SOTA open-source performance. This represents a significant jump from V3.2 (which had a 128k
https://x.com/baseten/status/2047779549644243146

This is great – @deepseek_ai V4 supports prefill! 😀 Most other providers have been dropping support for this critically important capability, so wonderful to see at least one company stepping up.
https://x.com/jeremyphoward/status/2049098509530583199

Unless I’m doing it wrong, Kimi K2.6 in Hermes is like 7x slower than DeepSeek V4, not to mention V4-Flash lmao but it can sometimes fix bugs that not even Pro can resolve. it also has some harsh words for them:
https://x.com/teortaxesTex/status/2048820805258059837

vLLM support for DeepSeek V4 base models is on the way! The V4 release includes 4 models: base/instruct × flash/pro. Initial support covers the instruct versions. To extend support to the base models, we worked with @deepseek_ai to add an expert_dtype field in the config, making
https://x.com/vllm_project/status/2048769886483329525

vLLM v0.20.0 is here! 752 commits from 320 contributors (123 new). 🎉 Highlights: DeepSeek V4, Hunyuan v3 preview support, CUDA 13 / PyTorch 2.11 / Transformers v5 baseline, FA4 as default MLA prefill, TurboQuant 2-bit KV (4× capacity), vLLM IR foundation. Thread 👇
https://x.com/vllm_project/status/2048918629144805619