Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, keep the pure white landscape field, vertical type hierarchy, and galaxy-punchout starfield treatment exactly as in the Alesso cover, but replace ‘HEROES’ with ‘DEEPSEEK’ in the same bold condensed grotesque all-caps with Milky Way texture, keep ‘(we could be)’ unchanged, replace ‘ALESSO’ with ‘OPEN FRONTIER’ in the light geometric all-caps galaxy-punchout, keep ‘FEATURING.’ unchanged, and replace ‘TOVE LO’ with ‘MIXTURE OF EXPERTS’ in the condensed grotesque all-caps galaxy-punchout. Maintain identical tracking, font contrast, centered composition, generous margins, and landscape aspect ratio with no illustrations or decorative elements.

Deepseek V4 Pro is the biggest open model ever with 1.6T total 49B active, trained on 33T tokens, 1M context, with 2 new attention mechanisms, Muon, mHC, open source kernels, FP4 QAT, MIT license and with one of the best tech repot of the year
https://x.com/eliebakouch/status/2047519300399837677

DeepSeek is once again the open-source king and it’s competitive with frontier models 1st or 2nd place on 12/22 benchmarks
https://x.com/scaling01/status/2047512176856899985

DeepSeek V4 Flash Thinking at 284B parameters (13B activated) shifts the Text Pareto frontier with $0.14 input / $0.28 output per MToken. Congrats again to @DeepSeek_AI on the open model progress!
https://x.com/arena/status/2047524055679729885

Exciting news – DeepSeek V4 Pro is in the Arena with 1.6T parameters (49B activated) alongside V4 Flash at 284B parameters (13B activated). Both support 1M token context. It’s a major leap over DeepSeek V3.2! Code Arena: – DeepSeek V4 Pro (thinking): #3 open model (#14 overall),
https://x.com/arena/status/2047518354903359697

🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we’re publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM.
https://x.com/vllm_project/status/2047520252851105796

DeepSeek V4 by @deepseek_ai just dropped! SGLang is ready on Day 0 with a full stack of optimizations from architectures to low-level kernels. We also deliver a verified RL training pipeline in Miles (by @radixark) for V4 at launch: 1️⃣ Native “”ShadowRadix”” Design: DeepSeek V4’s
https://x.com/lmsysorg/status/2047511629919932623

Deepseek v4 is a huge step upwards compared to DeepSeek 3, outperforms on SWE verified opus 4.6 and GPT-5.4 and sets a new record on Codeforces. Needs to be tested against opus 4.7 and GPT-5.5 tho and see if real world usage holds its promises. Big release! Sota open source
https://x.com/kimmonismus/status/2047514623356579869

DeepSeek-V4 official pricing: DeepSeek-V4 Flash:$0.14 / $0.28 DeepSeek-V4 Pro: $1.74 / $3.48
https://x.com/scaling01/status/2047508350238175526

DeepSeek-V4 Technical Report
https://x.com/scaling01/status/2047510520618516572

DeepSeek-V4 was pre-trained on 32T tokens using Muon and integrates a new hybrid attention mechanism and mHC
https://x.com/scaling01/status/2047510190044409860

Finally, DeepSeek V4 is here! – MIT license – DeepSeek-V4-Pro: 1.6T params (49B active) – DeepSeek-V4-Pro Max ≈ Opus-4.6 Max / GPT-5.4 xHigh across benchmarks!
https://x.com/Yuchenj_UW/status/2047514092756418757

My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware
https://x.com/iScienceLuvr/status/2047514399393579235

They said it’s next week 🤞 : r/DeepSeek

They said it's next week 🤞
byu/Exciting-Mall192 inDeepSeek

Tencent, Alibaba to back DeepSeek at $20B+ valuation: report — TFN

Tencent, Alibaba to back DeepSeek at $20B+ valuation: report

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params.
https://x.com/deepseek_ai/status/2047516922263285776

@teortaxesTex deepseek-v4-flash: $0.14/$0.28 deepseek-v4-pro: $1.74/$3.48 This is extremely aggressive pricing. Flash, in particular, really stands out. It is 10 times cheaper than gemini 3.0 flash on an output-token basis… If their inference infrastructure is solid, I think there could be
https://x.com/Hangsiin/status/2047515855949623667

One of DeepSeek-V4’s goals was to make ‘1M context windows practical enough for real world use’, and they appear to have done that remarkably well. It outperformed Gemini 3.1 Pro on long context benchmarks, held up quite well even at 1M tokens, improved compute efficiency by
https://x.com/Hangsiin/status/2047523724929405328

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading