Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Photorealistic ice drilling research station on frozen winter bay at dusk, vertical cylindrical ice core sample showing thousands of translucent layers with embedded mathematical symbols and data patterns, scientific measurement equipment and grid markers on ice surface, golden hour light passing through ice core creating amber and blue glow, sky gradient from deep blue to warm orange, 4K nature documentary cinematography, bold sans-serif title text ‘DeepSeek’ across top, landscape format
GLM-5 was pre-trained on 28.5T tokens and uses DeepSeek Sparse Attention”” https://x.com/scaling01/status/2021627498451370331
Annoyingly, DeepSeek on the API is still V3 And I can get 128K/2024 claims repeated in previous chats, despite the absence of clues in the preceding context. They’re rolling it out very unevenly, cautiously. I guess it’s a repeat of r1-lite-preview situation. We get a taste.”” https://x.com/teortaxesTex/status/2021515356951695431
DeepSeek finally has frontier level attention. Maybe better than “”frontier””. They announced the plan to solve attention 13 months ago, in V3 paper. They’re making progress.”” https://x.com/teortaxesTex/status/2021578213420405134
DeepSeek has achieved something very Special with attention. I haven’t seen a model that’s so proactive with its context. It doesn’t just have full recall, it *inhabits* a context, feels at home there. It reminds me of Ant hype about Opus self-awareness. Or of test-time training.”” https://x.com/teortaxesTex/status/2021579901548081353
DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling | AINews https://news.smol.ai/issues/25-12-01-deepseek-32
Fun to see Deep Think’s real-world impact. Check out how it’s helping researchers catch errors in high-level mathematics research papers. As “”just”” a math undergrad, I couldn’t even dream to do any of this myself!”” https://x.com/OriolVinyalsML/status/2021982723733438725
i think we don’t realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the “”finegrain + sparse + shared expert”” deepseek moe recipe > a lot of”” https://x.com/eliebakouch/status/2021577794480644216
i think we don’t realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the “”finegrain + sparse + shared expert”” deepseek moe recipe > a lot of”” https://x.com/eliebakouch/status/2021577794480644216?s=46
I’m sorry but if this is DeepSeek-V4 it is unfortunately over”” https://x.com/scaling01/status/2021562929728885166
Tensor Parallelism is killing your DeepSeek-V3 throughput. Period. MLA models only have ONE KV head. If you’re using vanilla TP8, you’re just wasting 7/8 of your VRAM on redundant cache. We just shipped the solution in @sgl_project : 1. DPA (DP Attention): Zero KV redundancy.”” https://x.com/GenAI_is_real/status/2021512872027656344
Within the last few minutes, DeepSeek has been updated. Knowledge cutoff May 2025, context length 1 million tokens. This is likely V4, though it doesn’t admit to being one.”” https://x.com/teortaxesTex/status/2021511733333131311
Short post about Engram, recent paper by DeepSeek: It is essentially very similar to SCONE (link below), where authors train embeddings for a large number of n-grams (e.g. 1B common n-grams like “”Alexander the Great””). [1/2]”” https://x.com/gabriberton/status/2020612533502222459
Average Throughput of GLM-5 on Openrouter is 14 tps”” https://x.com/scaling01/status/2021981416452764058
Build more. Spend less. GLM-5 is now on YouWare. Landing pages, portfolios, prototypes. All handled fast, with a 200K context window. Save your premium credits for the big builds.”” https://x.com/YouWareAI/status/2021982784948936874
Congrats @Zai_org on GLM-5! Love the permissive MIT license (vs K2.5’s modified MIT). Haven’t chatted with it yet so no vibes, but from the numbers I’m not compelled to switch from @Kimi_Moonshot K2.5: • Similar evals, but GLM-5’s are at bf16 while K2.5’s are at int4 – GLM-5″” https://x.com/QuixiAI/status/2021651135615184988
Day-0 with @Zai_org: GLM-5 is live on DeepInfra 🔥 Built for long-horizon agents that plan, orchestrate, and self-correct. Serving ~100 TPS at launch and as usual the best price on the market!”” https://x.com/DeepInfra/status/2021666854088110318
GLM 5 is 2x the total parameter of GLM 4.5 + deepseek sparse attention for efficient long context this is going to be a crazy model”” https://x.com/eliebakouch/status/2020824645868630065
GLM MoE DSA”” is landing in transformers 👀”” https://x.com/xeophon/status/2020815776890909052
GLM-4.7-Flash-GGUF is now the most downloaded model on @UnslothAI.”” https://x.com/Zai_org/status/2021207517557051627
GLM-5 already available on OpenRouter (with even lower prices)”” https://x.com/scaling01/status/2021637257103651040
GLM-5 has a 200k context length and maximum output of 128k”” https://x.com/scaling01/status/2021628691357298928
GLM-5 is massive. 745B params. LETS FUCKING GOOOOO This should be fun!”” https://x.com/scaling01/status/2020840989947298156
GLM-5 Pricing $1 and $3.2 Output There is also a GLM-5 Code variant that is more expensive👀 almost 8 times cheaper than Opus”” https://x.com/scaling01/status/2021628971939418522
GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It’s quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to @ActuallyIsaak and @kernelpool for the port.”” https://x.com/awnihannun/status/2022007608811696158
https://t.co/ctlyPtiB3j GLM-5 architecture is out: ~740B parameters ~50B active 78 layers, MLA attention lifted from DeepSeek V3, plus DeepSeek V3.2’s sparse attention indexer for 200k context. Basically DeepSeek V3 scale with DSA bolted on.”” https://x.com/QuixiAI/status/2021111352895393960





Leave a Reply