Ethan B. Holland

Over 50,200 manually organized AI links and counting

DeepSeek: AI News Week Ending 07/04/2025

July 4, 2025

Image created with OpenAI GPT-Image-1. Image prompt: rich crimson, bright ivory, deep navy Independence-Day palette, vibrant, celebratory, wholesome, authentic, photorealistic apple pie cooling on windowsill with steam curling up scene featuring the neon word “DEEPSEEK” reflected in a nearby window; natural lighting, subtle film grain, high detail

Tencent released Hunyuan-A13B, a new open-source hybrid reasoning model It nears or matches models like o1 and DeepSeek R1 on major benchmarks, while remaining efficient enough to run on a single GPU Also includes “”fast and slow”” modes to adjust efficiency levels https://x.com/rowancheung/status/1939601169271197973

Meet Jan-nano, a 4B model that outscores DeepSeek-v3-671B using MCP. It’s built on Qwen3-4B with DAPO fine-tuning, it handles: – real-time web search – deep research Model + GGUF: https://x.com/menloresearch/status/1934809407604576559

A bit later, but Huawei did open source their 72B MoE. Now they’re in the proud club «did Scout better than Meta». DeepSeek ethos spreads… Also, gitcode claims 15T tokens vs 13 in the tech report. Another detail: MoGE is their original load balancing solution. interesting https://x.com/teortaxesTex/status/1940341153754382688

METR results for DeepSeek V3 and R1 kinda sucks, huh? https://x.com/scaling01/status/1939770925781487779

It’s fun that Deepseek is being served at lower latencies and the same cost by a number of companies. A few firms have implemented Deepseek’s high rank / wide EP inference set up with higher efficiency too. As such traffic has left Deepseek’s direct API to other providers”” / X https://x.com/dylan522p/status/1940872241753039319

RT @ivanfioravanti: DeepSeek-R1-0528-5bit on MLX pushing M3 Ultra 512GB to its limits! 501GB used mem visibile on mactop in the video! Con…”” / X https://x.com/awnihannun/status/1940067135054913892

RT @tngtech: Today we release DeepSeek-TNG R1T2 Chimera. This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, nam…”” / X https://x.com/swyx/status/1940660469733511388

Policy throttles silicon; DeepSeek’s R2 waits in the queue. Washington’s export clamp has dried up fresh H20 supply, so those chips cannot be freed for training the larger R2. Engineers completed an initial R2 pass, yet CEO Liang Wenfeng says reasoning and coding lag. Teams https://x.com/rohanpaul_ai/status/1939242685828927512

DAMN! DeepSeek R1T2 – 200% faster than R1-0528 & 20% faster than R1 🔥 Significantly better than R1 on GPQA & AIME 24 made via Assembly of Experts w/ DS V3, R1 & R1-0528 MIT license – available on Hugging Face 🤗 https://x.com/reach_vb/status/1940536684061643239