Chips and Hardware: AI News Week Ending 04/11/2025

Image created with Flux Pro Ultra. Image prompt: A Minecraft screenshot displaying intricate redstone circuits and processors built from glowing blocks with paths of redstone dust connecting components, with “CHIPS” written in pixelated Minecraft font across the top

“Should I buy some $NVDA tomorrow @AskPerplexity” / X https://x.com/AravSrinivas/status/1909486897334042760

“TPUv7 will use ~25% more power than TPUv6, but has ~2.5x the FLOPS in FP8” / X https://x.com/scaling01/status/1909953954827432278

“UALink 1.0 spec is out to compete with NVLink https://x.com/StasBekman/status/1910381014213681537

“spotted @vllm_project at @googlecloud next keynote today! https://x.com/vllm_project/status/1910191668437156154

“lunch in downtown seattle costs 16-20 H100-hours my caloric consumption has dropped by 10x since i started converting $ to H100-hours” / X https://x.com/vikhyatk/status/1909752681742422383

“@AgentOpsAI 1/ Build Reasoning Models to Achieve Advanced Agentic AI Autonomy Hosted by: Joey Conway, Sr. Director, AI Software, NVIDIA Oleksii Kuchaiev, Director of Applied Research, NVIDIA This session explains how reasoning models like DeepSeek-R1 are built using various techniques such” / X https://x.com/AtomSilverman/status/1907897984135868671

“@AgentOpsAI 2/ How to Build an Agentic AI System Using the Best Tools and Frameworks Hosted by: Bartley Richardson, Senior Director of Engineering, NVIDIA Kris Murphy, Technical Product Manager, NVIDIA Join this session to learn about tools and frameworks to more easily build agentic AI” / X https://x.com/AtomSilverman/status/1907898057540395070

“NVIDIA GTC cost over $2000 to attend. I just found out that NVIDIA gave away all their AI Agent sessions for free. Here are the top free agent sessions that they hosted 🧵 (save for later & send to your entering team) https://x.com/AtomSilverman/status/1907897515531448449

“@AgentOpsAI 11/ How to Build Multimodal Agentic AI Retrieval Systems Hosted by: Annie Surla, Developer Advocate Engineer, NVIDIA Tanay Varshney, Developer Advocate Engineer for Deep Learning SW, NVIDIA Join NVIDIA technical product architects for an in-depth tutorial demonstrating how to” / X https://x.com/AtomSilverman/status/1907898434650309069

“@AgentOpsAI 7/ How to Onboard Your Team of AI Agents and Transform Your Enterprise Hosted by: Adel El Hallak, Senior Director of Product Management, NVIDIA Agents combine reasoning, dynamic data retrieval and can access tools to drive outcomes. Teams of AI agents can work together,” / X https://x.com/AtomSilverman/status/1907898293872730597

“@AgentOpsAI 8/ Building Scalable Data Flywheels for Continuously Improving AI Agents Hosted by: Kostikey Mustakas, Director of Data Science, AT&T Julia Gomes, Technical Product Manager, Arize AI Vivienne Zhang, Senior Product Manager, NVIDIA NVIDIA NeMo offers a complete solution for” / X https://x.com/AtomSilverman/status/1907898331302609037

“@AgentOpsAI 3/ Building Future-Ready AI With Agents and Data Flywheels: Insights From NVIDIAā€™s Enterprise Deployments Hosted by: Aaditya Shukla, Sr. Staff Engineer, NVIDIA Santiago Pombo, Generative AI Product Manager, NVIDIA Rama Akkiraju, VP, AI/ML for IT, NVIDIA Insights, best” / X https://x.com/AtomSilverman/status/1907898137177702844

“@AgentOpsAI 14/ Streamlining Investment Insights for Wealth Management with Generative AI Hosted by: Orest Xherija, Data Science Manager, Director, UBS Lavinia Ghita, Solutions Architect, NVIDIA The collaboration between UBS and NVIDIA focuses on real-time risk assessment and monitoring of” / X https://x.com/AtomSilverman/status/1907898526811697195

“@AgentOpsAI 13/ Create Multilingual 2D Digital Humans for Enterprise Hosted by: Rochelle Pereira, Sr. Director of Engineering, NVIDIA Ragav Venkatesan, Principal Software Engineer, NVIDIA Learn about NVIDIA NIMā„¢ microservices for secure, high-performance AI deployment across various” / X https://x.com/AtomSilverman/status/1907898487154626722

US utilities grapple with Big Tech’s massive power demands for data centers | Reuters https://www.reuters.com/business/energy/us-utilities-grapple-with-big-techs-massive-power-demands-data-centers-2025-04-07/

“@AgentOpsAI 15/ Transform an Enterprise Data Platform With Generative AI and RAG Hosted by: Nave Algarici, Generative AI Product Manager, NVIDIA Sean Sodha, Sr. Software Product Manager, NVIDIA Trillions of PDF files are generated every year, each file likely consisting of multiple pages” / X https://x.com/AtomSilverman/status/1907898568465330673

“@AgentOpsAI 6/ Best Practices to Implement Your AI Strategies in the Enterprise Hosted by: Andrew McMullan, Chief Data and Analytics Officer, Commonwealth Bank of Australia Anne Hecht, Sr. Director of Product Marketing, Enterprise Products, NVIDIA Aaron Chaisson, VP, Product Marketing ,” / X https://x.com/AtomSilverman/status/1907898258141438083

“📊 Deep Cogito just dropped models built in 75 days using Iterated Distillation & Amplification (IDA), a new approach that uses extra compute to think better, then bakes those improvements into itself. Rinse & repeat for continuous self-improvement .Could reshape AI development https://x.com/fdaudens/status/1909760681194254397

“Tencent’s new model ‘Hunyuan-T1’ looks incredible —This is world’s first Mamba-powered ultra-Large Model. 👏 Flexes ultra-large reasoning, Hybrid-Mamba-Transformer MoE, tight logic, & 60–80 tokens/sec speed. @TencentHunyuan 👏 🧵 1/n 📌 Key Highlights → Ultra-large-scale https://x.com/rohanpaul_ai/status/1908861892753322142

“MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism “We present MegaScale-Infer, an efficient and cost effective system for serving large-scale MoE models.” “MegaScale-Infer achieves up to 1.90x higher per-GPU throughput than https://x.com/iScienceLuvr/status/1908091264714850707

Ironwood: The first Google TPU for the age of inference https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

“Google’s TPUv7 is out! ML accelerator marketing material is usually pretty inscrutable (what numbers are even comparable?), so here I’ll explain concretely how this compares with Nvidia. 🧵 https://x.com/itsclivetime/status/1910026066129014868

“Elon Musk’s xAI acquired X The deal creates a joint entity with a $113B valuation, combining both xAI and X’s data, models, compute, distribution and talent Will be interesting to see how xAI’s Grok evolves post this move! https://x.com/adcock_brett/status/1908913463407034527

“Llama-4 Series on BigCodeBench-Hard *Inference via NVIDIA NIM Llama-4 Maverick Ranked 41th/192 Similar to Gemini-2.0-Flash-Thinking & GPT-4o-2024-05-13 29.1% Complete 25% Instruct Llama-4-Scout Ranked 97th/192 16.9% Complete 16.9% Instruct Also, new visuals on the leaderboard! https://x.com/terryyuezhuo/status/1909247540379148439

nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 · Hugging Face https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#evaluation-results

“The blog post is quite hyperbolic and dishonest – making a comparison to El Capitan FP64 performance. The fair comparison there is against El Capitan FP8 peak perf, which is 43808 MI300A * 1961 TFLOP/s = 86 exaflops. This means that a TPUv7 9216-chip pod has about half the flops” / X https://x.com/itsclivetime/status/1910026082193138056

Invest in RAD Intel | The AI Future of Marketing https://invest.radintel.ai/

“AI needs better, faster, more efficient compute! Excited about announcement of Ironwood the first TPU build for inference and thinking models! Ironwood is the 7th-gen TPU and will come in two sizes a 256 chip configuration and a 9,216 chip configuration. 🚀🤯 TL;DR: 🧠 First TPU https://x.com/_philschmid/status/1909979316344979900

“Meet Ironwood, our most powerful, capable and energy-efficient TPU yet. #GoogleCloudNext https://x.com/Google/status/1910775101219389469

“Google just announced Ironwood, their 7th-gen TPU competitor to Nvidia’s Blackwell B200 GPUs – 4,614 TFLOPs per chip (FP8) – 192 GB HBM, 7.2 Tbps HBM bandwidth – 1.2 Tbps bidirectional ICI – 42.5 exaflops per 9,216-chip pod (24x El Capitan) https://x.com/scaling01/status/1909949372965564896

“ICYMI during NVIDIA GTC we announced Together Instant GPU Clusters ⚡ Up to 64 interconnected NVIDIA GPUs, available in minutes, entirely self-service, perfect for training models of up to ~7B parameters 🚂 , or running models like DeepSeek-R1 🐋 Now available in Preview,” / X https://x.com/togethercompute/status/1909757415907865059

“Nvidia released Nemotron-Ultra, a 253B parameter reasoning AI —Surpasses DeepSeek R1, Llama 4 Behemoth, and Maverick across benchmarks —Includes a reasoning on/off toggle —Open-source with model code, weights, and post-training data on Hugging Face https://x.com/rowancheung/status/1909845094913417513

“Nvidia just dropped Llama 3.1 Nemotron Ultra 253B on Hugging Face https://x.com/_akhaliq/status/1909614682840744417

“Github 👨‍🔧: A list of free LLM inference resources accessible via API. → Aggregates legitimate services providing free API access to numerous LLMs. → Lists providers such as OpenRouter, Google AI Studio, Mistral, Groq, Cloudflare, GitHub Models. → Details available models https://x.com/rohanpaul_ai/status/1909085975814471795

“In fact they’re very similar in other ways – the package is almost identical. 8 stacks of HBM3e = 192 GB @ ~ 8 TB/s, surrounding two large compute dies. TPU appears to have moved I/O to a thin die at the top, which probably reduces the overall package cost a bit. https://x.com/itsclivetime/status/1910026075415208045

“TransMamba fuses Transformer precision with Mamba speed in one model ▪️It uses shared parameters to switch between attention and SSM mechanisms at different token lengths and layers during training/inference. So TransMamba can “decide”: • When to use Transformer (for shorter https://x.com/TheTuringPost/status/1910406228708385135

“@AgentOpsAI 17/ LLM Pruning and Distillation in Practice: The Minitron Approach Hosted by: Saurav Muralidharan, Sr. Research Scientist, NVIDIA Sharath Turuvekere Sreenivas, Sr. Deep Learning Algorithms Engineer, NVIDIA Deep into the Minitron approach for producing compact language models” / X https://x.com/AtomSilverman/status/1907898702276272387

“Google TPUv7: – 4.6 PFLOP/s FP8 – 192 GB HBM @ 7.4 TB/s – 600 GB/s (unidi) ICI – ~1000 watts Nvidia GB200: – 5 PFLOP/s FP8 / 10 PFLOP/s FP4 – 192 GB HBM @ 8 TB/s – 900 GB/s (unidi) NVLink – ~1200 watts https://x.com/itsclivetime/status/1910026068746289286

Trump administration backs off Nvidia’s ‘H20’ chip crackdown : NPR https://www.npr.org/2025/04/09/nx-s1-5356480/nvidia-china-ai-h20-chips-trump

“NVIDIA Blackwell can achieve 303 output tokens/s for DeepSeek R1 in FP4 precision, per our benchmarking of an Avian API endpoint Artificial Analysis benchmarked DeepSeek R1 on an @avian_io private API endpoint. Running DeepSeek R1 in FP4 precision on NVIDIA Blackwell, their https://x.com/ArtificialAnlys/status/1909633232821534935

“One interesting tidbit is that this is likely what was supposed to be TPU v6p – a training chip. But (perhaps after reasoning models went big) it got renamed to TPU v7 and called “the first Google TPU for the age of inference” – quite the pivot :)” / X https://x.com/itsclivetime/status/1910026084575551892

“And I just said that Type-C is the last port we need 192Gbps/480W is pretty good though, you can build compute clusters with these and low power chips” / X https://x.com/teortaxesTex/status/1909433438353961267

A New Era for Generalist Robotics: The Rise of Humanoids | NVIDIA GTC 2025 – YouTube https://www.youtube.com/watch?v=BmD22FNOAY4

“At the system level, TPU wins hard – ICI scales to 9,216 chips. However, the 3D torus topology limits programmability. Compare GB200: only 72 chips, but on a switched network. It’s a much more flexible topology, but the switches consume power, and you have to lean on the” / X https://x.com/itsclivetime/status/1910026078405750792

“Anthropic Chief Scientist Jared Kaplan says Claude 4 will arrive “in the next six months or so.” AI cycles are compressing — “faster than the hardware cycle” — even as new chips arrive. Post-training and RL are accelerating progress. No signs of slowing. https://x.com/vitrupo/status/1908763535351669017

“Google’s new TPUv7 rumored to have 2000x the performance of the latest iPhone” / X https://x.com/scaling01/status/1909958867066175802

“(disclaimer: everything here is just synthesis of public knowledge, and is my own opinion.) Roughly speaking, TPUv7 is about the same or slightly worse spec than GB200. It runs at slightly lower power, so you could probably consider them roughly even on perf/W. Jax / XLA will” / X https://x.com/itsclivetime/status/1910026071434748237