Ethan B. Holland

Over 55,600 manually organized AI links and counting

Chips and Hardware: AI News Week Ending 09/19/2025

September 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A brass balance scale on weathered stone in a Lincoln’s Inn chamber, one plate holding a reflective silicon wafer with visible circuit patterns, the other a leather law book, warm window light creating both judicial gravitas and technological precision, painted in the style of a Dutch master with regal blues and golds.

SoftBank to buy $2 billion in Intel shares at $23 each — firm still owns majority share of Arm | Tom’s Hardware https://www.tomshardware.com/tech-industry/semiconductors/softbank-to-buy-usd2-billion-in-intel-shares-at-usd23-each-firm-still-owns-majority-share-of-arm

Our new Waltham Cross data center is part of our two-year, £5 billion investment to help power the UK’s AI economy. https://blog.google/around-the-globe/google-europe/united-kingdom/waltham-cross-data-centre/

Nvidia and Intel announce jointly developed ‘Intel x86 RTX SOCs’ for PCs with Nvidia graphics, also custom Nvidia data center x86 processors — Nvidia buys $5 billion in Intel stock in seismic deal | Tom’s Hardware https://www.tomshardware.com/pc-components/cpus/nvidia-and-intel-announce-jointly-developed-intel-x86-rtx-socs-for-pcs-with-nvidia-graphics-also-custom-nvidia-data-center-x86-processors-nvidia-buys-usd5-billion-in-intel-stock-in-seismic-deal

NVIDIA and Intel to Develop AI Infrastructure and Personal Computing Products | NVIDIA Newsroom https://nvidianews.nvidia.com/news/nvidia-and-intel-to-develop-ai-infrastructure-and-personal-computing-products?ncid=so-twit-672238

Teams at Nvidia and Intel have been working in secret on jointly developed processors for a year — ‘The Trump administration has no involvement in this partnership at all’ | Tom’s Hardware https://www.tomshardware.com/pc-components/cpus/teams-at-nvidia-and-intel-have-been-working-in-secret-on-jointly-developed-processors-for-a-year-the-trump-administration-has-no-involvement-in-this-partnership-at-all

Jensen Huang ‘disappointed’ by reported China Nvidia chip ban https://www.bbc.com/news/articles/cqxz29pe1v0o

The new open-source Qwen3-Next Instruct and Thinking models put state-of-the-art long-context reasoning into the hands of everyone. We collaborated with #opensource frameworks from SGLang (@lmsysorg) and @vllm_project to enable communities to deploy Qwen3-Next across the https://x.com/NVIDIAAIDev/status/1967575419638468667

“How a rock learns to think.” These STEM reels are amazing! Is this the antidote to short form brainrot? https://x.com/bilawalsidhu/status/1966073881103606133

reassuring to know that even when your valuation is $183 billion you still deal with the same type of inference bug as the rest of us. there is no moat”” / X https://x.com/vikhyatk/status/1968432341937963257

Really cool to see that Anthropic also uses JAX for inference on Google TPU. I’m curious whether they also use JAX for inference on GPU’s (Azure/AWS) or if they developed a separate codebase for it.”” / X https://x.com/borisdayma/status/1968697704361468354

@zephyr_z9 😂 Although AMD is now working pretty well for small to medium sized models”” / X https://x.com/elonmusk/status/1966412913662669082

If you’ve spent your life of CUDA, come check out ROCm, the open-source alternative by AMD with a big new version upgrade today 🚀 Launch page: https://x.com/realSharonZhou/status/1967995011816997219

Is your training program or your team making full use of your expensive GPUs? 🧐 💻SkyPilot now has a native support for GPU metrics! Check your GPU utilization in the SkyPilot dashboard — a single view for managing resources/jobs on any AI infra. https://x.com/skypilot_org/status/1966592871600890285

this guy designs kernels with spreadsheets to hit shared memory banks evenly, open sourced assembly kernels that basically taught nvidia how to do gpu matmuls and register allocation properly, and authored most prod chatgpt kernels. genuinely incredible”” / X https://x.com/itsclivetime/status/1968140448062746651

Today we’re launching Reserved Instances – Request 8–1,000+ GPU clusters – Get quotes from up to 50+ providers in 24h – Re-sell idle GPUs back to our spot market – Support from our research team https://x.com/PrimeIntellect/status/1967724735430791342

Triton is nice if you want to get something onto a GPU but don’t need full performance/TCO. However, if you want peak perf or other HW, then Mojo🔥 could be a better fit. I’m glad OpenAI folk are acknowledging this publicly, but I wrote about it here: https://x.com/clattner_llvm/status/1968174450979070346

v0.10.2 marks the first release with official aarch64 support for vLLM! You can now install vLLM directly onto @nvidia ‘s GB200. Along with the PyPI release, our docker image is also multi-platform so pulling the right image just works. More perf enhancements on the way! https://x.com/vllm_project/status/1967752683458269282

I was lucky to work in both China and the US LLM labs, and I’ve been thinking this for a while. The current values of pretraining are indeed different: US labs be like: – lots of GPUs and much larger flops run – Treating stabilities more seriously, and could not tolerate spikes”” / X https://x.com/JingyuanLiu123/status/1966887747622453560

This is a timestamped transcripts at vLLM speed on modern GPUs: 1. Offload QK tensor to CPU during the forward pass(QK cache) 2. Recompute attention weights post‑generation”” / X https://x.com/_yuqiwang/status/1967996028604551534

including one more new feature, you can configure client idle detection with a single parameter. If this threshold is hit, the input_audio_buffer.timeout_triggered event will be fired. https://x.com/juberti/status/1968105091002667356

Serving a model at scale is hard. Serving it across three hardware platforms (AWS Trainium, NVIDIA GPUs, Google TPUs) while maintaining strict equivalence is a whole other level. Makes you wonder if the hardware flexibility is truly worth the hit to development speed and https://x.com/_philschmid/status/1968586407548518565

Towards a Physics Foundation Model Proposes GPhyT (General Physics Transformer), a large transformer trained on 1.8 TB of simulation data across fluid flows, shock waves, heat transfer, and multiphase dynamics. Here are a few key notes: https://x.com/omarsar0/status/1968681177189077366

Excited to share what friends and I have been working on at @Standard_Kernel We’ve raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: – Matmul 102%-105% perf https://x.com/anneouyang/status/1967610221712519612

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision “”This paper asks a simple question: Can inference compute substitute for missing supervision?”” “”the current policy produces a group of rollouts; a frozen anchor (the initial policy) reconciles https://x.com/iScienceLuvr/status/1968599654507102491

@huggingface Cerebras Inference powers the world’s top coding models, making code generation pretty much instant. And anyone can get a free API key from https://x.com/code/status/1966638514100924846

Scaling AI requires leaps in hardware. NVIDIA Blackwell is the latest step forward. On Oct 1, we’re bringing together Dylan Patel (@SemiAnalysis_), Ian Buck (@nvidia) and Charles Zedlewski (Together AI) to unpack its architecture, optimizations and impact on AI infrastructure. https://x.com/togethercompute/status/1968367704621863154

Stanford Seminar – Nvidia’s H100 GPU Deep dive into H100’s architecture, covering Hopper streaming multiprocessors, Transformer Engine, NVLink interconnects, and HPC/AI workloads optimization strategies. https://x.com/vivekgalatage/status/1968117707812774259

“People who are serious about robot learning should build their own hardware,” says NVIDIA’s embodied AI research co-lead. This is likely a general statement, not a hint at NVIDIA’s plans, but it would be awesome if NVIDIA designed and made its own robot hardware. https://x.com/TheHumanoidHub/status/1966216768290222552

We felt bad about the slowdowns as we were adding GPUs, so we reset everyone’s limits to make up for it:”” / X https://x.com/sama/status/1968316161113882665

🔋 Naveen Rao is leaving Databricks (a $100Bn startup) to build a next generation computer to shrink AI compute costs, and Databricks plans to invest. Databricks sits around $100B and just raised $1B. A core problem for the AI industry is that large models are limited by memory https://x.com/rohanpaul_ai/status/1966378718009635087

SAPO – Swarm sAmpling Policy Optimization – is a new RL training method by @gensynai. It works in a decentralized “swarm” of computers instead of synchronized GPU clusters: – Each computer (node) trains its own model – Nodes share rollouts with others in plain text – Any device https://x.com/TheTuringPost/status/1967575689844166834