Ethan B. Holland

Over 51,900 manually organized AI links and counting

Chips and Hardware: AI News Week Ending 09/12/2025

September 12, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Chips, sunlit café table near the pool, metallic microchip keychain clipped to a beach bag zipper, subtle reflections, photorealistic, editorial, minimal, landscape, vacation, no text overlays

Exclusive | Oracle, OpenAI Sign $300 Billion Cloud Deal – WSJ https://www.wsj.com/business/openai-oracle-sign-300-billion-computing-deal-among-biggest-in-history-ff27c8fe?mod=hp_lead_pos1

Oracle Stock Skyrockets as Software Giant Scores Massive AI Deals – WSJ https://www.wsj.com/business/earnings/oracle-stock-orcl-ai-deals-047216cd

⚡️ Efficient weight updates for RL at trillion-parameter scale 💡 Best practice from Kimi @Kimi_Moonshot vLLM is proud to collaborate with checkpoint-engine: • Broadcast weight sync for 1T params in ~20s across 1000s of GPUs • Dynamic P2P updates for elastic clusters •”” / X https://x.com/vllm_project/status/1965824120920342916

Introducing checkpoint-engine: our open-source, lightweight middleware for efficient, in-place weight updates in LLM inference engines, especially effective for RL. ✅ Update a 1T model on thousands of GPUs in ~20s ✅ Supports both broadcast (sync) & P2P (dynamic) updates ✅ https://x.com/Kimi_Moonshot/status/1965785427530629243

@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://x.com/basetenco/status/1966224960223158768

In 2025, getting the right cloud GPU at the right cost depends on capacity, contracts, and timing. Our new report maps providers, pricing models, hardware shifts, and strategies to help ML teams optimize availability and costs. https://x.com/dstackai/status/1965807328508399984

Underdiscussed topic: GPUs aren’t the bottleneck. For fast distributed GenAI post-training, network & storage matter as much. By tuning the network and storage on the cloud, @makneee found 10x speedup on @nebiusai cloud, even with the same GPUs and code. https://x.com/skypilot_org/status/1966208445339807816

Nebius stock soars on AI infrastructure deal with Microsoft https://www.cnbc.com/2025/09/08/nebius-stock-soars-on-ai-infrastructure-deal-with-microsoft-.html

Writing fast GPU kernels is important, though not nearly as important as writing correct ones. That’s why @marksaroufim and folks from Meta have released BackendBench. Now BackendBench also lives on @PrimeIntellect environment hub. 1/3 https://x.com/m_sirovatka/status/1965891832942047350

Heads up: if you’re looking to try CUDA-13 (most useful for Blackwell gpus), PyTorch nightly already has cu130 builds: https://x.com/StasBekman/status/1965826539540590791

This is a handy report on the state of cloud GPUs in 2025: costs, performance, playbooks by @dstackai https://x.com/StasBekman/status/1965817531043811339

🚀 Excited to announce QuTLASS v0.1.0 🎉 QuTLASS is a high-performance library for low-precision deep learning kernels, following NVIDIA CUTLASS. The new release brings 4-bit NVFP4 microscaling and fast transforms to NVIDIA Blackwell GPUs (including the B200!) [1/N] https://x.com/DAlistarh/status/1965157635617087885

Don’t forget, you can try it out at NVIDIA API Catalog https://x.com/Alibaba_Qwen/status/1966206151391064143

remembering the time i tried to buy GPUs from oracle the first sales guy I talked to didn’t know if the $10/hr on their website was per H100 or per 8xH100 he scheduled a follow up call, in which they brought in 2x more sales people than our company’s total headcount”” / X https://x.com/vikhyatk/status/1965943667237204069

Writing fast GPU kernels is important, though not nearly as important as writing correct ones. That’s why the folks from Meta have released BackendBench.https://x.com/johannes_hage/status/1965945249274151107