Ethan B. Holland

Over 54,900 manually organized AI links and counting

Chips and Hardware: AI News Week Ending 03/14/2025

March 14, 2025

“Meta is testing a new, in-house chip to cut costs on AI training Manufactured by TSMC, the chip is part of the company’s MTIA series and is likely to be deployed in 2026 It will help Meta cut reliance on Nvidia’s pricey GPUs for training large models https://x.com/rowancheung/status/1899713275127828522

“iPhone manufacturer Foxconn announced FoxBrain, its first LLM with advanced reasoning —Developed in 4 weeks using Nvidia’s tech and support —Optimized for traditional Chinese —Performance near top models —Will be used in manufacturing and supply chain https://x.com/rowancheung/status/1899350947718926670

“AI compilers bring new tech into the AI performance world – and promise to relieve us from having to write CUDA kernels directly. Here we look at TVM and XLA to see what worked well, what didn’t, and why so much of GenAI is still all written in CUDA directly… 🧐” / X https://x.com/clattner_llvm/status/1899913688158798055

“I love GPU benchmarks that measure CPU overhead such as vLLM and KernelBench” / X https://x.com/dylan522p/status/1900379633662779781

“Is that true that thermal dissipation is the key issue for making bigger chips?” / X https://x.com/francoisfleuret/status/1899716309127983535

“Warp divergence is one of the most subtle performance bugs in GPU programming.” / X https://x.com/hyhieu226/status/1899854357354688736

“I’ll be here and talking about ML systems! There’ll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!” / X https://x.com/cHHillee/status/1899655656455692379

“Taiwan’s Foxconn launches its first large language model called ‘FoxBrain’ https://x.com/Reuters/status/1899130627091239366

“R1 in Q4 getting to 18t/s on a new M3 Ultra. That’s what, $9K? getting there.” / X https://x.com/teortaxesTex/status/1899480424834899993

“H100 GPU prices are insanely expensive. If you are an AI startup or looking for cheap H100 GPU access, check the Explorer Tier at Nebius. It’s $1.50 per one H100 GPU-hour (for the first 1,000 hours each month). That’s not a typo. It’s only $1.50 per hour! No upfront” / X https://x.com/svpino/status/1899871762135089314

“AMD’s @AnushElangovan talking about making Radeon GPUs first class citizens on Windows at the RoCm User meetup Multiple GPU architectures supported now Big focus on CI and shipping constantly Building RoCM on @FrameworkPuter laptop live @realGeorgeHotz finally salvation https://x.com/dylan522p/status/1900352609271300572

Lopsided AI Revenues by @ttunguz https://tomtunguz.com/ai-hardware-software/

Token-Efficient Long Video Understanding for Multimodal LLMs https://research.nvidia.com/labs/lpr/storm/

AMD YOLO | the singularity is nearer https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-YOLO.html

“So it looks like the First Scaling Law (the bigger the model the “smarter”) still holds- order of magnitude increases in compute lead to linear improvements in ability GPT-3.5 Turbo scored 30% on GPQA, GPT-4 Turbo got 47%, now GPT-4.5 got 70% And Reasoners add a new Scaling Law https://x.com/emollick/status/1897457930833731979

Intel jumps nearly 15% as investors cheer appointment of new CEO Tan | Reuters https://www.reuters.com/technology/intel-jumps-10-incoming-ceo-tan-brings-instant-credibility-turnaround-efforts-2025-03-13/

[2405.01303] Joint Sequential Fronthaul Quantization and Hardware Complexity Reduction in Uplink Cell-Free Massive MIMO Networks https://arxiv.org/abs/2405.01303

“We’re sharing our progress as we build the next generation of @Meta’s #AI infrastructure, including: ✅Our first custom silicon chip for running AI models ✅A new AI-optimized data center design ✅The second phase of our AI research supercomputer https://x.com/MetaNewsroom/status/1659227458300792832

“Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning “We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem, which provides a principled perspective on spending test-time compute. This perspective enables us to view the https://x.com/iScienceLuvr/status/1899392429893042485

“Diffusion language models that can arbitrarily reshuffle token positions are probably the most powerful way to scale test time compute for ~bounded sequence length. Not sure how close this gets to it.” / X https://x.com/teortaxesTex/status/1899251749690708412

“Looking at this, you can see some of answer to the “what _____ saw” memes about why insiders at the AI labs got so excited a year ago. Reasoning models (“test-time compute”) really do seem to represent a breakthrough in AI capability, at least in fields like math and coding.” / X https://x.com/emollick/status/1898174702478082279

“Tencent released Hunyuan-TurboS, a new ultra-large Transformer-Mamba MoE model —Surpasses GPT-4o, DeepSeek-V3, and open-source rivals on math and reasoning tasks — Competitive on Knowledge, including MMLU-Pro —Lower cost than Hunyuan Turbo https://x.com/rowancheung/status/1899350978853314854

“New work on optimizing test-time compute as a meta-reinforcement learning (RL) problem, which provides a new perspective on test-time compute: Paper + Code: https://x.com/rsalakhu/status/1899597917016744445

“🚀 Introducing Hunyuan-TurboS – the first ultra-large Hybrid-Transformer-Mamba MoE model! Traditional pure Transformer models struggle with long-text training and inference due to O(N²) complexity and KV-Cache issues. Hunyuan-TurboS combines: ✅ Mamba’s efficient long-sequence https://x.com/TXhunyuan/status/1899105803073958010

“I’ll be doing a fireside chat at GTC with Nvidia chief scientist Bill Dally Tuesday next week.” / X https://x.com/ylecun/status/1900298938764202154

Introducing Command A: Max performance, minimal compute https://cohere.com/blog/command-a

“If you’re into AI, you *have to* put NVIDIA’s GTC on your calendar. Stoked to be headed back to bay this year to catch up with my favorite NVIDIA execs and creators. Tune in March 17-21; it’s the epicenter of AI, and online registration is totally free: https://x.com/bilawalsidhu/status/1897454854190325875

“I’ll be talking about optimizing attention on modern hardware. If time permits, will be showing some fun tricks with Blackwell SASS” / X https://x.com/tri_dao/status/1899669458995614179

“> Two CPU dies, 32 cores and 4 DDR4 channels each > 480 Gbit/s inter-chip bandwidth > 40 PCIe Gen 4.0 lanes Frankly not bad at all. For 2019 when it was announced, it was a marvel. Sanctions have really stopped Huawei from dominating datacenter hardware. https://x.com/teortaxesTex/status/1899622820058956280

“Our banger hackathon this Sunday Over 100 B200 / GB200 to hack on Participants from every lab Prizes of Blackwell GPUs + more Speakers – Phil Tillet OpenAI, Horace He Thinking Machines, Tri Dao Together, Vijay Nvidia, and Mark GPUMode/PT 30 spots left Apply NOW Link in next tweet https://x.com/dylan522p/status/1899914025674371188