Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide-angle interior of a muted, desaturated Chinese semiconductor fabrication plant with workers in white cleanroom suits amid processing machinery under flat fluorescent light, visible through industrial windows a fire horse stands calmly in an adjacent concrete courtyard with construction barriers, observational realism composition with large white Chinese cinema poster text reading CHIPS overlaid, documentary stillness, postindustrial palette of concrete gray and washed teal, patient long-take framing

Meta Builds AI Infrastructure With NVIDIA | NVIDIA Newsroom https://nvidianews.nvidia.com/news/meta-builds-ai-infrastructure-with-nvidia

Meta expands Nvidia deal to use millions of AI data center chips https://www.cnbc.com/2026/02/17/meta-nvidia-deal-ai-data-center-chips.html

🚀 DeepSeek R1 on GB300 with vLLM: 22.5K prefill TGS and 3K decode TGS per GPU — an 8x prefill and 10-20x mixed-context improvement over Hopper. DeepSeek V3.2 on 2 GPUs (NVFP4 + TP2): 7.4K prefill TGS and 2.8K decode TGS. Key recipe: ⚡ NVFP4 weights from HuggingFace ⚡”” https://x.com/vllm_project/status/2022308974150975792

Deepspeed ZeRO 1+2 used to take forever to load huge models on multi-gpu as tensor flattening was happening on cpu due to the small gpu size back when it was designed. Now things load super fast thanks to a rework by Kento Sugama to flatten on gpu. Yay!”” https://x.com/StasBekman/status/2022354880049082658

My realistic assessment of something like this: – If it costs billions to train a model and even more to serve it, spending tens of millions to tape-out a custom chip which is 10x more efficient (at 1/10th the latency!) makes financial sense – One major downside is the latency”” https://x.com/awnihannun/status/2024868422224671193

Nvidia, Groq and the limestone race to real-time AI: Why enterprises win or lose here | VentureBeat https://venturebeat.com/infrastructure/nvidia-groq-and-the-limestone-race-to-real-time-ai-why-enterprises-win-or

NVIDIA’s new Blackwell Ultra GB300 NVL72 systems deliver up to 50x higher performance per megawatt and (this is even more important and impressive) 35x lower cost per token versus the Hopper platform. Energy will be the biggest bottleneck, therefore performance per watt is”” https://x.com/kimmonismus/status/2023456488782487566

The glory work of GPU scheduling is in the frontier data centers with hundreds of thousands of GPUs, but a lot of research work is done with single GPU jobs on modest clusters, and the scheduling leaves much to be desired. I wish there were a clean way to preempt GPU tasks, so”” https://x.com/ID_AA_Carmack/status/2023805426345689198

the inference compute available to you is increasingly going to drive overall software productivity:”” https://x.com/gdb/status/2024662197692223857

Taalas runs Llama 3 8B at 16k tokens per second per user. That’s almost an order of magnitude increase even compared to SRAM-based systems like Cerebras. Key idea: each chip is specialized to a given model. The chip is the model. The chat demo is pretty wild:”” https://x.com/awnihannun/status/2024671348782711153

This is false If no sanctions were present, then the gap would either be super small or non-existent DeepSeek bros (@zheanxu & @chenggang_zhao) would absolutely cook on Rubin & Blackwells”” https://x.com/zephyr_z9/status/2024437158988353630

OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators | OpenAI https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/

17,000 tokens per second!! Read that again! LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200. the “”waiting for the LLM to think”” era is dead. Code generates at the speed of human thought.”” https://x.com/wildmindai/status/2024810128487096357

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading