Chips and Hardware: AI News Week Ending 07/04/2025

Chips and Hardware: AI News Week Ending 07/04/2025

July 4, 2025

Image created with OpenAI GPT-Image-1. Image prompt: rich crimson, bright ivory, deep navy Independence-Day palette, vibrant, celebratory, wholesome, authentic, photorealistic marching band rehearsal with brass instruments catching sun scene featuring an oversized GPU chip painted with stars and stripes; natural lighting, subtle film grain, high detail

The race for LLM “cognitive core” – a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.
Its features are slowly crystalizing:

– Natively multimodal text/vision/audio at both input and output.
– Matryoshka-style architecture allowing a dial of capability up and down at test time.
– Reasoning, also with a dial. (system 2)
– Aggressively tool-using.
– On-device finetuning LoRA slots for test-time training, personalization and customization.
– Delegates and double checks just the right parts with the oracles in the cloud if internet is available.

It doesn’t know that William the Conqueror’s reign ended in September 9 1087, but it vaguely recognizes the name and can look up the date. It can’t recite the SHA-256 of empty string as e3b0c442…, but it can calculate it quickly should you really want it.

What LLM personal computing lacks in broad world knowledge and top tier problem-solving capability it will make up in super low interaction latency (especially as multimodal matures), direct / private access to data and state, offline continuity, sovereignty (“not your weights not your brain”). i.e. many of the same reasons we like, use and buy personal computers instead of having thin clients access a cloud via remote desktop or so.https://x.com/karpathy/status/1938626382248149433

Oracle, OpenAI Ink Additional Stargate Deal for 4.5 Gigawatts of US Data Center – Bloomberg https://www.bloomberg.com/news/articles/2025-07-02/oracle-openai-ink-stargate-deal-for-4-5-gigawatts-of-us-data-center-power?embedded-checkout=true

NVIDIA just dropped a blog on their 8B VLM Llama Nemotron Nano VL 📖 icymi they also release an OCR leaderboard with this release 📑 https://x.com/mervenoyann/status/1938713088020136218

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
https://huggingface.co/blog/nvidia/llama-nemotron-nano-vl

RT @CoreWeave: We’re the first cloud provider to bring up the @NVIDIA GB300 NVL72, delivering up to 50x inference throughput and 10x user r…”” / X https://x.com/weights_biases/status/1940818055271272917

NVIDIA Corporation (NVDA) Insiders Cash Out Over $1 Billion In Stock Amid Market Surge: Report https://finance.yahoo.com/news/nvidia-corporation-nvda-insiders-cash-210457901.html

Fast GPU networking for multi-node AI (Infiniband/TCPXO/RDMA/…) requires days to debug and set up. In SkyPilot, we’ve made it as easy as one flag: network_tier: best Get ~4x speedup and save >$2K for debugging from idle GPUs. https://x.com/skypilot_org/status/1940473447739756592

The first @togethercompute GB200 cluster CDUs imbibing coolant in prep to go live next week! Each rack here is 1.4 exaflops of inference performance! https://x.com/vipulved/status/1940242672138244268

This slide encompasses dozens of different layers of the semiconductor industry https://x.com/dylan522p/status/1940562221626806540

OpenAI turns to Google’s AI chips to power its products, source says | Reuters https://www.reuters.com/business/openai-turns-googles-ai-chips-power-its-products-information-reports-2025-06-27/

LeoAM shows that long-context LLM inference can run on a single consumer GPU by keeping only the key-value chunks that matter in GPU memory. While representing the rest with lightweight summaries on disk, and streaming them through a three-tier GPU-CPU-disk pipeline. This https://x.com/rohanpaul_ai/status/1940335638714441872

RT @LysandreJik: BOOOM! transformers now has a baked-in http server w/ OpenAI spec compatible API Launch it with `transformers serve` and…”” / X https://x.com/TheZachMueller/status/1940195982169579805