Image created with OpenAI GPT-Image-1. Image prompt: rich crimson, bright ivory, deep navy Independence-Day palette, vibrant, celebratory, wholesome, authentic, photorealistic marching band rehearsal with brass instruments catching sun scene featuring an oversized GPU chip painted with stars and stripes; natural lighting, subtle film grain, high detail
The race for LLM “cognitive core” – a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.
Its features are slowly crystalizing:
– Natively multimodal text/vision/audio at both input and output.
– Matryoshka-style architecture allowing a dial of capability up and down at test time.
– Reasoning, also with a dial. (system 2)
– Aggressively tool-using.
– On-device finetuning LoRA slots for test-time training, personalization and customization.
– Delegates and double checks just the right parts with the oracles in the cloud if internet is available.
It doesn’t know that William the Conqueror’s reign ended in September 9 1087, but it vaguely recognizes the name and can look up the date. It can’t recite the SHA-256 of empty string as e3b0c442…, but it can calculate it quickly should you really want it.
What LLM personal computing lacks in broad world knowledge and top tier problem-solving capability it will make up in super low interaction latency (especially as multimodal matures), direct / private access to data and state, offline continuity, sovereignty (“not your weights not your brain”). i.e. many of the same reasons we like, use and buy personal computers instead of having thin clients access a cloud via remote desktop or so.https://x.com/karpathy/status/1938626382248149433
Oracle, OpenAI Ink Additional Stargate Deal for 4.5 Gigawatts of US Data Center – Bloomberg https://www.bloomberg.com/news/articles/2025-07-02/oracle-openai-ink-stargate-deal-for-4-5-gigawatts-of-us-data-center-power?embedded-checkout=true
NVIDIA just dropped a blog on their 8B VLM Llama Nemotron Nano VL 📖 icymi they also release an OCR leaderboard with this release 📑 https://x.com/mervenoyann/status/1938713088020136218
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
https://huggingface.co/blog/nvidia/llama-nemotron-nano-vl
RT @CoreWeave: We’re the first cloud provider to bring up the @NVIDIA GB300 NVL72, delivering up to 50x inference throughput and 10x user r…”” / X https://x.com/weights_biases/status/1940818055271272917
NVIDIA Corporation (NVDA) Insiders Cash Out Over $1 Billion In Stock Amid Market Surge: Report https://finance.yahoo.com/news/nvidia-corporation-nvda-insiders-cash-210457901.html
Fast GPU networking for multi-node AI (Infiniband/TCPXO/RDMA/…) requires days to debug and set up. In SkyPilot, we’ve made it as easy as one flag: network_tier: best Get ~4x speedup and save >$2K for debugging from idle GPUs. https://x.com/skypilot_org/status/1940473447739756592
The first @togethercompute GB200 cluster CDUs imbibing coolant in prep to go live next week! Each rack here is 1.4 exaflops of inference performance! https://x.com/vipulved/status/1940242672138244268
This slide encompasses dozens of different layers of the semiconductor industry https://x.com/dylan522p/status/1940562221626806540
OpenAI turns to Google’s AI chips to power its products, source says | Reuters https://www.reuters.com/business/openai-turns-googles-ai-chips-power-its-products-information-reports-2025-06-27/
LeoAM shows that long-context LLM inference can run on a single consumer GPU by keeping only the key-value chunks that matter in GPU memory. While representing the rest with lightweight summaries on disk, and streaming them through a three-tier GPU-CPU-disk pipeline. This https://x.com/rohanpaul_ai/status/1940335638714441872
RT @LysandreJik: BOOOM! transformers now has a baked-in http server w/ OpenAI spec compatible API Launch it with `transformers serve` and…”” / X https://x.com/TheZachMueller/status/1940195982169579805




