Chips and Hardware: AI News Week Ending 05/01/2026

Chips and Hardware: AI News Week Ending 05/01/2026

May 1, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: High-end product photograph of a glossy soft-serve sundae in a golden waffle-cone bowl etched like a silicon wafer, the cream curl studded with square chocolate chips laid out in a circuit-board grid with caramel traces and pretzel leads radiating across a white plate, a Dairy Queen red foil band wrapping the cone with bold custom ‘CHIPS’ lettering and a tiny ’75 — Milford, DE 1951′ embossed on the plate rim, soft directional studio light, shallow depth of field, crisp macro detail, landscape composition.

Meta expands Amazon partnership with AWS Graviton chips for AI
https://www.aboutamazon.com/news/aws/meta-aws-graviton-ai-partnership

Meta Partners With AWS on Graviton Chips to Power Agentic AI

Meta Partners With AWS on Graviton Chips to Power Agentic AI

Today we’re announcing an agreement with Amazon Web Services to bring tens of millions of AWS Graviton cores to our compute portfolio. This partnership marks an expansion of our diversified AI infrastructure and will help scale systems behind Meta AI and agentic experiences that
https://x.com/AIatMeta/status/2047647617681957207

.@deepseek_ai v4 Pro’s checkpoint is both in FP4 and FP8, depending on the layer. This means that the entire model can fit on a single NVIDIA 8xB200 node without trouble. @vllm_project: “”Checkpoint is FP4+FP8 mixed: MoE expert weights are stored in FP4 while the remaining
https://x.com/LambdaAPI/status/2047654086263320965

Thoughts after reading the DeepSeek V4 paper: – NVIDIA really is something else. Remember how back in 2024 people were bashing Blackwell as overspec’d and dismissing FP4 as just marketing? Turns out it was all groundwork for the next generation of models. Maybe NVIDIA’s moat is
https://x.com/jukan05/status/2047861732702662741

✨ DeepSeek-V4 is here — a million-token context, 1.6T parameter powerhouse optimized for agentic workflows. Out of the box, on DeepSeek-V4-Pro, NVIDIA Blackwell Ultra delivers over 150 TPS/user interactivity for agentic workflows. And we’re just getting started. Expect these
https://x.com/NVIDIAAI/status/2047765637808664759

Google just broke a decade-long tradition. At Cloud Next 2026, the company unveiled not one, but two new AI chips, the TPU 8t for training and TPU 8i for inference. For the first time ever, Google is splitting its custom silicon into specialized architectures instead of relying
https://x.com/kimmonismus/status/2048745304007299230

Google to sell TPU chips to ‘select’ customers in latest shot at Nvidia
https://finance.yahoo.com/markets/stocks/article/google-to-sell-tpu-chips-to-select-customers-in-latest-shot-at-nvidia-214900221.html

@NVIDIA Nemotron 3 Nano Omni is now on Together AI. Enterprise multimodal AI — video, audio, image, documents & text — optimized for speed and scale. ✅ ~3B active params, 9x higher throughput ✅ Fully managed, zero infra headache ✅ Secure, zero-trust architecture Build
https://x.com/togethercompute/status/2049160446708711883

Excited to support @NVIDIA Nemotron 3 Nano Omni, now available on Fireworks. It’s the first open model that handles vision, audio, video, and text in a single inference loop. Built for multimodal sub-agents at scale, with 9× higher throughput than Qwen3 30B. 256K context. Now
https://x.com/FireworksAI_HQ/status/2049159136802398546

Introducing @NVIDIA Nemotron 3 Nano Omni. NVIDIA Nemotron 3 Nano Omni is an open multimodal foundation model that unifies audio, images, text, and video into a single context window. It powers subagents for use cases like computer-use agent, document intelligence, and video and
https://x.com/baseten/status/2049160818575749300

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents
https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇
https://x.com/NVIDIAAI/status/2049159441870717428

NVIDIA Nemotron 3 Nano Omni is now live on fal, available at launch. A single model for multimodal agents: 🔁 text, image, video, audio in one loop 🧠 1 context reasoning across complex workflows ⚡️ ~9× higher throughput with fewer inference hops Built for real-world agent
https://x.com/fal/status/2049160999442198632

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning.
https://x.com/OpenRouter/status/2049164366218772526

NVIDIA releases Nemotron-3-Nano-Omni, a new 30B open multimodal MoE model. Nemotron-3-Nano-Omni-30B-A3B is the strongest omni model for its size and supports audio, video, image and text. Run on ~25GB RAM. GGUF:
https://t.co/t4COCqVrLS Guide:
https://x.com/UnslothAI/status/2049161390150365344

We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you’d like to do it at your company!
https://x.com/sama/status/2047395562501411058

Building the compute infrastructure for the Intelligence Age | OpenAI
https://openai.com/index/building-the-compute-infrastructure-for-the-intelligence-age/

OpenAI has effectively abandoned first-party Stargate data centers in favor of more flexible deals — company now prefers to lease compute and says Stargate is an umbrella term | Tom’s Hardware
https://www.tomshardware.com/tech-industry/artificial-intelligence/openai-has-effectively-abandoned-first-party-stargate-data-centers-in-favor-of-more-flexible-deals-company-now-prefers-to-lease-compute-and-says-stargate-is-an-umbrella-term

AWS Neuron SDK now available with Neuron Agentic Development for NKI kernel development on Trainium – AWS
https://aws.amazon.com/about-aws/whats-new/2026/04/announcing-neuron-agentic-development/

Worth thinking about the compute gap. Pretraining compute for DeepSeek v4 is ~1e25 flops. OpenAI has 100K GB200s. Assuming all are used, and with a mere 15% MFU, the pretraining run would complete in just over a day (37 hours)
https://x.com/nrehiew_/status/2047840706874749076

Google presents a new Transformer alternative at #ICLR2026! Join Nino Scherrer & Yanick Schimpf at the Google booth (#411) at 10AM to learn about MesaNet, proposing a new linear sequence layer that optimally learns in-context given a fixed memory budget.
https://x.com/GoogleResearch/status/2047630714145776053

Canonical and NVIDIA are collaborating to make NVIDIA Nemotron™ 3 Nano Omni easier to deploy on Ubuntu. With Canonical inference snaps, teams can go from setup to a working runtime in a single command – no complex integration required. Less time spent on infrastructure, more
https://x.com/Canonical/status/2049159988174602712

DeepInfra is an official launch partner for @nvidia Nemotron™ 3 Nano Omni — live today. One open multimodal 30B-A3B model. One pass over image, video, audio, docs+ text. No multi-model pipelines. OpenAI-compatible API, usage-based pricing. $0.20 in / $0.80 out per 1M tokens
https://x.com/DeepInfra/status/2049158141070524815

I mean, I would like it if they had Blackwell, but do these folks think that *Huawei* would be more stable? That in this one feature it’s going to be better?
https://x.com/teortaxesTex/status/2047608887616962992

Nemotron 3 Nano Omni is available locally on Ollama! This requires the latest Ollama 0.22 release.
https://x.com/ollama/status/2049194377751437470

Nemotron 3 Nano Omni is now in LM Studio! A new 30B multi-modal MoE from @nvidia Supports Image input, reasoning, and tool use Requires ~25GB to run locally 🔥🚀
https://x.com/lmstudio/status/2049172192705864091

Today we released Nemotron-3-Nano-Omni-30B-A3B – our first Omni model, with speech and audio understanding capabilities powered by parakeet-tdt-0.6b-v2 encoder. 🫡1st position on VoiceBench 🌏English only 🎙️5.95% WER on Open ASR Leaderboard 📽️Video+audio understanding
https://x.com/PiotrZelasko/status/2049162049599455725

OpenAI models, Codex, and Managed Agents come to AWS | OpenAI
https://openai.com/index/openai-on-aws/

We now estimate that only about 0.3 GW of total facility power is operational for Stargate Abilene, not 0.6 GW. We have moved the 0.6 GW milestone to late May and the 1.2 GW milestone from Q3 to Q4 2026, but both are uncertain. More about this change and our methodology in 🧵
https://x.com/EpochAIResearch/status/2047442515608162481