Ethan B. Holland

Over 53,700 manually organized AI links and counting

Chips and Hardware: AI News Week Ending 03/13/2026

March 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic on worn mustard-yellow cotton fabric, deep red ink illustration of a large rectangular surfboard wax bar designed as a computer chip with simple grid patterns and pins, sitting on wooden beach planks with sand, bold arched text reading CHIPS in retro display font, slightly imperfect printed look with fabric texture and minor stains, simple cartoon outlines, nostalgic beach town novelty shirt style.

Thinking Machines Lab and NVIDIA Announce Long-Term Gigawatt-Scale Strategic Partnership – Thinking Machines Lab https://thinkingmachines.ai/news/nvidia-partnership/

We’re thrilled to partner with @thinkymachines to deploy at least 1 gigawatt of NVIDIA Vera Rubin systems for frontier AI model training.
https://x.com/NVIDIAAI/status/2031381911852175868

We’re the first cloud to bring up an NVIDIA Vera Rubin NVL72 system for validation, another big step in building the next generation of AI infrastructure with NVIDIA.
https://x.com/satyanadella/status/2032515189086761005

🚀 Day 0 support for Nvidia’s Nemotron 3 Super! We’re excited to support open source models that push the frontier of model intelligence, cost, and latency Try it out in deepagents today!
https://x.com/LangChain/status/2031784791251525934

🚀 NVIDIA Nemotron 3 Super is now available on Together AI. A 120B hybrid MoE model with 12B active parameters, delivers leaing efficiency and accuracy for multi-agent AI systems. Run Nemotron 3 Super on Together’s Dedicated inference with reliable infrastructure and 99.9%
https://x.com/togethercompute/status/2031831368339243454

In collaboration with NVIDIA we announce support for the new NVIDIA Nemotron 3 Super model in llama.cpp NVIDIA Nemotron 3 Super is a 120B open MoE model activating just 12B parameters to deliver maximum compute efficiency and accuracy for complex multi-agent applications.
https://x.com/ggerganov/status/2031819920363733205

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI | NVIDIA Blog https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/

Nvidia Is Planning to Launch an Open-Source AI Agent Platform | WIRED https://www.wired.com/story/nvidia-planning-ai-agent-platform-launch-open-source/

NVIDIA releases Nemotron-3-Super, a new 120B open hybrid MoE model. Nemotron-3-Super-120B-A12B has a 1M-token context window and achieves competitive agentic coding and chat performance. Run on ~64GB RAM. GGUF: https://t.co/wuFdRZLdSk Guide: https://x.com/UnslothAI/status/2031778104306499749

Very grateful to Jensen for working to expand Nvidia capacity at AWS so much for us!
https://x.com/sama/status/2030318958512164966

Great to see vLLM powering a fully local AI assistant on @nvidia Jetson 🦞 The OpenClaw tutorial shows how to serve MoE models like Nemotron 3 Nano 30B with vLLM on Jetson AGX — everything runs on-device, zero cloud APIs. Thanks to the @NVIDIARobotics Jetson team for putting
https://x.com/vllm_project/status/2030839132512002217

NVIDIA Nemotron 3 Super is now available on Ollama. ollama run nemotron-3-super:cloud 🦞Try it with OpenClaw: ollama launch openclaw –model nemotron-3-super:cloud Run it locally on your device: ollama run nemotron-3-super > 120B mixture of experts model with 12B active >
https://x.com/ollama/status/2031777869681000676

Fun fact: The first transatlantic internet cable is being pulled off the ocean floor right now. Almost no one knows it’s happening. TAT-8 went live in 1988. First fiber-optic cable to connect Europe and the US. Isaac Asimov called it a “”maiden voyage across the sea on a beam
https://x.com/rowancheung/status/2031403030382522559

The emerging role of SRAM-centric chips in AI inference | Gimlet Blog https://gimletlabs.ai/blog/sram-centric-chips

Together GPU Clusters now includes autoscaling, RBAC, full-stack observability, and self-healing operations built in. Move from experimental GPU infrastructure to production-ready AI platforms with elastic capacity, multi-team governance, and automated failure recovery.
https://x.com/togethercompute/status/2031471454311821750

Oracle is building yesterday’s data centers with tomorrow’s debt https://www.cnbc.com/2026/03/09/oracle-is-building-yesterdays-data-centers-with-tomorrows-debt.html

One of the things that makes Nemotron 3 Super so fast is native multi-token prediction. 1. Model predicts several tokens rather than just one, which is essentially free because it’s just a bit of extra work for the last layer of the model. The first token is accepted, the
https://x.com/ctnzr/status/2031776463029186920

The LatentMoE architecture of Nemotron 3 is interesting and a great learning exercise for LLM / MoE inference patterns… MoE basics. An MoE layer basically just makes multiple copies of the feed-forward component of the transformer block. Instead of a single feed-forward neural
https://x.com/cwolferesearch/status/2032225187949666811

Virtualization overhead compounds at scale. Lambda’s Bare Metal Instances on NVIDIA Vera Rubin NVL72 Superclusters remove the hypervisor entirely. Lambda’s Maxx Garrison is breaking it down at #NVIDIAGTC. Register: https://x.com/LambdaAPI/status/2032427317696602575

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys Details in thread 🧵below:
https://x.com/kuchaev/status/2031765052970393805

Scoop from me: Nvidia will spend a total of $26 billion over the next five years building the world’s best open source models. America is back in the open source AI race!
https://x.com/willknight/status/2031792027390587313

🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: – 5x higher throughput – 2x higher accuracy on Artificial Analysis
https://x.com/vllm_project/status/2031779213527957732

🔥 Kernel upgrades: – FlashInfer Sparse MLA backend – Triton-based top-k/top-p sampler kernels – TRTLLM DSV3 Router GEMM: 6% batch-1 speedup – Helion kernel framework with autotuning 🖥️ Hardware: – NVIDIA SM100/SM120 optimizations (MXFP8, FP8 GEMM) – AMD ROCm: AITER fused
https://x.com/vllm_project/status/2030178779331502497

How NVIDIA Builds Open Data for AI https://huggingface.co/blog/nvidia/open-data-for-ai

Maintaining separate attention kernels for every GPU platform doesn’t scale. The vLLM Triton attention backend takes a different approach: ~800 lines of Triton, same source code across NVIDIA, AMD, and Intel GPUs. On H100, it matches state-of-the-art attention performance. On
https://x.com/vllm_project/status/2029919035924828234

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across
https://x.com/ArtificialAnlys/status/2031765321233908121

NVIDIA-Nemotron-3-Super-Technical-Report.pdf https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

the bible for mixture of expert training infra, thanks nvidia
https://x.com/eliebakouch/status/2031249241566273764

The new @NVIDIA Nemotron 3 Super is here and it’s live on W&B Inference! 120B hybrid MoE, 12B active params, 1M token context. 5x token efficiency over previous Nemotron Super and highest performance among open models in its class. We’re giving away $20 in credits to try it 👇
https://x.com/wandb/status/2031778471614300563

NVIDIA GTC is just one week away. It will feature plenty of robotics exhibitors: Unitree, Noble Machines, Persona AI, Hexagon Robotics, Humanoid [SKL], Galbot, Dyna Robotics, Generative Bionics, Sharpa. I’ll be there – looking forward to connect with many of you!
https://x.com/TheHumanoidHub/status/2031084926859358461

Scaling to billions of humanoids won’t happen without the five-layer AI stack. As NVIDIA put it, ‘A humanoid robot is an AI application embodied in a body.’ This industrial foundation represents a fundamental departure from the traditional robotics playbook.
https://x.com/TheHumanoidHub/status/2031765823787090223

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://t.co/CAYpP1iK3i And yes, Ultra is coming!
https://x.com/ctnzr/status/2031762077325406428

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.
https://x.com/rasbt/status/2032084724743553129

We’re excited to be day-0 launch partners for NVIDIA Nemotron 3 Super! You can try it now on Baseten, or read @rapprach’s blog to learn more about the new model: https://x.com/baseten/status/2031775755253026965

KV-cache math for Nemotron 3 Super With 8 attention layers, 2 KV heads, and head-dim 128, the sequence-growing KV cache comes out to: 8,192 bytes/token in BF16 4,096 bytes/token in FP8 That means: 1M tokens → 7.63 GiB BF16 / 3.81 GiB FP8 262k tokens → 2.00 GiB BF16 / 1.00
https://x.com/bnjmn_marie/status/2031821490916905089