NVIDIA: AI News Week Ending 12/19/2025

NVIDIA: AI News Week Ending 12/19/2025

December 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of child aged 6-8 viewing multiple TV screens in bedroom arc, all screens emanating intense NVIDIA green glow, scattered GPU architecture books and technical diagrams on plush rug, small gaming PC tower with green LED fans beside beanbag, warm ambient lighting contrasted with dominant green screen wash, shallow depth of field, soft focus on child’s absorbed profile, bold text ‘NVIDIA’ at top of frame, cozy yet tech-saturated atmosphere

NVIDIA Debuts Nemotron 3 Family of Open Models | NVIDIA Newsroom https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models

Nvidia and Alphabet VC arms back vibe coding startup Lovable https://www.cnbc.com/2025/12/18/google-and-n.html

🚀 Introducing Nemotron-Cascade! 🚀 We’re thrilled to release Nemotron-Cascade, a family of general-purpose reasoning models trained with cascaded, domain-wise reinforcement learning (Cascade RL), delivering best-in-class performance across a wide range of benchmarks. 💻 Coding https://x.com/_weiping/status/2000947255088701628

Those with Blackwell access have been waiting for Flash Attention 4 to come out since FA2 on Blackwell is really slow (Blackwell dropped WGMMA and requires a rewrite). But you’re likely to see a 50% and higher end-to-end speedup with FA3 if you’re on Hopper, the longer the https://x.com/StasBekman/status/2001839591243026593

“”Nemotron 3 Nano runs nicely with mlx-lm on an M4 Max. Could be a great model for local use on Mac: MoE + hybrid attention make it fast even for very long context. Generating in realtime with 4-bit model: https://x.com/awnihannun/status/2000718403380691417

Nemotron 3 Nano for MLX is now available in LM Studio. General purpose reasoning and chat model trained from scratch by @nvidia. 30B, 3.5B active MoE runs blazingly fast on Apple Silicon 🍎🚀 https://x.com/lmstudio/status/2001015687003963730

🚀🚀🚀 We’re excited to support @NVIDIA and their new open family of models: NVIDIA Nemotron 3! Open in weights, data, tools, and training, Nemotron 3 is built for multi-agent apps and features: ⚡️An efficient hybrid Mamba‑Transformer MoE architecture 🧾1M token context for”” / X https://x.com/vllm_project/status/2000623058076492276

Agent demos often fail for reasons that are hard to see: unclear tool traces, silent failures, and changes that improve one behavior but break another. Our new course with @Nvidia shows how to use their NeMo Agent Toolkit to surface these issues with OpenTelemetry tracing, run https://x.com/DeepLearningAI/status/2001329113622073611

Baseten supports @nvidia Nemotron 3 Nano on day zero Up to 4× faster token generation, high accuracy, and predictable inference built for agentic AI. Available to deploy today on Baseten for high-performance inference. Read more here: https://x.com/basetenco/status/2000582868532121688

Introducing NVIDIA Nemotron 3 Nano, a fully open 30B with 3B active parameter hybrid MoE model engineered for maximum efficiency and benchmark-leading accuracy. AI natives can now use Nemotron 3 Nano on Together AI — with fast, reliable inference for specialized agentic systems https://x.com/togethercompute/status/2000572943718314392

NEWS: NVIDIA announces the NVIDIA Nemotron 3 family of open models, data, and libraries, offering a transparent and efficient foundation for building specialized agentic AI across industries. Nemotron 3 features a hybrid mixture-of-experts (MoE) architecture and new open https://x.com/nvidianewsroom/status/2000588337896198481

.@nvidia Nemotron 3 Nano is now available on Ollama! Local ollama run nemotron-3-nano Cloud ollama run nemotron-3-nano:30b-cloud https://x.com/ollama/status/2000820163231232167

🚀 Day-0 support for @NVIDIA Nemotron 3 Nano in SGLang SGLang now supports Nemotron 3 Nano on Day 0 🎉 A highly efficient, fully open Hybrid MoE model with 1M context, thinking budget, and industry-leading accuracy per compute. ✅ Open weights, data, and recipes ⚡ Fast, https://x.com/lmsysorg/status/2000567938949243111

As AI Grows More Complex, Model Builders Rely on NVIDIA | NVIDIA Blog https://blogs.nvidia.com/blog/leading-models-nvidia/

BREAKING CUDA MOAT EXPANDS: Today, NVIDIA has acquired SchedMD, makers of SLURM, a widely used “”open source”” workload scheduler. Many AI companies such as Mistral, Thinking Machines, parts of Meta’s FAIR division, university academic labs use SLURM. NVIDIA’s acquisition expands https://x.com/SemiAnalysis_/status/2000620209262985641

BREAKING: NVIDIA just dropped an open 30B model that beats GPT-OSS and Qwen3-30B — and runs 2.2-3.3× faster Nemotron 3 Nano: • Up to 1M-token context • MoE: 31.6B total params, 3.6B active • Best-in-class performance for SWE-Bench • Open weights + training recipe + https://x.com/AskPerplexity/status/2000589984818954719

First time I see a major org release @huggingface collections inside collections 🤯 Kudos @nvidia for this brilliant release https://x.com/NielsRogge/status/2000639749514760465

In collaboration with NVIDIA, the new Nemotron 3 Nano model is fully supported in llama.cpp Nemotron 3 Nano features an efficient hybrid, Mamba, MoE architecture. It’s a promising model, suitable for local AI applications on mid-range hardware. The large context window makes it”” / X https://x.com/ggerganov/status/2000574990425415765

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate | NVIDIA Technical Blog https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/

New mlx-lm release: pip install -U mlx-lm Includes support for a few new models: – Nemotron 3 Nano (Nvidia) – Devstral (Mistral) – rnj-1 (Essential AI) https://x.com/awnihannun/status/2000974327660077298

Nvidia continues to put out some of the strongest and fastest open models. Pretraining and post training data are released as well, something very few orgs have done”” / X https://x.com/tri_dao/status/2000707760288092655

NVIDIA Debuts Nemotron 3 Family of Open Models | NVIDIA Newsroom https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models/?ncid=so-twit-561360

NVIDIA has just released Nemotron 3 Nano, a ~30B MoE model that scores 52 on the Artificial Analysis Intelligence Index with just ~3B active parameters Hybrid Mamba-Transformer architecture: Nemotron 3 Nano combines the hybrid Mamba-Transformer approach @NVIDIAAI has used on https://x.com/ArtificialAnlys/status/2000602570092675402

NVIDIA just released Nemotron-Agentic-v1 on Hugging Face This dataset empowers LLMs as interactive, tool-using agents for multi-turn conversations and reliable task completion. Ready for commercial use. https://x.com/HuggingPapers/status/2000628009049760072

NVIDIA just released Nemotron-Cascade-8B on Hugging Face A powerful 8B general-purpose reasoning model that achieves best-in-class performance across diverse benchmarks, from math to coding, by using novel Cascade RL. https://x.com/HuggingPapers/status/2001065870676603333

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! 🔥 Nemotron 3 has a 1M context window and the best in class performance for SWE-Bench, reasoning and chat. Run the MoE model locally with 24GB RAM. Guide: https://x.com/UnslothAI/status/2000568378407452746

Really impressive release from NVIDIA, who not only went head-to-head with Qwen3, but: – innovated on the architecture (risky for most open labs) – did legit multi-env RL, complete with agentic evals (first time I see this from an open lab) – plan to open source the pretraining”” / X https://x.com/_lewtun/status/2000599470099099990

SemiAnalysis InferenceMAX showing GPT OSS on Blackwell is 33% more tokens per $ in just 1 month thanks to the awesome work of @vllm_project and @nvidia”” / X https://x.com/dylan522p/status/2002135815233970295

This is not just another strong open model. Nemotron actually releases training data (!), RL environments, and training code. This is a big difference: almost all model developers just want people to use their models; NVIDIA is enabling people to make their own models. We are”” / X https://x.com/percyliang/status/2000608134205985169

Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months. https://x.com/ctnzr/status/2000567572065091791

vLLM delivers even more inference performance with the same GPU platform. In just 1 month, we’ve worked with NVIDIA to increase @nvidia Blackwell maximum throughput per GPU by up to 33% — significantly reducing cost per token — while also enabling even higher peak speed for https://x.com/vllm_project/status/2001449658984632699

When @NVIDIA announced Nemotron 3 – it marked a symbolic turning point in a year that fundamentally reshaped open-source AI leadership. Is NVIDIA the new open-source king? What’s behind this strategy? Let’s see. ▪️ It releases 3 trillion tokens of new pretraining, 18 million https://x.com/TheTuringPost/status/2001087448299065372

SkyPilot + @NVIDIA Dynamo: unmatched flexibility for inference workloads ⚡ Blazing fast MoE inference with PD disaggregation & KV-aware routing ☁️ Deploy on any cloud or k8s in minutes 🔌 Drop-in OpenAI API replacement Recipe: https://x.com/skypilot_org/status/2000999292333666339

Mech interp question: do the new nemotron models make use of negative zero for any circuits?”” / X https://x.com/andrew_n_carr/status/2000744793480270236