Open Source: AI News Week Ending 03/13/2026

Open Source: AI News Week Ending 03/13/2026

March 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic in deep red ink on worn mustard-yellow cotton fabric, showing a simple cartoon charcoal grill viewed from above with multiple hands reaching in from all sides adding food and stirring, bold text reading OPEN SOURCE integrated into the composition, retro local novelty shirt style with slightly imperfect printed texture and aged fabric with minor stains.

🚀 Day 0 support for Nvidia’s Nemotron 3 Super! We’re excited to support open source models that push the frontier of model intelligence, cost, and latency Try it out in deepagents today!
https://x.com/LangChain/status/2031784791251525934

🚀 NVIDIA Nemotron 3 Super is now available on Together AI. A 120B hybrid MoE model with 12B active parameters, delivers leaing efficiency and accuracy for multi-agent AI systems. Run Nemotron 3 Super on Together’s Dedicated inference with reliable infrastructure and 99.9%
https://x.com/togethercompute/status/2031831368339243454

In collaboration with NVIDIA we announce support for the new NVIDIA Nemotron 3 Super model in llama.cpp NVIDIA Nemotron 3 Super is a 120B open MoE model activating just 12B parameters to deliver maximum compute efficiency and accuracy for complex multi-agent applications.
https://x.com/ggerganov/status/2031819920363733205

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI | NVIDIA Blog https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/

Nvidia Is Planning to Launch an Open-Source AI Agent Platform | WIRED https://www.wired.com/story/nvidia-planning-ai-agent-platform-launch-open-source/

NVIDIA releases Nemotron-3-Super, a new 120B open hybrid MoE model. Nemotron-3-Super-120B-A12B has a 1M-token context window and achieves competitive agentic coding and chat performance. Run on ~64GB RAM. GGUF: https://t.co/wuFdRZLdSk Guide: https://x.com/UnslothAI/status/2031778104306499749

1/6 Today we’re introducing Storage Buckets on the Hugging Face Hub. They’re built for mutable, non-versioned ML artifacts: checkpoints, optimizer states, processed shards, logs, traces, eval outputs, and agent-generated files.
https://x.com/Wauplin/status/2031428845887213922

Introducing Storage Buckets on Hugging Face 🧑‍🚀 The first new repo type on the Hub in 4 years: S3-like object storage, mutable, non-versioned, built on Xet deduplication. – Starting at $8/TB/mo. That’s 3x cheaper than S3. You (and your coding agents) need somewhere to dump
https://x.com/victormustar/status/2031419482292576725

Tried many AI models with OpenClaw, I found Kimi AI to be the most token efficient, good at coding, also the easiest to set up.
https://x.com/cz_binance/status/2031313379235606989

Everything Gets Rebuilt: my conversation with Harrison Chase, CEO of @LangChain about agent harnesses, evals, runtimes, sandboxes, MCP and the future of the agent stack 00:00 Intro – meet @hwchase17 – at the Chase Center for the @daytonaio Compute conference 01:32 What changed
https://x.com/mattturck/status/2032141473009823882

Our open source agent harness, Stirrup, now integrates with Slack! Build custom Slack bot agents directly into your workflows The latest release of our lightweight, open source agent framework, Stirrup, now comes with Slack integration, featuring: ➤ 📁 Document input/output:
https://x.com/ArtificialAnlys/status/2032135114914951375

Learn how to run Qwen3.5 locally using Claude Code. Our guide shows you how to run Qwen3.5 on your server for local agentic coding. We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth. Works on 24GB RAM or less. Guide: https://x.com/UnslothAI/status/2031008078850924840

Today we’re releasing our first open source TTS model, TADA! TADA (Text Audio Dual Alignment) is a speech-language model that generates text and audio in one synchronized stream to reduce token-level hallucinations and improve latency. This means: → Zero content hallucinations
https://x.com/hume_ai/status/2031401003078062578

DeepSeek MoE training efficiency has been completely commoditized (though it’s startling how much of that is reimplementation of DeepSeek’s own Open Source Week releases 1 year ago, updated for Blackwell). Very cool
https://x.com/teortaxesTex/status/2031263831595335702

🧵 1/4 Still waiting for DeepSeek-V4? We (@Zai_org) made DSA 1.8× faster with minimal code change — and it’s ready to deliver real inference gains on GLM-5. IndexCache removes 50% of indexer computations in DeepSeek Sparse Attention with virtually zero quality loss. On GLM-5
https://x.com/realYushiBai/status/2032299919999189107

I know there is some overlap between open source and anti-AI activists, but I have a hard time reconciling it. My million+ open source LOC were always intended as a gift to the world. Yes, I would make arguments about how it would strengthen our communities, and the GPL would
https://x.com/ID_AA_Carmack/status/2032460578669691171

Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory | VentureBeat https://venturebeat.com/orchestration/google-pm-open-sources-always-on-memory-agent-ditching-vector-databases-for

Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL For extensive details please see this writeup: https://t.co/9laTFfU28P Thanks a lot to @krasul for helping make it
https://x.com/StasBekman/status/2031081858763792574

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys Details in thread 🧵below:
https://x.com/kuchaev/status/2031765052970393805

Scoop from me: Nvidia will spend a total of $26 billion over the next five years building the world’s best open source models. America is back in the open source AI race!
https://x.com/willknight/status/2031792027390587313

🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: – 5x higher throughput – 2x higher accuracy on Artificial Analysis
https://x.com/vllm_project/status/2031779213527957732

🔥 Kernel upgrades: – FlashInfer Sparse MLA backend – Triton-based top-k/top-p sampler kernels – TRTLLM DSV3 Router GEMM: 6% batch-1 speedup – Helion kernel framework with autotuning 🖥️ Hardware: – NVIDIA SM100/SM120 optimizations (MXFP8, FP8 GEMM) – AMD ROCm: AITER fused
https://x.com/vllm_project/status/2030178779331502497

How NVIDIA Builds Open Data for AI https://huggingface.co/blog/nvidia/open-data-for-ai

Maintaining separate attention kernels for every GPU platform doesn’t scale. The vLLM Triton attention backend takes a different approach: ~800 lines of Triton, same source code across NVIDIA, AMD, and Intel GPUs. On H100, it matches state-of-the-art attention performance. On
https://x.com/vllm_project/status/2029919035924828234

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across
https://x.com/ArtificialAnlys/status/2031765321233908121

NVIDIA-Nemotron-3-Super-Technical-Report.pdf https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

the bible for mixture of expert training infra, thanks nvidia
https://x.com/eliebakouch/status/2031249241566273764

The new @NVIDIA Nemotron 3 Super is here and it’s live on W&B Inference! 120B hybrid MoE, 12B active params, 1M token context. 5x token efficiency over previous Nemotron Super and highest performance among open models in its class. We’re giving away $20 in credits to try it 👇
https://x.com/wandb/status/2031778471614300563

Our next kernel competition is now open for submissions! A $1.1M cash prize competition sponsored by AMD on optimizing DeepSeek-R1-0528, GPT-OSS-120B on MI355X Registration:
https://x.com/GPU_MODE/status/2029974019018244223

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://t.co/CAYpP1iK3i And yes, Ultra is coming!
https://x.com/ctnzr/status/2031762077325406428

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.
https://x.com/rasbt/status/2032084724743553129

We’re excited to be day-0 launch partners for NVIDIA Nemotron 3 Super! You can try it now on Baseten, or read @rapprach’s blog to learn more about the new model: https://x.com/baseten/status/2031775755253026965

Codex for Open Source is an awesome idea. OSS maintainers get API credits, 6 months of ChatGPT Pro with Codex, and access to Codex Security as needed.
https://x.com/kevinweil/status/2030000508342272368

Excited to introduce Codex for Open Source! 🔥 TL;DR – ChatGPT Pro, Codex, and API credits for eligible open-source maintainers Open source has shaped modern software, and so much of it depends on maintainers doing steady, often invisible work to keep critical projects healthy.
https://x.com/reach_vb/status/2029998272945717553

T3 Code is now available for everyone to use. Fully open source. Built on top of the Codex CLI, so you can bring your existing Codex subscription.
https://x.com/theo/status/2030071716530245800

.@reflection_ai’s thesis is huge – build the missing Western open base model, then use RL to push it to the frontier. The problem is that this is also the slowest path in the game. “All hands on deck building the model” means no clear wedge product yet, few concrete proof points,
https://x.com/TheTuringPost/status/2030422345710711008

@ID_AA_Carmack Your position is different than many open source contributors though. You were already compensated for your time via commercial sales for most of those LOC no? Many open source devs may be open sourcing work that is their primary output, or an uncompensated side gig. It’s not
https://x.com/wightmanr/status/2032555294296084755

I open source my code because I want people to be able to do anything at all they like with it. That includes training AI models. People complaining about this do not speak for me, and I suspect do not speak for most people who open source their code. I welcome such use.
https://x.com/perrymetzger/status/2032543203795284218

Introducing v2 of our Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Fully free & open source. We’re releasing everything: evaluation dataset, code, app, and blog 🔥
https://x.com/togethercompute/status/2032524281461223614

Open Weights isn’t Open Training https://www.workshoplabs.ai/blog/open-weights-open-training

Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization | Hume Blog | Hume AI https://www.hume.ai/blog/opensource-tada

Yes!! Exactly!! Open sourcing is making a gift to the world. There’s two things about gifting: 1) you let go, it’s not yours anymore 2) you expect absolutely nothing in return Most of my life i thought everyone in oss had this mindset; over the years i learned that it ain’t so.
https://x.com/giffmana/status/2032528855215276282

We open sourced WAXAL! – Multilingual speech dataset for African languages – 17 languages for TTS – 19 languages for ASR Over 100 million speakers across 40 Sub-Saharan African countries
https://x.com/osanseviero/status/2032452729059045881

New OpenFold3 preview out! (OF3p2) It closes the gap to AlphaFold3 for most modalities. Most critically, we’re releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
https://x.com/MoAlQuraishi/status/2032471033760903511

🤖 New models: Qwen3.5, COLQwen3, ColModernVBERT, Ring 2.5, Ovis 2.6, Nemotron embed/rerank VL 🎙️ ASR: FunASR, FireRedASR2, Qwen3-ASR realtime streaming 📦 PyTorch 2.10 upgrade (breaking change for env deps) 🔗 Transformers v5 compatibility Speculative decoding: Nemotron-H MTP,
https://x.com/vllm_project/status/2030178782259171382

🚀 Three attention paradigms are emerging in modern LLMs: Hybrid (Linear + Full), GQA, and DSA. Two recent models illustrate these design choices well: Qwen3.5 and MiniMax M2.5. Here’s a quick breakdown of their architectures from Zhihu contributor kaiyuan👇 🧠 Qwen3.5 — Hybrid
https://x.com/ZhihuFrontier/status/2031686944040915152

🚀 vLLM v0.17.0 is here! 699 commits from 272 contributors (48 new!) This is a big one. Highlights: ⚡ FlashAttention 4 integration 🧠 Qwen3.5 model family with GDN (Gated Delta Networks) 🏗️ Model Runner V2 maturation: Pipeline Parallel, Decode Context Parallel, Eagle3 + CUDA
https://x.com/vllm_project/status/2030178775212671148

RWKV-7 G1e is here (13B/7B/3B/1B). Although Qwen 3.5 is strong, we are improving every month too 🙂 G1f in April. (G1d models all released too).
https://x.com/BlinkDL_AI/status/2031226189654966418