Ethan B. Holland

Over 55,600 manually organized AI links and counting

Qwen: AI News Week Ending 03/13/2026

March 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic in deep red ink on worn mustard-yellow cotton fabric, bold cartoon illustration of a carnival fortune teller tent with draped curtains and a retro computer monitor displaying Chinese characters, wooden sign reading QWEN KNOWS ALL in large bold carnival letters, simple outlines, slightly imperfect printed texture with minor fabric stains, humorous nostalgic beach town novelty shirt style

Learn how to run Qwen3.5 locally using Claude Code. Our guide shows you how to run Qwen3.5 on your server for local agentic coding. We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth. Works on 24GB RAM or less. Guide: https://x.com/UnslothAI/status/2031008078850924840

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://t.co/CAYpP1iK3i And yes, Ultra is coming!
https://x.com/ctnzr/status/2031762077325406428

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.
https://x.com/rasbt/status/2032084724743553129

We’re excited to be day-0 launch partners for NVIDIA Nemotron 3 Super! You can try it now on Baseten, or read @rapprach’s blog to learn more about the new model: https://x.com/baseten/status/2031775755253026965

🤖 New models: Qwen3.5, COLQwen3, ColModernVBERT, Ring 2.5, Ovis 2.6, Nemotron embed/rerank VL 🎙️ ASR: FunASR, FireRedASR2, Qwen3-ASR realtime streaming 📦 PyTorch 2.10 upgrade (breaking change for env deps) 🔗 Transformers v5 compatibility Speculative decoding: Nemotron-H MTP,
https://x.com/vllm_project/status/2030178782259171382

🚀 Three attention paradigms are emerging in modern LLMs: Hybrid (Linear + Full), GQA, and DSA. Two recent models illustrate these design choices well: Qwen3.5 and MiniMax M2.5. Here’s a quick breakdown of their architectures from Zhihu contributor kaiyuan👇 🧠 Qwen3.5 — Hybrid
https://x.com/ZhihuFrontier/status/2031686944040915152

🚀 vLLM v0.17.0 is here! 699 commits from 272 contributors (48 new!) This is a big one. Highlights: ⚡ FlashAttention 4 integration 🧠 Qwen3.5 model family with GDN (Gated Delta Networks) 🏗️ Model Runner V2 maturation: Pipeline Parallel, Decode Context Parallel, Eagle3 + CUDA
https://x.com/vllm_project/status/2030178775212671148

RWKV-7 G1e is here (13B/7B/3B/1B). Although Qwen 3.5 is strong, we are improving every month too 🙂 G1f in April. (G1d models all released too).
https://x.com/BlinkDL_AI/status/2031226189654966418