Ethan B. Holland

Over 53,700 manually organized AI links and counting

Alibaba: AI News Week Ending 03/13/2026

March 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic on worn mustard-yellow cotton fabric, single color deep red ink, cartoon illustration of smiling genie character pushing overflowing shopping cart full of boxes with price tags, simple bold outlines, large text ALIBABA arched across top, retro local novelty shirt style with aged fabric texture and slight imperfections

Learn how to run Qwen3.5 locally using Claude Code. Our guide shows you how to run Qwen3.5 on your server for local agentic coding. We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth. Works on 24GB RAM or less. Guide: https://x.com/UnslothAI/status/2031008078850924840

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it’s nice to see how much people are enjoying it. But it’s also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.
https://x.com/sama/status/2030319489993298349

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://t.co/CAYpP1iK3i And yes, Ultra is coming!
https://x.com/ctnzr/status/2031762077325406428

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.
https://x.com/rasbt/status/2032084724743553129

We’re excited to be day-0 launch partners for NVIDIA Nemotron 3 Super! You can try it now on Baseten, or read @rapprach’s blog to learn more about the new model: https://x.com/baseten/status/2031775755253026965

🤖 New models: Qwen3.5, COLQwen3, ColModernVBERT, Ring 2.5, Ovis 2.6, Nemotron embed/rerank VL 🎙️ ASR: FunASR, FireRedASR2, Qwen3-ASR realtime streaming 📦 PyTorch 2.10 upgrade (breaking change for env deps) 🔗 Transformers v5 compatibility Speculative decoding: Nemotron-H MTP,
https://x.com/vllm_project/status/2030178782259171382

🚀 Three attention paradigms are emerging in modern LLMs: Hybrid (Linear + Full), GQA, and DSA. Two recent models illustrate these design choices well: Qwen3.5 and MiniMax M2.5. Here’s a quick breakdown of their architectures from Zhihu contributor kaiyuan👇 🧠 Qwen3.5 — Hybrid
https://x.com/ZhihuFrontier/status/2031686944040915152

🚀 vLLM v0.17.0 is here! 699 commits from 272 contributors (48 new!) This is a big one. Highlights: ⚡ FlashAttention 4 integration 🧠 Qwen3.5 model family with GDN (Gated Delta Networks) 🏗️ Model Runner V2 maturation: Pipeline Parallel, Decode Context Parallel, Eagle3 + CUDA
https://x.com/vllm_project/status/2030178775212671148

RWKV-7 G1e is here (13B/7B/3B/1B). Although Qwen 3.5 is strong, we are improving every month too 🙂 G1f in April. (G1d models all released too).
https://x.com/BlinkDL_AI/status/2031226189654966418