Qwen: AI News Week Ending 03/06/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: High altitude aerial photograph of a joyful person in freefall wearing a Q-logo t-shirt, throwing colorful parachutes to other tiny figures below, bright blue sky, wide angle shot, the bold text ‘QWEN’ prominently displayed as clean magazine cover typography in the upper portion of the frame, dynamic action photography, crisp daylight, simple clean composition.

Some big exits from Alibaba’s Qwen team today. A unified “”Qwen is nothing without its people”” message is circulating via staff members, echoing OpenAI’s 2023 board drama. The exodus comes following its Qwen3.5-small launch, alongside a ‘unification’ of the brand + restructure.”” https://x.com/TheRundownAI/status/2028945432227774657?s=20

❓Qwen’s lead steps down — what’s happening inside Alibaba’s AI unit? Alibaba Qwen head Lin Junyang @JustinLin610 announced his departure. According to report by 晚点LatePost, the trigger may be ongoing organizational restructuring. 🔄 From vertical integration to horizontal”” https://x.com/ZhihuFrontier/status/2029117410259993073

Alibaba Qwen’s Tech Lead Junyang Lin, 2 Other Researchers Step Down https://officechai.com/ai/alibaba-qwens-tech-lead-junyang-lin-steps-down/

And the underlying reason seems to be that Alibaba has unified the entire team under the Qwen umbrella, with everything reporting directly to Alibaba’s CEO. That may have made the leadership above Junyang concerned that he could grow beyond their control, even though they rely”” https://x.com/Xinyu2ML/status/2028891170592473385

End of an era. Qwen lost its tech lead. When we launched the Qwen 3 Next endpoints on Hyperbolic with Junyang and his team, they were still online at 6am Beijing time! Thank you for pushing open source AI forward. Wishing you the best, @JustinLin610!”” https://x.com/Yuchenj_UW/status/2028872969217515996

Something is afoot in the land of Qwen https://simonwillison.net/2026/Mar/4/qwen/

The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable. I’ll do my best to keep carrying that torch (not that I’ve reached the level of impact of Qwen by any means). Every bit matters.”” https://x.com/natolambert/status/2028893211759124890

To be precise: Alibaba-Cloud kicked out Qwen’s tech lead.”” https://x.com/YouJiacheng/status/2028880908305219729

Update on the Qwen shakeup. Per 36Kr, Alibaba CEO Eddie Wu held an emergency all-hands with the Qwen team this afternoon. He told the team: “I should have known about this sooner.” Alibaba’s official framing: this is expansion, not contraction. But the meeting revealed real”” https://x.com/poezhao0605/status/2029151951167078454

Alibaba’s small, open source Qwen3.5-9B beats OpenAI’s gpt-oss-120B and can run on standard laptops | VentureBeat https://venturebeat.com/technology/alibabas-small-open-source-qwen3-5-9b-beats-openais-gpt-oss-120b-and-can-run

@Alibaba_Qwen The Qwen 3.5 small models are available on Ollama. All models support native tool calling, thinking, and multimodal capabilities in Ollama. 9B: ollama run qwen3.5:9b 4B: ollama run qwen3.5:4b 2B: ollama run qwen3.5:2b 0.8B ollama run qwen3.5:0.8b Model page, including”” https://x.com/ollama/status/2028514180936908842

> external customers get smoother access to Alibaba’s compute than the internal team building its most important model. man, this makes me reevaluate Qwen. For a long time, I thought they’re GPU-rich compared to startup competition. But seems they were in a similar situation”” https://x.com/teortaxesTex/status/2029159237729894727

> in 7,692 AI papers from 2025-2026 on HF, Qwen is indisputably the number one open model. > 41% of the papers used Qwen. > In May 2025, when Qwen3 was released, 1 out of every 2 papers used Qwen. > Across the entire year, at least 30% of AI papers produced each month used Qwen.”” https://x.com/teortaxesTex/status/2029102932604375057

🔥 Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚡️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. 👇 Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope:”” https://x.com/Alibaba_Qwen/status/2028846103257616477

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast,”” https://x.com/Alibaba_Qwen/status/2028460046510965160

A small Qwen3.5 from-scratch reimplementation for edu purposes: https://t.co/OnupgeE55l (probably the best “”small”” LLM today for on-device tinkering)”” https://x.com/rasbt/status/2028961822372425941

Alibaba has expanded its Qwen3.5 model family with 3 new models – the 27B model is a standout, scoring 42 on the Artificial Analysis Intelligence Index and matching open weights models 8-25x its size @Alibaba_Qwen has expanded the Qwen3.5 family with three new models alongside”” https://x.com/ArtificialAnlys/status/2027489442697777245

Alibaba shipped four Qwen 3.5 small models with a trick borrowed from their 397B model: Gated DeltaNet hybrid attention. Three layers of linear attention for every one layer of full attention. The linear layers handle routine computation with constant memory use. The full”” https://x.com/LiorOnAI/status/2028558859783311382

I remember when Qwen 1.0 came out (fall 2023, not that long ago!) and we added support to mlx-lm. And they didn’t stop releasing models, every one pushing the frontier of open-weights. @JustinLin610 always reached out to make sure the new models were well supported in MLX. I”” https://x.com/awnihannun/status/2028902061384057211

Interesting that Qwen decided to go back to hybrid models for the Qwen 3.5 series. Especially since the Qwen3-2507 update split the hybrid models into Thinking and Instruct versions”” https://x.com/nrehiew_/status/2028454952348328192

not sure if it’s a skill issue on my side, but using qwen models on llama.cpp with recommended sampling parameters, i always end up with doom loops at 20% context length, even with higher quants like q8″” https://x.com/qtnx_/status/2029246416342618321

Published some notes on the situation at Qwen – they released the Qwen 3.5 family (an outstanding family of open weight models) but now their lead researcher and several others all appear to have resigned within the past 24 hours”” https://x.com/simonw/status/2029223704127828386

Qwen 3.5 small dropped on HF: Shocking: 9b *and* 4b outperform even way bigger models like GPT-OSS-120b on several metrics – built for text, image, video, and agent tasks – with 262K native context window (extendable to 1M tokens). – It introduces early-fusion vision-language”” https://x.com/kimmonismus/status/2028461032377852000

Qwen3.5-9B is now available on LM Studio. Requires only ~7GB to run locally 🤯”” https://x.com/Alibaba_Qwen/status/2028664203872251943

So given the news … do we expect Qwen to change its stance on OSS soon? Apparently having the most popular open models wasn’t enough.”” https://x.com/code_star/status/2028913595602616391

The Qwen 3.5 small model series is now available ollama run qwen3.5:9b ollama run qwen3.5:4b ollama run qwen3.5:2b ollama run qwen3.5:0.8b All models support native tool calling, thinking, and multimodal capabilities in Ollama.”” https://x.com/ollama/status/2028510184788926567

Virtually the entire finetuning field after Llama/Mistral fell off, vast swathes of academic research, enterprise finetunes, tons of Chinese VLM/OCR models, VLAs – everything runs on Qwens of various sizes and generations. We live in Junyang’s world now. Godspeed”” https://x.com/teortaxesTex/status/2028874511509000646

What’s actually nice about Gated DeltaNet modules is that they don’t grow the KV cache size. So with that 3:1 ratio, Qwen3.5 is much more memory friendly than the previous Qwen3 models.”” https://x.com/rasbt/status/2029233742708130265

You can now fine-tune Qwen3.5 with our free notebook! 🔥 You just need 5GB VRAM to train Qwen3.5-2B LoRA locally! Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM. GitHub: https://t.co/aZWYAtakBP Guide: https://t.co/7d3BW8Qcjg Qwen3.5-4B Colab: https://x.com/UnslothAI/status/2028845314506150079

BullshitBench v2, created by Peter Gostev, is a benchmark that does something refreshingly different: it tests whether AI models can detect and reject nonsensical prompts instead of confidently rolling with them. Only Anthropic’s Claude models and Alibaba’s Qwen 3.5 score”” https://x.com/kimmonismus/status/2029230388028358726

🤖 Multiple Core Leaders Exit Alibaba’s Qwen Team Several key members of @Alibaba_Qwen AI team have stepped down, including technical lead @JustinLin610. Other core contributors — including @huybery and @kxli_2000 — have also departed, marking a notable leadership shake-up”” https://x.com/MetaEraHK/status/2029031825071587590

Okay Junyang is not Qwen, and Qwen is not Junyang. Behind every great model is a team grinding through data pipelines, training runs, and sleepless launches. But he was the voice, the bridge, the person who made the global AI dev community feel like Qwen was theirs too. That kind”” https://x.com/hxiao/status/2028932213228900701

Qwen delivered the best open-source models across sizes and modalities, for both academia and industry. And the response? Replace the excellent leader with a non-core people from Google Gemini, driven by DAU metrics. If you judge foundation model teams like consumer apps, don’t”” https://x.com/Xinyu2ML/status/2028867420501512580

Qwen3.5 2b Running locally on an iPhone 17pro is the breakthrough that was needed for local models running on the edge.”” https://x.com/kimmonismus/status/2028602520302399701

The new Qwen 3.5 by @Alibaba_Qwen running on-device on iPhone 17 Pro. Qwen 3.5 beats models 4 times its size, has strong visual understanding, and can toggle reasoning on or off. The 2B 6-bit model here is running with MLX optimized for Apple Silicon.”” https://x.com/adrgrondin/status/2028568689709084919

Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear. It’s incredible timing to release a fully open model so people can study”” https://x.com/natolambert/status/2029595053694628221

7 notable models from this week worth your attention: ▪️ Causal-JEPA ▪️ Kimi K2.5 with K2.5 Agent Swarm ▪️ GLM-5 ▪️ Qwen3.5 ▪️ DreamZero, a World Action Model (WAM) ▪️ Computer-Using World Model (CUWM) ▪️ Gemini 3.1 Pro Find more info and links to the research and models here:”” https://x.com/TheTuringPost/status/2027056777058291820

Top 10 Open Models: February 2026 in Text Arena. The top 3 labs have not changed since January, but the scores have gotten tighter between them: – @Zai_org’s GLM-5, scoring 1455 – @Alibaba_Qwen’s Qwen-3.5 397B A17B, scoring1454 – @Kimi_Moonshot’s Kimi-K2.5 Thinking, 1452 The”” https://x.com/arena/status/2027511779417592173

Most language models only read forward. Perplexity just open-sourced 4 models that read text in both directions. They used a technique from image generation to retrain Qwen3 so every word can see every other word in a passage. That changes how well a model understands”” https://x.com/LiorOnAI/status/2027483180752900129