Ethan B. Holland

Over 51,300 manually organized AI links and counting

Alibaba: AI News Week Ending 05/30/2025

May 30, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Assembly instruction diagram for an industrial shipping container crane, isometric view with modular components, modern industrial style, orange and gray color palette, technical blueprint background, “ALIBABA” prominently displayed as instruction header, technical drafting font, assembly sequence numbers, metric measurements shown

Pretty impressive 7B VLM coming out of Xiaomi 🤓 ViT encoder w/ MLP and powered by their 7B Text backbone Compatible w/ Qwen VL arch so works across vLLM, Transformers, SGLang and Llama.cpp Bonus: it can reason and is MIT licensed 🔥 https://x.com/reach_vb/status/1928360066467439012

There will be DeepSeek R1 0528 Qwen 3 8B too matching Qwen 3 235B Thinking in performance too 🤯 Whale COOKED! https://x.com/i/web/status/1928058862923391260

DeepSeek R1 05-28 LiveBench results: – 8th in the Overall ahead of o4-mini, Gemini 2.5 Flash Preview and Qwen3-235B-A22B (biggest competitors) – 1st on Data Analysis !!! – 3rd on Reasoning !! – 4th on Mathematics ! – 11th on Language – 20th on Instruction Following – 23rd on https://x.com/i/web/status/1928173385399308639

Deep Seek R1 Qwen3 8B knows it’s overthinking it 😂 https://x.com/i/web/status/1928119439737729482

The 4-bit DWQ of DSR1 Qwen3 8B is up on HF. Use the command below or use it in @lmstudio: https://x.com/awnihannun/status/1928125690173383098

Gemma 3 abliterated again ✂️✂️ Abliteration removes refusals from the models. This new and improved version targets refusals with more accuracy, based on previous work with Qwen 3. Here’s how to do it https://x.com/i/web/status/1928030013275918464

Just FYI all the reports from our RL experiments have not been on Qwen, they’ve been on Llama (DeepHermes 8B) – so hopefully that gives some additional assurance on the impact RL can have and that its not random god-mode qwen math improvements from randomness”” / X https://x.com/i/web/status/1928184393035559191

kicking the qwen randomly makes it work better”” like old TVs. I’m not reading any of it at this point”” / X https://x.com/teortaxesTex/status/1927459880341782700

How does an LLM writing out this program (WITHOUT a code interpreter running the output) make things more accurate? Verified on Qwen 3 – a30b (below) Lots of interesting takeaways from the Random Rewards paper. NOT that RL is dead, but honestly far more interesting than that! https://x.com/hrishioa/status/1927974614585725353

random rewards only work for Qwen models but not for other models improvements with random rewards were due to clipping, and disappear once clipping is removed Conjecture by authors: “”Under clipping, random rewards don’t teach task quality – instead, they trigger a”” / X https://x.com/scaling01/status/1927424801938825294

Why are almost all RL experiments done on qwen models? Kind of interesting right…”” / X https://x.com/i/web/status/1927948317931000277

Worth thinking about how this paper reflects on every other RL paper using Qwen. If Qwen works with any random reward, how do we know if any of these papers actually does anything”” / X https://x.com/nrehiew_/status/1927424673702121973