Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic shot of an enormous empty warehouse with endless rows of identical cardboard boxes under single cold fluorescent light, polished concrete floors reflecting light, deep shadows, minimalist composition, cold blue-grey tones, architectural photography style, the word ALIBABA in bold white sans-serif prominently displayed across the image
@awnihannun added batched generation to MLX-LM >2 months ago. Everybody, since, has been asking for batching in the MLX-LM server. Well, enjoy the first version in the latest MLX-LM release. The following video is serving 4 consecutive requests for Qwen3 30B on an M2 Ultra. https://x.com/angeloskath/status/1996364526749639032
> be arcee > look around > realize open-weight frontier MoE is basically a Qwen/DeepSeek monopoly > decide “nah, we’re building our own” > actual end-to-end pretraining > on US soil > introducing Trinity > Nano (6B MoE) and Mini (26B MoE) > open weights, Apache 2.0 > free on https://x.com/TheAhmadOsman/status/1995613231629381935
Our new Qwen3-TTS (version 2025-11-27) is here! 🚀 We’ve leveled up on what matters most: ✨ More Personalities: Over 49 high-quality voices, from cute and playful to wise and stern. Find your perfect match! 🌍 Global Reach: Now speaks 10 languages (zh, en, de, it, pt, es, ja, https://x.com/Alibaba_Qwen/status/1996947806138126547
The latest mlx-lm is out and it has continuous batching with mlx_lm.server! Added by @angeloskath Check-out the video of 4 simultaneous requests running with Qwen3 30B on the same M2 Ultra:”” / X https://x.com/awnihannun/status/1996365940343402596
TIL you can compile quantized models thanks to quanto although memory blows up a bit on Qwen3-VL https://x.com/mervenoyann/status/1996998362118201850





Leave a Reply