Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide static shot of a worn communal stable in a concrete Chinese courtyard, half-open door revealing ordinary people of various ages collaboratively grooming a red-brown fire horse, desaturated palette of river-gray and rust, reclaimed wood and metal construction with handwritten technical notes pinned to walls, overcast natural light, documentary realism, large white text overlay reading OPEN SOURCE positioned like Chinese cinema poster title, Jia Zhangke observational style, patient composition, decelerated moment of shared labor.
You can now run Qwen3.5 locally! 💜 Qwen3.5-397B-A17B is an open MoE vision reasoning LLM for agentic coding & chat. It performs on par with Gemini 3 Pro, Claude Opus 4.5 & GPT-5.2. Run 4-bit on 256GB Mac / RAM. Guide: https://t.co/wjS1lMnbNp GGUF: https://x.com/UnslothAI/status/2023338222601064463
Opus4.6 found 500+ vulnerabilities in open-source code and we’ve begun reporting them and contributing patches quick excerpts from some of them 🧵”” https://x.com/trq212/status/2024937919937741290
How efficient is MiniMax M2.5? We benchmarked on 8xH200 TEP8 with @vllm_project . At a reasonable 10-25s TTFT, M2.5 is able to sustain ~2500 tok/s/GPU throughput. For decode, it’s still possible to reach ~20 tok/s/GPU throughput at a strict 20 tok/s/user interactivity with 10K+”” https://x.com/SemiAnalysis_/status/2023418414203646066
MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let’s start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB”” https://x.com/ivanfioravanti/status/2022338870172684655
🎉 Congrats to @Alibaba_Qwen on releasing Qwen3.5 on Chinese New Year’s Eve — day-0 support is ready in vLLM! Qwen3.5 is a multimodal MoE with Gated Delta Networks architecture — 397B total params, only 17B active. What makes it interesting for inference: 🧠 Gated Delta”” https://x.com/vllm_project/status/2023341059343061138
🔥 Alibaba’s Qwen 3.5 just dropped — and Zhihu is dissecting it. Here’s a sharp breakdown from Zhihu contributor toyama nao 👇 🏆 Verdict: “”The spearhead of the open-source elite.”” 📊 Big picture Tongyi Lab’s pattern: new mid-size model leapfrogs old giant. • Last cycle: 80B”” https://x.com/ZhihuFrontier/status/2024176484232155236
Qwen https://qwen.ai/blog?id=qwen3.5
Qwen3.5 is Live! Today we openweight the first model, Qwen3-397B-A17B, which is a native multimodal model supporting both thinking and non-thinking modes. We have strengthened its coding and agentic capabilities to foster productivity for developers and enterprises. Hope you”” https://x.com/JustinLin610/status/2023332446713070039
Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap | AINews https://news.smol.ai/issues/25-09-23-alibaba-yunqi
Alibaba’s new Qwen3.5-397B-A17B is the #3 open weights model in the Artificial Analysis Intelligence Index – a significant upgrade from Qwen3-235B-A22B-2507, and achieved with fewer active parameters than leading peers Qwen3.5-397B-A17B is the first model released by Alibaba”” https://x.com/ArtificialAnlys/status/2023794497055060262
Qwen https://qwen.ai/blog?id=qwen3.5#spatial-intelligence
Qwen3.5’s thinking is downright excessive.”” https://x.com/QuixiAI/status/2023995215690781143
Koyeb is Joining Mistral AI to Build the Future of AI Infrastructure – Koyeb https://www.koyeb.com/blog/koyeb-is-joining-mistral-ai-to-build-the-future-of-ai-infrastructure#serverless-inference-and-agents
One thing I feel not enough people know is that the Codex agent is open source. It also exposes an app-server interface that lets you integrate Codex into your application including sign-in with ChatGPT. It’s the same server that powers Codex in VSCode, Jetbrains and Xcode”” https://x.com/dkundel/status/2024233673764257879?s=20
Labor of love: We’re open-sourcing the runtime we use to run long-horizon agents at Southbridge. Something like this exists at almost every serious AI team I know. We ended up needing to build it because we couldn’t buy it. The problems were simple: – How do we stop throwing”” https://x.com/hrishioa/status/2023807677089099914
MiniMax-M2.5 is now open source. Trained with reinforcement learning across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows. Hugging Face: https://t.co/Wxksq9BB7t GitHub:”” https://x.com/MiniMax_AI/status/2022310932693897628
You can now run open-source AI coding agents without paying for API keys 🤯 Cline CLI 2.0 just dropped with free access to Minimax M2.5. → Runs from your terminal → Parallel agents → Works with any editor Any model you want. 100% Open Source.”” https://x.com/dr_cintas/status/2022387444189139367
BREAKING 🚨: Cline released Cline CLI 2.0, an open-source AI coding agent powered by Kimi K2.5 and MiniMax M2.5, available for free! `npm install -g cline` on Windows, Mac, and Linux. Open-source strikes back 👀”” https://x.com/testingcatalog/status/2022348951459172604
Introducing Cline CLI 2.0: An open-source AI coding agent that runs entirely in your terminal. Parallel agents, headless CI/CD pipelines, ACP support for any editor, and a completely redesigned developer experience. Minimax M2.5 and Kimi K2.5 are free to use for a limited time.”” https://x.com/cline/status/2022341254965772367
Anthropics hate for open source is so weird”” https://x.com/ThePrimeagen/status/2023194211445834132
Taalas runs Llama 3 8B at 16k tokens per second per user. That’s almost an order of magnitude increase even compared to SRAM-based systems like Cerebras. Key idea: each chip is specialized to a given model. The chip is the model. The chat demo is pretty wild:”” https://x.com/awnihannun/status/2024671348782711153
Cohere Labs Launches Tiny Aya, Making Multilingual AI Accessible https://cohere.com/blog/cohere-labs-tiny-aya
DeepSeek-V3.2 on GB300: Performance Breakthrough | vLLM Blog https://vllm.ai/blog/gb300-deepseek
People who say “”ohh this model is faster it must be mini”” must have forgotten that DeepSeek boasted of 60 tps when V3 came out. V3, of course, was ≈3x larger and had almost 2x more active params than V2. Then they increased the batch size and tps went back down. it’s weak signal”” https://x.com/teortaxesTex/status/2022255213394948360
Today https://t.co/jFknDoasSy joins Hugging Face Together we will continue to build ggml, make llama.cpp more accessible and empower the open-source community. Our joint mission is to make local AI easy and efficient to use by everyone on their own hardware.”” https://x.com/ggerganov/status/2024839991482777976
Tiny Aya is out on Hugging Face a family of massively multilingual small language models Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model”” https://x.com/_akhaliq/status/2023771434347044890
Moondream’s inference engine got so fast image decoding became a bottleneck. So we shipped a SIMD image decoding library that’s faster than all the Python options I know of. Plus it’s statically linked so not a pain in the ass to install.”” https://x.com/vikhyatk/status/2024005498874306984
Georgi’s llama.cpp really kicked off the whole local model thing in my opinion – it made original Llama usable on personal computers, I wrote about it back in March 2023 https://x.com/simonw/status/2024855027517702345
Large language models are having their Stable Diffusion moment https://simonwillison.net/2023/Mar/11/llama/#llama-cpp
Ollama 0.16.3 is out with Cline and Pi integrations out of the box. Try it with: @cline: ollama launch cline Pi: ollama launch pi”” https://x.com/ollama/status/2024978932127187375
BREAKING: Llama.cpp joins Hugging Face 🤯”” https://x.com/victormustar/status/2024842175532413016
GGML and llama.cpp join HF to ensure the long-term progress of Local AI https://huggingface.co/blog/ggml-joins-hf
Paper : https://t.co/WvJL8tJ2IL Model weights : https://t.co/IQ3b0CjEeF Mistral AI studio live playground :”” https://x.com/GuillaumeLample/status/2024445952812060715
Mistral AI buys Koyeb in first acquisition to back its cloud ambitions | TechCrunch https://techcrunch.com/2026/02/17/mistral-ai-buys-koyeb-in-first-acquisition-to-back-its-cloud-ambitions/
Nice standalone Swift package for real-time streaming transcription with Mistral’s Voxtral Mini 4B in MLX Swift:”” https://x.com/awnihannun/status/2022322714548338962
also have fast lanczos3 resize but it doesn’t beat pyvips yet”” https://x.com/vikhyatk/status/2024008173271863541
I made Moondream’s MoE inference kernel 2.6% faster, without touching any kernel code!”” https://x.com/vikhyatk/status/2023749843186078144
Cohere labs just released the best multilingual low resource language model. it runs on a phone, It covers 70+ languages and excels at languages underrepresented on the internet, like Zulu, Javanese, Yoruba, and others.”” https://x.com/nickfrosst/status/2023756803717427467
A fully open source mocap system that works with cheap webcams: The FreeMoCap Project A free-and-open-source, hardware-and-software-agnostic, minimal-cost, research-grade, motion capture system and platform for decentralized scientific research, education, and training:”” https://x.com/IlirAliu_/status/2024198014617702738
🎉 Congrats @MiniMax_AI on M2.5! vLLM has day-0 support — SOTA coding (80.2% SWE-Bench Verified), agentic search (76.3% BrowseComp), trained on 200k+ real-world RL environments. 37% faster than M2.1, matching Opus 4.6 speed. 🚀 ✅Verified on NVIDIA GPUs. Recipe (Docker &”” https://x.com/vllm_project/status/2022311342225678757
Free Models Router – API Pricing & Providers | OpenRouter https://openrouter.ai/openrouter/free
Less hyperbolic, Chinese set of assessments of Minimax 2.5. Verdict: ≤Sonnet for coding, but close, and crucially – unlike previous open models, viable for multi-turn work. Not a toy.”” https://x.com/teortaxesTex/status/2022223441005621556
MiniMax just dropped M2.5, a top-tier open-weight model. Already competitive with models like Opus 4.6. The speed at which open-weight models are improving is wild. It’s fast and surprisingly fluent at generating and operating Word, Excel, and PowerPoint files. But the bigger”” https://x.com/omarsar0/status/2022384166034190528
Open models trail frontier systems by 6-9 months, but without them AI research loses its engine for exploration. This is what we discussed with @natolambert, one of the clearest voices on open models. Nathan is a research scientist at @allen_ai, author of the RLHF Book, and”” https://x.com/TheTuringPost/status/2022055296940769332
Trillion Labs, a Korean AI startup, has launched Tri-21B-think Preview, a small open weights reasoning model that scores 20 on the Artificial Analysis Intelligence Index Key benchmarking takeaways: ➤ High but not leading intelligence for its small size: Tri-21B-think Preview”” https://x.com/ArtificialAnlys/status/2024381202959118807
we are headed towards a future where laptops can run open source models good enough to do most work. it’s early but undeniably the trajectory. excited to work with ollama to make this a reality!”” https://x.com/sdrzn/status/2024986545019912564
Blog about @MiniMax_AI ‘s Forge RL system. Core takeaways: 1. still CISPO 2. process reward, completion time reward 3. multi-level prefix cache 4. rollout uses 60% compute 5. millions of trajectories per day https://t.co/IrKDOoiKAB cc @teortaxesTex”” https://x.com/YouJiacheng/status/2022339475049947576
The dark side of reinforcement learning @olive_jy_song, senior researcher at @MiniMax_AI, about RL models that try to hack rewards and why alignment fails in practice This conversation is an inside look at how Chinese AI labs move fast – testing new models overnight, debugging”” https://x.com/TheTuringPost/status/2022961676799398337
🤔Has MiniMax finally stabilized its path in reasoning and coding? Still a hot review from Zhihu contributor toyama nao, and he call it: “”Root downward, grow upward.”” 🔥 After the flawed M2.1 (stronger coding, weaker logic), M2.5 fixes the technical issues and restores balance,”” https://x.com/ZhihuFrontier/status/2022214461415993817
$1 per hour with 100 tps”” https://x.com/MiniMax_AI/status/2022379949336957254
It’s been a few days since onboarding @MiniMax_AI’s latest model, M2.5, in standard and Lightning variants. Results are showing on our leaderboard. With over 3K votes, M2.5 Lightning ranks eighth among open models, with Standard following closely behind! Lets run some prompts:”” https://x.com/yupp_ai/status/2024165671136059892
MiniMax M2.5 casually responding at ~50 tok/s with MLX (M3 Ultra). The model was released one hour ago 🥳”” https://x.com/pcuenq/status/2022336556326060341
Nice independent look at SWE-bench Verified by @simonw MiniMax M2.5 showing strong results under the same evaluation setup. Worth a read”” https://x.com/MiniMax_AI/status/2024646767325958285
People were saying as early as Oct 2024 that SWE-bench was saturated when scores were just ~50% Awesome chat from Minimax team that shows otherwise. We’re certainly much, much closer, but there’s evidence that some room remains. Tiny 🧵”” https://x.com/jyangballin/status/2022367240293949772
RL often throws away useful signal at intermediate steps, or as @karpathy put it, it’s like “”sucking supervision through a straw.”” MiniMax M2.5 solves this with per-token process rewards. The result is frontier coding performance at least 1/10th the cost of closed source.”” https://x.com/basetenco/status/2022456010049495213
RL shouldn’t waste signal. M2.5’s per-token process rewards improve signal utilization across reasoning steps, delivering frontier coding performance with dramatically better cost efficiency. Thanks @basetenco for the deep dive and day-0 hosting!”” https://x.com/MiniMax_AI/status/2023470874708549941
This matches the general feeling on the big Chinese open source models. They have great benchmarks and near-frontier status on some coding, but there is a larger gap with the the big closed models than the benchmarks would indicate when it comes to real work and general “smarts””” https://x.com/emollick/status/2024190674166239420
Oof, SWE-rebench is brutal for recent Chinese releases M2.5 reported 80.2% on SWE-bench verified against 80.8% for Opus 4.6, but it seriously underperforms here Qwen3-Coder-Next looks good with 40% and only 80B A3B parameters”” https://x.com/maximelabonne/status/2022401174549512576
🚀 Qwen Coding Plan is now live on Alibaba Cloud Model Studio! ✨ What you get: • 🔥 Latest Qwen3.5-Plus models • 💡 Fixed monthly subscription: from ~$10/mo (Lite) or ~$50/mo (Pro) • 📦 Up to 90K requests/month for AI-powered coding • 🔌 Works with Claude Code, Qwen Code,”” https://x.com/Alibaba_Qwen/status/2024136381308805564
Ouch, the pricing on Alibaba just hurts. You can get the larger Kimi-K2.5 and GLM-5 for less”” https://x.com/scaling01/status/2023346718377406840
Qwen3.5-397B-A17B SVG results I have seen better. DeepSeek-V3.2 and GLM-5 both beat it.”” https://x.com/scaling01/status/2023364296277721300
🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series. 🖼️Native multimodal. Trained for real-world agents. ✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling. ⚡8.6x-19.0x decoding throughput vs Qwen3-Max 🌍201″” https://x.com/Alibaba_Qwen/status/2023331062433153103
Happy Chinese New Year!! What a week for open-source LLMs: > Qwen-3.5 > GLM-5 > MiniMax-M2.5 Are we just waiting on DeepSeek-V4 now? Also I’m hoping a US lab steps up with a true frontier open-source model.”” https://x.com/Yuchenj_UW/status/2023453819938763092
Qwen 3.5 goes bankrupt on Vending-Bench 2″” https://x.com/andonlabs/status/2023450768406364238
So a new Repo full of MLX-LM-LoRA examples to train your own LLM for Apple Silicon, fast and efficient on ultra long context lengths: Fine-tune Qwen3 4B Instruct on 32K context: https://t.co/yGZlR59fHD Train @IBMResearch Granite 350M model on RL-GRPO Reasoning:”” https://x.com/ActuallyIsaak/status/2022414004623479014
🚀 Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ✅ SGLang support is merged 🔄 vLLM PR submitted → https://t.co/rJkuitOBWs Check the model card for example code. vLLM support landing in the next couple of days!”” https://x.com/Alibaba_Qwen/status/2024161147537232110
🚩Cerebras’s MiniMax-M2 GGUF 2-bit model: https://t.co/udlviJQZqQ Qwen3-Coder-Next INT4 model:”” https://x.com/HaihaoShen/status/2022293472796180676
A clarification of Qwen3.5 Plus and 397B: 1. for opensource, we follow the tradition to make parameters apparent so we use the name with the number of total parameters and active params. 2. Qwen3-Plus is a hosted API version of 397B. As the model natively supports 256K tokens,”” https://x.com/JustinLin610/status/2023340126479569140
It’s Qwen 3.5 day today! 🥳 State of the art 800 GB model. Runs _locally_ with MLX using Q4, taking 225 GB of RAM.”” https://x.com/pcuenq/status/2023369902011121869
Let’s do the KV cache math for Qwen3.5: – KV heads: 2 – Head dimension: 256 – gated attention layers: 15 – bytes per element (BF16): 2 2 x 256 x 15 x 2 = 15 360 This is the same for K and V. So, we multiply by 2: 30 720 bytes Roughly 31 kb per token of context. Meaning at max”” https://x.com/bnjmn_marie/status/2023424404504342608
ollama run qwen3.5:cloud Qwen3.5-397B-A17B is the first open-weight model in the series. It’s available on Ollama’s cloud right now! Give it a try. Let’s go! 🚀🚀🚀”” https://x.com/ollama/status/2023334181804069099
Qwen 3.5 Plus is now available on AI Gateway. Thanks @vercel_dev team. 🤝 Use model: ‘alibaba/qwen3.5-plus’ Try it now!”” https://x.com/Alibaba_Qwen/status/2024029499541909920
Qwen3.5 runs quite well in mlx-lm. Awesome that we have a frontier-level hybrid model. The context gets longer but the inference speed and memory use barely change. Here’s the Q4 generating a space invaders game on an M3 Ultra. Generated 4,120 tokens at 37.6 tok/s.”” https://x.com/awnihannun/status/2023462412092059679
So speaking of benchmarks, what can be said of the new open Qwen? First, it completely destroys Qwen3-VL-235B ofc, but more surprisingly it outscores Qwen3-Max-thinking. All the while it’s the same model as “”Plus””. Plus just has 1M context and some more bells and whistles.”” https://x.com/teortaxesTex/status/2023331885402009779
The new chonky Qwen 3.5 looks pretty solid, beating their own Qwen3-Max model everywhere and is much better at vision benchmarks than Qwen3-235B-A22B-VL Now what I sadly haven’t seen is anything on reasoning efficiency.”” https://x.com/scaling01/status/2023343368399704506
Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched | AINews https://news.smol.ai/issues/25-09-05-1t-models
An open-source bipedal robotic system. [📍GitHub Repo] It’s a complete leg design with 6 DOF per leg, RSU ankle architecture, passive toe joints. Built with off-the-shelf components and compatible with MJF 3D printing. What they’re open-sourcing & sharing: – Full mechanical”” https://x.com/IlirAliu_/status/2022233309716123878
Introducing ZUNA, a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text. Fully open source, Apache 2.0.”” https://x.com/ZyphraAI/status/2024114248020898015





Leave a Reply