Ethan B. Holland

Over 51,300 manually organized AI links and counting

Locally Run: AI News Week Ending 02/20/2026

February 20, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide observational shot of a weathered Chinese internet café storefront on a half-demolished street, muted grays and faded teals, overcast daylight, a few people visible through smudged windows at desktop computers, a red fire horse casually tethered to a rusted bike rack outside, large white text overlay reading LOCAL positioned like a Chinese cinema poster title, documentary realism, patient framing, transitional urban decay

You can now run Qwen3.5 locally! 💜 Qwen3.5-397B-A17B is an open MoE vision reasoning LLM for agentic coding & chat. It performs on par with Gemini 3 Pro, Claude Opus 4.5 & GPT-5.2. Run 4-bit on 256GB Mac / RAM. Guide: https://t.co/wjS1lMnbNp GGUF: https://x.com/UnslothAI/status/2023338222601064463

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let’s start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB”” https://x.com/ivanfioravanti/status/2022338870172684655

Tiny Aya is out on Hugging Face a family of massively multilingual small language models Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model”” https://x.com/_akhaliq/status/2023771434347044890

GGML and llama.cpp join HF to ensure the long-term progress of Local AI https://huggingface.co/blog/ggml-joins-hf

🤔Has MiniMax finally stabilized its path in reasoning and coding? Still a hot review from Zhihu contributor toyama nao, and he call it: “”Root downward, grow upward.”” 🔥 After the flawed M2.1 (stronger coding, weaker logic), M2.5 fixes the technical issues and restores balance,”” https://x.com/ZhihuFrontier/status/2022214461415993817

$1 per hour with 100 tps”” https://x.com/MiniMax_AI/status/2022379949336957254

It’s been a few days since onboarding @MiniMax_AI’s latest model, M2.5, in standard and Lightning variants. Results are showing on our leaderboard. With over 3K votes, M2.5 Lightning ranks eighth among open models, with Standard following closely behind! Lets run some prompts:”” https://x.com/yupp_ai/status/2024165671136059892

MiniMax M2.5 casually responding at ~50 tok/s with MLX (M3 Ultra). The model was released one hour ago 🥳”” https://x.com/pcuenq/status/2022336556326060341

Nice independent look at SWE-bench Verified by @simonw MiniMax M2.5 showing strong results under the same evaluation setup. Worth a read”” https://x.com/MiniMax_AI/status/2024646767325958285

People were saying as early as Oct 2024 that SWE-bench was saturated when scores were just ~50% Awesome chat from Minimax team that shows otherwise. We’re certainly much, much closer, but there’s evidence that some room remains. Tiny 🧵”” https://x.com/jyangballin/status/2022367240293949772

RL often throws away useful signal at intermediate steps, or as @karpathy put it, it’s like “”sucking supervision through a straw.”” MiniMax M2.5 solves this with per-token process rewards. The result is frontier coding performance at least 1/10th the cost of closed source.”” https://x.com/basetenco/status/2022456010049495213

RL shouldn’t waste signal. M2.5’s per-token process rewards improve signal utilization across reasoning steps, delivering frontier coding performance with dramatically better cost efficiency. Thanks @basetenco for the deep dive and day-0 hosting!”” https://x.com/MiniMax_AI/status/2023470874708549941