Locally Run: AI News Week Ending 04/24/2026

Locally Run: AI News Week Ending 04/24/2026

April 24, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, keep the pure white landscape field, exact vertical type hierarchy, generous margins, and galaxy-punchout Milky Way starfield treatment clipped inside every letterform, but replace ‘HEROES’ with ‘LOCAL’ in the same bold condensed grotesque all-caps, replace ‘ALESSO’ with ‘SOVEREIGN COMPUTE’ in the same light geometric all-caps, and replace ‘TOVE LO’ with ‘GGUF’ in the same condensed grotesque all-caps, keeping ‘(we could be)’ and ‘FEATURING.’ unchanged with identical tracking, weights, and high-contrast galaxy texture.

Kimi K2.6 Tech Blog: Advancing Open-Source Coding
https://www.kimi.com/blog/kimi-k2-6

Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source financial matching engine. Over a 13-hour execution, the model iterated through 12 optimization strategies, initiating over 1,000 tool calls to precisely modify more than 4,000 lines of code. Acting as
https://x.com/Kimi_Moonshot/status/2046531057147933137

Check out this video on how to run Gemma 4 locally on an iPhone! It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.
https://x.com/googlegemma/status/2045204738720084191

What does it take to run 3, 5, or even 10 concurrent instances of Gemma 4 locally? We’ve open-sourced a demo letting you run multiple models side-by-side on your hardware. Gemma 4 26B A4B easily runs 10+ concurrent requests on a MacBook Pro M4 Max at 18 tokens/sec per request.
https://x.com/googlegemma/status/2046621841146671456

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We’re calling it
https://x.com/zan2434/status/2046982383430496444

partnering with @Kimi_Moonshot to bring kimi k2.6 to @CloudflareDev workers ai on day 0 better for coding and agentic use cases! try it out now:
https://x.com/michellechen/status/2046297037742997909

🎉 Congrats to the Moonshot team on Kimi K2.6 — day-0 support on vLLM 0.19.1. • 1T total / 32B active MoE — 384 experts, 8 routed + 1 shared • MLA attention, 256K context • Native multimodal: MoonViT vision encoder + video input • Native INT4 quantization • Interleaved
https://x.com/vllm_project/status/2046251287206035759

FINALLLY FINALLY it is here. V4-flash: all the way back to V2 prices, only now with 1M V4-pro: roughly Kimi/GLM/MiMo competitor Chat prefix completion and FIM back – thank you! Missed this forever but what can they do?
https://x.com/teortaxesTex/status/2047508587883250112

Kimi 2.6 Thinking seems very good for an open weights model, but many rough edges compared to closed SoTA. The Lem Test resulted in a 74 page thinking trace… and an okay-ish answer. It did an okay TiKZ unicorn, an adequate twigl shader for a neogothic city in the waves, etc.
https://x.com/emollick/status/2046411222354989189

Kimi K2.6 + DFlash: 508 tok/s on 8x MI300X 5.6x throughput improvement over baseline autoregressive serving 90 tok/s → 508 tok/s on the same hardware, same model, zero quality loss
https://x.com/HotAisle/status/2046620289984057634

Kimi K2.6 demonstrates strong long-horizon coding in complex engineering tasks: Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig–a highly niche programming language–it demonstrated
https://x.com/Kimi_Moonshot/status/2046531052957569211

Kimi K2.6 has landed, and it is live on Baseten! We have baked in multiple inference optimizations so that you can leverage Kimi K2.6 in production right away. To run Kimi K2.6, Baseten uses: -> The Baseten Inference Stack with advanced optimizations, including KV-aware routing
https://x.com/baseten/status/2046263526281576573

Kimi K2.6 helped us rewrite kernels; it worked like a charm 🙂
https://x.com/Yulun_Du/status/2046252918526071017

Kimi K2.6 is live on OpenRouter! @Kimi_Moonshot’s new model is a long-horizon coding model built for sustained agentic work. It behaves more like a systems engineer than a chatbot, with the stamina to decompose, execute, and optimize complex tasks. Try it in all your favorite
https://x.com/OpenRouter/status/2046259590774571199

Kimi K2.6 is now available in Windsurf! Available for free for the next 2 weeks for Pro, Teams, and Max users.
https://x.com/windsurf/status/2046686574793154996

Kimi K2.6 now in OpenCode — Go included
https://x.com/opencode/status/2046275886396125680

Kimi K2.6 was released 1h ago, and it looks amazing! Here it’s running with MLX (mlx-vlm) on two M3 Ultras (full 1T param VLM) 🔥
https://x.com/pcuenq/status/2046283942689456297

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What’s new: 🔹Long-horizon coding – 4,000+
https://x.com/Kimi_Moonshot/status/2046249571882500354

Moonshot AI launches Kimi K2.6 on Kimi Chat and APIs
https://www.testingcatalog.com/moonshot-ai-launches-kimi-k2-6-on-kimi-chat-and-apis/

Qwen3.6-27B can now run locally! 💜 Run on 18GB RAM via Unsloth Dynamic GGUFs. Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks. GGUFs:
https://t.co/ykKgwh2zI9 Guide:
https://x.com/UnslothAI/status/2046959757299487029

Ran Qwen3-8B (8.2B dense, open) on LongCoT-Mini. Vanilla: 0/507. dspy.RLM: 33/507 (6.5%). Same model. Same weights. No fine-tuning. The scaffold is doing 100% of the lifting. Context: leaderboard’s smallest open MoE is GLM-4.7 at 358B total / 32B active params. Qwen3-8B is ~4x
https://x.com/raw_works/status/2045208764509470742

these questions are silly Kimi > all other open-source models tho
https://x.com/scaling01/status/2046591683198906542

We’re open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achieves 1.72×-2.22× prefill speedup over the flash-linear-attention baseline on H20, and works as a drop-in backend for flash-linear-attention. Explore on github:
https://x.com/Kimi_Moonshot/status/2046607915424034839

OpenClaw 2026.4.20 🦞 🧠 Kimi K2.6 support + provider-aware /think 💬 BlueBubbles iMessage sends + tapbacks fixed ⏰ Cron state/delivery cleanup 🔐 Gateway pairing + plugin startup hardening Less haunted. More useful.
https://x.com/openclaw/status/2046686809367708123

Kimi K2.6 wrote an inference engine for Qwen3.5 0.5B in Zig and managed to beat LM Studio’s token per second by 20%, running for 12 hours and with 4000+ tool calls
https://x.com/nrehiew_/status/2046254256194474221

Sharing my current setup to run Qwen3.6 locally in a good agentic setup (Pi + llama.cpp). Should give you a good overview of how good local agents are today: # Start llama.cpp server: llama-server \ -hf unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL \ –jinja \
https://x.com/victormustar/status/2045068986446958899

VLM Performance：Qwen3.6-27B is natively multimodal, supporting both vision-language thinking and non-thinking modes in a single unified checkpoint — the same as Qwen3.6-35B-A3B. It handles images and video alongside text, enabling multimodal reasoning, document understanding,
https://x.com/Alibaba_Qwen/status/2046939788184547610