Moonshot: AI News Week Ending 10/31/2025

Moonshot: AI News Week Ending 10/31/2025

October 31, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: A suburban driveway at autumn dusk with a kid in an 80s Halloween costume preparing to launch a sleek, futuristic model rocket with glowing blue panels and visible circuitry, surrounded by carved pumpkins and fallen leaves, with a decorated 80s house in the background featuring orange string lights and paper bats, cinematic 80s film photography style with warm nostalgic lighting.

🎉 Congrats to @Kimi_Moonshot! vLLM Day-0 model expands! Now supporting Kimi Linear — hybrid linear attention with Kimi Delta Attention(KDA): – RULER 128k context: 84.3 perf + 3.98× speedup – Up to 6× faster decoding & 6.3× faster TPOT (1M tokens) – 75% KV cache reduction 💡 https://x.com/vllm_project/status/1983941708233765149

🔥 Inside Kimi Linear: First-Hand Insights @Kimi_Moonshot just dropped something impressive again. @yzhang_cs from Kimi AI Infra, shared an insider’s look at the making of Kimi Linear — an architecture designed around hybrid linear attention and optimized for efficiency × https://x.com/ZhihuFrontier/status/1984321210055082207

Kimi just released another “”next-gen”” model that reduces memory usage by up to 75%, while achieving up to 6.3× higher decoding throughput and outperforming MLA and GDN baselines https://x.com/scaling01/status/1983926811051384965

✨ At vLLM, we strive for correctness, reliability, and open collaboration — every detail matters. Together with @Kimi_Moonshot , we verified Kimi K2’s tool-calling accuracy on vLLM using the latest K2-Vendor-Verifier benchmark. Our debugging uncovered 3 key compatibility”” / X https://x.com/vllm_project/status/1983115488982122929

Kimi For Coding: Exclusive Add-on to Your VIP Plan! We’ve added Kimi For Coding as a powerful add-on built right on top of your current subscription perks. Extra value, no extra cost. More details 👉 https://x.com/Kimi_Moonshot/status/1984207737673359441

Kimi K2vv updated! We’ve added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy. Feedback is welcome! https://x.com/Kimi_Moonshot/status/1983082003731042637

Kimi K2vv updated! We’ve added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy. The infra team also listed some suggestions for vendors; looks like enforcer is important. https://x.com/crystalsssup/status/1983126339399102756

Kimi Linear Tech Report is dropped! 🚀 https://x.com/Kimi_Moonshot/status/1983937694360322136

My favorite part: > “Scaling Ladder” is a Kimi tradition for scaling models. We start from something small (say, 1B active parameters) and gradually aim to beat the baseline on benchmarks, while also monitoring the corresponding “internals.” Only after clearing each gate at each”” / X https://x.com/eliebakouch/status/1984291165860958614

Thankfully, theres a really nice glossary in the KIMI Delta Attention paper that covers most of the notable variants https://x.com/nrehiew_/status/1983891931823505518

There are a lot of works behind Kimi Linear. We’ve rethought efficient and expressive linear attention from infra. We even first discovered the attn matrix, and then the recurrent. No wait to check out the kda kernel in the FLA repo. We have much more work to do, to open.”” / X https://x.com/uniartisan/status/1983941443283775780

You are also welcome to share your suggestions and feedback for our Kimi CLI on GitHub. > https://x.com/Kimi_Moonshot/status/1984207741037252751

Introducing Kimi CLI Technical Preview & Kimi For Coding! Kimi CLI powers your terminal: – Shell-like UI + shell command execution – Seamless Zsh integration – MCP support -Agent Client Protocol (now compatible with @zeddotdev) More features incoming! https://x.com/Kimi_Moonshot/status/1984207733177090274

Claude, GPT-5, Gemini, and Kimi: “”write me a horror story done entirely in the dedications to six books (you can give me the title and author of each book as well)”” ChatGPT and Claude did well in different way. Kimi did the usual (sounds good but meaning falls apart). https://x.com/emollick/status/1982279778859151783

Many people are confused by Minimax’s recent return to full attention – especially since it was the first large-scale pivot toward hybrid linear attention – and by Kimi’s later adoption of hybrid linear variants (as well as earlier attempts by Qwen3-Next, or Qwen3.5). I actually”” / X https://x.com/SonglinYang4/status/1984021551914926514