Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Wide shot of an 80s American suburban cul-de-sac at golden hour dusk with Halloween decorations, featuring diverse groups of trick-or-treaters in costumes that blend traditional cultural dress from different countries with Halloween accessories, walking past ranch houses with carved pumpkins and autumn leaves, warm nostalgic lighting, cinematic composition reminiscent of Amblin Entertainment films.
🎉 Congrats to @Kimi_Moonshot! vLLM Day-0 model expands! Now supporting Kimi Linear — hybrid linear attention with Kimi Delta Attention(KDA): – RULER 128k context: 84.3 perf + 3.98× speedup – Up to 6× faster decoding & 6.3× faster TPOT (1M tokens) – 75% KV cache reduction 💡 https://x.com/vllm_project/status/1983941708233765149
🔥 Inside Kimi Linear: First-Hand Insights @Kimi_Moonshot just dropped something impressive again. @yzhang_cs from Kimi AI Infra, shared an insider’s look at the making of Kimi Linear — an architecture designed around hybrid linear attention and optimized for efficiency × https://x.com/ZhihuFrontier/status/1984321210055082207
Kimi just released another “”next-gen”” model that reduces memory usage by up to 75%, while achieving up to 6.3× higher decoding throughput and outperforming MLA and GDN baselines https://x.com/scaling01/status/1983926811051384965
A continuing issue in studying LLMs for medicine is the fact that everyone is testing different things with different standards. This (interesting) paper is about agentic systems powered by DeepSeek-V3.2. Other papers look at single LLMs. Tons of different benchmarks. Confusion.”” / X https://x.com/emollick/status/1982630126065201636
Smoking gun: Pretty sure Cursor’s new Composer-1 is a fine-tuned Chinese model. As I was building, it switched its inner monologue to Chinese, and I can’t get it back to english. @simonw https://x.com/auchenberg/status/1983901551048470974
✨ At vLLM, we strive for correctness, reliability, and open collaboration — every detail matters. Together with @Kimi_Moonshot , we verified Kimi K2’s tool-calling accuracy on vLLM using the latest K2-Vendor-Verifier benchmark. Our debugging uncovered 3 key compatibility”” / X https://x.com/vllm_project/status/1983115488982122929
Kimi For Coding: Exclusive Add-on to Your VIP Plan! We’ve added Kimi For Coding as a powerful add-on built right on top of your current subscription perks. Extra value, no extra cost. More details 👉 https://x.com/Kimi_Moonshot/status/1984207737673359441
Kimi K2vv updated! We’ve added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy. Feedback is welcome! https://x.com/Kimi_Moonshot/status/1983082003731042637
Kimi K2vv updated! We’ve added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy. The infra team also listed some suggestions for vendors; looks like enforcer is important. https://x.com/crystalsssup/status/1983126339399102756
Kimi Linear Tech Report is dropped! 🚀 https://x.com/Kimi_Moonshot/status/1983937694360322136
My favorite part: > “Scaling Ladder” is a Kimi tradition for scaling models. We start from something small (say, 1B active parameters) and gradually aim to beat the baseline on benchmarks, while also monitoring the corresponding “internals.” Only after clearing each gate at each”” / X https://x.com/eliebakouch/status/1984291165860958614
Thankfully, theres a really nice glossary in the KIMI Delta Attention paper that covers most of the notable variants https://x.com/nrehiew_/status/1983891931823505518
There are a lot of works behind Kimi Linear. We’ve rethought efficient and expressive linear attention from infra. We even first discovered the attn matrix, and then the recurrent. No wait to check out the kda kernel in the FLA repo. We have much more work to do, to open.”” / X https://x.com/uniartisan/status/1983941443283775780
You are also welcome to share your suggestions and feedback for our Kimi CLI on GitHub. > https://x.com/Kimi_Moonshot/status/1984207741037252751
Introducing Kimi CLI Technical Preview & Kimi For Coding! Kimi CLI powers your terminal: – Shell-like UI + shell command execution – Seamless Zsh integration – MCP support -Agent Client Protocol (now compatible with @zeddotdev) More features incoming! https://x.com/Kimi_Moonshot/status/1984207733177090274
Claude, GPT-5, Gemini, and Kimi: “”write me a horror story done entirely in the dedications to six books (you can give me the title and author of each book as well)”” ChatGPT and Claude did well in different way. Kimi did the usual (sounds good but meaning falls apart). https://x.com/emollick/status/1982279778859151783
Many people are confused by Minimax’s recent return to full attention – especially since it was the first large-scale pivot toward hybrid linear attention – and by Kimi’s later adoption of hybrid linear variants (as well as earlier attempts by Qwen3-Next, or Qwen3.5). I actually”” / X https://x.com/SonglinYang4/status/1984021551914926514
This week we officially opened our first Asia-Pacific office in Tokyo. To mark this milestone, our CEO Dario Amodei met with Prime Minister Takaichi and Digital Minister Matsumoto to discuss how Claude can support Japan’s digital transformation and AI ecosystem. https://x.com/AnthropicAI/status/1983541657162432647
Another ByteDance Seed banger? They introduce the looped language models (LoopLMs) Ouro 1.4B and 2.6B trained on 7.7T tokens, that match evaluation results of larger 4B and 8B models respectively. “”Ouro”” 1.4B is a standard decoder-only Transformer with 24 layers (upcycled Ouro https://x.com/scaling01/status/1984286236438094307
(this is not valid) DeepSeek is the new king now. It’s gaining 125% in just 9 days, making more than GPT-5 and Gemini 2.5 Pro lost combined. DeepSeek is just a side project of a hedge fund, confirmed. https://x.com/Yuchenj_UW/status/1982658436182712750
Excited to partner with @RelianceJio to bring the best of Google AI to India. Eligible Jio users will receive our AI Pro plan at no extra cost for 18 months, including Gemini 2.5 Pro, 2TB of storage + our latest AI creation tools. Can’t wait to see what we’ll build together!”” / X https://x.com/sundarpichai/status/1983922303424471541
To scale data-constrained LLMs, repeating & denoising objectives can help. Another solution: Add multilingual data. But what languages help & how much? Below a snapshot for this at 2B scale, e.g., Chinese can hurt English while Indonesian may help. https://x.com/Muennighoff/status/1983243353341997536
Here’s my view of U.S. vs China competition https://x.com/adcock_brett/status/1983640903345696811
ollama run qwen3-vl Ollama’s engine now supports all the Qwen 3 VL models locally. 2B to 235B parameter sizes. The smaller models work exceptionally well for their size. The latest version of Ollama v0.12.7 is needed! Give it a try! 👇👇👇 https://x.com/ollama/status/1983683646864126155
open-source OCR models are super cheap to run and privacy first 🤝 BUT there’s a ton of new models out there: DeepSeek-OCR, Nanonets, PaddleOCR, how do you pick them? 🤯 don’t worry though, @huggingface got you covered! 🫡🧶 https://x.com/mervenoyann/status/1980685830411931885
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer A new paper from Weizmann Institute of Science, getting reconstructions that are not complete nonsense from ONLY 15 min of data. We previously demonstrated SOTA for 1 hr of data with MindEye2. This https://x.com/iScienceLuvr/status/1984195725253804449
Today we’re excited to add gpt-oss and DeepSeek model families to Tinker – one of our top community requests. With Tinker, you can train a 671B parameter model on your laptop in just a few lines of code. No GPU rentals. No CUDA. No cluster setup. Just train.”” / X https://x.com/dchaplot/status/1983055956352348614
OpenAI offers free ChatGPT Go for one year to all users in India | TechCrunch https://techcrunch.com/2025/10/27/openai-offers-free-chatgpt-go-for-one-year-to-all-users-in-india/
PewDiePie in 2025: – built a 10×4090 rig – runs Llama 70B, gpt-oss-120B & Qwen 245B locally via vLLM – built a custom web UI (chat, RAG, search, TTS) – ran protein-folding simulations for charity – created an AI “council”, a swarm of 64 models – now fine-tuning his own model https://x.com/Yuchenj_UW/status/1984309989134254493
A great deep-dive on On-Policy Distillation — an efficient way to post-train smaller LLMs with dense, on-policy feedback. Excited to see Qwen featured in the experiments, showcasing strong math-reasoning gains and continual-learning recovery. Excellent work by @thinkymachines 👏”” / X https://x.com/Alibaba_Qwen/status/1983053298447069275
Always visualize. We’ve caught so many interesting patterns / phenomenon In particular in either multimodal or MoE modeling by inspecting / debugging visually. @kilian_maciej ran this exercise of the Qwen3 MoE series and some very interesting patterns emerged”” / X https://x.com/AkshatS07/status/1982629716495663521
IBM Granite team released Granite 4 Nano models 1B variant outperforms Qwen3-1.7B with fewer params on a mix of tasks from math to coding 👏 https://x.com/mervenoyann/status/1983192115577503974
Qwen 3 Max Thinking has released Should also be up on VB shortly! https://x.com/legit_api/status/1984284268412191216
Qwen3-VL models are now live in LM Studio! 🎉🚀 A powerful collection of vision-language models. Happy Halloween! 🎃👻 https://x.com/lmstudio/status/1984330903880155154
We dive deep into every part of the stack that we didn’t build. Here’s a deep-dive into (potentially undisclosed?) MoE depth wise up-cycling that Qwen3 does. I have a lot of Qwen folks that follow me, would anyone like to clarify :)”” / X https://x.com/ArmenAgha/status/1982613142321746130
Very nice blog post from Thinky (@_kevinlu et al) about on-policy distillation for LLMs — we published this idea back in 2023 and it is *publicly* known to be successfully applied to Gemma 2 & 3, and Qwen3-Thinking (and probably many closed frontier models)! The idea behind”” / X https://x.com/agarwl_/status/1982880080482140372
📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by https://x.com/ShayneRedford/status/1983170949865173069




