Open Source: AI News Week Ending 11/14/2025

Open Source: AI News Week Ending 11/14/2025

November 14, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: A massive transparent crystalline command sphere floating in deep space above Earth, its internal glowing circuits and data pathways fully visible through glass walls, small spacecraft docking at open ports, cinematic science fiction lighting with cool blues and subtle neon highlights, Ender’s Game inspired orbital structure, dramatic rim lighting emphasizing the sphere’s see-through architecture against the black void

GPT-5, Claude, Kimi, and Gemini: “”I can travel back in time to any time before 1500 and change only one thing, what is the single thing you would change, nothing obvious.”” https://x.com/emollick/status/1987355374928769395

Perceptron’s platform is here — built for Physical AI Developers can now use Isaac-0.1 or Qwen3VL 235B via: Perceptron API — fast, reliable multimodal intelligence Python SDK — simple, grounded prompting for vision + language Build apps that see and understand the world. https://x.com/perceptroninc/status/1988713482460750290

Every day, over 1,500 terabytes of open models and datasets are downloaded and uploaded between @huggingface and @googlecloud by millions of AI builders. We suspect it generates over a billion dollars of cloud spend annually already. So we’re excited to announce today a new https://x.com/ClementDelangue/status/1989000335247983049

🤗 @huggingface we’re announcing a closer partnership with @googlecloud to make open model development easier across the Hugging Face ecosystem and Google Cloud! – Deep Learning Containers (DLCs) for streamlined deployment and training – DLCs available via Vertex AI, Cloud Run, https://x.com/alvarobartt/status/1988970441357094984

Most models: think → tool call → think → tool call K2 Thinking: keeps tool calls inside the reasoning trace so multi-step workflows don’t drift. We’ll show how Moonshot post-trained for agentic tool calling and demo complex workflows running in one model call.”” / X https://x.com/togethercompute/status/1988009780149878904

It turns out that Kimi K2 Thinking is also a beast at deep research. It can run 200-300 tool requests for impressive multi-agent capabilities. Would you like to see a code example of it?”” / X https://x.com/omarsar0/status/1987912692099682399

Kimi K2 Thinking is impressive. So I built a multi-agent deep researcher, Kimi Deep Researcher. It generates long research reports on any topic, powered by subagents (web searcher, analyzer, and synthesizer). It can do 100s of tool calls per session. Repo soon! https://x.com/omarsar0/status/1988974710592516454

These are pretty impressive benchmarks from a Chinese open weights model. Especially big is the agentic capability, which has generally lagged in the open weights models. Be interesting to see independent confirmation soon, I found K2 a solid, but kind of weird, model to use.”” / X https://x.com/emollick/status/1986452925418270871

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built https://x.com/Kimi_Moonshot/status/1986449512538513505

🚀We’re going live with @Kimi_Moonshot on Nov 19 for a technical deep dive on Kimi K2 Thinking Learn about the 1T parameter MoE that allows your AI agent to make 300 tool calls in one run. Register: https://x.com/togethercompute/status/1988009777247510564

from Kimi AMA: – K3 will likely use KDA or some other hybrid attention mechanism – Kimi-K2 will get vision https://x.com/scaling01/status/1987916859400659011

I wonder if part of what makes Kimi K2 Thinking impressive is that it produces a lot more thinking tokens for even minor & non-technical queries than any model I have used. This is the thinking trace for “”write me a really good sentence about cheese”” it is 1,595 tokens long! https://x.com/emollick/status/1987286609713107261

Try Kimi-K2-Thinking now on Together AI https://x.com/togethercompute/status/1988011880443470217

I’m sorry Kimi bros The problem is and was 100% the OpenRouter API and it’s starting to piss me off that long reasoning always breaks Just use Kimi API for now and not OpenRouter if you have requests that take a lot of reasoning tokens. Simpler requests work fine with”” / X https://x.com/scaling01/status/1987938809628291168

since testing Kimi-K2 Thinking I have become very wary of providers on OpenRouter might switch to original provider APIs only they need to do quality testing for every model and provider”” / X https://x.com/scaling01/status/1988399213563236810

Kimi K2 Thinking passes the Lem Test the first time, very few models have done so Just like Kimi K2, however, this remains a very weird & interesting model in a way that is hard to benchmark. Its writing is often very good but sometimes doesn’t hold up under close investigation https://x.com/emollick/status/1986552301922738651

Thanks everyone for testing Kimi K2 Thinking and sharing benchmark results! We’ve noticed that benchmark outcomes can vary across providers. Some third-party endpoints show substantial accuracy drops (e.g., 20+ pp), which has negatively affected scores on reasoning-heavy tasks”” / X https://x.com/Kimi_Moonshot/status/1987892275092025635

Kimi AMA on K2 Thinking: 1. $4.6M training cost is not an official number 2. Trained on H800s (nerfed H100s) 3. KDA (Kimi Delta Attention) hybrids with NoPE MLA perform better than full MLA with RoPE 4. Muon scales well to 1T parameters. “there are tens of optimizers and”” / X https://x.com/Yuchenj_UW/status/1987940704929395187

Test out Kimi K2 Thinking vs. all the frontier models for yourself at: https://x.com/arena/status/1987947224173781185

Testing Kimi K-2 has reminded me of how insane it is that firms picking AIs are treating them as fungible based on benchmarks Kimi & Grok & Claude & every other model have strengths, quirks & weaknesses that can make a big difference in aggregate Develop your own benchmarks!”” / X https://x.com/emollick/status/1986604851770360213

In our new Expert and Occupational leaderboards: The previous, non-thinking Kimi K2 is ranked #7 for Hard Prompts, particularly excelling in the ‘Legal & Government’ category under the ‘Occupational’ leaderboard, while falling behind in ‘Instruction Following’. Kimi K2 Thinking https://x.com/arena/status/1987947222299013630

k2 vision is happening. this is not a drill. https://x.com/code_star/status/1987917177417289794

Whenever people ask me, “Is Muon optimizer just hype?” I need to show them this. Muon isn’t just verified and used in Kimi; other frontier labs like OpenAI are using it and its variants. It’s also in PyTorch stable now! https://x.com/Yuchenj_UW/status/1987955443420065816

Latest LisanBench results for Kimi-K2 Thinking Kimi-K2 Thinking is the best open-source model and 7th best model overall, right between GPT-5 and GPT-5-Mini Raw Scores: Glicko-2 ratings – better indicator of relative strength Kimi-K2 Thinking managed to set new high-scores https://x.com/scaling01/status/1987952884927934966

🚨 Leaderboard Update! Kimi K2 Thinking by @Kimi_Moonshot has landed on the Text leaderboard as the #2 open source model (MIT modified), tied for #7 overall. These are real-world results. With only a six-point difference with @Zai_org ‘s GLM 4.6, the competition is tight. Kimi https://x.com/arena/status/1987947219224526902

China’s DeepSeek makes rare public comment, calls for AI ‘whistle-blower’ on job losses | South China Morning Post https://www.scmp.com/tech/big-tech/article/3332086/chinas-deepseek-makes-rare-public-comment-calls-ai-whistle-blower-job-losses

How cool is this! @Siemens is breaking new ground with an open-source-first, self-contained LLM platform, optimized by @vllm_project. Learn how they deployed their sustainable AI stack with flexibility, full control, and cost savings at scale: https://x.com/NVIDIAAIDev/status/1987944094883037559

Join us LIVE at MCP’s first Birthday kickoff at 10 am PT today!🎂 Don’t miss out on details about the celebration from the co-hosts, @Gradio and @AnthropicAI. 🔥 We’ve also got an exciting lineup of speakers from @Huggingface, @OpenAI, @GoogleDeepMind, @modal, @blaxelAI, https://x.com/Gradio/status/1989315723336749412

🚀 Qwen DeepResearch 2511 is LIVE! 🚀 We’ve just dropped a major upgrade, making your research deeper, faster, and smarter! 🔗： https://x.com/Alibaba_Qwen/status/1989026687611461705

🚀 Qwen Code v0.2.1 is here! We shipped 8 versions（v0.1.0->v0.2.1） in just 17 days with major improvements: What’s New: 🌐 Free Web Search: Support for multiple providers. Qwen OAuth users get 2000 free searches per day! 🎯 Smarter Code Editing: New fuzzy matching pipeline https://x.com/Alibaba_Qwen/status/1989368317011009901

QwenEdit-2509 Photo2Anime: LoRA transforms photos into anime; delivers better results than prompting for “”anime”” without it. https://x.com/wildmindai/status/1988309389259010112

Qwen Image Edit Light Restoration app Easily remove shadows and relight in seconds. Here’s how + 4 wild examples:👇 https://x.com/minchoi/status/1988008926797787208

We releasing a large update to 📄FinePDFs! – 350B+ highly education tokens in 69 languages, with incredible perf 🚀 – 69 edu classifiers, powered by ModernBert and mmBERT – 300k+ EDU annotations for each of 69 languages from Qwen3-235B https://x.com/HKydlicek/status/1988328336469459449