Open Source: AI News Week Ending 09/12/2025

Open Source: AI News Week Ending 09/12/2025

September 12, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: OpenSource, seaside picnic table, sticker‑covered laptop with tasteful open‑source iconography, gentle flare, photorealistic, editorial, minimal, landscape, vacation, no text overlays

📊 @Kimi_Moonshot’s K2-0905 on @GroqInc scored 7th overall at 94% on Roo Code evals, the 1st open-source model to break the 90+ barrier. It’s also the fastest and cheapest in the top 10, while holding its own on accuracy. View the full leaderboard: https://x.com/roo_code/status/1965098976677658630

It feels the coding agent frontier is now open-weights: GLM 4.5 costs only $3/month and is on par with Sonnet Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs — too slow.”” / X https://x.com/Tim_Dettmers/status/1965021602267217972

Kimi K2 0905 upgrade: Substantial improvement in agentic capabilities, modest change in overall intelligence Key takeaways: ➤ Intelligence increased +2 pts in our Artificial Analysis Intelligence Index ➤ Agentic capabilities substantially improved as shown by our two new https://x.com/ArtificialAnlys/status/1965010554499788841

🚨 Leaderboard Disrupted! Two new models have entered the Top 10 Text leaderboard: 🔸#6 Qwen3-max-preview (Proprietary) by @Alibaba_Qwen 🔸#8 Kimi-K2-0905-preview (Modified MIT) by @Kimi_Moonshot tied with 7 others. Note that this puts Kimi-K2-0905-preview in a tight race for https://x.com/arena/status/1965115050273976703

AI agents can finally talk to your frontend! The AG-UI Protocol bridges the critical gap between AI agents and frontend apps, making human-agent collaboration seamless. MCP: Agents to tools A2A: Agents to agents AG-UI: Agents to users 100% open-source. https://x.com/akshay_pachaar/status/1963945302991450272

🤗 Use Hugging Face Inference Providers with GitHub Copilot Chat in VS Code https://huggingface.co/docs/inference-providers/en/guides/vscode

MBZUAI and G42 Launch K2 Think: A Leading Open-Source System for Advanced AI Reasoning https://www.prnewswire.com/news-releases/mbzuai-and-g42-launch-k2-think-a-leading-open-source-system-for-advanced-ai-reasoning-302551074.html

⚡️ Efficient weight updates for RL at trillion-parameter scale 💡 Best practice from Kimi @Kimi_Moonshot vLLM is proud to collaborate with checkpoint-engine: • Broadcast weight sync for 1T params in ~20s across 1000s of GPUs • Dynamic P2P updates for elastic clusters •”” / X https://x.com/vllm_project/status/1965824120920342916

Introducing checkpoint-engine: our open-source, lightweight middleware for efficient, in-place weight updates in LLM inference engines, especially effective for RL. ✅ Update a 1T model on thousands of GPUs in ~20s ✅ Supports both broadcast (sync) & P2P (dynamic) updates ✅ https://x.com/Kimi_Moonshot/status/1965785427530629243

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5 https://x.com/rasbt/status/1965798055141429523

Stop building AI agents that ignore your instructions. This Python framework guarantees LLM Agents follow your rules in production. Every single time. 100% Opensource. https://x.com/Saboo_Shubham_/status/1963428564398932074

@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://x.com/basetenco/status/1966224960223158768

DeepSeek V3.1 dynamic @UnslothAI quants on Aider Polyglot benchmarks are here! 1. 3-bit thinking gets 75.6% vs 76.1% un-quantized 2. Leaving attn_k_b in 8-bit gets +2% accuracy vs 4-bit 3. Dynamic quants beat other similar imatrix quants 4. AMA r/LocalLlama today 10AM PST! https://x.com/danielhanchen/status/1965800675105017980

Evals are a scam. And we’re being gaslit into believing they aren’t. New post just dropped (🧵). https://x.com/AlexReibman/status/1964116243847565526

Evals are a scam. This is what Twitter was fighting about all weekend, and honestly? Both sides are missing the point. The real problem isn’t whether evals work or don’t work. It’s that everyone uses “”evals”” to mean 6 different things and then acts shocked when the conversation https://x.com/bnicholehopkins/status/1965130607790264452

We’re open-sourcing everything. All the projects built during the Nano Banana Hackathon are now released as open-source. Since AI Studio supports remixing, you can easily adapt and reuse them. They may not have been huge projects, but it was a great opportunity to truly”” / X https://x.com/arrakis_ai/status/1965001417716072877

BOOM! Starting today you can use open source frontier LLMs in @code with HF Inference Providers! 🔥 Use your inference credits on SoTA llms like GLM 4.5, Qwen3 Coder, DeepSeek 3.1 and more All of it packaged in one simple extension – try it out today 🤗 https://x.com/reach_vb/status/1966185427582497171

ERNIE-4.5-21B-A3B-Thinking https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking

New on the Together Fine-Tuning Platform: ✅ Train 100B+ models (DeepSeek, Qwen, gpt-oss) ✅ Long-context fine-tuning (up to 131k tokens) ✅ @huggingface Hub integration ✅ Advanced DPO options Details in the 🧵”” / X https://x.com/togethercompute/status/1965845309273346557

PaddleOCRv5 is now on @huggingface Hub with Apache-2.0 license 🔥 despite being tiny (70M) the benchmarks look insane, outperforms latest models! it supports 40 languages, deploy it anywhere 🤯 demo and model on the next one ⤵️ https://x.com/mervenoyann/status/1966097461640126704

Starting today, you can use Hugging Face Inference Providers directly in GitHub Copilot Chat on @code! 🔥 which means you can access frontier open-source LLMs like Qwen3-Coder, gpt-oss and GLM-4.5 directly in VS Code, powered by our world-class inference partners – https://x.com/hanouticelina/status/1966201072390701298

Out of over 2 million open models, EmbeddingGemma is the top trending model on Hugging Face https://x.com/osanseviero/status/1965774422834622648

Introducing LlamaIndex Classify: Rules-Based Document Classification Made Simple Learn how to automatically classify your documents with LlamaIndex’s newest beta feature! In this quick demo, Laurie walks through the Classify service – a powerful tool for preprocessing documents https://x.com/llama_index/status/1963263366086172719

Mixtral of experts | Mistral AI https://mistral.ai/news/mixtral-of-experts

ASML, Mistral AI enter strategic partnership https://www.asml.com/en/news/press-releases/2025/asml-mistral-ai-enter-strategic-partnership

Mistral raises 1.7B€, partners with ASML | Hacker News https://news.ycombinator.com/item?id=45178041

4B OCR with Apache-2.0 license outperforming Mistral OCR 🔥 Tencent released Points-Reader, it’s a new model firstly trained on Qwen2.5VL annotations and then self-trained on real data in many benchmarks, it performs better than Qwen2.5VL and MistralOCR! https://x.com/mervenoyann/status/1966176133894098944

fan-favorite vision LM Florence-2 is now officially supported in @huggingface transformers 🤗 find all the models in florence-community org 🫡 https://x.com/mervenoyann/status/1966122522723725420

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion https://huggingface.co/tencent/POINTS-Reader

The release of OpenAI’s GPT-OSS through transformers came with a lot of new upgrades that are now part of the toolkit. Today, we release a blog post covering all of the updates in depth. A thread on what we cover. [1/N] https://x.com/ariG23498/status/1966111451481043402

We shipped an OSS ‘vibe coding platform’ (like @v0) built with @vercel AI SDK, Gateway and Sandbox. We worked with @openai to tune the GPT-5 agent loop. It can write/read files, run commands, install packages, autofix errors… Demo oneshotting a multiplayer Pong in Go ↓ https://x.com/rauchg/status/1964857952722133231

You DO NOT want to miss this – All the tricks and optimisations used to make gpt-oss blazingly fast, all of it – in a blogpost (with benchmarks)! 🔥 We cover details ranging from MXFP4 quantisation to, pre-built kernels, Tensor/ Expert Parallelism, Continuous Batching and much https://x.com/reach_vb/status/1966134598682767507

At Thinking Machines, our work includes collaborating with the broader research community. Today we are excited to share that we are building a vLLM team at @thinkymachines to advance open-source vLLM and serve frontier models. If you are interested, please DM me or @barret_zoph!”” / X https://x.com/woosuk_k/status/1966245455815487703

Interesting example of how open weights models provide opportunities for innovation. Salesforce builds a strong deep research agent from OpenAI’s small open source model. Though open models development is dependent on the good will of OpenAI, Mistral & a few Chinese firms.”” / X https://x.com/emollick/status/1965735119307817245

@reach_vb @Alibaba_Qwen ❤️ We ship as fast as we can. We optimized the models’ speed, serve in bf16, should be fast!”” / X https://x.com/Yuchenj_UW/status/1966201249721888800

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed & https://x.com/Alibaba_Qwen/status/1966197643904000262

From research paper to live website in minutes 🚀 Upload your paper, let Qwen Chat turn it into a webpage, and deploy instantly. Try it now: https://x.com/Alibaba_Qwen/status/1964870508421480524

I vibe coded a visual PDF search app with ColQwen2. This is how it works: – Store PDF files as images in a @weaviate_io vector database – Embed images and text with a multimodal late-interaction model (ColQwen2) – Generate token-wise (and summed) similarity maps to highlight https://x.com/helloiamleonie/status/1964997028875743637

Learn more about Qwen3-max-preview here: https://x.com/arena/status/1965124408097517853

Qwen3-Next (thinking & non-thinking) are now live in BF16 at Hyperbolic! Qwen3-Next is a huge efficiency leap: – 80B MoE with just 3B active params – 10x cheaper to train vs Qwen3-32B – 10x inference throughput for >32K tokens Proud to be a launch partner with @Alibaba_Qwen – https://x.com/Yuchenj_UW/status/1966199037973200955

Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a year. We believe that our solution shoud be at least a stable and solid solution to”” / X https://x.com/JustinLin610/status/1966199996728156167

Qwen3-Next: Towards Ultimate Training & Inference Efficiency
https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list

Welcome Qwen3-Next! You can run it efficiently on vLLM with accelerated kernels and native memory management for hybrid models. https://x.com/vllm_project/status/1966224816777928960

🎙️ Meet Qwen3-ASR — the all-in-one speech recognition model! ✅ High-accuracy EN/CN + 9 more languages: ar, de, en, es, fr, it, ja, ko, pt, ru, zh ✅ Auto language detection ✅ Songs? Raps? Voice with BGM? No problem. <8% WER ✅ Works in noise, low quality, far-field ✅ Custom https://x.com/Alibaba_Qwen/status/1965068737297707261

inpainting is not dead! @instantx_ai brought it back to life! 🪔🕯 Qwen Image Inpainting ControlNet allows for the most precise, targeted & high quality edits that ever happened to inpainting Official Model & Demo on @huggingface 🤗 https://x.com/multimodalart/status/1966190381340692748

Fine-tune Any LLM from the Hugging Face Hub with Together AI https://huggingface.co/blog/togethercomputer/together-ft