Open Source: AI News Week Ending 08/22/2025

Open Source: AI News Week Ending 08/22/2025

August 22, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Community night school for model tooling; the words “Open Source” printed on a workshop banner in slab serif; branch-and-merge diagram projected over a zine table; inclusive, forest-green accents, hands-on craft

American companies are losing market share to chinese open-source companies! Anthropic’s coding market share on OpenRouter went from 46% in July down to 32% in a month the reason for it? Qwen3-Coder https://x.com/scaling01/status/1956858471682617553

NVIDIA ON A ROLL! Canary 1B and Parakeet TDT (0.6B) SoTA ASR models – Multilingual, Open Source 🔥 – 1B and 600M parameters – 25 languages – automatic language detection and translation – word and sentence timestamps – transcribe up to 3 hours of audio in one go – trained on 1 https://x.com/reach_vb/status/1957148807562723809

New DeepSeek V3.1 beats Opus and R1 for a dollar https://x.com/scaling01/status/1957892601098432619

Deepseek V3.1 is already 4th trending on HF with a silent release without model card 😅😅😅 The power of 80,000 followers on @huggingface (first org with 100k when?)! https://x.com/ClementDelangue/status/1957897020741402751

Traditional browsers will die and webpages won’t be the primary interface anymore. @joshm, the mind behind Arc and now Dia, knows the internet better than most. His vision challenges us to rethink how information should be presented, accessed, and experienced. https://x.com/fdaudens/status/1927168498238714289

bangs successfully removed with 8-step Qwen Image Edit [Fast] too 💨 using Qwen Image Lightning LoRA, now on Spaces👇 https://x.com/linoy_tsaban/status/1957762030393544847

NEW: Run @huggingface models locally on your @AMD Ryzen AI and Radeon PCs, with Lemonade! 🍋 + 🤗 = 💻🧠 GG @roaner @reach_vb @AIatAMD https://x.com/jeffboudier/status/1957972077002035405

Gemma 3 270M (8-bit) is also fast on iPhone 16 Pro 🏎️ ~140 tk/s for the A18 Pro chip with MLX, not that far from the M3 chip While Gemma 3 270M is not intended for chat, it’s perfect to be used with Apple Shortcuts on tasks like summarization for example https://x.com/adrgrondin/status/1957171759876059371

Got a Mac with an M-chip? You can now train Gemma3 270m locally as a multilingual embedding or reranker model using our mlx-retrieval project. It lets you train Gemma3 270m locally at 4000 tokens/s on M3 Ultra – that’s actually usable speed. We’ve implemented some standard https://x.com/JinaAI_/status/1958547803489415195

Ant Group just released UI-Venus on @huggingface It’s a native UI agent achieving SOTA in grounding & navigation tasks from just screenshots. Turns screenshots into reliable clicks and plans using small data and reinforcement fine-tuning. The usual way, supervised fine https://x.com/rohanpaul_ai/status/1956777729304711639

Gpt Oss News Agent – a Hugging Face Space by fdaudens https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent

Open-source, self-hostable browser automation library for AI agents; build agents to navigate sites, fill forms, click, and extract info, 90.4% on Web Voyager https://x.com/tom_doerr/status/1955640654085632485

Announcing Open Lovable 🔥 We’ve built an open-source AI web app builder that can transform any website URL into a working, editable clone, giving you a foundation to build on instantly. All powered by @GroqInc, @e2b, and Firecrawl. https://x.com/firecrawl_dev/status/1955660448587735393

1️⃣ Convert any collection of documents into an interactive MCP server through LlamaCloud 2️⃣ Convert any document workflow into an MCP server through LlamaCloud – codify a repeatable process that the user can easily trigger, without complex prompting! 3️⃣ Build a custom agentic https://x.com/jerryjliu0/status/1957873536456093903

We have a new comprehensive Model Context Protocol (MCP) documentation section, to help you connect your AI applications to external tools and data sources through a standardized interface. 🔌 Learn how MCP works – connecting LLMs to databases, tools, and services through a https://x.com/llama_index/status/1957840992360710557

🚀 Qwen Chat Desktop for Windows is here! 💻 All the power of Qwen Chat — now with MCP support for smarter, faster agents. ⚡ Run up MCP Servers, supercharge your productivity, and stay in control. 📥 Download now → https://x.com/Alibaba_Qwen/status/1956399490698735950

There’s been a lot of Discourse about Qwen’s rejection of hybrid paradigm. “”Did DeepSeek fall for the hybrid meme?”” But hybrids make *so much sense* if you’re building a fast, economical SWE agent, which is exactly what 3.1 is for. It’s all been for Aider, Claude Code, MCPs. https://x.com/teortaxesTex/status/1958437173948023127

Command A Reasoning: Enterprise-grade control for AI agents https://cohere.com/blog/command-a-reasoning

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and”” / X https://x.com/deepseek_ai/status/1958417062008918312

Want to build an AI Agent? I made a free cookbook for creating your own news research agent with open-weight GPT-OSS models — no GPU, no setup. Searches news → pulls articles → summarizes w/ sources → runs in a Gradio chat UI. https://x.com/fdaudens/status/1956006950249906593

AGENTS.md https://agents.md/

Codex CLI now works with your ChatGPT login, with generous GPT-5 use included in the plus and pro plans. $ brew install codex $ codex It’s that simple.”” / X https://x.com/thsottiaux/status/1957133984657481956

Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*! https://x.com/lvwerra/status/1957832240416580024

I tried @Alibaba_Qwen Qwen3-Coder today inside @cline . Very impressed. It helped me solve a tricky deployment: putting a Dockerized vibe-coded project onto https://x.com/chunhualiao/status/1956957519315956074

>V3.1-Base I guess this confirms they’ve moved on to hybrid models, Anthropic-style (and contra Qwen). I am not amused with how it works. But I was also disappointed with V2.5 (original), their merge of chat and code; ultimately, it worked. Another reason to expect V4, not R2. https://x.com/teortaxesTex/status/1957818879205351851

• DeepSeek V3.1 Reasoner improves on DeepSeek R1 on the Extended NYT Connections Benchmark: 48.6% → 57.7%. • DeepSeek V3.1 Non-Think improves on DeepSeek V3-0324: 16.8% → 21.6%. • Mistral Medium 3.1 improves on Mistral Medium 3: 11.5% → 15.2%. • GPT-5 (low https://x.com/LechMazur/status/1958970478712037548

Wow the team at @daftengine cooked! You can now read/write to 🤗Hugging Face with Daft! > DataFrame engine in 🦀 runs distributed and supports multimodal datasets to train/eval models Best part: it’s optimized for Xet, the dedupe-based HF storage that makes uploads crazy fast! https://x.com/lhoestq/status/1958904406004449452

🚨 Top 10 Leaderboard Disrupted! A new model provider has landed in the Arena Top 10: 💠Mistral-Medium-2508 ranks at #8! 💠it also ranks top 3 in the Coding & Longer Query categories The Text Arena is neck and neck—just a few points can shift the rankings and change who’s on https://x.com/lmarena_ai/status/1958954094867226954

🌐 Diffbot-small-xl has been added to the Arena! Brought to you by @diffbot, it’s the first open model to join the Search Arena. https://x.com/lmarena_ai/status/1957512493586350444

We released DeepConf that can achieve 99.9% on AIME’25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models”” / X https://x.com/tydsh/status/1959003712942403835

Just saw GLM-4.5V is trending #2 on Hugging Face https://x.com/Zai_org/status/1956421442092032258

We apply ComputerRL to the open-source GLM-4-9B-0414 model and evaluate its performance on the OSWorld benchmark. Our AutoGLM-OS-9B, built upon GLM-4-9B-0414, achieves state-of-the-art accuracy and demonstrates substantial improvements for general-purpose agents in desktop https://x.com/Zai_org/status/1958175307019829754

ByteDance just released the Seed-OSS 36B LLM on Hugging Face. It’s an open-source model with powerful long-context, reasoning, and agentic capabilities. https://x.com/HuggingPapers/status/1958207114876228111

ByteDance-Seed/Seed-OSS-36B-Instruct · Hugging Face https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct

New ByteDance Seed reasoning RL paper, relating RL to self-supervised learning. The paper is pretty dense with all the dual-task derivation so this is basically my notes. https://x.com/nrehiew_/status/1958882481488146644

Frontier AI performance typically reaches consumer hardware in just 9 months. With a single gaming GPU, you can run open-weight models matching the benchmark performance of the absolute frontier from less than a year ago. 🧵 https://x.com/EpochAIResearch/status/1956468453399044375

@cohere Congrats on the release and love the day-0 integration with Inference Providers 🔥 https://x.com/reach_vb/status/1958563034810446169

Command A Reasoning @cohere is now available in anycoder https://x.com/_akhaliq/status/1958602589681197494

Command A Reasoning is here! It’s designed to tackle complex enterprise tasks like deep research and data analysis. 🔥 As part of our commitment to the research ecosystem, we’re releasing the model weights. 🎉”” / X https://x.com/Cohere_Labs/status/1958576284763611322

Command A Reasoning looks somewhat interesting unfortunately with bad license and only for private use if you don’t want to pay https://x.com/scaling01/status/1958561844810903708

Introducing Command A Reasoning, our most advanced model for enterprise reasoning tasks. https://x.com/cohere/status/1958542682890047511

mlx-vlm v0.3.3 is here gr New models: – @LiquidAI_ LFM2-VL – @Zai_org GLM-4.5V – @cohere Command-A-Vision Changes: – New kernel for grid_sample – Fix bicubic interpolate kernel compatibility with macOS < 15 – Fix config inheritance Thank you very much to all the amazing https://x.com/Prince_Canuma/status/1958469233622327785

@deepseek_ai Now Available and default model in anycoder: https://x.com/_akhaliq/status/1958488877024362966

@scaling01 Just to clarify, it’s “”trained using the UE8M0 FP8.”” DeepSeek stated this is designed for the upcoming generation of chips”” / X https://x.com/Anonyous_FPS/status/1958437047359995914

@teortaxesTex Maybe I missed something, but I could only find the Base model, and no model card. Where did they upload the Thinking/Reasoning model? https://x.com/rasbt/status/1957982932594778596

📢 New Model(s) Drop: DeepSeek v3.1 Thinking & Chat are now on Yupp! The latest edition from @deepseek_ai offers hybrid thinking built in, for quicker answers and stronger, tool-savvy agents. We checked them out with some prompts on Yupp: https://x.com/yupp_ai/status/1958935061677711451

🥇DeepSeek v3.1 INT4 model: https://x.com/HaihaoShen/status/1958507863749325197

9:15 AM in China, I predict we’ll see the second item soon (logically, V3.1–Instruct) and hopefully a model card/tweets. My biggest wish is to also see «With the release of DeepSeek-V3.1, the V3 series comes to an end… the DeepSeek V4 series will be released in the future» https://x.com/teortaxesTex/status/1957975224768430179

BIG LAUNCH: @deepseek_ai’s V3.1 is now live on W&B Inference! One model, two modes: toggle between high-speed ‘Non-Think’ & deep ‘Think’. Priced at just $0.55/$1.65 per 1M tokens, it’s a game-changer for building intelligent agents. Want $50 in free credits? Details below. https://x.com/weave_wb/status/1958681269484880026

DeepSeek had been using UE8M0 FP8 for a long time, you can see it in DeepGEMM. But maybe? https://x.com/teortaxesTex/status/1958437815710089697

DeepSeek is doubling down on their open source commitments with an MIT license for -Base. This is not only their first permissively licensed base model, it is the first large* permissively licensed base model in the industry. * unless you count dots.llm1 from RedNote @ 140B. https://x.com/georgejrjrjr/status/1957867653764379073

Deepseek just released a new model! https://x.com/ClementDelangue/status/1957823652298166340

DeepSeek launches V3.1, unifying V3 and R1 into a hybrid reasoning model with an incremental increase in intelligence Incremental intelligence increase: Initial benchmarking results for DeepSeek V3.1 show Artificial Analysis Intelligence Index of 60 in reasoning mode, up from https://x.com/ArtificialAnlys/status/1958432118562041983

DeepSeek V3.1 beats Claude 4 Opus on Aider Polyglot This makes it the best non-TTC coding model and all of that for ~$1 https://x.com/scaling01/status/1957890953026392212

DeepSeek V3.1 dropped and the Cline community is testing it out. Early sentiment: “”Makes 10,000 assumptions even when told to clarify”” for planning tasks. What’s your experience been? (early data — 13.3% diff edit failure rate)”” / X https://x.com/cline/status/1959032407828602886

DeepSeek v3.1 is live on our Model APIs! https://x.com/basetenco/status/1958716181256577347

DeepSeek V3.1 Now Available on Chutes, with hybrid inference (one-model, two-modes) $0.1999 USD / M Input $0.8001 USD / M Output Available now: https://x.com/chutes_ai/status/1958507978476106196

DeepSeek-V3.1 Release | DeepSeek API Docs https://api-docs.deepseek.com/news/news250821

DeepSeek-V3.1-4bit running with MLX on M3 Ultra 512GB at 21 toks/sec! 🔥 Using only 380GB! 👀 <think> or </think> that is the question. https://x.com/ivanfioravanti/status/1958778366229655971

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model – so at its native 8-bit quantization, it https://x.com/MattBeton/status/1958946396062851484

Looking into the V3 vs V3.1 a bit – modelling and config for the latest deepseek models is exactly the same? What’s the difference then? purely data? if purely data then why release base model too? and not just release a refresh for instruct?”” / X https://x.com/reach_vb/status/1957824849633485249

looks like @deepseek_ai is still on track to ship DeepSeek V4! https://x.com/swyx/status/1957902542136045608

Now on MLX 🚀 > pip install mlx-lm”” / X https://x.com/Prince_Canuma/status/1958791001301987628

Reminder that there’s 15 hours difference between SF and Hangzhou/Beijing. DeepSeek release cycle is as follows: do tests, push the model to prod at ≈ 7 PM local time, go home/out for drinks/whatever, next day maybe leisurely add a model card. They sleep through the release. https://x.com/teortaxesTex/status/1957954702781686094

some highlights from the release: > optional thinking mode achieves same/ competitive results as R1-0528 > MMLU, GPQA): 80.1 on GPQA (pretty strong) > LiveCodeBench: scores 74.8 > R1 > AIME 2024: scores 93.1 > R1 > support for tool use (non-thinking mode only) > new search”” / X https://x.com/reach_vb/status/1958430639595864378

@deepseek_ai 3.1 reasons to get hyped about DeepSeek v3.1 1: Hybrid reasoning 2: Agentic tool use 3: Improved coding 3.1: Best-in-class latency on Baseten https://x.com/basetenco/status/1958515897972232526

@nrehiew_ That’s not why it’s because reasoning uses up context length too fast to get to the end of an agentic coding loop”” / X https://x.com/Teknium1/status/1958898159326765075

DeepSeek trained its agentic coder as a non reasoner. There is a reason Anthropic evaluated Opus 4.1 without thinking on SweBench, Claude Code has thinking off by default and Qwen released Qwen Coder for Qwen code as a non reasoner. We do not need reasoning for Agentic Coding. https://x.com/nrehiew_/status/1958838487895117956

DeepSeek-V3.1 officially released! Key highlights of the update: – hybrid thinking model – more efficient reasoning – improved reasoning for search – better tool calling and agentic capabilities – improvements on many benchmarks: SWE-Bench: 44.6% -> 66%, Aider Polyglot https://x.com/scaling01/status/1958438863279681824

DeepSeek-V3.1 on par with o3, Opus 4 and Gemini 2.5 Pro Preview on coding It achieves a 76.3% score on Aider Polyglot with Thinking https://x.com/scaling01/status/1958438007104549243

just a minor version bump. booooring https://x.com/willccbb/status/1958420877537849801

🚀 Exciting news: DeepSeek-V3.1 from @deepseek_ai now runs on vLLM! 🧠 Seamlessly toggle Think / Non-Think mode per request ⚡ Powered by vLLM’s efficient serving — scale to multi-GPU with ease 🛠️ Perfect for agents, tools, and fast reasoning workloads 👉 Guide & examples: https://x.com/vllm_project/status/1958580047658491947

DeepSeek-V3.1 is fully ready on Hugging Face Inference Providers! https://x.com/ben_burtenshaw/status/1958449429511352549

China’s DeepSeek Releases V3.1, Boosting AI Model’s Capabilities – Bloomberg https://www.bloomberg.com/news/articles/2025-08-19/china-s-deepseek-release-v3-1-boosting-ai-model-s-capabilities

Major TOM @GoogleDeepMind’s AlphaEarth Embeddings are now on @huggingface! 🚀 A new 6 TB prototype dataset for the community. Get it here: https://x.com/mikonvergence/status/1958767622176039019

Masquerade: Learning from In-the-wild Human Videos using Data-Editing https://arxiv.org/pdf/2508.09976

Some fun things people may have missed from Gemma 3 270M: 1. Out of 270M params, 170M are embedding params and 100M are transformers blocks. Bert from 2018 was larger 🤯 2. The vocabulary is quite large (262144 tokens). This makes Gemma 3 270M very good model to be hyper”” / X https://x.com/osanseviero/status/1956258657483534803

Couldn’t resist. Here’s a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): https://x.com/rasbt/status/1957073842393792751

New hyper-efficient addition to our amazing Gemma open models: Gemma 3 270M packs a real punch for its tiny size! It’s super compact and power efficient, so you can easily run your own task-specific fine-tuned systems on edge devices. Enjoy building with it!”” / X https://x.com/demishassabis/status/1956502480675578298

We’re sharing a new addition to the Gemma family of open models: Gemma 3 270M. 🛠️ It’s tiny yet mighty AI – making it ideal for task-specific fine-tuning, with powerful instruction following built-in. Here’s how you can build with it → https://x.com/GoogleDeepMind/status/1956393664248271082

Gemma 3 270M (8-bit) at ~200 tk/s on iPad Air M3 with MLX https://x.com/adrgrondin/status/1956428984876704059

🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error https://x.com/vllm_project/status/1957002590220431669

Hugging Face just dropped AI sheets to build and enrich datasets without writing a single line of code. Works with Qwen, Kimi, Llama 3 and other opensource LLMs. 100% Free, local and Opensource. https://x.com/Saboo_Shubham_/status/1956732735147639081

WE ARE SO BACK!!! https://x.com/reach_vb/status/1957821171249934486

🎨✨ From simple sketches to stunning 3D interiors — powered by Qwen-Image-Edit! All designs are community contributions, showcasing how AI transforms architectural visions into realistic, stylish, and precise creations. Try it now: https://x.com/Alibaba_Qwen/status/1958744976772198825

📸 Just showed Qwen Chat Vision Understanding how to “”see”” and understand a meal — and it didn’t just identify the food, it analyzed what, where, weight and even how many calories! From a simple photo, we extracted detailed insights: ✅ Object detection ✅ Weight estimation ✅ https://x.com/Alibaba_Qwen/status/1956618027769971070

🖼️ 🚨 Image Edit Leaderboard Update: Qwen-Image-Edit is now the #1 open model for Image Edit in the Arena (Apache 2.0). The model by @alibaba_qwen debuts at #6 overall on the Image Edit leaderboard tied with Gemini 2.0 Flash Preview. https://x.com/lmarena_ai/status/1958206842657743270

🖼️ Image Edit Model Update Qwen-Image-Edit, developed by @Alibaba_Qwen, is now available in the Arena. This model brings image editing capabilities, and we encourage you to test it with your most complex prompts. https://x.com/lmarena_ai/status/1957878222986821711

🚀 Excited to introduce Qwen-Image-Edit! Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing. ✨ Key Features ✅ Accurate text editing with bilingual support ✅ https://x.com/Alibaba_Qwen/status/1957500569029079083

🚀 Small but mighty update to Vision Understanding in Qwen Chat — now with native 128K context and stronger performance across vision, video, and 3D tasks! 🔥 Key Upgrades: ✅ Significant boost in math & reasoning ✅ More accurate object recognition ✅ OCR support for 30+ https://x.com/Alibaba_Qwen/status/1956289523421470855

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale “”Autoregressive models—generating content step-by-step like reading a sentence—excel in language but struggle with images. Traditionally, they either depend on costly diffusion models or https://x.com/iScienceLuvr/status/1956321483183329436

Qwen Image Edit works too well with lightx2v LoRA to run with just 8 and 4 steps, wtf? in my experience, 8 steps keeps the quality of the edits at the same level as the original model, at a 12x speedup 💨 (ofc i built a demo for it) https://x.com/multimodalart/status/1958217824629092568

Qwen-Image Edit in ComfyUI”” / X https://x.com/Alibaba_Qwen/status/1957991583649001555

Qwen-Image-Edit is out in anycoder for image editing in your vibe coded apps Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing. https://x.com/_akhaliq/status/1957519569016238268

Qwen-Image-Edit is the new open weights leader in Image Editing, with quality comparable to GPT-4o and FLUX.1 Kontext [max] Qwen-Image-Edit is the image editing variant of the recent Qwen-Image release from Alibaba, also released under the Apache 2.0 license with weights https://x.com/ArtificialAnlys/status/1958712568731902241

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency | Qwen https://qwenlm.github.io/blog/qwen-image-edit/

Relighting images with Qwen Edit impressive directional control and color temperature manipulation w/o additional finetuning crazy how we needed a dedicated model for this not long ago https://x.com/linoy_tsaban/status/1958176756185325931

Thank you! Qwen-Image-Edit is now available in anycoder!”” / X https://x.com/Alibaba_Qwen/status/1957709912202682588

👀🚨 Vision Leaderboard update! Two new models have entered the Vision Top 20 this week: 🔸Qwen-vl-max-2025 by @alibaba_qwen lands at #10 (tied with gemini-1.5-pro & gpt-5-nano-high) 🔸Step 3 by @StepFun_ai ranks at #19 (tied with step-lo-turbo) Congrats to both 🎉 this is https://x.com/lmarena_ai/status/1958957107946168470

Wow — Qwen-Image-Edit just debuted at #2 in the Image Editing Arena 🏆 ELO 1098, with performance on par with GPT-4o — and all at open weights under Apache 2.0. Thanks to @ArtificialAnlys Try it now: https://x.com/Alibaba_Qwen/status/1958725835818770748

llama.qtcreator is now part of ggml-org https://x.com/ggerganov/status/1958183404207214629

Introducing 𝘃𝗶𝗯𝗲-𝗹𝗹𝗮𝗺𝗮 to streamline your LlamaIndex development with context-aware coding agents. A command-line tool that that automatically configures your favorite coding agents with up-to-date context and best practices about LlamaIndex framework, LlamaCloud and https://x.com/llama_index/status/1958656414295237014

Mistral Medium 3.1 is 2nd on LMArena without style control. Very proud of the @MistralAI team ! https://x.com/GuillaumeLample/status/1959015551172583602

Mistral Medium 3.1 just landed on @lmarena_ai leaderboard—punching way above its weight! 🏆 #1 in English (no Style Control) 🏆 2nd overall (no Style Control) 🏆 Top 3 in Coding & Long Queries 🏆 8th overall Small model. Big impact. Try it now on Le Chat and the API! https://x.com/MistralAI/status/1959015454359585230

NVIDIA Nemotron Nano v2 – a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. 💚💚💚 9B: https://x.com/ClementDelangue/status/1957519608992407848

nvidia parakeet-tdt-0.6b-v3 600M model here: https://x.com/reach_vb/status/1957149090913128598

NVIDIA Nemotron Nano 2 An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model https://x.com/_akhaliq/status/1958545622618788174

Today we’re releasing NVIDIA Nemotron Nano v2 – a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the https://x.com/ctnzr/status/1957504768156561413

Nvidia dropping model that rivals qwen 3 8b, with data, with base model, not that bad of a license (could be better to be clear) a big win, love to see it. Hopefully is well integrated into open tools and “”easy to finetune”” etc, which is hard to measure”” / X https://x.com/natolambert/status/1957517030929887284

For full transparency, we had an implementation issue with the GPT-OSS models that the team worked hard to roll out fixes for and are now live with significant quality improvements. If you had tried GPT-OSS models at launch and weren’t happy, please give them another chance. 🫡 https://x.com/ozenhati/status/1957896891468800345

Fun fact, you can full parameter fine tune @OpenAI GPT-OSS 120B on single node or multinode. With @basetenco’s Truss CLI, it’s been pretty painless to deploy multinode training for 120B.”” / X https://x.com/winglian/status/1958155665597501879

just ~4x’d my gpt-oss-20b MFU (5% -> 18%) by completely rewriting the thinky sinky using logsumexp renormalization turns out me from 5 days ago is an incompetent joke of an engineer”” / X https://x.com/khoomeik/status/1957754482185630071

Together AI makes it simple to fine-tune the latest OpenAI gpt-oss-120B and gpt-oss-20B models. While these models are incredibly strong out of the box, fine-tuning takes their quality to another level. Get started with supervised fine-tuning today! (Blog link below) https://x.com/togethercompute/status/1958197481272901663

Want to fine-tune gpt-oss-120b? We teamed up with Axolotl to launch a new recipe to run fine-tuning out of the box — multi-node training, one-line deployments from the CLI, and built-in observability included. https://x.com/basetenco/status/1957877915737362437

I just ran the gpt-oss eval suite with the large gpt-oss-120b on my M2 Ultra using vanilla llama.cpp and got the following scores: – GPQA: 79.8% – AIME25: 96.6% These numbers are inline with those from various cloud providers: Here are the steps: https://x.com/ggerganov/status/1958238492603089287

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong https://x.com/jiawzhao/status/1958982524333678877

One of the quickest ways to start playing with a good local LLM on macOS (if you have ~12GB of free disk space and RAM) – using llama-server and gpt-oss-20b: brew install llama.cpp llama-server -hf ggml-org/gpt-oss-20b-GGUF \ –ctx-size 0 –jinja -ub 2048 -b 2048 -ngl 99 -fa https://x.com/simonw/status/1957880963666702466

The ultimate guide for using gpt-oss with llama.cpp – Runs on any device – Supports NVIDIA, Apple, AMD and others – Support for efficient CPU offloading – The most lightweight inference stack today https://x.com/ggerganov/status/1957821440633282642

Projects like the New Deal, the Apollo program pale in comparison to what we’re doing right now.”” 🆕 Greg Brockman (@gdb) joins us to talk GPT-5, GPT-OSS, and what’s next on @OpenAI’s road to crystallizing all of human intelligence! “Energy turns into compute, turns into https://x.com/latentspacepod/status/1956433236021883071

Update on this: the reason Microsoft (and probably Amazon) were so much worse at serving gpt-oss is that they ignored reasoning effort setting and stuck with the default medium one. The numbers make sense for that hypothesis, and someone from MS confirmed in the comments that”” / X https://x.com/giffmana/status/1955710876528599217

GPT-5 behind chinese models like Kimi-K2 and Qwen3-235B on coding https://x.com/scaling01/status/1956404452442681829

GPT-5-mini high shows no improvement over o4-mini and behind top chinese models like Kimi-K2, GLM-4.5, Qwen3-235B and DeepSeek-R1 https://x.com/scaling01/status/1956405559978029061

Introducing Chroma Cloud: an open-source serverless search database that is fast, cost-effective, scalable, and reliable. https://x.com/trychroma/status/1957523079938339163

GLM-4.5 is now live on TensorBlock Forge, an open-source middleware service that simplifies AI model provider management. Get your https://x.com/Zai_org/status/1958009737498234934

We built OpenHands in the open (~60K ⭐️ on GitHub). Now we’re giving back to the OSS ecosystem. Announcing the OpenHands Cloud OSS Credit Program → $100–$500 credits for maintainers. 👉 Learn how to apply!”” / X https://x.com/allhands_ai/status/1958901220363338034

AI everywhere — love seeing Qwen3 powering cars & robots on-device with Qualcomm NPU! 🚀 Thanks to NEXA AI 🙌”” / X https://x.com/Alibaba_Qwen/status/1958800193970954657

Qwen 3 instruct is now on Baseten Model APIs. Our model performance team has worked quite a bit of magic to reach ~95tps for Qwen 3 Instruct. This gives you blazing fast responses for a state of the art reasoning model. https://x.com/basetenco/status/1956475210582090030

The @Alibaba_Qwen team patched two improvement fixes after we released. We thought of doing a patch release for that. So, please update to the latest: 0.35.1. Notes: https://x.com/RisingSayak/status/1958057896731897940

Knobs that matter α tunes performance vs efficiency; accuracy rises fast until ~0.6 while cost stays low until ~0.4 then climbs. Implementation uses k‑means with k=60, Qwen3‑embedding‑8B (4096‑d) and top‑p=4 nearest clusters at inference. https://x.com/omarsar0/status/1958897532890943884

Quick hacks for tool calling and thinking flag support for DeepSeek V3.1 in SGLang: https://t.co/EoUWKu4MEE Then run with: –tool-call-parser deepseekv31 –reasoning-parser qwen3 And in request body: “”chat_template_kwargs””: {“”thinking””: true} This is up on @chutes_ai now, but”” / X https://x.com/jon_durbin/status/1958488353478758599

🐞 We hit a bug in the inference code for Qwen-Image-Edit on Diffusers, which caused some odd cases. ✅ Fixed now and thanks to Diffusers for the quick merge — give it another try! 🔗 Try it now: https://x.com/Alibaba_Qwen/status/1957840853277290703

AI Toolkit now supports fine tuning Qwen Image Edit and supports caching the text embeddings with the control images. I already trained a 3 bit ARA for it, which will allow you to train a LoRA at 1024 on a 5090 when caching the text embeddings. More in 🧵 https://x.com/ostrisai/status/1958932936620900666

It’s out friends! Really great to see the state of things in image edits, video fidelity being pushed further and further, thanks to the community! This release also features new fine-tuning scripts for Qwen-Image and Flux Kontext (with support for image inputs). So, get busy https://x.com/RisingSayak/status/1957668389935096115

nano-banana, qwen-image-edit, what else? Try @StepFun_ai NextStep-1-Large-Edit – 14B AR model – Apache 2 license – Demo available on @huggingface – Pretrain model also made available Link below https://x.com/Xianbao_QIAN/status/1957749693485838448

qwen image edit is back at #1 trending model at @huggingface 👑 https://x.com/multimodalart/status/1958229738398634171

Qwen-Image pruning experiment. Going from 60 to 30 blocks, 20B params to 10B params. Removed block idx 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, 16, 21, 23, 24, 40, 41, 42, 43, 44, 45, 49, 50, 51, 52, 53, 54, 55, 56 https://x.com/ostrisai/status/1957748358451503166

In collaboration with @NASA, we’ve open-sourced Surya on @huggingface — a new foundation model designed to help researchers protect infrastructure through accessible, accurate modeling of space weather. It’s going to totally change how we forecast solar storms. See how.🧵 https://x.com/IBM/status/1958152244504768949