Open Source: AI News Week Ending 11/21/2025

Open Source: AI News Week Ending 11/21/2025

November 21, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic wide shot of the Emerald City gates from Wicked constructed from transparent green wireframe code and glowing open source documentation segments, yellow brick road made of interlocking puzzle pieces, dramatic moody lighting with object segmentation overlays highlighting each architectural element in different colors, movie poster composition with Open Source title overlay

Let’s goooo! We’ve just launched a new Computer Use Agent (CUA) powered by open models, @huggingface smolagents and @E2B for secure computer sandboxing! We’re building something different. Open. Transparent. Yours. Check this out https://x.com/amir_mahla/status/1991166551945355295

We introduce Olmo 3, a family of state-of-the-art, fully open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long context reasoning, function calling, coding, instruction following, general chat, and knowledge recall.https://www.datocms-assets.com/64837/1763662397-1763646865-olmo_3_technical_report-1.pdf

BOOM! New fact-checking benchmark from @ArtificialAnlys. Great insights + a @huggingface dataset for model evaluation. Added to lighteval and tested on top models of HF inference providers 🔥 MODEL AND JUDGE RESPONSES IN THREAD 👇 https://x.com/nathanhabib1011/status/1991165652783222982

Gemini 3 Pro Preview has comparable speeds to Gemini 2.5 Pro, with 128 output tokens per second. This places it ahead of other frontier models including GPT-5.1 (high), Kimi K2 Thinking and Grok 4 https://x.com/ArtificialAnlys/status/1990813128226189811

The Artificial Analysis leaderboard shows Gemini 3 at 73%, GPT-5.1 at 70%, and Kimi at 67% – minor differences. On our leaderboard, Gemini is 47%, GPT-5.1 is 38%, and Kimi is 27% – Gemini 3 is substantially more capable on hard benchmarks. https://x.com/hendrycks/status/1991188104804208736

We estimate that Kimi K2 Thinking has a 50%-time-horizon of around 54 minutes (95% confidence interval of 25 to 100 minutes) on our agentic SWE tasks. Note that we conducted this evaluation through a third-party inference provider, which reduces our confidence in this estimate. https://x.com/METR_Evals/status/1991658241932292537

Kimi K2 Thinking is impressive. So I built a multi-agent deep researcher, Kimi Deep Researcher. It generates long research reports on any topic, powered by subagents (web searcher, analyzer, and synthesizer). It can do 100s of tool calls per session. Repo soon! https://x.com/omarsar0/status/1988974710592516454

🤗 Kimi-k2-Thinking has reached top performance on the latest IMO-level reasoning benchmark, AMO-Bench from Meituan Longcat!”” / X https://x.com/Kimi_Moonshot/status/1991139250566545886

Kimi-K2 Thinking gets the same score on METR as Claude 3.7 Sonnet as I was saying, open-source is 9 months behind frontier labs on agentic, long-context reasoning tasks it’s still an improvement and open-source models seem to be on their own exponential, but I heavily suspect https://x.com/scaling01/status/1991665386513748172

NVIDIA just released Nemotron Parse on Hugging Face A new vision model that goes beyond traditional OCR to understand complex document layouts. It extracts text, tables, and other elements with spatial grounding, turning unstructured documents into actionable data.”” / X https://x.com/HuggingPapers/status/1991108589235372286

NVIDIA Apollo Unveiled as Open Model Family for Scientific Simulation | NVIDIA Blog https://blogs.nvidia.com/blog/apollo-open-models/

Open-source research agents have been lagging behind proprietary systems like OpenAI’s Deep Research. The gap has been frustrating for developers who want powerful, deep research agents without vendor lock-in. I’ve been building my own called Kimi Deep Researcher. Similarly, https://x.com/omarsar0/status/1990794651608219727

open-weight models are around 8 months behind closed frontier models the doubling time is below the stated 7 months, I estimate it to be closer to 6.5 months from the limited data on open-weight models it seems like progress is happening at a similar pace https://x.com/scaling01/status/1991684839821423073

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big https://x.com/natolambert/status/1991508141687861479

Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. I am sure I’ll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks. In the meantime, here’s the side-by-side https://x.com/rasbt/status/1991656199394050380

The OlmoRL infrastructure was 4x faster than Olmo 2 and made it much cheaper to run experiments. Some of the changes: 1. continuous batching 2. in-flight updates 3. active sampling 4. many many improvements to our multi-threading code https://x.com/finbarrtimbers/status/1991546419875115460

AllenAI deserves so much more visibility and credit than what they are getting today. Another banger release today with Olmo-3, fully open-source with all code, models in Apache 2.0, and associated training details. https://x.com/ClementDelangue/status/1991609311920026027

Amazing work – congrats to the Olmo team! Look forward to the day when open-source is the default.”” / X https://x.com/percyliang/status/1991545594482159619

Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. @StellaLisy proves this with spurious rewards: RL trained on a random reward signal can’t improve on the evals, unlike some previous setups https://x.com/mnoukhov/status/1991576437246292434

supervision-0.27.0 lets you parse and visualize Qwen3-VL object detection results prompt: person between albert and marie. to answer this, Qwen3 needs prior visual knowledge of Marie Curie and Albert Einstein, and it needs to understand reference terms like between. https://x.com/skalskip92/status/1990433442434031737

Today we are announcing cline-bench, an open source benchmark of real world agentic coding tasks that we are building together with the community. cline-bench turns difficult Cline tasks from open source repos into containerized RL environments, with real repo snapshots, real”” / X https://x.com/cline/status/1991612268220752130

Aide is an open-source AI native code editor built on top of the agentic framework. It’s SOTA at 43% on swebench-lite and has all the features you expect out of Cursor/Copilot, with complete data privacy and plug-and-play LLM integration. https://x.com/ycombinator/status/1854237314651980257

DeepSeek is back with a new open source repository. “”A parallel load balancer that optimizes workload distributions for MoE models”” https://x.com/scaling01/status/1991067602467131704

PRO for PROs Nano Banana PRO is available at no cost for @huggingface PRO subscribers on Spaces, go bananas 🍌 https://x.com/multimodalart/status/1991549140627775511

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix https://huggingface.co/blog/codelion/optimal-dataset-mixing

“We’re in an LLM bubble,” Hugging Face CEO says–but not an AI one – Ars Technica https://arstechnica.com/ai/2025/11/were-in-an-llm-bubble-hugging-face-ceo-says-but-not-an-ai-one/

HunyuanVideo 1.5 is out on Hugging Face 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. HD Cinematic Quality: Natively generates 5-10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality https://x.com/_akhaliq/status/1991724463462011328

Anycoder now has a slick new UI. It is still my best place to try new models as it offers a ton of options, and one-click deployment to Hugging Face Spaces. 🫡 https://x.com/pandeyparul/status/1991726081288859966

Perplexity Pro and Max subscribers now have access to Kimi-K2 Thinking and Gemini 3 Pro. https://x.com/perplexity_ai/status/1991614227950498236

🚨Leaderboard Update New model provider in the Arena: @DeepCogito has released Cogito v2.1 (MIT licensed) 🔹Top 10 Open Source Model for WebDev, rank #10 🔹Tie ranks #18 overall for WebDev This puts Cogito v2.1 on par with community favorites like Qwen 3 Coder Plus & Kimi K2 https://x.com/arena/status/1991211903331496351

As promised, Kimi K2 and Gemini 3 Pro are available for all Perplexity Pro and Max users. Grok 4.1 will be available soon.”” / X https://x.com/AravSrinivas/status/1991619527638151665

Z.ai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai – YouTube https://www.youtube.com/watch?v=m6MF1OR_9kM

I don’t understand why there’s an expectation that open weights models will keep up with closed ones. Costs will continue to go up without a clear way to get revenue (traditional open source revenue strategies won’t work), more capable systems will face government pressure, etc.”” / X https://x.com/emollick/status/1989189674867511723

🍐@trypearai (YC F24) is an open source AI code editor with a curated inventory of the best AI tools, natively integrated for effortless AI-powered coding. They’re building a flexible framework for the AI coding tech stack under a unified UX: https://x.com/ycombinator/status/1856441845880107408

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence https://x.com/TencentHunyuan/status/1991721236855156984

I love Qwen3-VL but for some reason 2B model blows up 80GB VRAM on simple SFT (NF4 QLoRA with flash installed)”” / X https://x.com/mervenoyann/status/1990172603437175147

10,000,000 users creating with Qwen Chat — and we’re just getting started. From here, let’s begin — https://x.com/Alibaba_Qwen/status/1990322403994657091

This weekend I evaluated the latest Qwen3-VL models for semantic object detection and built a HF Space to compare Qwen3/2.5/2 side by side. 👉 https://x.com/darius_morawiec/status/1990225022766719335

Today, we are officially open-sourcing a set of high-quality speculator models on the @huggingface Hub. Our first release includes Llamas, Qwens, and gpt-oss. In practice, you can expect 1.5-2.5× speedups on average, with some workloads seeing more than 4× improvements! https://x.com/_EldarKurtic/status/1991160711838359895

Ulysses Sequence Parallelism integration from Arctic Long Sequence Training has been merged into @huggingface Accelerate. So now you can choose from context and sequence parallelism solutions to deal with very long seqlens. https://x.com/StasBekman/status/1991561577007611907

New open-weight LLM by Deep Cogito! Locally (671B): ollama run cogito-2.1 Ollama’s Cloud: ollama run cogito-2.1:671b-cloud”” / X https://x.com/ollama/status/1991212450755060020

LlamaExtract now has a PER_TABLE_ROW extraction target. Instead of extracting once per doc or per page, you can now apply your schema to each row in a table or each item in a bulleted list–getting back an array of structured JSON objects. For example, here’s me extracting the https://x.com/tuanacelik/status/1990804124590616582