Open Source: AI News Week Ending 05/09/2025

Open Source: AI News Week Ending 05/09/2025

May 9, 2025

Image created with GPT Image 1. Image prompt: Draped in an open patchwork cape of translucent fabrics bearing shared code, protest slogans, and silk-screened Git commits, a barefoot Black hacker-artist walks a runway of glass circuit boards and candlelight — photographed with a vintage Leica in slow exposure to honor the collaborative spirit of freedom and flair.

DeepSeek released Prover-V2, an open-source AI combining informal math reasoning with theorem proving With 671B params, the model solves 88.9% of problems on MiniF2F It does a ‘cold-start’ to break down proofs into subgoals before formal verification https://x.com/adcock_brett/status/1919060364655800684

We just released DeepSeek-Prover V2. – Solves nearly 90% of miniF2F problems – Significantly improves the SoTA performance on the PutnamBench – Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: https://x.com/zhs05232838/status/1917600755936018715

Hugging Face releases a free Operator-like agentic AI tool | TechCrunch https://techcrunch.com/2025/05/06/hugging-face-releases-a-free-operator-like-agentic-ai-tool/

OpenAI CPO, Kevin Weil: We’re preparing to release an open-weight model soon, built on democratic values. It won’t be the frontier model and will be one generation behind on purpose to avoid accelerating rivals like China. https://x.com/slow_developer/status/1919393597385810272

The @huggingface LLM course has new videos! 📽️ We’ve added videos on the latest topics. Join the course and check them out! https://x.com/ben_burtenshaw/status/1919761119322804723

The Meta Llama org just crossed 40,000 followers on Hugging Face. Grateful for all their impact on the field sharing the Llama weights openly and much more! We need more of this from all other big tech to make the AI more open, collaborative and beneficial to all! https://x.com/ClementDelangue/status/1918038543772897739

For the first time in March, ChatGPT gets into top 10 sources of traffic to Hugging Face. Might get to top 5 in a few months if growth continues like this. https://x.com/ClementDelangue/status/1918070591300776222

very grateful to all the developers who spent time with us telling us what they wanted from an open-weights model. the feedback was useful and unexpected, but all doable. i think we will ship something extraordinary!” / X https://x.com/sama/status/1918737082895446465

The Phi-4-reasoning tech report is a real tour de force in both rigour and pragmatism. The main lessons for me are: > Most gains come from careful SFT, with RL the 🍒 on top > Filter the data for the most “teachable” prompts, ie not too easy for the model you want to tune.” / X https://x.com/_lewtun/status/1917947747195298086

Github 👨‍🔧: Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI → Offers an open-source AI search engine using LLMs for refined, sourced answers. → Supports local LLMs (like Llama3 via Ollama) and hosted models (OpenAI, Groq, https://x.com/rohanpaul_ai/status/1919340042906210486

Qwen3 benchmark results 235B is a BEAST placing 3rd in the overall and with the best generalization among all tested models all of the Qwen3 models have very low or perfect percentages of invalid moves which means good instruction following 235B MoE > 32B > 14B > 30B MoE > 8B https://x.com/scaling01/status/1918031153312731536

2/ China’s Alibaba just released Qwen 3 with support for MCP and 119 languages. It matches the performance of DeepSeek-R1, OpenAI o1, o3-mini, and Grok-3. @Saboo_Shubham_ Plus, AI Agents with Qwen3 can now think deeper with hybrid reasoning modes. https://x.com/AtomSilverman/status/1918424770749874668

China’s Alibaba just released Qwen 3 with support for MCP and 119 languages. It matches the performance of DeepSeek-R1, OpenAI o1, o3-mini, and Grok-3. Plus, AI Agents with Qwen3 can now think deeper with hybrid reasoning modes. https://x.com/Saboo_Shubham_/status/1916972515077066922

Episode 167: Overnight Agent We share the results of our first overnight agent run. We fed DeepSeek R1 a summary of the new @Cloudflare agents SDK and asked it to think every 15 minutes about the entire conversation history and reflect on new ideas that extend the ideas https://x.com/OpenAgentsInc/status/1901964880594313542

2/ Cloudflare Agent SDK Summary @OpenAgentsInc fed DeepSeek R1 a summary of the new Cloudflare agents SDK and asked it to think every 15 minutes about the entire conversation history and reflect on new ideas that extend the ideas further. https://x.com/AtomSilverman/status/1918047663800631794

ADK FTW! Learn how to go from >prompt_to agent in this Agent Development Kit (ADK) demo. Our new #OpenSource framework simplifies the process of building agents and sophisticated multi-agent systems while maintaining precise control over agent behavior ↓ https://x.com/GoogleCloudTech/status/1912583522696520083

PyTorch: The Open Language of AI – PyTorch https://pytorch.org/blog/pytorch-the-open-language-of-ai/

JetBrains just open-sourced Mellum, it’s 4B param coding AI First announced last year, the model is being touted as a ‘focal’ product designed specifically for code completion rather than chasing multiple capabilities Available under Apache 2.0 https://x.com/rowancheung/status/1917844373032599668

We’re launching Computer Use in smolagents! 🥳 -> As vision models become more capable, they become able to power complex agentic workflows. Especially Qwen-VL models, that support built-in grounding, i.e. ability to locate any element in an image by its coordinates, thus to https://x.com/AymericRoucher/status/1919783847597670780

Is Qwen3-235B the new budget-friendly coding champ in Cline? Early user feedback is rolling in — it’s promising, but not perfect. Here’s what we’re hearing from the Cline community: 🧵” / X https://x.com/cline/status/1917708041857949983

Alibaba’s Qwen team released Qwen3 family with 2 MoE models and 6 dense models —Models range from 600M to 235B params —Flagship version rivals OpenAI o1 & DeepSeek-R1 —Hybrid “thinking” in all —Boosted coding + agent performance —119 languages supported https://x.com/adcock_brett/status/1919060402417119375

Thousands of MCP servers are now available in LlamaIndex.TS in just a single line of code! Just call mcp() to connect to any MCP server. See an example: https://x.com/llama_index/status/1915162222831231223

Just launched on Hugging Face: ACE-Step v1-3.5B — ultra-fast, open-source music generation model! 🎵 Key features: > 4 mins of music in 20s (15× faster than LLMs) > Diffusion + compressed audio + linear transformer > Wide style/genre support with structure control > Tasks: https://x.com/Tu7uruu/status/1919748788903621048

Llama-Nemotron: Efficient Reasoning Models NVIDIA introduces the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. “As of April 2025, https://x.com/iScienceLuvr/status/1919234521171693844

Introducing Mistral Medium 3: our new multimodal model offering SOTA performance at 8X lower cost. – A new class of models that balances performance, cost, and deployability. – High performance in coding and function-calling. – Full enterprise capabilities, including hybrid or https://x.com/MistralAI/status/1920119463430500541

Modular’s 25.3 release is a big step: beyond adding a ton of features, it is now also open and free to use both on CPUs and on NVIDIA. We aim to be the most open GenAI platform out there – surpassing existing tech with Apache2 GPU kernels, not just serving infra. Check it out!🚀” / X https://x.com/clattner_llvm/status/1919808113773027621

Cisco’s Foundation AI just dropped Foundation-Sec-8B on Hugging Face A cybersecurity-focused LLM built on Llama 3.1 that matches Llama 3.1-70B & GPT-4o-mini on certain security tasks! https://x.com/HuggingPapers/status/1919127655531557374

People expect the economic impact of a high-tariff regime to look like a downward step function, like in the image on the left (not expecting 145% tariffs on China — but let’s say 30%). In reality, it will be more like the image on the right. 1. The early response will be https://x.com/fchollet/status/1918258519624790273

Sharing new open source protection tools and advancements in AI privacy and security https://ai.meta.com/blog/ai-defenders-program-llama-protection-tools/

I’m reviewing the wild card applications today if you’re excited to join the @huggingface team! https://x.com/ClementDelangue/status/1919844548722794681

🤗 Mellum is now open source on @huggingface! It’s a focal model that is small, efficient, and made for one thing: code completion. ⚙️ Trained from scratch by JetBrains. 🌱 First in a growing family of dev-focused LLMs. 🔗 https://x.com/jetbrains/status/1917559863854457175

Nvidia dropped Llama-Nemotron on Hugging Face Efficient Reasoning Models https://x.com/_akhaliq/status/1919324939934453928

Meet Solo Tech, one of the 10 international recipients of the second Llama Impact Grants. Solo Tech uses Llama to offer offline, multilingual AI support for underserved rural communities with limited internet access. This grant will help them to equip 50 rural centers with AI https://x.com/AIatMeta/status/1917727629601616030

PyTorch Day France Featured Sessions: A Defining Moment for Open Source AI – PyTorch https://pytorch.org/blog/pt-day-france-featured-sessions/

Meta hosted its first LlamaCon developers conference and made a ton of announcements, including: —Llama API free preview —ChatGPT-like Meta AI app with “Discover” feed —Lama Guard 4 (12B), LlamaFirewall, and Prompt Guard —Colab with Groq and Cerebras https://x.com/adcock_brett/status/1919060231771877793

👏🏻 Excited to see Qwen3-235B-A22B’s impressive performance on LiveCodeBench! This positions Qwen3 as the top open model for competitive-level code generation, matching the performance of o4-mini (low). https://x.com/huybery/status/1919418019517776024

Pretty fucking incredible week so far: > Qwen3 – MoE (235B, 30B) + Dense (32, 14, 8, 4, 0.6B) > Xiaomi – MiMo 7B dense > Kyutai – Helium 2B dense > DeepSeek – Prover V2 671B MoE > Qwen2.5 Omni 3B > Microsoft – Phi4 14B Reasoning, Mini (3.8B) & Plus > JetBrains- Mellum 4B Dense” / X https://x.com/reach_vb/status/1917938596465750476

So, Microsoft did what OpenAI was afraid of others to do and they released it under MIT for everyone? 😅 (this is in reference to Phi, Microsoft’s small model) https://x.com/_philschmid/status/1918217295928664474

Ming-Lite-Uni just dropped on Hugging Face Advancements in Unified Architecture for Natural Multimodal Interaction https://x.com/_akhaliq/status/1919677117337395359

you can easily fine-tune, quantize, play with sota vision LM InternVL3 now 🔥 we have recently merged InternVL3 to @huggingface transformers and released converted checkpoints 🤗 find the model collection and a notebook to get started on the next one ⤵️ https://x.com/mervenoyann/status/1918340027219603683

Introducing ERNIE X1 Turbo & ERNIE 4.5 Turbo! Building on the success of ERNIE X1 and 4.5, the upgraded ERNIE X1 Turbo and 4.5 Turbo deliver results faster and cheaper. Both models stand out for their multimodal capabilities, strong reasoning and low costs. For X1 Turbo, input https://x.com/Baidu_Inc/status/1915603080336597310

RADIO – a nvidia Collection https://huggingface.co/collections/nvidia/radio-669f77f1dd6b153f007dd1c6

NVIDIA just open sourced Open Code Reasoning models – 32B, 14B AND 7B – APACHE 2.0 licensed 🔥 > Beats O3 mini & O1 (low) on LiveCodeBench 😍 Backed by OCR dataset the models are 30% token efficient than other equivalent Reasoning models Works with llama.cpp, vLLM, https://x.com/reach_vb/status/1920223688919486496

The community votes are in for Qwen3-235B-A22B 🥁 The latest open-source Qwen3 is now on the Arena Top 10 🏆 Congrats to @alibaba_qwen on this achievement! 👏 Highlights: 💠 For Chat: Qwen3-235B-A22B ranks #10, tied with o1 💠 Strong in Coding at #4 and Math #1 💠 For WebDev: https://x.com/lmarena_ai/status/1919448953042706759

Supabase created the ChatGPT of databases. You can build and launch databases, create charts, see visuals of your DB, generate sample data, and more. 100% open source. https://x.com/LiorOnAI/status/1919830786653741366

DeepSeek quietly released Prover-V2, an open-source AI combining informal math reasoning with theorem proving —671B params —Solves 88.9% of problems on MiniF2F —Does ‘cold-start’ to break down complex proofs into subgoals before formal verification https://x.com/rowancheung/status/1917844254648324388

Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models | TechCrunch https://techcrunch.com/2025/04/28/alibaba-unveils-qwen-3-a-family-of-hybrid-ai-reasoning-models/

@Alibaba_Qwen Do you plan to make a Qwen 3 Coder in the future with FIM capabilities similar to Qwen 2.5 Coder?” / X https://x.com/ggerganov/status/1918373399891513571

Ace Studio dropped ACE-Step v1-3.5B, an ultra-fast, open-source music generation model It can generate 4 minutes of music in 20s (15× faster than LLMs) with support for several genres and structure control https://x.com/rowancheung/status/1920018927670685914

30% cost savings and improved operational efficiency are on the menu @HelloFresh.🍴 Leveraging @Snowflake to get a unified view of their data and real-time analytics has enabled HelloFresh to get better insight into their customer journey, and optimize supply chain operations to https://x.com/RamaswmySridhar/status/1917982790282559946

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL! It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box, https://x.com/cohere/status/1918386633772286278

🚨This week’s top AI/ML research papers: – DeepSeek-Prover-V2 – The Leaderboard Illusion – Phi-4-reasoning Technical Report – Mem0 – X-Fusion – Softpick – RL for Reasoning in LLMs with One Training Example – ReasonIR – RL for LLM Reasoning Under Memory Constraints – https://x.com/TheAITimeline/status/1919155696655843474

“How can we make sure this open-source revolution is sustainable? One way to achieve this is to go back to the 3Rs (Reduce, Reuse, Recycle), fundamental principles in environmental conservation that can also be applied to AI.” @sashamtl and Régis https://x.com/fdaudens/status/1920196231579459809

I noticed an alarming change within myself because of LLM usage: I have become lazy of reading. These days, if I ask @grok a question and its answer is long, I would get annoyed and tell it to give me a TDLR. Sometimes, I even become impatient enough to tell it to answer” / X https://x.com/hyhieu226/status/1919068971845976113

YAYYY! MSFT released Phi 4 Reasoning & Reasoning plus on Hugging Face🔥 Architecture: > Dense decoder-only Transformer > 14B params > 32k context (extendable to 64k) Training: > SFT + RL on 16B tokens (8.3B unique) > 32 H100-80G GPUs for 2.5 days Benchmarks: > AIME 2025:” / X https://x.com/reach_vb/status/1917852036369916081

Phi models are frustrating. I guess MSFT internal tests are also very impressive, but they lack some way to make sure it’s generally robust.” / X https://x.com/teortaxesTex/status/1918389360439013535

Is this the Smol Models Festival? – Phi-4 just dropped a new reasoning model – Qwen2.5 Omni now has a 3B version – And OLMo-2 1B version https://x.com/fdaudens/status/1917961029675347973

Medium is the new large. | Mistral AI https://mistral.ai/news/mistral-medium-3

Qwen 3 235B now on @togethercompute API! Qwen 3 is a reasoning model that has a non-reasoning instruct mode with allowance for setting a thinking budget. It’s efficient ($0.20/M input & $0.60/M output on our throughput optimized endpoint) and fantastic on a variety of” / X https://x.com/vipulved/status/1917777842466889873

So now that, apparently, “Grok” controls the X algorithm in ways that haven’t been explained, are external links still being penalized? Does anyone actually know?” / X https://x.com/emollick/status/1920216850006241501

Does Grok actually have access to a sentiment tool or is this hallucinated? https://x.com/emollick/status/1918819008616218639

Today we’re introducing our latest open source video model—and it’s a big one. This release sets a new bar for speed, quality, and control. It’s faster than anything in its class, packed with new features, and ready to run on your own hardware. Let’s break it down 👇 1/7 https://x.com/LTXStudio/status/1919751150888239374

Lighttricks released LTXV-13B, an open-source AI for video generation —Multiscale rendering for creating smoother content, 30x faster —Can run on consumer GPUs —Includes camera motion control, keyframe editing, and multi-shot sequencing https://x.com/rowancheung/status/1920018860733775918

A ton of impactful models and datasets in open AI past week, let’s summarize the best 🤩 link to all are on the next one ⤵️ 💬 @Alibaba_Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B https://x.com/mervenoyann/status/1919784802099540446

We will release the quantized models of Qwen3 to you in the following days. Today we release the AWQ and GGUFs of Qwen3-14B and Qwen3-32B, which enables using the models with limited GPU memory. Qwen3-32B-AWQ: https://x.com/Alibaba_Qwen/status/1918353505074725363

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general https://x.com/Alibaba_Qwen/status/1916962087676612998

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to @huggingface transformers 🔥 D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩 Keep reading for the paper explainer, notebooks & demo 👀 https://x.com/mervenoyann/status/1919431751689998348