Locally Run: AI News Week Ending 05/09/2025

Locally Run: AI News Week Ending 05/09/2025

May 9, 2025

Image created with GPT Image 1. Image prompt: Wearing a bespoke trench lined with quilted newspaper clippings from the Black press, a hometown hero strides confidently through a brick alley painted with murals of Harriet Tubman and digital graffiti; a hat tipped low, shot with documentary intimacy on 35mm — celebrating the dignity of place and the power of community-sourced data.

Microsoft dropped three reasoning-focused Phi-4 models —Flagship Phi-4-reasoning (14B) beats OpenAI’s o1-mini —Smaller 3.8B param Phi-4-mini-reasoning matches 7B models on math —Suited for small devices like phones —Open-source with permissive licenses https://x.com/adcock_brett/status/1919060284078997565

How can smaller LLMs achieve strong reasoning? By combining data curation with supervised fine-tuning (SFT) and targeted reinforcement learning (RL). Microsoft released their first open reasoning/thinking models with Phi-4-reasoning distilled from OpenAI o3-mini. Implementation https://x.com/_philschmid/status/1918216082231320632

New release from Apple ML research: – Code and models for FastVLM – MLX implementation + on-device (iPhone) demo app” / X https://x.com/awnihannun/status/1919986192449200511

The Phi-4-reasoning tech report is a real tour de force in both rigour and pragmatism. The main lessons for me are: > Most gains come from careful SFT, with RL the 🍒 on top > Filter the data for the most “teachable” prompts, ie not too easy for the model you want to tune.” / X https://x.com/_lewtun/status/1917947747195298086

Microsoft releases Phi-4-reasoning – 14B param SFT of Phi-4 on demonstrations from o3-mini – Phi-4-reasoning-plus is RL-trained – outperforms DeepSeek-R1-Distill-Llama-70B model, approaches the performance levels of full DeepSeek-R1 model https://x.com/iScienceLuvr/status/1917742817914544355

We’ve been cooking… a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast. https://x.com/DimitrisPapail/status/1917731614899028190

🤗 Mellum is now open source on @huggingface! It’s a focal model that is small, efficient, and made for one thing: code completion. ⚙️ Trained from scratch by JetBrains. 🌱 First in a growing family of dev-focused LLMs. 🔗 https://x.com/jetbrains/status/1917559863854457175

NEW: Xiaomi releases MiMo-7B, a new language model for reasoning tasks. MiMo-7B is explicitly designed for advanced reasoning across math and code. Don’t think I’ve seen too many good, SMALL reasoning models. Here is my quick recap: https://x.com/omarsar0/status/1917582720341008814

Xiaomi released MiMo-7B, a new language model optimized for reasoning tasks —Beats OpenAI’s o1-mini and Alibaba’s QwQ-32B-preview on multiple benchmarks —Trained on 25T tokens using multi-token prediction —Fully open-source under MIT License https://x.com/rowancheung/status/1917844350463098893

👏🏻 Excited to see Qwen3-235B-A22B’s impressive performance on LiveCodeBench! This positions Qwen3 as the top open model for competitive-level code generation, matching the performance of o4-mini (low). https://x.com/huybery/status/1919418019517776024

Pretty fucking incredible week so far: > Qwen3 – MoE (235B, 30B) + Dense (32, 14, 8, 4, 0.6B) > Xiaomi – MiMo 7B dense > Kyutai – Helium 2B dense > DeepSeek – Prover V2 671B MoE > Qwen2.5 Omni 3B > Microsoft – Phi4 14B Reasoning, Mini (3.8B) & Plus > JetBrains- Mellum 4B Dense” / X https://x.com/reach_vb/status/1917938596465750476

So, Microsoft did what OpenAI was afraid of others to do and they released it under MIT for everyone? 😅 (this is in reference to Phi, Microsoft’s small model) https://x.com/_philschmid/status/1918217295928664474

YAYYY! MSFT released Phi 4 Reasoning & Reasoning plus on Hugging Face🔥 Architecture: > Dense decoder-only Transformer > 14B params > 32k context (extendable to 64k) Training: > SFT + RL on 16B tokens (8.3B unique) > 32 H100-80G GPUs for 2.5 days Benchmarks: > AIME 2025:” / X https://x.com/reach_vb/status/1917852036369916081

Phi models are frustrating. I guess MSFT internal tests are also very impressive, but they lack some way to make sure it’s generally robust.” / X https://x.com/teortaxesTex/status/1918389360439013535

Is this the Smol Models Festival? – Phi-4 just dropped a new reasoning model – Qwen2.5 Omni now has a 3B version – And OLMo-2 1B version https://x.com/fdaudens/status/1917961029675347973

Lighttricks released LTXV-13B, an open-source AI for video generation —Multiscale rendering for creating smoother content, 30x faster —Can run on consumer GPUs —Includes camera motion control, keyframe editing, and multi-shot sequencing https://x.com/rowancheung/status/1920018860733775918