Ethan B. Holland

Over 55,600 manually organized AI links and counting

Locally Run: AI News Week Ending 09/19/2025

September 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A sunlit small-town courthouse interior with classical wooden benches and tall windows, a ceremonial gavel resting beside a glowing tablet on an oak table in the foreground, warm golden light suggesting both tradition and quiet technological transformation, painted in a dignified realist style reminiscent of Norman Rockwell civic scenes.

🔥 Genspark AI Browser now available on Windows and Mac! 💻 On-Device Free AI – The world’s first browser letting you choose from 169 AI models to run entirely on-device. No internet required, completely private, lightning fast, and totally free! Beyond traditional browsers – https://x.com/genspark_ai/status/1966109976944062893

Happy to land this data-efficient model! Our team is dedicated to building cutting-edge, efficient reasoning models. We are excited to release MobileLLM-R1, a series of sub-billion parameter reasoning models. Collaborating w/ @zechunliu, Changsheng Zhao et al.”” / X https://x.com/erniecyc/status/1966511167053910509

We have released small-scale reasoning models MobileLLM-R1 (0.14B, 0.35B, 0.95B) that are trained from scratch with just 4.2T pre-training tokens (10% of Qwen3), while its reasoning performance is on-par with Qwen3-0.6B. Thanks the three core contributors for their great work!”” / X https://x.com/tydsh/status/1967476530826854674

Meta MobileLLM-R1-140M, which can run 100% locally in your browser, no server inference required vibe coded a chat app powered by transformers.js in anycoder https://x.com/_akhaliq/status/1967460621802438731

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,”” / X https://x.com/zechunliu/status/1966560134739751083

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the https://x.com/_akhaliq/status/1966498058822103330

Finetune DeepSeek 🐳 with two Mac Studios + MLX 🚀 We use pipeline parallelism to split the full 671GB model across two devices connected by a single TB5 cable. LoRA reduces the number of parameters to train from 671 billion down to 37 million, reducing the memory overhead from https://x.com/MattBeton/status/1968739407260742069

One of the quietest but wildest shifts in AI is how small models are becoming absolute beast”” / X https://x.com/Thom_Wolf/status/1966889089162244463

Our lightweight open-source eval library “”lighteval”” now ships with 7,000+ (!!) benchmarks baked in. Running it locally is literally a one-liner: >> lighteval vllm “”model_name=gpt2″” “”leaderboard|truthfulqa:mc|0″” (there is also a Python API for in/post-training evals ofc)”” / X https://x.com/Thom_Wolf/status/1967926861889163304

Beautiful open smol moe for vision language task Weight are here if you want to try: https://x.com/eliebakouch/status/1968809452640825650

First test of MLX batch generation PR on Mac Studio M3 Ultra 512GB with Qwen3-1.7B (4K ctx, 64 tokens) 🔥 Batch generation = WOW bf16 vs 4bit (avg of 3 runs) Batch of 1 → 127 vs 237 t/s 5 → 365 vs 515 t/s 10 → 556 vs 625 t/s 15 → 672 vs 617 t/s MLX vllm not a dream anymore! https://x.com/ivanfioravanti/status/1966903782400545196

LM Studio now supports Qwen3-Next with MLX on Mac! 🧵 https://x.com/lmstudio/status/1967985102845366280

Woah, 66 tok/s on a Macbook M4 Max 64GB with qwen3-next-80b-a3b-instruct-mlx@4bit, which uses about 41GB. Amazing job to the folks working on MLX, aware of at least these guys: @ivanfioravanti @ActuallyIsaak @awnihannun https://x.com/rwojo/status/1967767157250592899

Check out the actual speed (not yet the final version) of Qwen3-Next-80B-A3B-Instruct on Apple MLX! 🔥 4-bit: 67 TPS 8-bit: 58 TPS bf16: 48 TPS Movie normal speed, only waiting times removed. @awnihannun and @ActuallyIsaak did it and I bet there is still room for improvement 💪 https://x.com/ivanfioravanti/status/1966866942461177925