Locally Run: AI News Week Ending 03/06/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: A desktop computer tower falling joyfully through bright blue sky, happy face on front display, colorful cables streaming behind like ribbons, word LOCAL in bold clean typography in clouds, aerial photography style, high altitude, vibrant daylight, simple clean composition with vast sky and distant ground far below.

Alibaba’s small, open source Qwen3.5-9B beats OpenAI’s gpt-oss-120B and can run on standard laptops | VentureBeat https://venturebeat.com/technology/alibabas-small-open-source-qwen3-5-9b-beats-openais-gpt-oss-120b-and-can-run

A 24-billion-parameter model just ran on a laptop and picked the right tool in under half a second. The real story is that tool-calling agents finally became fast enough to feel like software. Liquid built LFM2-24B-A2B using a hybrid architecture that mixes convolution blocks”” https://x.com/LiorOnAI/status/2029623603294310819

> 385ms average tool selection. > 67 tools across 13 MCP servers. > 14.5GB memory footprint. > Zero network calls. LocalCowork is an AI agent that runs on a MacBook. Open source. 🧵”” https://x.com/liquidai/status/2029586519389086198

Apple Silicon just leveled up for local LLM dev. @vllm_project is now supported in Docker Model Runner on macOS, so you can run MLX models on an M-series Mac with your existing OpenAI-compatible API and Docker workflow. Update to Docker Desktop 4.62+ and get started. Read more:”” https://x.com/Docker/status/2028470592899354929

Qwen3.5 2b Running locally on an iPhone 17pro is the breakthrough that was needed for local models running on the edge.”” https://x.com/kimmonismus/status/2028602520302399701

The new Qwen 3.5 by @Alibaba_Qwen running on-device on iPhone 17 Pro. Qwen 3.5 beats models 4 times its size, has strong visual understanding, and can toggle reasoning on or off. The 2B 6-bit model here is running with MLX optimized for Apple Silicon.”” https://x.com/adrgrondin/status/2028568689709084919

Here’s some code for an experiment that doesn’t work so well: https://t.co/GcQ3gQ6uML – Basically you chat with a model running locally – Every now and then you /sleep the model to transition short-term memory to long-term memory – The /sleep command runs the same model to”” https://x.com/awnihannun/status/2029693579006988531