Locally Run: AI News Week Ending 11/07/2025

Locally Run: AI News Week Ending 11/07/2025

November 7, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic architectural photograph of the six iconic Ionic limestone columns on the University of Missouri quad, completed with a classical stone entablature across the top with the word LOCAL carved in centered Roman serif letters, late afternoon golden hour light, warm beige limestone texture, red brick buildings and green lawn in background, wide landscape composition, crisp realism with long soft shadows, clear blue sky.

EdgeTAM, real-time segment tracker by Meta is now in @huggingface transformers with Apache-2.0 license 🔥 > 22x faster than SAM2, processes 16 FPS on iPhone 15 Pro Max with no quantization > supports single/multiple/refined point prompting, bounding box prompts https://x.com/mervenoyann/status/1986785795424788812

We wanted to share more information about Gemma in AI Studio: First, to clarify the distinction between our AI products. Our Gemma models are a family of open models built specifically for the developer and research community. They are not meant for factual assistance or for”” / X https://x.com/NewsFromGoogle/status/1984412221531885853

For an AI assistant to be truly personal, it’s important to keep as much data as possible locally on the users device, and not on Perplexity servers. Account credentials, such as passwords and credit card information, are also stored locally on the user’s device.”” / X https://x.com/perplexity_ai/status/1985376891763925064

Big update for Claude Desktop and Cursor users! Now you can connect all AI apps via a common memory layer in a minute. I used the Graphiti MCP server that runs 100% locally to cross-operate across AI apps like Claude Desktop and Cursor without losing context. (setup below) https://x.com/_avichawla/status/1985958015452020788

Your Mac is about to run inference like a datacenter. Coming soon to MLX-Swift: Continuous batching: the fastest way to handle multiple inference streams locally. It starts with regular inference and seamlessly upgrades to batched mode when new requests arrive. The best of both https://x.com/ronaldmannak/status/1985693207003275729

When you run AI on your device, it is more efficient and less big brother and free! So it’s very cool to see the new llama.cpp UI, a chatgpt-like app that fully runs on your laptop without needing wifi or sending any data external to any API. It supports: – 150,000+ GGUF models https://x.com/ClementDelangue/status/1985748187634717026

Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰”” / X https://x.com/vllm_project/status/1985241134663405956

How much RAM do you need to run tiny models? Jamba Reasoning 3B runs on just 2.25 GiB, the lightest among small models like Qwen (@Alibaba_Cloud), Llama (@Meta), Granite (@IBM), and Gemma (@GoogleDeepMind). 👉 Try Jamba Reasoning 3B yourself: https://x.com/AI21Labs/status/1986439953539076169

Visualizations from the Smol Playbook pretraining section, explaining MLA, RoPE, chunked attention, multi-step schedulers and more. A thread: https://x.com/LoubnaBenAllal1/status/1986110843600117760

This is the letter that caused Gemma to be pulled from AI Studio.
Thread by @AndrewCurran_ on Thread Reader App – Thread Reader App https://threadreaderapp.com/thread/1984995011482755085.html

I’m joining some kinda fruit company to work on MLX full-time! It is open source + local AI + full stack ML, and most importantly has a great team that I learn a lot from.”” / X https://x.com/zcbenz/status/1985560798543167739

Most industrial robots can move with precision. Very few can adapt when the job… changes. @mbodiai, a New York startup, is building an embodied-AI layer that lets anyone teach robots new tasks. Just by talking to them. Instead of long reprogramming cycles, operators use https://x.com/IlirAliu_/status/1986143123333140488

The robot’s walking controller is powered entirely by a neural network running on-device via embedded GPU”” / X https://x.com/adcock_brett/status/1984623436120178709

MotionStream Real-Time Video Generation with Interactive Motion Controls model runs in real time on a single NVIDIA H100 GPU (29 FPS, 0.4s Latency) https://x.com/_akhaliq/status/1986054085766750630