Locally Run: AI News Week Ending 07/18/2025

MedGemma is a really interesting model – very small, multimodal, open, and does quite well in out-of-distribution medical tasks compared to much larger models. Would love to see more work thinking about how to improve & deploy this sort of LLM to support medical professionals https://x.com/emollick/status/1943142004537393456

We cracked the code to develop apps powered by local LLMs We created LEAP based on our experience making end-to-end apps for customers. It’s model-agnostic, developer-friendly, and available for iOS and Android today. Let’s build the future of edge AI 🚀 https://x.com/maximelabonne/status/1945110321938514335

“compiled a Rust-based ColBERT model into WebAssembly (WASM) using pylate-rs” interaction so late it happens at the client side!”” / X https://x.com/lateinteraction/status/1944941744782512389

Small chat models often skip step by step thinking, so answers feel shallow. Cache steering fixes that by adding a pre‑computed tweak to the model’s key value memory before it speaks. Transformers stash past token keys and values so new queries can look back fast. The tweak https://x.com/rohanpaul_ai/status/1944741419794522115