Image created with Flux Pro v1.1 Ultra. Image prompt: Apple, apple-shaped mosaic assembled entirely from small bananas, subtle leaf accent, photorealistic, editorial, minimal, high detail, 3:2 landscape
Apple released FastVLM
so I tried vibe coding a video captioning AI app with it
took 5 prompts to get a working app in anycoder and deployed it on Hugging Face
85x faster and 3.4x smaller than comparable sized VLMs
the deployed app works 100% locally in your browser powered by transformers.js and WebGPU https://x.com/_akhaliq/status/1962018549674684890
Apple open sourcing artefacts on HF is a special kind of joy! https://x.com/reach_vb/status/1961481909181075961
🚨 Apple just released FastVLM on Hugging Face – 0.5, 1.5 and 7B real-time VLMs with WebGPU support 🤯 > 85x faster and 3.4x smaller than comparable sized VLMs > 7.9x faster TTFT for larger models > designed to output fewer output tokens and reduce encoding time for high https://x.com/reach_vb/status/1961471154197053769
And FastVLM was released by Apple today! 🚀 All about on-device use. Model sizes: 0.5B, 1.5B, 7B. Available in MLX and Core ML. Vision encoder designed to output fewer tokens and reduce encoding time. Which means much faster time-to-first-token.”” / X https://x.com/pcuenq/status/1961464859465269757
Holy crap! That is some fast video captioning — all happening locally in your browser 🤯 This is the aptly named FastVLM by Apple; available on HF: https://x.com/bilawalsidhu/status/1962545148136444380
NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗 The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯 It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility! https://x.com/xenovacom/status/1961454543503344036
If you think Apple is not doing much in AI, you’re getting blindsided by the chatbot hype and not paying enough attention! They just released FastVLM and MobileCLIP2 on Huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision language model (VLM) applications! It can even do live video captioning 100% locally in your browser 🤯🤯🤯 https://x.com/ClementDelangue/status/1962526559115358645
pip install -U mlx https://x.com/awnihannun/status/1961484829037330612
Apple’s rumored AI search tool for Siri could rely on Google | The Verge https://www.theverge.com/news/770712/apple-ai-search-tool-siri-google-gemini
ollama-style CLI for running MLX models on Apple Silicon https://x.com/tom_doerr/status/1961309536406392877
The most powerful thing about using the Vision Pro for social VR is that you know the person on the other end is *exactly* who they say they are and look the way they do because Apple uses a retina scan to authenticate and ensure only you can create & drive your 3D avatar. https://x.com/bilawalsidhu/status/1962198920085594568
Introducing ChromaSwift – in beta! Build search and retrieval into your iOS apps – Includes on-device persistence – Packaged with on-device MLX embedding inference https://x.com/trychroma/status/1962917927382122857
Apple loses 4 top AI researchers as key robotics lead heads to Meta, others join OpenAI and Anthropic – India Today https://www.indiatoday.in/technology/news/story/apple-loses-4-top-ai-researchers-as-key-robotics-lead-heads-to-meta-others-join-openai-and-anthropic-2781133-2025-09-03
GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. – 30+ language support – Runs smoothly on iPhone/iPad 100% open-source! https://x.com/akshay_pachaar/status/1962132670126981459




