HuggingFace: AI News Week Ending 09/05/2025

HuggingFace: AI News Week Ending 09/05/2025

September 5, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: HuggingFace, smiling face icon composed from curved small bananas forming eyes and mouth, friendly arrangement, photorealistic, editorial, minimal, high detail, 3:2 landscape

Apple open sourcing artefacts on HF is a special kind of joy! https://x.com/reach_vb/status/1961481909181075961

🚨 Apple just released FastVLM on Hugging Face – 0.5, 1.5 and 7B real-time VLMs with WebGPU support 🤯 > 85x faster and 3.4x smaller than comparable sized VLMs > 7.9x faster TTFT for larger models > designed to output fewer output tokens and reduce encoding time for high https://x.com/reach_vb/status/1961471154197053769

And FastVLM was released by Apple today! 🚀 All about on-device use. Model sizes: 0.5B, 1.5B, 7B. Available in MLX and Core ML. Vision encoder designed to output fewer tokens and reduce encoding time. Which means much faster time-to-first-token.”” / X https://x.com/pcuenq/status/1961464859465269757

Holy crap! That is some fast video captioning — all happening locally in your browser 🤯 This is the aptly named FastVLM by Apple; available on HF: https://x.com/bilawalsidhu/status/1962545148136444380

NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗 The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯 It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility! https://x.com/xenovacom/status/1961454543503344036

Autonomous News Agent A LangGraph-powered AI agent that autonomously curates news briefings, extracts facts, and summarizes content with integrated human feedback and dynamic tool selection. https://x.com/LangChainAI/status/1962213801249710230

Hugging Face team just released an agent dataset. Training on it drastically improves the ability to execute code and analyze data. 📈 They use E2B sandboxes to simulate a real code execution environment. Check it out:”” / X https://x.com/e2b/status/1962945170736849262

Jina Code Embeddings: SOTA Code Retrieval at 0.5B and 1.5B https://x.com/JinaAI_/status/1963637141675843791

vibe coding app: https://x.com/_akhaliq/status/1962920607684730977

🥳Seed-OSS INT4 model: https://x.com/HaihaoShen/status/1962652473862299667

meituan-longcat/LongCat-Flash-Chat · Hugging Face https://huggingface.co/meituan-longcat/LongCat-Flash-Chat

xAI may be one of the single biggest contributors to open-source inference just by serving everything with SGLang https://x.com/casper_hansen_/status/1961752869478031810

You can now use flash-attention 3 through 🤗 `kernels`, skipping its long build times entirely 🔥 https://x.com/RisingSayak/status/1963225732668182856

We just added OpenAI Codex CLI formal support in Hugging Face MCP Server – go play with it now!! 🔥 https://x.com/reach_vb/status/1963599978909008321

ZeroGPU on 🤗 HF Spaces enables anyone to build delightful ML demos, benefitting from powerful compute. But, due to its serverless nature, it is hard to optimize these demos. That CHANGES today 🪖 Use AoT compilation to melt our ZeroGPU servers 🔥 Details ⬇️ https://x.com/RisingSayak/status/1962844485118996545

ZeroGPU on Hugging Face enables anyone to build and deploy AI apps dynamically allocates and releases NVIDIA H200 GPUs as needed But, due to its serverless nature, it is hard to optimize these apps Now use AoT compilation to melt ZeroGPU servers on Hugging Face for vibe https://x.com/_akhaliq/status/1962920105186115621

🚀 LongCat-Flash-Chat Launches! ▫️ 560B Total Params | 18.6B-31.3B Dynamic Activation ▫️ Trained on 20T Tokens | 100+ tokens/sec Inference ▫️ High Performance: TerminalBench 39.5 | τ²-Bench 67.7 🔗 Model: https://x.com/Meituan_LongCat/status/1961827385667690965

Try Hunyuan-MT-7B and Hunyuan-MT-Chimera via @huggingface and @gradio! This model is specialized for translate 🤗”” / X https://x.com/SOSOHAJALAB/status/1962790133054480600

🌐 Our first open model has landed on the Search leaderboard! Diffbot-small-xl by @diffbot debuts at #9 (Apache 2.0) We look forward to more models with search capabilities contributing to ecosystem progress! https://x.com/lmarena_ai/status/1961526740754616545

❤️ Thanks to @mervenoyann & @huggingface , MiniCPM-V 4.5 is officially live on Hugging Face Spaces. Come check it out！ https://x.com/OpenBMB/status/1963623940028563910

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation, presented in the paper Check out the model here: https://x.com/reach_vb/status/1961414145938485477

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation, presented in the paper Play with the demo here: https://x.com/reach_vb/status/1961471503267979699

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation, presented in the paper try it here: https://x.com/_akhaliq/status/1962644559868883310

FineVision is out! A massive open-source dataset by @huggingface for training Vision-Language Models: – 17.3M images – 24.3M samples – 88.9M turns – 9.5B answer tokens This is the inaugural article using our new scientific publishing template! https://x.com/thibaudfrere/status/1963627540544647177

Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs. https://x.com/andimarafioti/status/1963610118165000479

Today, we are releasing FineVision, a huge open-source dataset for training state-of-the-art Vision-Language Models: > 17.3M images > 24.3M samples > 88.9M turns > 9.5B answer tokens Here are my favourite findings: https://x.com/lusxvr/status/1963609337546293448

vLLM now supports Kwai Keye-VL-1.5!
With sharper video 📹 & image 🖼️ comprehension, stronger reasoning, and an extended 128K context length, this model unlocks richer conversations and more complex tasks than ever before. https://x.com/vllm_project/status/1962509793345859666

we present R-4B, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. https://x.com/mervenoyann/status/1962917670786937135

best small vision LM with reasoning has dropped on @huggingface 🔥 Tencent dropped R-4B, small vision LM that claims sota with Apache 2.0 license 💗 the model enables different thinking options and transformers support through custom code! https://x.com/mervenoyann/status/1962917635932229797

All of the details warrant a blog post for the community. So, we authored one 🤗 Check out all the details in this post: https://x.com/RisingSayak/status/1962844506094723429