Ethan B. Holland

Over 54,400 manually organized AI links and counting

Images: AI News Week Ending 10/10/2025

October 10, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Create a 16:9 cinematic split-screen poster. LEFT SIDE (40% width): – A desk covered with printed contact sheets and individual images in different styles—portraits, landscapes, abstract art—arranged neatly with a pair of scissors and tape. – The background is a turquoise / teal abstract field made of stylized blue rods or data fibers, evoking a generative image engine behind the scenes. – Use soft, even studio lighting. No glowing art, no neon. RIGHT SIDE (60% width): – A green-toned abstract aerial forest canopy texture, suggesting creativity rooted in the natural world. – Two clean rounded rectangles stacked vertically near the center-right. – The TOP rectangle contains the text: “Images”. – The BOTTOM rectangle contains the text: “2025/10/10”. – Clean sans-serif font, dark green or charcoal. OVERALL STYLE: – Curated, tactile, and human. – No additional text or branding. – Maintain the turquoise/forest split-screen.

AI Mode in Google Search updates: Visual exploration and discovery https://blog.google/products/search/search-ai-updates-september-2025/

I think people are still unprepared for a world where you cannot trust any video content, despite years of warning. Even when Google & OpenAI include watermarks, those can be easily removed, and open weights AI video models without guardrails are coming.. https://x.com/emollick/status/1976004133296685165

This seems like a pretty big finding: If you train an AI model on enough video, it seems to gain the ability to reason about images in ways it was never trained to do, including solving mazes & puzzles. The bigger the model, the better it does at these out-of-distribution tasks. https://x.com/emollick/status/1974096724445503827

🚨 Passive video is dead. Welcome to real-time AI video. With Synthesia 3.0, your videos don’t just play — they engage, respond, and act. Add your content → and create fully interactive AI-powered experiences in minutes. ✅ Video Agents ✅ Realistic Avatars ✅ Express Voice https://x.com/lax97981/status/1974742019420696588

[2509.17803] Effect of Appearance and Animation Realism on the Perception of Emotionally Expressive Virtual Humans https://arxiv.org/abs/2509.17803

WAN 2.2 Animate does some cool stuff with lighting and flame behavior🔥 Check out the workflow and tutorial below 👇 https://x.com/heyglif/status/1976259706214592747

felixtaubner/cap4d: Official repository for the paper “”CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models”” https://github.com/felixtaubner/cap4d

[2509.17748] “”I don’t like my avatar””: Investigating Human Digital Doubles https://arxiv.org/abs/2509.17748

HuMo https://phantom-video.github.io/HuMo/

A collaborative approach to image generation https://research.google/blog/a-collaborative-approach-to-image-generation/

Good news for developers: Gemini 2.5 Flash Image 🍌 is now stable and ready for scaled production, in addition to coming with a new aspect ratio setting + the ability to specify image only output! The reception to this model has been truly wild to see so far, much more to come!”” / X https://x.com/OfficialLoganK/status/1973790388722061394

Nano Banana”” is now a marketing case study 🍌 It was named at 2am by a tired PM who just needed something for the model registry. Here’s my chat w/ DeepMind researchers Benigno Uria & Yuqing Du where they reveal the full story. https://x.com/bilawalsidhu/status/1974894680656822350

🎆 Can we achieve high compression rate for images in autoencoders without compromising quality and decoding speed? ⚡️ We introduce SSDD (Single-Step Diffusion Decoder), achieving improvements on both fonts, setting new state-of-the-art on image reconstruction. 👇 1/N https://x.com/webalorn/status/1975555815294791719

Qwen Image Edit 2509 is the new leading open weights image editing model, ranking #3 overall in the Artificial Analysis Image Editing Arena and introducing multi-image editing capabilities! The latest release from Alibaba Qwen trails only Gemini 2.5 Flash (Nano-Banana) and https://x.com/ArtificialAnlys/status/1975993986314813889

Introducing Qwen3-VL Cookbooks! 🧑‍🍳 A curated collection of notebooks showcasing the power of Qwen3-VL—via both local deployment and API—across diverse multimodal use cases: ✅ Thinking with Images ✅ Computer-Use Agent ✅ Multimodal Coding ✅ Omni Recognition ✅ Advanced https://x.com/Alibaba_Qwen/status/1976479304814145877

🚀 Day 0 Support — Qwen3-VL-30B-A3B-Instruct on NexaSDK We’re excited to announce Day 0 support for Qwen3-VL-30B-A3B-Instruct, a breakthrough in multimodal intelligence, now running natively on NexaSDK. We’ve added full support for the MLX Engine on @Apple Silicon GPUs, https://x.com/nexa_ai/status/1974562612164886659

4/5 The same efficiency gains apply on mobile. Running at 16K context lengths on an iPhone 16 Pro, Jamba outputs nearly 16 tokens/second, outpacing token outputs from Llama 3.2 3B, Qwen 3 1.7B, and Phi-4 Mini. Jamba is the only one that can handle up to 64K.”” / X https://x.com/AI21Labs/status/1975917063278567919

Alibaba has released Qwen3 Omni and Qwen3 Omni Realtime – two natively end-to-end “”omni””-modal models that process text, images, audio, and video in a single unified architecture. Artificial Analysis benchmarking shows competitive Speech to Speech performance, as well as https://x.com/ArtificialAnlys/status/1975904190061834602

Most popular local models in Cline are qwen3-coder & GLM-4.5-Air (guide on how to use them is linked below)”” / X https://x.com/cline/status/1976101061753700400

Qwen3-VL secured 2nd place in the vision leaderboard and became the first open-source model to rank first in both the pure text and visual leaderboards.”” / X https://x.com/Alibaba_Qwen/status/1975360868092420345

More generally: if all of your experiments are “”RL on math with Qwen””, I’m not interested in any outlandish claims you want to make. Qwen’s base models have been (appropriately) aggressively mid-trained for math for a long time. Stop drawing conclusions purely from this.”” / X https://x.com/lateinteraction/status/1976761442842849598

Qwen3-30B-A3B-Instruct-2507-4bit generation on MLX: 473 tokens per sec on M3 Ultra! 🚀 https://x.com/ivanfioravanti/status/1976153645658898453

Thank you @ArtificialAnlys ! 🙏 Qwen Image Edit 2509 ranks #3 overall and leads all open-weight models — enabling multi-image editing with precise control. Try it now: https://x.com/Alibaba_Qwen/status/1976119224339955803

Intelligence performance: The Qwen3 Omni 30B reasoning variant achieves an Artificial Analysis Intelligence Index score of 40, surpassing similarly-sized models like Qwen3 30B, but still trailing Alibaba’s flagship LLM, Qwen3 235B 2507, which scored 57. The Qwen3 Omni 30B https://x.com/ArtificialAnlys/status/1975904195426537596

Z ai’s updated GLM 4.6 (Reasoning) is one of the most intelligent open weights models, with near DeepSeek V3.1 (Reasoning) and Qwen3 235B 2507 (Reasoning) level intelligence 🧠 Key intelligence benchmarking takeaways: ➤ Reasoning Model Performance: GLM 4.6 (Reasoning) scores 56 https://x.com/ArtificialAnlys/status/1975425594679496979

HF demo: https://x.com/Alibaba_Qwen/status/1974290412602040532

The state of LLMs is messy: Some AI features (like vision) lag others (like tool use) while others have blind spots (imagegen and clocks). And the expensive “”heavy thinking”” models are now very far ahead of all the other AIs that most people use. None of this is well-documented.”” / X https://x.com/emollick/status/1974937573413007867

Made by @Grok Imagine with no prompt! Grok just read the text in the image and figured it out. https://x.com/elonmusk/status/1976146944398590385