Ethan B. Holland

Over 54,400 manually organized AI links and counting

Audio: AI News Week Ending 05/23/2025

May 23, 2025

Image created with Ideogram 3.0. Image prompt: Lower-East-Side street-corner photograph reminiscent of a late-80s album cover: weathered red-brick tenement with exterior fire-escapes, canvas awning shading racks of vintage clothes; above the awning, a hand-painted board reads ‘Audio SPORTSWEAR’; a hanging blade sign in cursive script reads ‘Audio Boutique’; vintage boomboxes blast old-school beats from a sidewalk table; warm golden-hour light, subtle 35mm film grain, muted yet punchy color palette, gritty NYC vibe.

Last year, we introduced Project Astra: a research prototype exploring capabilities for a universal AI assistant. 🤝 We’ve been making it even better with improved voice output, memory and computer control – so it can be more personalized and proactive. Take a look ↓ #GoogleIO https://x.com/GoogleDeepMind/status/1924883244459425797

Check out Veo 3 🔥🔥🔥 sound on 🔊”” / X https://x.com/_tim_brooks/status/1924895946967810234

From capturing real-world physics – like the noise and movement of water, or the look and sound of walking in snow – to lip syncing, Veo 3 is great at understanding what you want. You can tell a short story in your prompt, and the model gives you back a clip that brings it to https://x.com/GoogleDeepMind/status/1924893531300077675

Say goodbye to the silent era of video generation: Introducing Veo 3 — with native audio generation. 🗣️ Quality is up from Veo 2, and now you can add dialogue between characters, sound effects and background noise. Veo 3 is available now in the @GeminiApp for Google AI Ultra https://x.com/Google/status/1924893837295546851

Veo 3 is available today for Ultra subscribers in the United States in the @GeminiApp. Find out more about where you can use it ↓ https://x.com/GoogleDeepMind/status/1924893533787332996

Veo 3, our SOTA video generation model, has native audio generation and is absolutely mindblowing. For filmmakers + creatives, we’re combining the best of Veo, Imagen and Gemini into a new filmmaking tool called Flow. Ready today for Google AI Pro and Ultra plan subscribers. https://x.com/sundarpichai/status/1924909490081825195

Veo 3: “”a big broadway musical about garlic bread, with elaborate costumes and a sondheim-like vibe”” https://x.com/emollick/status/1925065546082484418

Veo 3: “”a scene from an unnerving 1970s childrens show with live action puppets and Lovecraftian overtones singing a song”” https://x.com/emollick/status/1925047195738218505

Video, meet audio. 🎥🤝🔊 With Veo 3, our new state-of-the-art generative video model, you can add soundtracks to clips you make. Create talking characters, include sound effects, and more while developing videos in a range of cinematic styles. 🧵 https://x.com/GoogleDeepMind/status/1924893528062140417

AI learns how vision and sound are connected, without human intervention | MIT News | Massachusetts Institute of Technology https://news.mit.edu/2025/ai-learns-how-vision-and-sound-are-connected-without-human-intervention-0522

How Apple Intelligence and Siri AI Went So Wrong – Bloomberg https://www.bloomberg.com/news/features/2025-05-18/how-apple-intelligence-and-siri-ai-went-so-wrong?embedded-checkout=true

Omni-R1 Do You Really Need Audio to Fine-Tune Your Audio LLM? https://x.com/_akhaliq/status/1923003554131775788

We’ve also been working on upgrades to Project Astra, including more natural voice output with native audio, improved memory, & computer control. Over time we’ll bring these new capabilities to Gemini Live & new experiences in Search, Live API for devs, and new form factors like”” / X https://x.com/Google/status/1924883459253649494

A different twist on privacy concerns. With AI powered always on devices, you are not just being recorded secretly, those recordings are now also more valuable as AI can go through the audio and turn it into useful data for the recording party Another place policy will be needed”” / X https://x.com/emollick/status/1923760092584816855

Stability AI and Arm Collaborate to Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation — Stability AI https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small-enabling-real-world-deployment-for-on-device-audio-control

Stability AI open-sourced Stable Audio Open Small, a text-to-audio AI —341M-parameters —Generates 11s of audio, including drum loops, foley, riffs, and textures —Optimized for Arm-based consumer devices https://x.com/adcock_brett/status/1924133939376996539

Today we’re open-sourcing Stable Audio Open Small, a 341M-parameter text-to-audio model optimized to run entirely on @Arm CPUs. This means 99% of smartphones can now generate music-production samples in seconds, right on-device with no internet required. Built for fast, https://x.com/StabilityAI/status/1922675163411497094

Elton John brands government ‘absolute losers’ over AI copyright plans https://www.bbc.com/news/articles/c8jg0348yvxo

We recently expanded access to Music AI Sandbox 🎵 our suite of experimental AI tools for professional musicians powered by our latest model Lyria 2. 🎧 This medley made with artists will transport you through different genres, sounds, and moods. Sound on ↓ https://x.com/GoogleDeepMind/status/1924894499899146677

Google Meet is getting real-time speech translation | TechCrunch https://techcrunch.com/2025/05/20/google-meet-is-getting-real-time-speech-translation/

Announcing Veo 3, Imagen 4, and Lyria 2 on Vertex AI | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/announcing-veo-3-imagen-4-and-lyria-2-on-vertex-ai

I really want to try out Veo3. Really really bad. But the reality is, I will probably only run a dozen or so tests through it and move on, so I cannot justify a subscription of this amount. I have visited this screen a dozen or so times the past few days. https://x.com/ostrisai/status/1925917357731410313

It’s official — Veo 3 and Imagen 4 is here and available starting today. > Veo 3 is not only higher quality video with support for subject and style references, but it can *natively* generate audio (sound effects, music AND dialogue!) > Imagen 4 similarly now crushes it at https://x.com/bilawalsidhu/status/1924897257629089855

New Google AI Ultra subscription tier will give you access to Gemini 2.5 Pro Deep Think, Veo 3 and Project Mariner”” / X https://x.com/scaling01/status/1924891236109799838

non-human intelligence comes in peace the dialogue, lip movement, environmental audio — all perfectly synced — all from one prompt what should i prompt with google veo 3 next? https://x.com/bilawalsidhu/status/1924931758556082677

This matches what I am seeing, the model is a huge leap in video creation and is good at direction following, but the two most common failure modes are that it adds nonsense “”subtitles”” to videos and that some videos lack sound. Wouldn’t be a big deal but credits are not returned”” / X https://x.com/emollick/status/1925305547651190787

We’re excited to shape the future of Flow with AI filmmakers like @HenryDaubrez who used it to create Electric Pink: a short video exploring a pink-haired superhero crafting his dream adventure using his childhood inspirations. 📽️✨↓ https://x.com/GoogleDeepMind/status/1924896549248594225

Gemini Diffusion: diffusion-based LLM, much faster than autoregressive LLMs Gemini 2.5 Pro Deep Think: doubles o3’s score on 2025 USAMO math competition Imagen 4: can spell Veo 3: native audio generation, characters can speak Google is back. Artificial Pichai Intelligence.”” / X https://x.com/Yuchenj_UW/status/1924896740068753825