Ethan B. Holland

Over 52,500 manually organized AI links and counting

Audio: AI News Week Ending 04/17/2026

April 17, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, preserve the marigold-orange backdrop, the seated woman’s purple-and-white windbreaker and closed-eyes smile, the tattooed singer’s red beanie and layered red vest, and the exact intimate two-shot lighting and composition, but replace only the black handheld microphone with a glossy black vinyl LP record held edge-on to his lips in the same hand position, its grooves and center label catching the warm key light with photographic realism. After generating the image, overlay the text “Audio” in the upper-left corner of the frame in large, bold, all-caps ITC Avant Garde Gothic Pro Medium (or a near-identical geometric sans-serif if unavailable), pure white (#FFFFFF), with no date, subtitle, drop shadow, or outline. The text should be substantial in scale — taking up a meaningful portion of the upper-left area — with comfortable margin from the top and left edges, set against the negative space of the orange backdrop so it does not overlap or obscure the singer, the seated woman, or the replaced object.

Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵
https://x.com/GoogleDeepMind/status/2044447030353752349

Gemini 3.1 Flash TTS: New text-to-speech AI model
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/

Google’s new Gemini 3.1 Flash TTS ranks #2 on the Artificial Analysis Speech Arena Leaderboard, ahead of ElevenLabs’ Eleven v3 and only behind Inworld TTS 1.5 Max Gemini 3.1 Flash TTS represents a significant step forward for Google from previous TTS models, with notably
https://x.com/ArtificialAnlys/status/2044450045190418673

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!
https://x.com/OfficialLoganK/status/2044447596010435054

Our most expressive and steerable TTS model yet! Designed to give builders granular control over AI-generated speech, Gemini 3.1 Flash TTS is really fun to play with! Available in preview today – for devs via the Gemini API & @GoogleAIStudio + for enterprises on Vertex AI
https://x.com/demishassabis/status/2044599020690010217

An experimental voice pipeline for the Agents SDK enables real-time voice interactions over WebSockets. Developers can now build agents with continuous STT and TTS in just ~30 lines of server-side code.
https://x.com/Cloudflare/status/2044423032265957872

You can now add voice to your agent using Agents SDK:
https://t.co/bb29zIHvEt Voice is just another input — you can use the same WebSocket connection your Durable Object uses to transmit audio. So much fun working with @threepointone on this
https://x.com/korinne_dev/status/2044441427736936510