Ethan B. Holland

Over 51,300 manually organized AI links and counting

Audio: AI News Week Ending 09/12/2025

September 12, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Audio, poolside table with cold drink, glass refraction casting clean sine‑wave patterns on the surface, warm afternoon glow, photorealistic, editorial, minimal, landscape, vacation, no text overlays

Gemma 3n now available in the Play Store for on-device, internet free with speech, text and image input! Open local AI Assistants are coming to everyone! 🤯 – New on-device speech-to-text and speech-to-translated-text. – Process audio batch inference for clips up to 30 seconds. https://x.com/_philschmid/status/1965742109157188031

You asked, we shipped! Scripted mode just dropped for audio generation in Copilot Labs (c/o our new MAI-Voice-1 model). Scripted mode: reads your input verbatim Emotive: riffs a bit for max drama Story: performs multiple voices/characters Try out all 3 ➡️ https://x.com/mustafasuleyman/status/1965825187393511565

Stability AI Introduces Stable Audio 2.5, the First Audio Model Built for Enterprise Sound Production at Scale — Stability AI https://stability.ai/news/stability-ai-introduces-stable-audio-25-the-first-audio-model-built-for-enterprise-sound-production-at-scale

Today we’re launching Voice Remixing in alpha. Reimagine any aspect of your own voice or your designed voices to create new characters: •Change the gender •Make voices sound older or younger •Try a new accent Perfect for creative storytelling and precise Agent design. https://x.com/elevenlabsio/status/1965806127897264300

🗣️ Evals now support native audio inputs and audio graders. Evaluate model audio responses, with no text transcription needed. Get started in the Cookbook guide: https://x.com/OpenAIDevs/status/1965923707085533368

OpenAI referenced Artificial Analysis’ Big Bench Audio benchmark in their recent GPT-Realtime release, where they secured the #1 position with a score of 83% Benchmark context: Big Bench Audio is the first dedicated dataset for evaluating reasoning performance of speech models. https://x.com/ArtificialAnlys/status/1966116575851028970

🎙️ Meet Qwen3-ASR — the all-in-one speech recognition model! ✅ High-accuracy EN/CN + 9 more languages: ar, de, en, es, fr, it, ja, ko, pt, ru, zh ✅ Auto language detection ✅ Songs? Raps? Voice with BGM? No problem. <8% WER ✅ Works in noise, low quality, far-field ✅ Custom https://x.com/Alibaba_Qwen/status/1965068737297707261