Ethan B. Holland

Over 56,100 manually organized AI links and counting

Audio: AI News Week Ending 05/08/2026

May 8, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference images, keep the authentic Sonoran Desert trail scene with rocky singletrack, saguaros, volcanic rock, and the wide bright valley vista, and keep the brown wooden post sign with its weathered ranger-style typography and directional arrows — but change the header to bold all-caps ‘AUDIO’ with fictional trail entries like ‘→ Whisper Wash 1.2’, ‘← Waveform Ridge 0.8’, ‘→ Podcast Pass 3.4’, replace the WP3 medallion with a small speaker-cone icon, and place a weathered brass tuning fork planted upright in the dirt beside the post with a tiny cactus wren perched on the sign listening intently. Maintain photorealistic Sonoran documentary lighting, warm midday sun, and the exact sign construction and palette of the reference.

🎙️ Voice AI only feels natural when conversation keeps pace with speech. Here’s how we rebuilt our WebRTC stack with a thin relay and stateful transceiver to keep real-time media fast for ChatGPT voice, the Realtime API, and more.
https://x.com/OpenAIDevs/status/2051453905343828350

🚀 GPT-Realtime-2 just landed in Genspark. Our Call for Me Agent now runs on it. Genspark Realtime Voice is upgrading next. What Realtime 2 brings: Sharper reasoning. Tighter instruction following. +26% effective conversation rate. Far fewer dropped calls.
https://x.com/genspark_ai/status/2052524670088556557

Advancing voice intelligence with new models in the API | OpenAI
https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

Building voice applications with GPT-Realtime-2? Our new prompting guide covers how to tune reasoning effort, use preambles, design tool behavior, handle unclear audio, capture exact entities, and maintain state in longer sessions.
https://x.com/OpenAIDevs/status/2052530378184032560

Dubbing for live events… in real time? 😮 Here’s OpenAI’s new GPT-Realtime-Translate model in action in Vimeo. Those translations are happening completely live. No pre-loaded captions. Live dubbing is one of the many features we’re exploring this year… (Hopefully) more
https://x.com/Vimeo/status/2052442588201029684

GPT-Realtime-2 audio input price remains steady at $1.15 per hour of audio input, and $4.61 per hour of audio output.
https://x.com/ArtificialAnlys/status/2052486478501204415

gpt-realtime-2 shows a 15pp improvement (vs 1.5) on Big Bench Audio, and is now close to saturation.
https://x.com/juberti/status/2052507302092296252

GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper are available in the Realtime API today.
https://x.com/OpenAIDevs/status/2052440968763515223

GPT-Realtime-2: Building a Live Translator
https://x.com/RayFernando1337/status/2052479718495318143

GPT-Realtime-Whisper brings low-latency streaming transcription to the Realtime API. Use it when your app needs to understand speech continuously while the interaction is still unfolding.
https://x.com/OpenAIDevs/status/2052440957258489859

Guess who’s back, back again. Whisper, but now with realtime streaming. Check out the new gpt-realtime-whisper transcription model in my
https://t.co/b2UTuSxhOI demo.
https://x.com/juberti/status/2052478775523512356

have been excited for realtime voice-to-voice translation as an AI application since we started OpenAI. extremely cool to see it now available in the API for anyone to build with:
https://x.com/gdb/status/2052480998668206262

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API
https://x.com/OpenAI/status/2052438194625593804

New ChatGPT Voice mode pretty much confirmed. And im really excited for it.
https://x.com/kimmonismus/status/2051571219040735423

New Voice Model from OpenAI in the API gpt-realtime-2 Here a quick demo I built
https://x.com/diegocabezas01/status/2052492653082681485

OpenAI has released GPT-Realtime-2, achieving 96.6% in our Speech Reasoning benchmark, Big Bench Audio, and #1 in our Conversational Dynamics benchmark Released today, GPT-Realtime-2 is OpenAI’s new flagship native Speech to Speech model, introducing adjustable reasoning effort
https://x.com/ArtificialAnlys/status/2052486470469140777

OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough to use in my voice agents that do “”real work.”” Or real play, for that matter. Here’s gpt-realtime-2 as the brain of the ship AI in Gradient Bang. The
https://x.com/kwindla/status/2052521318688739811

Our new voice models are now available in the Realtime API: 🎙️ GPT-Realtime-2: Build production-ready voice agents that can think harder, take action, handle interruptions, and keep conversations flowing. 🎙️ GPT-Realtime-Translate: Translate while streaming across more than 70
https://x.com/OpenAI/status/2052438196454379986

people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)
https://x.com/sama/status/2052462271667028211

pretty excited for voice models to get great its interesting to watch how people are already starting to change the way they interface with AI
https://x.com/sama/status/2051464865634742334

Saw this and thought “”yes! ChatGPT voice mode is going to stop acting like a two-year-model”” but that upgrade hasn’t shipped just yet
https://x.com/simonw/status/2052439091577496054

Taking talking shop to a whole new level. We just shipped Glean’s real-time voice capability, powered by @OpenAI’s newest speech model GPT-Realtime-2. Grounded in the context across your org, it feels like a real AI coworker and can keep up with how work gets finished. In
https://x.com/glean/status/2052440702169108990

Updated my hello-realtime demo to use the new gpt-realtime-2 model (now with reasoning). Check it out at
https://t.co/td6Cx2EOPO, or call 425-800-0042!
https://x.com/juberti/status/2052469176821002676

Using @OpenAI gpt-realtime-2 to get a glimpse of future voice-first experiences. A market dashboard you don’t click through. You direct it. Say, “Focus on Apple,” and the whole interface changes. Ask, “How did it do over the last 30 days?” and the chart updates. Say, “Go
https://x.com/levinstanley/status/2052506605044842672

Voice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster
https://x.com/OpenAIDevs/status/2052440907933474954

Voice agents are so back!! Today we’re launching 3 new realtime audio models in the API: 🎙️ GPT-Realtime-2 GPT-5-class reasoning for voice agents that can use tools, recover from interruptions, and carry longer conversations with 128K context 🌍 GPT-Realtime-Translate Live
https://x.com/reach_vb/status/2052438371058737280

Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo
https://x.com/OpenAIDevs/status/2026014334787461508

We know you’re eager for voice updates in ChatGPT. Stay tuned, we’re cooking.
https://x.com/OpenAI/status/2052438197695877316

Congrats to @OpenAI for taking the top spot on our Audio MultiChallenge S2S leaderboard with the release of GPT‑Realtime‑2 🥇 GPT-Realtime-2 more than doubles GPT-Realtime-1.5 on instruction retention, rising from 36.7% to 70.8% APR, and also stands out on voice editing,
https://x.com/ScaleAILabs/status/2052451341071683732

Custom Voices and Voice Library | xAI
https://x.ai/news/grok-custom-voices

Try Grok Voice for your customer support
https://x.com/elonmusk/status/2052530063913189879

Two voices. One human. One AI. Can you guess the AI clone? 👇 Voice cloning, rich with natural emotion, is now live on the Grok Voice API.
https://x.com/xai/status/2051438210065322244

i think audio is honestly a bit like VR: everyone keeps getting excited about it but it doesn’t fully stick as an interface tool use in realtime, reasoning while speaking, live translations are massive steps to getting audio interfaces to take off
https://x.com/willdepue/status/2052493097586823353

Save Your Personal Podcast to Spotify and Listen Anywhere — Spotify
https://newsroom.spotify.com/2026-05-07/personal-podcasts-launch/

Voice is a lot more natural for humans and over time AI will shape urself to make the best use of our limited bandwidth to produce the highest value to us
https://x.com/BorisMPower/status/2052471142921994332

who can help? maybe @GradiumAI @kyutai_labs @ElevenLabs?
https://x.com/ClementDelangue/status/2052385809655828907

Google Flow Music announces partnerships with Believe
https://blog.google/innovation-and-ai/models-and-research/google-labs/believe-flow-music-partnership/

Voice Cloning is now live via the xAI API! Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more.
https://x.com/xai/status/2050355373052223585