Ethan B. Holland

Over 56,100 manually organized AI links and counting

Audio: AI News Week Ending 06/13/2025

June 13, 2025

Image created with OpenAI GPT-Image-1. Image prompt: 1966 Kodachrome photo-look, thin white frame, forest-green title band in upper left with stacked yellow/white serif text reading “AUDIO”, hand-held candid shot with slight motion blur on animals scene featuring a retro tape deck resting on a hay bale; gentle film grain, overcast daylight

Apple (AAPL) Targets Spring 2026 for Release of Delayed Siri AI Upgrade – Bloomberg https://www.bloomberg.com/news/articles/2025-06-12/apple-targets-spring-2026-for-release-of-delayed-siri-ai-upgrade?srnd=undefined&sref=9hGJlFio&embedded-checkout=true

Apple doesn’t report benchmarks for their AIs, reporting on an ill-documented head-to-head evaluation But even by their standards, Apple’s latest on device models are mostly worse than the open Gemma 3-4B from Google or Qwen 3-4B And their server LLM is similar to Llama 4 Scout https://x.com/emollick/status/1932420903515590997

ChatGPT voice is getting really good — https://x.com/gdb/status/1931456650336141752

Wow, new expressive voice in ⁦⁦@ChatGPTapp⁩ doesn’t just talk, it performs. Feels less like an AI and more like a human friend. Nice work ⁦@OpenAI⁩ team. 🎤🎶🚀 https://x.com/shaunralston/status/1931361225046405233

Haven’t tried the updated Advanced Voice that was recently launched to all paid users in ChatGPT? Then take a listen below. Prompt: Wish me an awkward happy birthday. https://x.com/OpenAI/status/1932166285447856130

The new ChatGPT Advanced Voice Mode is super interesting – lots of deliberate use of disfluencies (nervous laughs, ums & ahs) and vocal changes make it feel much more human than the previous version Really shows the possibilities from multimodal voice vs most AI’s text-to-speech”” / X https://x.com/emollick/status/1931557886947205629

I finally built PodPixel using @Replit 🎉 An web app that transcribes podcasts & pulls out all links/resources with context. Just use the search or drop a URL, and find those links. Try yourself. https://x.com/designworkplan/status/1928756748153659509

Apple Intelligence gets even more powerful with new capabilities across Apple devices – Apple https://www.apple.com/newsroom/2025/06/apple-intelligence-gets-even-more-powerful-with-new-capabilities-across-apple-devices/

Introducing Eleven v3 (alpha) – the most expressive Text to Speech model ever. Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers]. Now in public alpha and 80% off in June. https://x.com/elevenlabsio/status/1930689774278570003

World first: Breakthrough AI powered Brain-Computer Interface Enables Real-Time Speech for ALS Patient → A 45-year-old man with ALS can now produce expressive speech and melody using a brain-computer interface (BCI) that translates brain signals into audio in 10 milliseconds. https://x.com/rohanpaul_ai/status/1933094038816858372

Voice cloning is now trivially easy with open source tools, while live avatar videos of real people are easy with proprietary tools & a variety of open source tools are getting there. Very limited time to adjust legal & financial safeguards to new ways of authenticating people”” / X https://x.com/emollick/status/1931364236304830675

We just launched our biggest update yet. Meet Higgsfield Speak — the fastest way to make motion-driven talking videos. Pick a style, choose an avatar, type a script. We do the rest — cinematic motion, voice, emotion. Comment Speak to get the full guide + promo code in the DM. https://x.com/higgsfield_ai/status/1930686472845455417

How do you evaluate Voice Agents? This is the talk for you with @kwindla He provides code/Github repo & does some fun demos in this talk (links in yt description) https://x.com/HamelHusain/status/1932204210994704625

Fireflies.ai | AI Teammate to transcribe, summarize, analyze meetings, Real time AI note taker https://fireflies.ai/

Retellio | Turn Call Recordings into Actionable Insights https://www.retellio.com/

🆕 Four updates to building agents with OpenAI: Agents SDK in TypeScript, a new RealtimeAgent feature for voice agents, Traces support for the Realtime API, and improvements to our speech-to-speech model.”” / X https://x.com/OpenAIDevs/status/1929950012160790876

The rate of advancements is so high that most people don’t realize how far we advanced from Siri in voice interfaces. Give it a try!”” / X https://x.com/BorisMPower/status/1931732885415010763

apple really put “”the last in the group chat to get the joke”” in an ad about apple intelligence”” / X https://x.com/swyx/status/1932137205268688983

Apple WWDC 2025 > What users wanted: Siri that actually works > What users got: “You’ll immediately notice how the playback controls refract the environment. Sidebars and toolbars reflect the depth of your workspace and offer a subtle hint of the content.” I wanted more. But”” / X https://x.com/bilawalsidhu/status/1932168211963007179

Apple’s “spatial scenes” remind me of Facebook 3d photos from 2018. Take any photo and use AI to give it real depth and parallax. Glad Apple is starting to think beyond stereo photo/video for the Vision Pro; 6dof media needs to be a first class citizen. https://x.com/bilawalsidhu/status/1932286185285750791

Apple’s Visual Intelligence was showcased with a familiar demo for anyone following recent developer conferences: More ways to buy stuff, more swiftly, powered by AI. https://x.com/TechCrunch/status/1932147112164069608

I am a graphics programmer, and here’s my feedback on Apple’s Liquid Glass beta. The idea is cool, but it’s difficult to work with from a UX perspective. Let’s start with the main problems: 1 – Low Contrast: It’s clearly not readable, but there are many different ways to fix it. https://x.com/XorDev/status/1932429551256101328

Interesting to see Apple double down on conventional UIs while ignoring AI when the goal of the big AI firms is to make it so that you just talk to AI to get whatever you want done, without touching a UI.”” / X https://x.com/emollick/status/1932225668487463374

lmfao Apple models sound so 2010ish”” / X https://x.com/cto_junior/status/1932128352036605962

New iOS feels like a junior designer discovered the gradient tool, and are now using it EVERYWHERE. I’ve been there, that was me once.”” / X https://x.com/dzhng/status/1932135452569714863

RT @fkasummer: apple is about to have their windows vista moment”” / X https://x.com/zacharynado/status/1932259455368102098

Updates to Apple’s On-Device and Server Foundation Language Models – Apple Machine Learning Research https://machinelearning.apple.com/research/apple-foundation-models-2025-updates

What could happen at Apple’s WWDC 2025? See latest rumors https://www.usatoday.com/story/tech/2025/06/04/apple-wwdc-2025-rumors/84017268007/

Windows Vista walked so iOS 26 could run.”” / X https://x.com/skirano/status/1932145646963704199

WWDC: Apple opens its AI to developers but keeps its broader ambitions modest | Reuters https://www.reuters.com/business/wwdc-apple-faces-ai-regulatory-challenges-it-woos-software-developers-2025-06-09/

Take the reins — Reorder your track. Remix the vibe. Split it into stems. Upload full songs. HERE’S WHAT’S NEW: 🎼 Upgraded Song Editor: Make any edit you can imagine. Reorder, rewrite, and remake your track section by section—right from the waveform. 🧬 Stem Extraction: https://x.com/SunoMusic/status/1930007866116636735

The new voice model from ElevenLabs is interesting. I put it against one of the hardest pieces for reading aloud – the final verse of Eliot’s Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was good. https://x.com/emollick/status/1931198391154786565

Reimagining TTS with LLM-Powered Audio Generation | Bland AI https://www.bland.ai/blogs/new-tts-announcement

RT @freddy_alfonso_: 🚨 NotebookLM Dethroned?! 🚨 Meet vui: The new open-source dialogue generation model. 💪100M Params, 40k hours audio!…”” / X https://x.com/_akhaliq/status/1932149790747525396

🎙️ After serving millions of users through our text-to-speech platform, one need kept coming up: fine-grained AI speech editing – the ability to modify existing speech. Today, we’re open-sourcing PlayDiffusion, a diffusion-based inpainting model built for that exact purpose. https://x.com/PlayAIOfficial/status/1929558863319330822

Yambda-5B — an open recommender system datasets from Yandex It offers open access to large-scale, anonymized Yandex music streaming data, including features like the is_organic flag and Global Temporal Split (GTS). Why is it special? Yambda-5B provides: ▪️ 4.79 billion https://x.com/TheTuringPost/status/1932091557127274993

Also illustrates how far Siri has fallen behind. The gap between it and ChatGPT Advanced Voice Mode is vast (as is the gap between Siri and Gemini Voice, which is not quite as advanced as ChatGPT) The usual fast follower approach may fail as people come to trust “”their”” chatbot.”” / X https://x.com/emollick/status/1931914916341944456