Audio: AI News Week Ending 12/19/2025

Audio: AI News Week Ending 12/19/2025

December 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of a child aged 6-8 wearing oversized vintage headphones with eyes gently closed, sitting on plush rug in warm-lit bedroom, surrounded by panoramic arc of TV screens displaying colorful audio waveforms and spectrograms in amber and cyan, vinyl records and cassette tapes scattered among newspapers on floor, small vintage radio on shelf, shallow depth of field, soft focus, warm peach and cream tones contrasting with cool blue screen glow, tender intimate lighting, large bold text reading AUDIO at top of frame.

Fastweb + Vodafone (Swisscom Group), one of Europe’s leading telecom providers, is building Super TOBi, which brings agentic customer service to massive scale. Using LangSmith, they are: 🔹Achieving 90% response correctness and 82% resolution rates across ~9.5M customers https://x.com/LangChain/status/2001321491703443877

GROK JUST TURNED VOICE AI INTO A REAL PRODUCT, FAST, AND EVERYWHERE xAI just opened Grok Voice to developers, and this isn’t some early experiment dressed up as a launch. It’s the same system already running inside millions of Teslas, now exposed through an API that actually https://x.com/MarioNawfal/status/2001472484869329288

Grok Voice Agent API | xAI https://x.ai/news/grok-voice-agent-api

Today, we’re excited to launch the Grok Voice Agent API, empowering developers to build voice agents that speak dozens of languages, call tools, and search realtime data. https://x.com/xai/status/2001385958147752255

Took less than an hour for Grok Voice Agent by @xai to be ported to Reachy Mini thanks to @atariorbit! https://x.com/ClementDelangue/status/2001410494528213481

Gemini 2.5 Native Audio upgrade, plus text-to-speech model updates https://blog.google/products-and-platforms/products/gemini/gemini-audio-model-updates/

Listen up 🔊 We’ve made some updates to our Gemini Audio models and capabilities: — Gemini’s live speech-to-speech translation capability is rolling out in a beta experience to the Google Translate app, bringing you real-time audio translation that captures the nuance of human https://x.com/GoogleAI/status/1999560839679082507

Today we are rolling out an updated Gemini Native Audio model, built with 🎙️: – higher precision function calling – better realtime instruction following – smoother and more cohesive conversational abilities Available to developers in the Gemini API right now! https://x.com/OfficialLoganK/status/1999586764382523521

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to https://x.com/AIatMeta/status/2000980784425931067

SAM Audio https://ai.meta.com/samaudio/

sam-audio – a facebook Collection https://huggingface.co/collections/facebook/sam-audio

Apple Music is coming to ChatGPT, OpenAI announces https://x.com/9to5mac/status/2001014465689469051

Apple Music is coming to ChatGPT, OpenAI announces – 9to5Mac https://9to5mac.com/2025/12/16/apple-music-is-coming-to-chatgpt-openai-announces/

Photoshop is now inside ChatGPT. Just prompt what you want and get slider-level control to dial in the perfect look. Intelligently select content and apply effects — without opening Photoshop. You’re the conductor. Photoshop is the orchestra. For me, this one’s personal — I’ve https://x.com/bilawalsidhu/status/1999594990868267227

Adobe launches free ChatGPT-integrated apps for Photoshop, Acrobat, and Express on desktop, the web, and iOS, after OpenAI added app integrations in October (@zombie_wretch / The Verge) https://x.com/Techmeme/status/1998741032091996348

Edit with Photoshop in ChatGPT | Adobe Blog https://blog.adobe.com/en/publish/2025/12/10/edit-photoshop-chatgpt

xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark The new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5 https://x.com/ArtificialAnlys/status/2001388724987527353

🎥 Kling 2.6 Motion Control Feature Is Now Live! To celebrate the launch of Kling 2.6 Motion Control Feature, we’re kicking off a new contest – and the prizes are one post away from you! 🔥 Show us your creative power with Kling 2.6 Motion Control Feature – The Kling 2.6 Motion https://x.com/Kling_ai/status/2001891240359632965

🎥 Kling 2.6 Voice Control Feature Is Now Live! To celebrate the launch of Kling 2.6 Voice Control Feature, we’re kicking off a new contest – and the prizes are one post away from you! 🔥 Show us your creative power with Kling 2.6 Voice Control Feature – Use your signature voices https://x.com/Kling_ai/status/2001198609115628029

🚀 Motion Control, Leveled Up Newly upgraded Motion Control is now live in Kling VIDEO 2.6! Experience precise, full control over every action & expression ✅ Full-Body Motions — Body movements captured in stunning detail ✅ Fast & Complex Actions — From martial arts to https://x.com/Kling_ai/status/2001306445262823431

🚨 Kling O1 Video Standard is here on fal! 🎬 Same powerful editing model, 720P mode ✨ Start & end frame control for precision 🎯 3-10 second range for flexible videos 💰 Faster generation, lower cost https://x.com/fal/status/2000590369545744599

🚨Video Leaderboard Updates Kling 2.6 Pro by @kling_AI and the new Kandinsky 5.0 open models by @kandinskylab have now landed on the Video Arena leaderboard. Kling 2.6 Pro delivers a major 16-point jump over Kling-2.5-turbo-1080p. While Kandinsky 5.0 enters strong, taking the https://x.com/arena/status/1999530939886768205

A new prompt unlock? Multiple gliding rack focus through a cyberpunk nightclub, yes the characters in close up are prompted, prompt share in later post. Not keyframes. Created in @Kling_ai 2.6 Image to video. 🔊🔊🎧 https://x.com/StevieMac03/status/2002001196383391813

Do you want to create ultra-dynamic action animations with @Kling_ai 2.6? 🎬⚡️ After testing many prompts, I’ve noticed what works best. And here’s the key. 👉 What usually gives the best results is starting the prompt with “”High-speed anime battle.”” Other combinations that https://x.com/Artedeingenio/status/2001960379610767835

Kling 2.6「MotionControl」ダンス動画で検証・全身のステップや重心移動が自然・髪の毛の追尾性能も◎ こういったダンスやアクションの方が相性が良く、強みを発揮できる印象です✨ https://x.com/genel_ai/status/2001532885673873677

Oh my… Kling just dropped the next era of motion control. Kling VIDEO 2.6 can copy any action with perfect lip-sync, lifelike motion and expressive gesture. It outperforms Wan 2.2-Animate, Act-Two and DreamActor 1.5 across all metrics. More examples below. https://x.com/AngryTomtweets/status/2001569619375698199

Quick test of Kling 2.6 Motion Control Shall I keep going? 😭 https://x.com/blizaine/status/2001849003819098168

Your frames. Your timing. Kling VIDEO O1 now supports Start & End Frames generation with freely selectable durations from 3- 10s, giving you smoother transitions and more control over pacing. From fast, high-impact moments to fully immersive cinematic shots–your story moves the https://x.com/Kling_ai/status/2000581619556421673

Music tools usually live on flat screens. This is an Apple Vision Pro app that turns music production into a modular, spatial system. Sounds are blocks. Arrangements become structures. The composition exists as a physical object in the room, not a timeline on a screen. https://x.com/IlirAliu_/status/2000491725597384854

V2V時代の幕開けです！📢 文字で指示する時代から動きで指示をする時代へ。 Kling AIのモーションコントロール機能を試しました。この機能は、1.6のバージョンで使えていた機能ですが、今回から最新モデルの2.6で利用可能です。 https://x.com/seiiiiiiiiiiru/status/2001502678116110430

Introducing Real-time Transcription with Speakers! – Step change in accuracy, surpassing top cloud APIs – Faster than real-time on Mac and iPhone – Still under 3 watts when all features are enabled Available in Argmax SDK 2.0 for early access! Benchmarks and details in comments. https://x.com/argmax/status/2001296557556040028

Realtime speech to speech translation powered by Gemini, available in Google Translate now, coming to developers early next year : ) https://x.com/OfficialLoganK/status/1999994009452962073

Meta just released sam-audio https://x.com/_akhaliq/status/2001000836017844296

🆕 New audio model snapshots are now live in the Realtime API with improvements to reliability, lower error rates, and fewer hallucinations: – gpt-4o-mini-transcribe-2025-12-15: 89% reduction in hallucinations compared to whisper-1 – gpt-4o-mini-tts-2025-12-15: 35% fewer word https://x.com/OpenAIDevs/status/2000678814628958502

I’m satisfied with GPT-5.2’s long-context capability. Up to now, I’ve always used Gemini to summarize podcasts, but I can now switch this use case over to ChatGPT. What I like is that, with the same prompt, it produces summaries with richer detail compared to Gemini. (That”” / X https://x.com/Hangsiin/status/2000738988378968224

kling2.6(@Kling_ai )のモーションコントロールについて v2vの最大の魅力は AIで再現できない演技をさせること。実例として私が恥を晒して再現したから見てほしい。こんな動きプロンプトでは無理なんですよ。 https://x.com/onofumi_AI/status/2001840428250022087