Audio: AI News Week Ending 01/30/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Animation cel style: A muscular blue-skinned genie emerging from a brass oil lamp, conducting gesture with one arm raised, glowing cyan musical notes and sound waveforms swirling around him like magical smoke, vintage studio microphone in foreground, rich purple and teal magical wisps, warm golden lamp, Disney-quality hand-drawn aesthetic with bold outlines and volumetric effects, clean gradient background, horizontal composition with space for title text across top.

Apple acquires Israeli startup Q.ai https://www.cnbc.com/2026/01/29/apple-acquires-israeli-startup-qai-.html

Apple buys Israeli startup Q.ai as the AI race heats up | TechCrunch https://techcrunch.com/2026/01/29/apple-buys-israeli-startup-q-ai-as-the-ai-race-heats-up/

Google tests voice cloning on AI Studio powered by Gemini https://www.testingcatalog.com/google-tests-voice-cloning-ahead-of-gemini-3-flash-native-audio-release/

Thrilled to share our new Grok Imagine release 🚀 It is the highest quality, fastest, and most cost-effective video generation model yet. Comes with 720P, video editing and better audio! We listened closely to your feedback and moved fast. Just six months ago, we had almost”” https://x.com/EthanHe_42/status/2016749123198673099

🎉 Congrats @Alibaba_Qwen on the Qwen3-ASR release — vLLM has day-0 support. 52 languages, 2000x throughput on the 0.6B model, singing voice recognition, and SOTA accuracy on the 1.7B. Serve it now in vLLM! 🚀 Learn more: https://x.com/vllm_project/status/2016865238323515412

Qwen3-ASR and Qwen3-ForcedAligner are now open source — production-ready speech models designed for messy, real-world audio, with competitive performance and strong robustness. ● 52 languages & dialects with auto language ID (30 languages + 22 dialects/accents) ● Robust in”” https://x.com/Alibaba_Qwen/status/2016858705917075645

Qwen3-ASR is out🚀 https://t.co/pVnuuNPMEL ✨ 0.6B & 1.7B – Apache2.0 ✨ 30 languages + 22 Chinese dialects, plus English accents across regions ✨ Single model for language ID + ASR (no extra pipeline stitching) ✨ Qwen3-ForcedAligner-0.6B, a strong forced aligner”” https://x.com/AdinaYakup/status/2016865634559152162

Qwen3-ASR is the first open-source LLM-based ASR in the industry with native streaming support. Demo: https://t.co/y2X1slCMcs vLLM Example:”” https://x.com/Alibaba_Qwen/status/2016900512478875991

Big thanks to vLLM for providing Day 0 support for Qwen3-ASR.”” https://x.com/Alibaba_Qwen/status/2016905051395260838