Ethan B. Holland

Over 54,900 manually organized AI links and counting

Audio: AI News Week Ending 03/14/2025

March 14, 2025

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice https://magicinfinite.github.io/

Podcasting platform Podcastle launches a text-to-speech model with more than 450 AI voices | TechCrunch https://techcrunch.com/2025/03/03/podcasting-platform-podcastle-launches-a-text-to-speech-model-with-more-than-450-ai-voices/

“YuE: Scaling Open Foundation Models for Long-Form Music Generation “We tackle the task of long-form music generation—particularly the challenging lyrics-to-song problem—by introducing YuE (乐), a family of open foundation models based on the LLaMA2 architecture. Specifically, https://x.com/iScienceLuvr/status/1899717628912157104

“In the last 7 minutes I have built a pdf -> text converter app using @Replit . Now I plan to add @elevenlabsio integration so I can listen to my books. I think I can be unstopable with @bubble + @Replit stack 🤩 https://x.com/bestbubbledev/status/1896947792712741081

“Copilot Voice speaks over 40 languages now! Whether you want to chat in Japanese or translate Italian, it has you covered. And Zurich team, don’t judge me while I try to learn “hi, how are you” in German. https://x.com/mustafasuleyman/status/1890091173517357460

“@grok Hedra unveiled Character-3, a new “omnimodal model” It can reason jointly across image, text, and audio to create high-quality video generations featuring characters, dynamic backgrounds, emotional control, and more You can use it in Hedra Studio! https://x.com/rowancheung/status/1899003669674066318

“Introducing #Ray2 Flash—3x faster, 3x cheaper new model. Flash brings Ray’s frontier production-ready Text-to-Video, Image-to-Video, audio, and control capabilities with high quality and speed to all subscribers—so you can create more, faster, and without limits. Available now. https://x.com/LumaLabsAI/status/1898056614684381218

“Cartesia announced $64M in funding and a new Sonic 2.0 state space model The model generates realistic AI voice across 16 languages and also includes an ultra-fast ‘Turbo’ mode In blind tests, 1.5x as many people preferred Sonic over other providers https://x.com/rowancheung/status/1899713320006852788

“INTERNS: we have a lot of summer internship job listings up on the Figure careers site: > Motor Test > FPGA Design > Full-Stack SWE > Embedded Software > Mechanical Engineer > Audio Signal Processing It’s hard work, but rewarding!” / X https://x.com/adcock_brett/status/1899484728157417610

“ElevenLabs announced a massive discount on its SOTA Scribe speech-to-text model, which excels across 99 languages —45% price-cut on its API —Free via the company’s UI —Offers valid until April 9, 2025 https://x.com/rowancheung/status/1899351001850614057

“📢 Cartesia’s Sonic-2 voice AI model is now available through Together API. This SOTA model delivers industry-leading 40ms latency and high-fidelity voice synthesis. Create seamless multimodal experiences with chat, image, audio, code and embeddings on the Together Platform. https://x.com/togethercompute/status/1899498102836380106

“in 5 lines and a free google colab, you too can have real life like voice synthesis! ⚡ https://x.com/reach_vb/status/1900306104724447478

“Hedra Studio and Character-3 is here. A new generation of AI-native video creation. At its core is Character-3, the first omnimodal model in production, built to jointly reason across image, text, and audio for more intelligent video generation. This goes beyond multimodal—it’s https://x.com/hedra_labs/status/1897699010632466469

“Introducing Scribe — the most accurate Speech to Text model. It has the highest accuracy on benchmarks, outperforming previous state-of-the-art models such as Gemini 2.0 and OpenAI Whisper v3. It’s now the leading model for English, Spanish, Italian, and many more. With support https://x.com/elevenlabsio/status/1894821477230485570

“You can now install the Perplexity App from the Windows and Microsoft App Store. We will be shipping voice-to-voice mode in a few days.” / X https://x.com/AravSrinivas/status/1900371155753853427

“A new study found that people trust humanoid robots significantly more than a nonhumanoid robot to care for objects, personal information, and vulnerable agents like children or pets. Even a faceless humanoid with a robotic voice was trusted more than the nonhumanoid robot. https://x.com/TheHumanoidHub/status/1898099364503208022

“Llama models were used to develop India’s first open source audio language model — Shuka v1 ➡️ https://x.com/AIatMeta/status/1900271556917682405

“We’ve raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We’ll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below https://x.com/cartesia_ai/status/1899479695537676624

“@grok Luma Labs released Ray2 Flash, a new version of its top-tier video generation model —3x faster than Ray2 —3x more affordable —Text-to-Video —Image-to-Video —Audio with advanced control options https://x.com/rowancheung/status/1899003692004548907

“HOLY SHITT, Sesame Labs just dropped CSM (Conversational Speech Model) – Apache 2.0 licensed! 💥 > Trained on 1 MILLION hours of data 🤯 > Contextually aware, emotionally intelligent speech > Voice cloning & watermarking > Ultra fast, real-time synthesis > Based on llama https://x.com/reach_vb/status/1900304515376799915

“Introducing the Perplexity Windows app. Access voice dictation, keyboard shortcuts, and the latest models with the official Perplexity desktop app for PC. https://x.com/perplexity_ai/status/1899498357154107499

“OMG did another open source song gen model just drop? 🧑‍🎤 🎵DiffRhythm is “the first latent diffusion-based song generation model capable of synthesizing complete songs of up to 4m45s in only 10(!!) seconds” 🤯🤯 https://x.com/linoy_tsaban/status/1896857873608827196