Grok 2 (beta) A hotel concierge holds an iPhone toward the camera. The phone screen displays the word “Multimodal”. Behind the desk on the wall are several large TVs. Landscape aspect ratio.
“RecLoRA tackles personalization in LLMs for recommendation systems. Incorporates a Personalized LoRA module that maintains independent LoRAs for different users and a Long-Short Modality Retriever that retrieves different history lengths for different modalities, significantly
“Building a Multimodal Recipe Recommender with @qdrant_engine, @llama_index, and Gemini 🧑🍳 This is an strikingly easy-to-follow tutorial by Benito Martin on building a multimodal RAG pipeline that can ingest a playlist of YouTube videos (including video descriptions and the video
Sonova’s AI hearing aids offer crystal-clear speech in noisy places
“Our new study is out today in the New England Journal of Medicine! We demonstrate a speech neuroprosthesis that decodes the attempted speech of a man with ALS into text with 97.5% accuracy, enabling him to communicate with his family, friends, and colleagues in his own home. 1/9
[2408.04388v1] MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models
“Language Model Can Listen While Speaking Overview: This paper introduces the listening-while-speaking language model (LSLM) for real-time, interactive speech dialogue. LSLM supports simultaneous listening and speaking, allowing interruptions during responses. It combines a TTS
“Designing a website will never be the same again. Take a look at this quick video I recorded. I generated a complete landing page in Figma in less than 60 seconds. This is game-changer for people like me, who don’t know much about design but want to create something decent with
OpenAI
ChatGPT unexpectedly began speaking in a user’s cloned voice during testing | Ars Technica
Meet Gemini Live: a new way to have more natural conversations with Gemini (Google DeepMind) : r/singularity
“We’re introducing Gemini Live, a more natural way to interact with Gemini. You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call. Available to Gemini Advanced subscribers. #MadeByGoogle
“Google’s Gemini Live AI voice assistant is coming to Pixel Pro earbuds. Just like Joaquin Phoenix’s earbud in ‘Her.’
“Meet Gemini Live: a new way to have more natural conversations with Gemini. 💬 💡 Brainstorm ideas ❓ Interrupt to ask questions ⏸️ Pause a chat and come back to it Now rolling out in English to Gemini Advanced subscribers on @Android phones →
Using Gemini Live was faster than Google, but also more awkward – The Verge
Segmentation
ClickAttention: Click Region Similarity Guided Interactive Segmentation





Leave a Reply