a realistic baroque oil painting in an ornate gilded frame, depicting a deconstructed scene with a variety of image and audio tools, a small, elegant nameplate at the bottom of the fame is engraved with the title: “Multimodal” –chaos 20 –ar 4:3 –style raw –personalize t9u6ckr –v 6.1
“French startup Kyutai just introduced Moshi, an open-sourced ‘real-time’ AI voice assistant. It’s capable of responding to a range of emotions and styles in a similar fashion to OpenAI’s Voice Mode
“SenseTime also revealed SenseNova 5o, a real-time multimodal model capable of processing audio, text, image, and video. Here’s a video of a live demonstration of SenseTime 5o in action (it’s incredibly similar to the GPT-4o demo)
“🚨 Chinese AI company SenseTime just revealed SenseNova 5.5, an AI model that claims to beat GPT-4o across key metrics Plus, big developments from Apple, YouTube, KLING, Neuralink, and Google DeepMind. Here’s everything going on in AI right now:” / X
“At the World Artificial Intelligence Conference (WAIC) in Shanghai this weekend, SenseTime unveiled SenseNova 5.5. The company claims the model outperforms GPT-4o in 5 out of 8 key metrics. While I’d take it with a grain of salt, China’s AI startups are showing major progress
“Vision language models can see We introduce a new benchmark named Avocado360, and evaluate four arbitrarily selected VLMs on this benchmark. We show, for the first time ever, that VLMs can determine whether or not a given image contains an avocado.” / X
Deploying ML for Voice Safety
OpenAI
“Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇
“Game-changer alert: Navigate your video by clicking transcribed words with Whisper Timestamped! 🚀 Key features: – Multilingual transcription (35+ languages) – Click any word to jump to that moment in the video – Works with audio & video files – 100% browser-based for total
Segmentation
ConceptExpress: Unsupervised Concept Extraction (UCE): We focus on the unsupervised problem of extracting multiple concepts from a single image. Given an image that contains multiple concepts, we aim to harness a frozen pretrained diffusion model to automatically learn the conceptual tokens. Using the learned conceptual tokens, we can regenerate the extracted concepts with high quality.
Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News | Massachusetts Institute of Technology
“Cool demo by @mervenoyann for real-time object tracking with RT-DETR https://twitter.com/fdaudens/status/1811029049638011000

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.
Be Sure To Read This Week’s Main Post:
This week’s executive overview and top links are here:
AI News #41: Week Ending 07/12/2024 with Executive Summary and Top 58 Links
The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.
- Agents/Copilots
- Amazon
- Apple
- Artificial General Intelligence (AGI)
- Augmented and Virtual Reality (AR/VR)
- Autonomous Vehicles
- AI Audio
- Business and Enterprise AI
- Chips and Hardware
- Consumer Products
- Education
- Ethics/Legal Security
- Images/Photos
- International AI News
- Locally Run AI Models
- Mobile
- Meta
- Microsoft
- OpenAI
- Open Source
- Podcasts/YouTube
- Publishing and News
- Retrieval-Augmented Generation (RAG) News
- Robots and Embodiment
- Safe Intelligence, Inc.
- Science and Medicine
- Video
- Vision/Multimodality
- X/Twitter/Grok
- Tech and Development
Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive/
- Andrej Karpathy: https://x.com/karpathy
- Brett Adcock: https://x.com/adcock_brett
- Florent Daudens: https://x.com/fdaudens
- Ate-a-Pi: https://x.com/8teAPi
- Francesco Marconi: https://x.com/fpmarconi
- Charlie Beckett: https://x.com/CharlieBeckett
For previous issues, please visit the archives!

Thanks for reading!





Leave a Reply