a realistic baroque oil painting in an ornate gilded frame, depicting a deconstructed scene with a variety of image and audio tools, a small, elegant nameplate at the bottom of the fame is engraved with the title: “Multimodal” –chaos 20 –ar 4:3 –style raw –personalize t9u6ckr –v 6.1

“French startup Kyutai just introduced Moshi, an open-sourced ‘real-time’ AI voice assistant. It’s capable of responding to a range of emotions and styles in a similar fashion to OpenAI’s Voice Mode 

“SenseTime also revealed SenseNova 5o, a real-time multimodal model capable of processing audio, text, image, and video. Here’s a video of a live demonstration of SenseTime 5o in action (it’s incredibly similar to the GPT-4o demo) 

“🚨 Chinese AI company SenseTime just revealed SenseNova 5.5, an AI model that claims to beat GPT-4o across key metrics Plus, big developments from Apple, YouTube, KLING, Neuralink, and Google DeepMind. Here’s everything going on in AI right now:” / X

“At the World Artificial Intelligence Conference (WAIC) in Shanghai this weekend, SenseTime unveiled SenseNova 5.5. The company claims the model outperforms GPT-4o in 5 out of 8 key metrics. While I’d take it with a grain of salt, China’s AI startups are showing major progress 

“Vision language models can see We introduce a new benchmark named Avocado360, and evaluate four arbitrarily selected VLMs on this benchmark. We show, for the first time ever, that VLMs can determine whether or not a given image contains an avocado.” / X

Deploying ML for Voice Safety

OpenAI

“Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇 

“Game-changer alert: Navigate your video by clicking transcribed words with Whisper Timestamped! 🚀 Key features: – Multilingual transcription (35+ languages) – Click any word to jump to that moment in the video – Works with audio & video files – 100% browser-based for total 

Segmentation

ConceptExpress: Unsupervised Concept Extraction (UCE): We focus on the unsupervised problem of extracting multiple concepts from a single image. Given an image that contains multiple concepts, we aim to harness a frozen pretrained diffusion model to automatically learn the conceptual tokens. Using the learned conceptual tokens, we can regenerate the extracted concepts with high quality.

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News | Massachusetts Institute of Technology

“Cool demo by @mervenoyann for real-time object tracking with RT-DETR https://twitter.com/fdaudens/status/1811029049638011000

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #41: Week Ending 07/12/2024 with Executive Summary and Top 58 Links 

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.

For previous issues, please visit the archives!

Thanks for reading!

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading