Flux[dev]: A surreal machine with letters and text going into it and music notes coming out. The machine is operated by a futuristic robot, sleek smooth humanoid design. Smooth, glossy black faceplate with no visible facial features, high-tech, minimalist appearance. The robot’s body is matte black or dark gray, with articulated joints and mechanical parts that resemble those of a human, including fingers. In the background, “Multimodal” is written in fog.
Zuckerberg touts Meta’s latest video vision AI with Nvidia CEO Jensen Huang | TechCrunch
AI can see what’s on your screen by reading HDMI electromagnetic radiation | TechSpot – https://www.techspot.com/news/104015-ai-can-see-what-screen-reading-hdmi-electromagnetic.html
OpenAI invests in a webcam company turned AI startup. – The Verge
“OpenAI is leading a $60 million investment in Opal, which sells $300 professional webcams and plans to develop other types of devices powered by OpenAI’s AI models. w/ @steph_palazzolo
“Want a robot to assist you in the kitchen **without any instructions** simply by watching you?🤖🏠 🚀 Presenting our recent paper on action anticipation from short video context for human-robot collaboration, accepted at Robotics and Automation Letters (RA-L).
“Although there is probably too much AI hype these days, I am excited about my Ray-Ban smart glasses for many reasons (e.g., listening to music, live streaming, image capture etc.). The “killer app” is that these glasses are now powered by Meta’s Llama AI model! 1/3
“Idefics3-Llama is out! 💥 It’s a multimodal model based on Llama 3.1 that accepts arbitrary number of interleaved images with text with a huge context window (10k tokens!) 😍 Link to demo and model in the next one 😏
“This is clever. A diffusion model picks up features in common across datasets and we can use that to find subtle visual patterns. It identifies geographies like a geo-guesser (utility poles, bollards), decades by eye glasses shape & fashion trends, and shows promise for medicine
“MiniCPM V 2.6 is out! 🤩 A VLM marrying SigLIP 400M 🤝🏻 Qwen2-7B 💪🏻 Outperforms proprietary models on OpenCompass benchmarks and video benchmarks 🎬 Accepts multiple images, videos, can do in-context learning I will unpack it with details once 2.6 technical report is out 😊
Segmentation
“Our SAM 2 pod with @nikhilaravi is out! Fun SAM1 quote from guest cohost @josephofiowa: “I recently pulled statistics from the usage of SAM in @RoboFlow over the course of the last year. And users have labeled about 49 million images using SAM on the hosted side of the RoboFlow
“SAM 2 from Meta FAIR is the first unified model for real-time, promptable object segmentation in images & videos. Using the model in our web-based demo you can segment, track and apply effects to objects in video in just a few clicks. Try SAM 2 ➡️
“Grounded SAM 2: Ground and Track Anything in Videos
“Computer vision + Journalism + #Olympics = 😍 – The @nytimes used computer vision to detect the positions of the athletes on photos taken every 100ms – Speeds were then computed by combining their positions and the timestamp of each photograph – Manual verification and
![Flux[dev]: A surreal machine with letters and text going into it and music notes coming out. The machine is operated by a futuristic robot, sleek smooth humanoid design. Smooth, glossy black faceplate with no visible facial features, high-tech, minimalist appearance. The robot's body is matte black or dark gray, with articulated joints and mechanical parts that resemble those of a human, including fingers. In the background, "Multimodal" is written in fog.](https://ethanbholland.com/wp-content/uploads/2024/09/multimodal-2.png)




Leave a Reply