Flux[dev]: A surreal machine with letters and text going into it and music notes coming out. The machine is operated by a futuristic robot, sleek smooth humanoid design. Smooth, glossy black faceplate with no visible facial features, high-tech, minimalist appearance. The robot’s body is matte black or dark gray, with articulated joints and mechanical parts that resemble those of a human, including fingers. In the background, “Multimodal” is written in fog.

Zuckerberg touts Meta’s latest video vision AI with Nvidia CEO Jensen Huang | TechCrunch

AI can see what’s on your screen by reading HDMI electromagnetic radiation | TechSpot – https://www.techspot.com/news/104015-ai-can-see-what-screen-reading-hdmi-electromagnetic.html

OpenAI invests in a webcam company turned AI startup. – The Verge

“OpenAI is leading a $60 million investment in Opal, which sells $300 professional webcams and plans to develop other types of devices powered by OpenAI’s AI models. w/ @steph_palazzolo

“Want a robot to assist you in the kitchen **without any instructions** simply by watching you?🤖🏠 🚀 Presenting our recent paper on action anticipation from short video context for human-robot collaboration, accepted at Robotics and Automation Letters (RA-L).

“Although there is probably too much AI hype these days, I am excited about my Ray-Ban smart glasses for many reasons (e.g., listening to music, live streaming, image capture etc.). The “killer app” is that these glasses are now powered by Meta’s Llama AI model! 1/3

“Idefics3-Llama is out! 💥 It’s a multimodal model based on Llama 3.1 that accepts arbitrary number of interleaved images with text with a huge context window (10k tokens!) 😍 Link to demo and model in the next one 😏

“This is clever. A diffusion model picks up features in common across datasets and we can use that to find subtle visual patterns. It identifies geographies like a geo-guesser (utility poles, bollards), decades by eye glasses shape & fashion trends, and shows promise for medicine

“MiniCPM V 2.6 is out! 🤩 A VLM marrying SigLIP 400M 🤝🏻 Qwen2-7B 💪🏻 Outperforms proprietary models on OpenCompass benchmarks and video benchmarks 🎬 Accepts multiple images, videos, can do in-context learning I will unpack it with details once 2.6 technical report is out 😊

Segmentation

“Our SAM 2 pod with @nikhilaravi is out! Fun SAM1 quote from guest cohost @josephofiowa: “I recently pulled statistics from the usage of SAM in @RoboFlow over the course of the last year. And users have labeled about 49 million images using SAM on the hosted side of the RoboFlow

“SAM 2 from Meta FAIR is the first unified model for real-time, promptable object segmentation in images & videos. Using the model in our web-based demo you can segment, track and apply effects to objects in video in just a few clicks. Try SAM 2 ➡️

“Grounded SAM 2: Ground and Track Anything in Videos 

“Computer vision + Journalism + #Olympics = 😍 – The @nytimes used computer vision to detect the positions of the athletes on photos taken every 100ms – Speeds were then computed by combining their positions and the timestamp of each photograph – Manual verification and

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading