Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic shot of person wearing modern VR headset in darkened room, with the glowing Emerald City from Wicked appearing as layered holographic segments before them, each building and spire outlined with bright green object segmentation boundaries, moody theatrical lighting, split composition showing both real and virtual worlds, dramatic depth of field.

For AI to be able to help humans in the physical world, we need systems that can understand and simulate the universe. To exponentially accelerate Luma’s path to Multimodal AGI we are building a 2GW compute cluster with Humain and we have raised a $900M Series C. I am incredibly”” / X https://x.com/gravicle/status/1991202746871988680

Meta just dropped SAM 3D, but more interestingly, they basically cracked the 3D data bottleneck that’s been holding the field back for years. Manually creating or scanning 3D ground truth for the messy real world is basically impossible at scale. But what if you just have https://x.com/bilawalsidhu/status/1991237143898017854

Introducing SAM 3D: Powerful 3D Reconstruction for Physical World Images https://ai.meta.com/blog/sam-3d/

SAM 3D enables accurate 3D reconstruction from a single image, supporting real-world applications in editing, robotics, and interactive scene generation. Matt, a SAM 3D researcher, explains how the two-model design makes this possible for both people and complex environments. https://x.com/AIatMeta/status/1991605451809513685

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve https://x.com/AIatMeta/status/1991184188402237877

We’re sharing model checkpoints, an evaluation benchmark, human body training data, and inference code with the community to support creative applications in fields like robotics, interactive media, science, sports medicine, and beyond. 🔗 SAM 3D Body: https://x.com/AIatMeta/status/1991184190323212661

Meta AI Demos https://aidemos.meta.com/segment-anything

Introducing Meta Segment Anything Model 3 and Segment Anything Playground https://ai.meta.com/blog/segment-anything-model-3/

SAM-3 is out on @huggingface! A big upgrade from SAM-2, and Meta finally added support for text prompts. Here I tried it out on @hazardeden10’s magical goal against @Arsenal using the text prompt “”Chelsea player”” Works pretty well! https://x.com/NielsRogge/status/1991213874687758799

Collecting a high quality dataset with 4M unique phrases and 52M corresponding object masks helped SAM 3 achieve 2x the performance of baseline models. Kate, a researcher on SAM 3, explains how the data engine made this leap possible. 🔗 Read the SAM 3 research paper: https://x.com/AIatMeta/status/1991640180185317644

SAM3 video tracking is so good yesterday: collect data, train custom object detector, use tracker to estimate object motion – days today: track anything with text prompt – seconds https://x.com/skalskip92/status/1991232397686219032

We’ve partnered with @Roboflow to enable people to annotate data, fine-tune, and deploy SAM 3 for their particular needs. Try it here: https://x.com/AIatMeta/status/1991191530367799379

SAM 3 tackles a challenging problem in vision: unifying a model architecture for detection and tracking. Christoph, a researcher on SAM 3, shares how the team made it possible. 🔗 Read the SAM 3 research paper: https://x.com/AIatMeta/status/1991538570402934980

SAM3 is open-source model. You can use the models in commercial. You can modify or fine tune. You keep ownership of your modifications. You do not need to release your source code.”” / X https://x.com/skalskip92/status/1991626755782877234

Today we are releasing & open-sourcing Segment Anything 3 (SAM 3). It is a state-of-the-art model for image & video segmentation, and builds upon the work of SAM & SAM 2. SAM3 will also power features in Edits, Meta AI, & Facebook Marketplace soon. https://x.com/alexandr_wang/status/1991198465628459494

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: https://x.com/AIatMeta/status/1991178519557046380

We release following ⤵️ > video segmentation demo with visual/concept prompting ⏯️ https://x.com/mervenoyann/status/1991182168161136684

AI can now create AND explore 3D worlds. World models and agentic AI are on a collision course. World Labs is making world-building effortless. Google DeepMind’s SIMA-2 is making agency inside those worlds possible. Together, they hint at a new paradigm–AI that both creates https://x.com/bilawalsidhu/status/1990994808626950579

Google DeepMind has introduced SIMA 2, a reasoning, conversational AI agent for 3D worlds including games and generative world-model scenes. – Handles complex goals, explains steps, supports multilingual/emojis for collaborative play. – Adapts to real-time generated 3D worlds https://x.com/TheHumanoidHub/status/1989424462085960082

Google DeepMind’s SIMA 1 vs SIMA 2 The bitter lesson continues to be bitter sweet https://x.com/bilawalsidhu/status/1989001120849735898

NVIDIA researchers present SONIC, a generalist humanoid controller: It scales motion tracking on a single policy to achieve natural, robust whole-body movement. The scalable foundation avoids manual reward engineering and features a universal token space and kinematic planner to https://x.com/TheHumanoidHub/status/1989409669983736306

If you work with robotics, AV, or 3D vision, this update will save you months of engineering. Most models need complex engineering to get reliable 3D geometry. This one does it with a plain transformer. Depth Anything 3 is the new model from @BytedanceTalk that predicts stable, https://x.com/IlirAliu_/status/1989622721366446190

Depth Anything 3 proves most 3D vision research has been overengineering the problem. Vanilla DINOv2 transformer + depth-ray pairs crushes SOTA by 44% on pose, 25% on geometry. One approach for SOTA monocular depth, multi-view geometry, pose estimation, and novel view synthesis”” / X https://x.com/bilawalsidhu/status/1989444908357488832

ByteDance-Seed/Depth-Anything-3: Depth Anything 3 https://github.com/ByteDance-Seed/Depth-Anything-3

Depth Anything 3 is here! It’s a beefy one! https://x.com/Almorgand/status/1989370456131215514

After a year of team work, we’re thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 https://x.com/bingyikang/status/1989358267668336841

Damn. DeepMind’s generalist AI agent SIMA 2 evolved from basic instruction-following to actual reasoning companion. Uses vision and keyboard/mouse like a human player, works across dozens of games without touching game code. The robotics angle is obvious – if you can generalize https://x.com/bilawalsidhu/status/1988986033669828985

Researchers are exploring Marble’s generative 3D worlds as a way to rapidly produce simulation-ready environments for robotics without manual scene construction. https://x.com/theworldlabs/status/1991918801714332137

⚡ FlashWorld: High-quality 3D Scene Generation within Seconds TL;DR: “”FlashWorld enables fast (5~10 seconds on a single GPU) and high-quality 3D scene generation across diverse scenes, from a single image or text prompt.”” https://x.com/Almorgand/status/1988977382003470675

Hand-controlled hologram boids: Most people have seen hologram tricks. Very few know you can build one that reacts to your hand for about 100 dollars. This demo shows a hand-controlled boids simulation: a small flock of digital particles that moves based on your gestures. No https://x.com/IlirAliu_/status/1989259566740054065

What was a complex hacky pipeline in 2023 to take indoor 3d scans and reskin them to different types of decor is now just a few clicks in 2025. World labs marble has collapsed a lot of the complexity involved in generating and editing 3d worlds: https://x.com/bilawalsidhu/status/1988958359412961743

Simulating the Visual World with Artificial Intelligence: A Roadmap https://world-model-roadmap.github.io/

Gaussian splatting training at 200 iterations or Space Telescope capture? https://x.com/Almorgand/status/1988919187381653849

OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication https://humanaigc.github.io/omnitalker/

We’re advancing on-device AI with ExecuTorch, now deployed across devices including Meta Quest 3, Ray-Ban Meta, Oakley Meta Vanguard and Meta Ray-Ban Display. By eliminating conversion steps and supporting pre-deployment validation in PyTorch, ExecuTorch accelerates the path https://x.com/AIatMeta/status/1991901746579509542

Pretty cool hack to blend between different video feeds to give you the feeling of free viewpoint video AKA. god’s eye view. TL;DR 36 cameras deployed at basketball & badminton venues for China’s National Games, letting viewers drag around on their phones for different angles https://x.com/bilawalsidhu/status/1989362893243154501

Robot tiles that let you walk forever in VR Researchers at the University of Tsukuba built a tile that moves under your feet as you walk. It uses sensors to read your gait and predict where your next step will land. ✅ Each tile slides into place before your foot touches down. https://x.com/IlirAliu_/status/1989407343591800921

Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. – Ultra long-horizon tasks – Zero-shot generalization – Advanced dexterity 🧵-> https://x.com/tonyzzhao/status/1991204839578300813

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading