Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, keep the exact compositional architecture with subject dominating left third in close-crop, deep blue-purple cinematic lighting, wispy smoke bleeding rightward, and right two-thirds open for text overlay, but replace the human figure with a VR headset resting at a slight angle toward camera, its lenses catching glitter and purple light reflections, strap trailing into atmospheric haze, maintaining the post-party melancholy and emotional gravity of the original with thin all-lowercase white category name on the right.
People are asking what’s the difference between Falcon Perception and SAM3, so here’s my opinion: SAM3:
https://t.co/KVRbuHm8H1 Falcon Perception:
https://t.co/QDgMlOBvDH First, sam3 does “”promptable concept segmentation””: simple noun phrases (like “”yellow bus””, “”red apple””) +
https://x.com/dahou_yasser/status/2041474094252933195
Today we’re releasing WildDet3D–an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵
https://x.com/allen_ai/status/2041545111151022094
kays on X: “I noticed there wasn’t anything like this out there, so I wrote a tiny visual blog for those wanting to introduce themselves to Dynamic Gaussian Splatting and their current methods 🖼️ Feel free to check out, these are some of the visuals taken from it https://t.co/6W2qx2yI1K” / X
https://x.com/pabloadaw/status/2041650303804555278
We’re excited to be rolling out two model updates today! Marble 1.1: Improves lighting and contrast, with a major reduction in visual artifacts. Marble 1.1-Plus: Our new model built for scale. Create larger, more complex environments than ever before.
https://x.com/theworldlabs/status/2041554646561677701
I showed you SAM 3 all week. This is a 0.6B model that outperforms it. Falcon Perception. Type “”detect the plane”” and it segments every plane in the frame. Pixel-accurate masks from natural language. Fighter jets. Fire. Crowds. All on a MacBook via MLX. No cloud.
https://x.com/MaziyarPanahi/status/2040776481673281936
i built this 4d reconstruction of iran’s chokehold on the world’s oil. almost nothing is getting through the strait of hormuz. you can clearly see the path ships took before/after the blockade, and can even detect & track “”dark vessels”” getting through. 0:00 – clearly see the
https://x.com/bilawalsidhu/status/2040627050864574759
I turned on ship tracking in God’s Eye View and watched the Strait of Hormuz go dark. Ship crossings went from hundreds a day to a handful. Some days, none. You can see the new route clearly — Iran’s tightly run “”toll booth.”” Built dark vessel detection to track ships going
https://x.com/bilawalsidhu/status/2040198848644083958
That time when DARPA turned the whole atmosphere into a sensor
https://x.com/bilawalsidhu/status/2041696646124171647
This is the most important step if you want to join the radiance field club
https://x.com/Almorgand/status/2041844305468284945
Falcon Perception is unbelievable! Look at the demo video!
https://x.com/ivanfioravanti/status/2040886300971004270
JRM: Joint Reconstruction Model for Multiple Objects without Alignment”” TL;DR: jointly reconstructs objects from unaligned observations using a 3D flow-matching model, removing the need for explicit alignment
https://x.com/Almorgand/status/2040048419993985103
Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models”” TL;DR: injects vision-language knowledge into diffusion-based 3D generation to make unseen regions controllable and semantically consistent
https://x.com/Almorgand/status/2040420958532514067
PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models”” TL;DR: diffusion pipeline for scalable generation of photorealistic human data with 3D annotations
https://x.com/Almorgand/status/2040096997470843366
StereoVGGT: A Training-Free Visual Geometry Transformer for Stereo Vision”” TL;DR: adapts a pretrained 3D-aware transformer to stereo vision with a training-free pipeline, achieving SOTA performance on KITTI
https://x.com/Almorgand/status/2041569246883332385
Subtle hand held camera shake will convince people a 3d game render is real life footage. No generative ai required.
https://x.com/bilawalsidhu/status/2041643400433201384
PoseDreamer: Scalable Photorealistic Human Data Generation with Diffusion Models
https://prosperolo.github.io/posedreamer/
We always need more visuals! Checkout this on for dynamic gaussian splatting
https://x.com/Almorgand/status/2041773431524302968
Robots can now reconstruct 3D scenes in real time from a single RGB camera. [📍 Projects page + paper] No depth sensor. No retraining. 30 FPS. Researchers at the Imperial College London introduced KV-Tracker, a training-free method that makes heavy models like π³ and Depth
https://x.com/IlirAliu_/status/2041062366025031787
Researchers just taught a robot to play tennis. From just clips of a few amateur players performing basic forehands, backhands, and shuffles… …a robot learned one of the fastest, most coordinated physical skills there is. Insane!
https://x.com/rowancheung/status/2040085788256506190
Robotics pre-training *from scratch* has been a heretical idea for the last two years. That “there’s no internet of robotics data” has led to two prevailing conclusions: 1) we need to use pretrained model backbones and 2) we need to scale robotics data. The first conclusion in
https://x.com/xiao_ted/status/2041547335935853025
Avatar V: Scaling Video-Reference Avatar Generation
https://www.heygen.com/research/avatar-v-model





Leave a Reply