Image created with OpenAI GPT-Image-1. Image prompt: over-the-top 1990s pro-wrestling promo poster, virtual grid arena featuring “AR-V.R. Vortex” wearing neon visor and pixelated gauntlets; hologram crowd effects, grainy print texture, vivid neon titles

Capturing reality is a damn near superpower. Pretty cool to see how much Veo 3 understands 3d mapping concepts — including geometry types, terrain maps, camera poses, detections, trajectories etc. https://x.com/bilawalsidhu/status/1947002004275904537

Train and deploy robot skills in the cloud… with just one click. Gr00t‑n1.5 is now live on Phosphobot, making training and inference simpler than ever. Example prompt: “”Grab food and place into bowl.”” Tips for better results ✅ Record longer episodes (~30–40s) ✅ Target https://x.com/IlirAliu_/status/1947721603082817884

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
https://clift-nvs.github.io/

Data-efficient and Accurate Vision Models from Synthetic Data
https://microsoft.github.io/DAViD/

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data”” TL;DR: Training only on high quality human centric synthetic data; diverse in terms of poses, environments, lighting, and appearances, and not tailored to any specific evaluation set. https://x.com/Almorgand/status/1947669634800398607

Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars https://research.nvidia.com/labs/dair/dream-lift-animate/

How soon until AI can continuously fuse together all sensor data into a persistent 4D model of reality? https://x.com/bilawalsidhu/status/1947474834973131158

Huge. Take any image (real or synthetic) and turn it into a multi-part 3D object using @Scenario_gg. https://x.com/bilawalsidhu/status/1947673321014735099

The value of generating multi-part 3d meshes cannot be overstated — much easier to rig and animate things without a ton of manual work. Loving what scenario has been doing, esp since going back to their 3d roots! https://x.com/bilawalsidhu/status/1946364106606256281

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction https://microsoft.github.io/VoluMe/

Robots learning in simulation, and then mastering the real world. [📍 Bookmark GitHub Repositories] Today’s video shows a full Sim2Real pipeline with the OMY robot. ✅ Compact 6-DoF manipulator built for AI and robotics research ✅ Trained in Isaac Sim, validated in Gazebo, https://x.com/IlirAliu_/status/1946504663349481768

Robots won’t master the real world by training in The Matrix. Real data (not shortcuts like simulation or proxy datasets) is the key to building foundation models that TRULY generalize. @svlevine calls these shortcuts “sporks”… clever but flawed stand-ins that try to https://x.com/IlirAliu_/status/1947199867799122306

Diffusion video models but now – **realtime**! Simple video filters are real-time but can only do basic re-coloring and styles. Video diffusion models (Veo and friends) are magic, but they take many seconds/minutes to generate. MirageLSD is real-time magic. Unlike simple video”” / X https://x.com/karpathy/status/1945979830740435186

RT @DecartAI: Introducing MirageLSD: The First Live-Stream Diffusion (LSD) AI Model Input any video stream, from a camera or video chat to…”” / X https://x.com/_akhaliq/status/1945966720734155079

machine readable model of the world https://x.com/bilawalsidhu/status/1946251146558890445

Now just imagine the chaos of 2500 AI NPCs in Unity 😂 https://x.com/bilawalsidhu/status/1946588729235091697

Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors https://dubbingforeveryone.github.io/

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering”” TLDR: 3D scene as compressed lightfield tokens; adaptive neural rendering with configurable compute budgets; flexible tradeoffs in rendering speed, representation storage, visual quality https://x.com/Almorgand/status/1945840420128256066

Creative upscaling ftw! Turn high altitude aerial photogrammetry into low altitude drone imagery. Such a cool way to frame the shots you want in Google Earth and use a FLUX Kontext LoRA to take it all the way. https://x.com/bilawalsidhu/status/1948035585618427953

Can a simple bird’s-eye view make LiDAR robots navigate more accurately than ever before? [📍Bookmark the paper & code for later] EV-LIO(LC) is a new LiDAR-Inertial Odometry framework that blends Bird’s Eye View (BEV) image representations with point cloud registration and loop https://x.com/IlirAliu_/status/1948077653468029386

Mirage https://mirage.decart.ai/queue?gameId=camera

🎞️MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second”” TL;DR: Feed-forward framework that jointly reconstructs appearance, geometry and motion for 4D scene perception from monocular videos in one second. https://x.com/Almorgand/status/1946251119325274153

HairCUP https://bjkim95.github.io/haircup/

1,700 h training • 1,800 tests • 47,000 sims One model… hundreds of tasks: And it’s faster, with less data! [📍 bookmark the paper] @ToyotaResearch’s Large Behavior Models (LBMs) bring multitask dexterous manipulation closer to reality. Trained on over 1,700 hours of https://x.com/IlirAliu_/status/1946259863564394900

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading