Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Sophisticated damask wrapping paper pattern featuring layered Victorian architectural elements in deep navy and burgundy with translucent metallic gold AR overlay effects creating elegant double-exposure, ornate frames and classical details repeating seamlessly, AR/VR monogram woven into decorative cartouches, premium embossed paper texture, museum-quality gift wrap design in traditional holiday colors with subtle holographic shimmer suggesting digital augmentation.

The dawn of a world simulator https://odyssey.ml/the-dawn-of-a-world-simulator

Visual sim2real: zero-shot deploy to the real world, with zero real data. Trained entirely in Isaac Lab. https://x.com/DrJimFan/status/2003879976173818298

Apple dropped 2D-to-3D conversion just in time for the entire Epstein catalog to be converted into 3D Gaussian splats 💀 https://x.com/bilawalsidhu/status/2002809948229783664

DARPA turned the Earth’s atmosphere into a planetary-scale sensor with Project AtmoSense. By modeling atmospheric waves across 6 orders of magnitude in 3D, they cracked an “”impossible”” physics problem. During tests in New Mexico, the system was sensitive enough to detect SpaceX https://x.com/bilawalsidhu/status/2002056681669447811

I love 3d maps like this because they’re inherently an abstraction of reality. Unlike 3d scans, the goal isn’t to create a 1:1 mirror world. The goal is to create a stylized distillation down to its visual essence, so you can recognize it effortlessly. https://x.com/bilawalsidhu/status/2003337650535825443

MapAnything got an upgrade with stronger weights! Cheers @Nik__V__ and the whole team behind the project! https://x.com/Almorgand/status/2002119543574110602

AI can help explain complex topics easily by throwing together a simulation. As Eric says later in the thread, a newer paper argues that this pattern is actually collider bias (the authors disagree). What is collider bias? Gemini one-shots an explanation: https://x.com/emollick/status/2002820278733386227

腾讯混元 https://hunyuan.tencent.com/motion?tabIndex=0

Soul https://zhangzjn.github.io/projects/Soul/

STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits https://foivospar.github.io/STARCaster/

Animate Any Character in Any World”” TL;DR: users’ provided 3DGS scene along with a 3D or multi-view character -> enabling interactive control of the character’s behaviors and active exploration of the environment through natural language commands https://x.com/Almorgand/status/2003518454280687885

FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision”” TL;DR: transformer-based 3D portrait animation model with learnable data source tokens, so-called bias sinks, which enables unified training across monocular and multi-view datasets. https://x.com/Almorgand/status/2003153695765336468

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos”” TL;DR: canonical frame selection;image to 3D (static 3D Gaussian);set of randomly sampled camera poses to fine-tune a lightweight image2pose estimator; camera pose estimator to optimize a deformable 3d object model https://x.com/Almorgand/status/2001695549259747415

There’s something so magical about turning 2d videos into 4d reconstructions. Every video becomes a spatio-temporal portal back in time – one that you can revisit from any angle. Research like d4rt is turning science fiction into reality; and it’s getting fast enough to run in https://x.com/bilawalsidhu/status/2003698903838003685

Virtually Being : Customizing Camera-Controllable Video Diffusion Models with MultiView Performance Captures TLDR: multiview character consistency; 3D camera control in video diffusion models; character trained via 4DGS,lighting variability obtained with a video relighting model https://x.com/Almorgand/status/2002069630622507504

Researchers proposed Sample-Efficient Modality Integration (SEMI), which plugs any pretrained encoder (image, audio, video, sensors, graphs) into an LLM using one projector plus LoRA adapters generated from a handful of paired examples. Trained on data-rich domains, SEMI https://x.com/DeepLearningAI/status/2003593131132916204

3d artists are discovering the power of nano banana pro uv unwrapped textures in one prompt”” / X https://x.com/bilawalsidhu/status/2002757896782934308

did a project w meta quest 3 where i scanned my studio. its pretty rad, you can walk around a look at stuff and get close. i spefically did not clean the place before we scanned it. works on your phone and on the headset – https://x.com/Casey/status/2002051616455975186

A great visual positioning system makes augmented reality feel like magic… this tech helps your phone (or robot) figure out precisely where it is in 3D space. Here’s MultiSet AI nailing it in real-time, at night, and on-device too. This new reveal shader animation shows just https://x.com/bilawalsidhu/status/2001858275738890348

Most robot foundation models still learn physics the hard way. From robot data only. This paper takes a different path. mimic-video uses large-scale internet video to learn motion and physical dynamics first, then maps that into robot actions. • Policies are grounded in https://x.com/IlirAliu_/status/2003100997065802044

Generative Refocusing: Flexible Defocus Control from a Single Image”” TL;DR: two-step process; DeblurNet to recover all-in-focus images from various inputs+BokehNet for controllable bokeh;semi-supervised training. https://x.com/Almorgand/status/2003140933223919815

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading