Ethan B. Holland

Over 51,300 manually organized AI links and counting

Video: AI News Week Ending 03/06/2026

March 6, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: A classic 35mm film camera freefalling through a crisp blue sky in bright daylight, film reels unspooling behind it like ribbons, shot from a wide angle aerial perspective with the ground visible far below, the word VIDEO in bold vintage movie title typography prominently displayed across the top, clean composition, dynamic action photography, joyful and optimistic mood despite the freefall.

Introducing Cinematic Video Overviews, the next evolution of the NotebookLM Studio. Unlike standard templates, these are powered by a novel combination of our most advanced models to create bespoke, immersive videos from your sources. Rolling out now for Ultra users in English!”” https://x.com/NotebookLM/status/2029240601334436080

[2603.02049] WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories https://arxiv.org/abs/2603.02049

3d object tracking is soooo much easier these days grab your video and use meta’s sam 3 to segment an object, then sam 3d to turn it into a 3d model then use geotracker in blender to track it across a video and now you’re ready to rip and make some vfx”” https://x.com/bilawalsidhu/status/2027465634578153825

Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections”” TL;DR: uses mirror reflections as auxiliary virtual views to generate stereo cues and improve 3D reconstruction from a single image.”” https://x.com/Almorgand/status/2027053866487804261

RefVFX: Tuning-free Visual Effect Transfer across Videos”” TL;DR: a feed-forward model that learns to transfer complex temporal video effects (lighting, transformations, dynamics) from a reference video to a target video/image without tuning or prompts.”” https://x.com/Almorgand/status/2028536745814278624

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction”” TL;DR: test-time training layer compresses long image contexts into fast weights and enables linear-complexity autoregressive 3D reconstruction with explicit outputs like Gaussian splats.”” https://x.com/Almorgand/status/2027440718008995995

UniLight: A Unified Representation for Lighting”” TL;DR: learns a shared latent space for lighting from text, images, irradiance maps, and environment maps to enable cross-modal retrieval and lighting control.”” https://x.com/Almorgand/status/2029241617110962452

Video world models today have a very limited context length. Mode Seeking meets Mean Seeking (MMM) unlocks long-context, persistent video world models through a unified representation. 1/8 🧵”” https://x.com/GordonWetzstein/status/2029054374459376026

π³: Permutation-Equivariant Visual Geometry Learning”” TL;DR: breaks reference-view reliance by learning permutation-equivariant camera poses and dense point maps from unordered images for robust 3D reconstruction.”” https://x.com/Almorgand/status/2027430884782072233

We present a research preview of Self-Flow: a scalable approach for training multi-modal generative models. Multi-modal generation requires end-to-end learning across modalities: image, video, audio, text – without being limited by external models for representation learning.”” https://x.com/bfl_ml/status/2029212134023020667