Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, keep the exact crate construction with horizontal dark reddish-brown wooden slats, iron hardware, weathered paint texture, and three-panel layout with hand-painted black stencil lettering, but replace the address text with ‘AR/VR’ in the same loose brushstroke style. Place the crate on a weathered wooden dock with still water visible behind, early spring light raking across the wood grain. Rest a pair of vintage 1950s stereoscope viewers on top of the crate, their metal and bakelite body catching soft light, lenses facing slightly toward camera, suggesting new ways of seeing after a long journey.
AI at Meta on X: “Today we’re introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people https://t.co/vRoVj8gP4j” / X
https://x.com/AIatMeta/status/2037153756346016207
AI at Meta on X: “Without any retraining, TRIBE v2 can reliably predict the brain responses of individuals it has never seen before, achieving a nearly 2-3x improvement over previous methods for both movies and audiobooks We’re releasing the model, codebase, paper, and demo to help researchers https://t.co/GcqZUPC2br” / X
https://x.com/AIatMeta/status/2037153758455750717
TRIBE v2
https://aidemos.atmeta.com/tribev2/
Ai2 just released MolmoPoint GUI on Hugging Face A specialized VLM for GUI automation that points using grounding-tokens instead of coordinates, reaching 61.1 on ScreenSpotPro.
https://x.com/HuggingPapers/status/2036101402477404284
Versatile Editing of Video Content, Actions, and Dynamics without Training https://dynaedit.github.io/
Versatile Editing of Video Content, Actions, and Dynamics without Training”” TL;DR: Enables temporally consistent editing of dynamic scenes while preserving motion and avoiding frame-to-frame artifacts.
https://x.com/Almorgand/status/2035058325830701509
360 camera rigs are amazing to capture 3d gaussian splats like this the virtual camera render with the shaky, handheld feel really sells it now only to bring these static scenes to life with dynamic entities
https://x.com/bilawalsidhu/status/2035339640710783117
Your WiFi Can See You. Here’s How.
https://x.com/bilawalsidhu/status/2036240745019986256
Pokémon Go players captured 30 billion images and built one of the most detailed 3D maps in the world. Niantic just licensed it to train delivery robots. I actually sat down with their CTO, co-creator of Google Earth: https://t.co/PYcEKQb0Fc What he described makes the picture a
https://x.com/bilawalsidhu/status/2035420415208931425
AHOY: Audio-Visual Object Localization in Egocentric Videos TL;DR: Aligns audio and vision to localize sounding objects in first-person videos using cross-modal learning in dynamic, real-world environments.
https://x.com/Almorgand/status/2034671876757246322
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing TL;DR: Removes objects and their visual effects (shadows, reflections) in videos via a joint removal-insertion framework with strong temporal consistency.
https://x.com/Almorgand/status/2036812865600987513
We’re really gonna have personalized AI videos — generated just-in-time — moments before you actually scroll to them on your feed. Plug in an algo like tiktok that can reverse engineer your soul, and good lord… you’ve got mountain dew straight to the vein.
https://x.com/bilawalsidhu/status/2034485077917253824




Leave a Reply