SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views”” TL;DR: feed-forward framework for 3DGS from sparse unposed views; predicts Gaussians + poses, enforces geometry via reprojection, SOTA novel view synthesis, even in extreme settings.”” https://x.com/Almorgand/status/1970910944948781195
Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model https://x.com/_akhaliq/status/1970949464606245139
3d gaussian splatting is fucking cool. And now you can capture real world spaces just by walking around in your meta quest headset. The real-time feedback ensures you don’t miss a spot. Apple needs to get on this asap: https://x.com/bilawalsidhu/status/1968522141273329847
feeling really bad for the Meta OS team https://x.com/nearcyan/status/1968473003592990847
Happy to see a failed live demo 100/100 times rather than a BS scripted demo Making new technology is hard Having to demo it live takes balls Big props to Meta for giving it a shot 👏”” / X https://x.com/mrdbourke/status/1968506328613347797
I unironically think this is good for meta team. They managed to * prove that generally their live demo are not fake * lower the expectation for meta products, so next time they deliver banger it will look like massive improvements”” / X https://x.com/cloneofsimo/status/1968484339416453344
Meta AI’s live demo failed for the entire minute 😢 https://x.com/nearcyan/status/1968468841786126476
Meta just unveiled AI glasses with a built-in display, controlled by a band that reads muscle signals. I sat down with Mark Zuckerberg to cover how these glasses could replace your phone, superintelligence, the metaverse, and more. 0:00 Intro 1:07 Meta’s new glasses revealed https://x.com/rowancheung/status/1968476034518630607
Meta Ray-Ban Display AI shades with on-screen display looks sick. 2% light leakage so people won’t see the display Gesture control with EMG wristband Hitting the US shelves by Sep 30 https://x.com/minchoi/status/1968744103157313799
The Meta Raybans thing is very cool regardless of live demo failures”” / X https://x.com/aidangomez/status/1968609969848164641
This is what meta hyperscape can do with a few minutes capture off a $400 quest 3. Some of the cleanest splats I’ve seen. Meanwhile apple releasing a dozen canned environments for the $3500 vision pro like it’s a big deal. https://x.com/bilawalsidhu/status/1970830926549766296
wow, a live demo of silently writing a message with Meta neural band on the Meta Ray-Ban Display, pretty cool https://x.com/iScienceLuvr/status/1968471538350583993
Meta scrapping Unity to build their own game engine (Horizon Engine) is really interesting. I doubt it has as much to do with the Unity tax and more so to allow them to vertically integrate with all their own layers of ~SOTA AI starting with gaussian splatting”” / X https://x.com/nearcyan/status/1968475789021852075
OpenAI might also be developing AI glasses, a voice recorder, and a pin | The Verge https://www.theverge.com/news/781854/openai-chatgpt-hardware-rumors-smart-speaker-glasses-pin
Skild AI’s omni-bodied brain, trained on 100,000 diverse simulated robots for 1000 years, enables remarkable real-world adaptability. In-context adaptation allows the brain to discern the robot form and adapt to extreme changes like chopped limbs or walking on stilts. https://x.com/TheHumanoidHub/status/1970981739200909811
Multi-camera shot generation will be a button in every video editor https://x.com/bilawalsidhu/status/1970018366124618077
This is Ray3. The world’s first reasoning video model, and the first to generate studio-grade HDR. Now with an all-new Draft Mode for rapid iteration in creative workflows, and state of the art physics and consistency. Available now for free in Dream Machine. https://x.com/LumaLabsAI/status/1968684330034606372
Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies • https://x.com/arankomatsuzaki/status/1971042970800701809
What used to take hours in After Effects now takes just ONE prompt. Nano Banana, Seedream 4, Wan 2.2, Runway Aleph et al are pioneering instruction-based editing — collapsing complex VFX pipelines into a single, implicit step. Here’s everything you need to know in 10 mins: https://x.com/bilawalsidhu/status/1970915228536947026
Marble from World Labs is so close to magic. You can imagine a space, generate it and then walk around it in your VR Headset. This was recorded with my Quest 3 and the potential is mind-blowing. https://x.com/TomLikesRobots/status/1970430493033464175
The Hyperscape Capture on Quest 3 is as impressive as it looks in the demo, although maybe a bit of blur with very fast head movements? I’ve just downloaded the software (thanks US VPN!) and have had a look around Gordon Ramsay’s kitchen. Very cool. I’ll be capturing my own when https://x.com/TomLikesRobots/status/1968647034589585686
Whether it’s AR glasses or AI wearables, I want Jarvis, not clockwork orange. My hope is that ambient computing actually takes off – so it’s there when you need it, and it disappears when you don’t. Versus opening up another surface area to hijack our attention and perpetually”” / X https://x.com/bilawalsidhu/status/1969274326798254107
First, the internet crawled so that AI can run. Now, AR glasses will crawl so that robotics can run. https://x.com/bilawalsidhu/status/1968585113706332544
i found a ‘real’ recording (rare because difficult to capture with a camera) one thing i underestimated was realizing you can do the gestures behind your back, under your covers laying in bed, etc (as this is rarely done in a demo). very cool https://x.com/nearcyan/status/1968581348706189726
the bracelet is ON lets go https://x.com/nearcyan/status/1968467271694549111
The chorus of “who actually needs this?” for smart glasses is the same script we heard when smartwatches were pointless and smart speakers were creepy. Now they’re two of the most ubiquitous and massive categories in consumer tech.”” / X https://x.com/bilawalsidhu/status/1968655341278597477
The most reluctant product endorsement in tech journalism history https://x.com/bilawalsidhu/status/1968543441899389395
what do you guys think ppl will do with this? https://x.com/nearcyan/status/1968502999854235864
VisualMimic enables humanoid loco-manipulation via sim-to-real. Using egocentric vision, a low-level keypoint tracker and high-level policy achieve zero-shot transfer for tasks like box pushing, generalizing across locations and lighting conditions. https://x.com/TheHumanoidHub/status/1970945814840410494
Visualizing global weather patterns over a 24 hour day https://x.com/bilawalsidhu/status/1970167825479463134
Introducing: Hyperscape Capture 📷 Last year we showed the world’s highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. https://x.com/JonathonLuiten/status/1968474776793403734
Meshcapade can now pull apart both 3D camera tracking + human pose estimation data. Effectively like Wonder Dynamics / Autodesk Flow Studio at this point. Useful AI model for 3D artists and as an input into video-to-video workflows. https://x.com/bilawalsidhu/status/1969458783480135711
We’re now moving beyond models that react to single instructions and creating systems that can truly tackle problems in a general way – on the path towards solving AGI in the physical world. Developers can now use Gemini Robotics-ER 1.5 via the Gemini API in @GoogleAIStudio. https://x.com/GoogleDeepMind/status/1971243970953879643
(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://x.com/syhw/status/1970960837721653409
The guided 3d scanning experience in meta’s new hyperscape app is world class. You capture in 2 stages — get a lay of the land first, then get up close for macro level detail. Watch the real-time feedback in action below: https://x.com/bilawalsidhu/status/1969799126109020469
Microsoft introduces Latent Zoning Network (LZN) A unified principle for generative modeling, representation learning, and classification. LZN uses a shared Gaussian latent space and modular encoders/decoders to tackle all three core ML problems at once! https://x.com/HuggingPapers/status/1970218823140687885
Most humanoid trackers break when reality pushes back. ❗️Not this one: Uneven terrain, external forces, or sudden changes throw them off. It is a two-stage RL framework that tracks diverse humanoid motions while adapting online to real-world disturbances. ✅ AnyTracker: a https://x.com/IlirAliu_/status/1969458971321782664
Why can robots do backflips but still struggle to open a drawer??? [📍 Link to project] Precise grasping and whole-body coordination make it harder than acrobatics. DreamControl takes a step toward solving this. It combines diffusion models and reinforcement learning to teach https://x.com/IlirAliu_/status/1970539603368042823
Ran a world model robot learning reading group alongside @djkesu1 this Thursday, covering V-JEPA and V-JEPA 2 by @AIatMeta, here are some of my thoughts: JEPA is an interesting idea and borrows many ideas from the representation learning literature (a combination of SimCLR, https://x.com/stevengongg/status/1969387819920736396
Towards a Physics Foundation Model Proposes GPhyT (General Physics Transformer), a large transformer trained on 1.8 TB of simulation data across fluid flows, shock waves, heat transfer, and multiphase dynamics. Here are a few key notes: https://x.com/omarsar0/status/1968681177189077366




