SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views”” TL;DR: feed-forward framework for 3DGS from sparse unposed views; predicts Gaussians + poses, enforces geometry via reprojection, SOTA novel view synthesis, even in extreme settings.”” https://x.com/Almorgand/status/1970910944948781195

Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model https://x.com/_akhaliq/status/1970949464606245139

3d gaussian splatting is fucking cool. And now you can capture real world spaces just by walking around in your meta quest headset. The real-time feedback ensures you don’t miss a spot. Apple needs to get on this asap: https://x.com/bilawalsidhu/status/1968522141273329847

feeling really bad for the Meta OS team https://x.com/nearcyan/status/1968473003592990847

Happy to see a failed live demo 100/100 times rather than a BS scripted demo Making new technology is hard Having to demo it live takes balls Big props to Meta for giving it a shot 👏”” / X https://x.com/mrdbourke/status/1968506328613347797

I unironically think this is good for meta team. They managed to * prove that generally their live demo are not fake * lower the expectation for meta products, so next time they deliver banger it will look like massive improvements”” / X https://x.com/cloneofsimo/status/1968484339416453344

Meta AI’s live demo failed for the entire minute 😢 https://x.com/nearcyan/status/1968468841786126476

Meta just unveiled AI glasses with a built-in display, controlled by a band that reads muscle signals. I sat down with Mark Zuckerberg to cover how these glasses could replace your phone, superintelligence, the metaverse, and more. 0:00 Intro 1:07 Meta’s new glasses revealed https://x.com/rowancheung/status/1968476034518630607

Meta Ray-Ban Display AI shades with on-screen display looks sick. 2% light leakage so people won’t see the display Gesture control with EMG wristband Hitting the US shelves by Sep 30 https://x.com/minchoi/status/1968744103157313799

The Meta Raybans thing is very cool regardless of live demo failures”” / X https://x.com/aidangomez/status/1968609969848164641

This is what meta hyperscape can do with a few minutes capture off a $400 quest 3. Some of the cleanest splats I’ve seen. Meanwhile apple releasing a dozen canned environments for the $3500 vision pro like it’s a big deal. https://x.com/bilawalsidhu/status/1970830926549766296

wow, a live demo of silently writing a message with Meta neural band on the Meta Ray-Ban Display, pretty cool https://x.com/iScienceLuvr/status/1968471538350583993

Meta scrapping Unity to build their own game engine (Horizon Engine) is really interesting. I doubt it has as much to do with the Unity tax and more so to allow them to vertically integrate with all their own layers of ~SOTA AI starting with gaussian splatting”” / X https://x.com/nearcyan/status/1968475789021852075

OpenAI might also be developing AI glasses, a voice recorder, and a pin | The Verge https://www.theverge.com/news/781854/openai-chatgpt-hardware-rumors-smart-speaker-glasses-pin

Skild AI’s omni-bodied brain, trained on 100,000 diverse simulated robots for 1000 years, enables remarkable real-world adaptability. In-context adaptation allows the brain to discern the robot form and adapt to extreme changes like chopped limbs or walking on stilts. https://x.com/TheHumanoidHub/status/1970981739200909811

Multi-camera shot generation will be a button in every video editor https://x.com/bilawalsidhu/status/1970018366124618077

This is Ray3. The world’s first reasoning video model, and the first to generate studio-grade HDR. Now with an all-new Draft Mode for rapid iteration in creative workflows, and state of the art physics and consistency. Available now for free in Dream Machine. https://x.com/LumaLabsAI/status/1968684330034606372

Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies • https://x.com/arankomatsuzaki/status/1971042970800701809

What used to take hours in After Effects now takes just ONE prompt. Nano Banana, Seedream 4, Wan 2.2, Runway Aleph et al are pioneering instruction-based editing — collapsing complex VFX pipelines into a single, implicit step. Here’s everything you need to know in 10 mins: https://x.com/bilawalsidhu/status/1970915228536947026

Marble from World Labs is so close to magic. You can imagine a space, generate it and then walk around it in your VR Headset. This was recorded with my Quest 3 and the potential is mind-blowing. https://x.com/TomLikesRobots/status/1970430493033464175

The Hyperscape Capture on Quest 3 is as impressive as it looks in the demo, although maybe a bit of blur with very fast head movements? I’ve just downloaded the software (thanks US VPN!) and have had a look around Gordon Ramsay’s kitchen. Very cool. I’ll be capturing my own when https://x.com/TomLikesRobots/status/1968647034589585686

Whether it’s AR glasses or AI wearables, I want Jarvis, not clockwork orange. My hope is that ambient computing actually takes off – so it’s there when you need it, and it disappears when you don’t. Versus opening up another surface area to hijack our attention and perpetually”” / X https://x.com/bilawalsidhu/status/1969274326798254107

First, the internet crawled so that AI can run. Now, AR glasses will crawl so that robotics can run. https://x.com/bilawalsidhu/status/1968585113706332544

i found a ‘real’ recording (rare because difficult to capture with a camera) one thing i underestimated was realizing you can do the gestures behind your back, under your covers laying in bed, etc (as this is rarely done in a demo). very cool https://x.com/nearcyan/status/1968581348706189726

the bracelet is ON lets go https://x.com/nearcyan/status/1968467271694549111

The chorus of “who actually needs this?” for smart glasses is the same script we heard when smartwatches were pointless and smart speakers were creepy. Now they’re two of the most ubiquitous and massive categories in consumer tech.”” / X https://x.com/bilawalsidhu/status/1968655341278597477

The most reluctant product endorsement in tech journalism history https://x.com/bilawalsidhu/status/1968543441899389395

what do you guys think ppl will do with this? https://x.com/nearcyan/status/1968502999854235864

VisualMimic enables humanoid loco-manipulation via sim-to-real. Using egocentric vision, a low-level keypoint tracker and high-level policy achieve zero-shot transfer for tasks like box pushing, generalizing across locations and lighting conditions. https://x.com/TheHumanoidHub/status/1970945814840410494

Visualizing global weather patterns over a 24 hour day https://x.com/bilawalsidhu/status/1970167825479463134

Introducing: Hyperscape Capture 📷 Last year we showed the world’s highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. https://x.com/JonathonLuiten/status/1968474776793403734

Meshcapade can now pull apart both 3D camera tracking + human pose estimation data. Effectively like Wonder Dynamics / Autodesk Flow Studio at this point. Useful AI model for 3D artists and as an input into video-to-video workflows. https://x.com/bilawalsidhu/status/1969458783480135711

We’re now moving beyond models that react to single instructions and creating systems that can truly tackle problems in a general way – on the path towards solving AGI in the physical world. Developers can now use Gemini Robotics-ER 1.5 via the Gemini API in @GoogleAIStudio. https://x.com/GoogleDeepMind/status/1971243970953879643

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://x.com/syhw/status/1970960837721653409

The guided 3d scanning experience in meta’s new hyperscape app is world class. You capture in 2 stages — get a lay of the land first, then get up close for macro level detail. Watch the real-time feedback in action below: https://x.com/bilawalsidhu/status/1969799126109020469

Microsoft introduces Latent Zoning Network (LZN) A unified principle for generative modeling, representation learning, and classification. LZN uses a shared Gaussian latent space and modular encoders/decoders to tackle all three core ML problems at once! https://x.com/HuggingPapers/status/1970218823140687885

Most humanoid trackers break when reality pushes back. ❗️Not this one: Uneven terrain, external forces, or sudden changes throw them off. It is a two-stage RL framework that tracks diverse humanoid motions while adapting online to real-world disturbances. ✅ AnyTracker: a https://x.com/IlirAliu_/status/1969458971321782664

Why can robots do backflips but still struggle to open a drawer??? [📍 Link to project] Precise grasping and whole-body coordination make it harder than acrobatics. DreamControl takes a step toward solving this. It combines diffusion models and reinforcement learning to teach https://x.com/IlirAliu_/status/1970539603368042823

Ran a world model robot learning reading group alongside @djkesu1 this Thursday, covering V-JEPA and V-JEPA 2 by @AIatMeta, here are some of my thoughts: JEPA is an interesting idea and borrows many ideas from the representation learning literature (a combination of SimCLR, https://x.com/stevengongg/status/1969387819920736396

Towards a Physics Foundation Model Proposes GPhyT (General Physics Transformer), a large transformer trained on 1.8 TB of simulation data across fluid flows, shock waves, heat transfer, and multiphase dynamics. Here are a few key notes: https://x.com/omarsar0/status/1968681177189077366

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading