Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: 1980s NORAD war room with silhouetted operator wearing chunky retro VR headset at console, multiple CRT monitors displaying glitching wireframe virtual environments with fragmenting geometric grids, massive bold red sans-serif text reading AR/VR on dark wall, flickering amber and blue screen glow, high contrast cinematic lighting, WarGames movie aesthetic
An interactive world model developed by NVIDIA in collaboration with academic partners. – DreamDojo turns egocentric human video data into physical intelligence. – Human data is more scalable than robotics data but lacks action labels. – To solve this, a dedicated action model
https://x.com/TheHumanoidHub/status/2025368793321799909
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It’s Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is
https://x.com/DrJimFan/status/2024895359236051274
NVIDIA has open-sourced SONIC, a humanoid behavior foundation model that gives robots a core set of motor skills learned from large-scale human motion data. https://x.com/TheHumanoidHub/status/2024935738362765677
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control https://nvlabs.github.io/GEAR-SONIC/
We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC ( https://x.com/yukez/status/2024639427788857707
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We
https://x.com/DrJimFan/status/2026709304984875202
What can half of GPT-1 do? We trained a 42M transformer called SONIC to control the body of a humanoid robot. It takes a remarkable amount of subconscious processing for us humans to squat, turn, crawl, sprint. SONIC captures this “”System 1″” – the fast, reactive whole-body
https://x.com/DrJimFan/status/2026350142652383587
The trifecta: Vision, world models, and robotics. OpenAI is getting physical on the AI battlefield!
https://x.com/TheHumanoidHub/status/2026361659695194537
Having an agentic VLM model, shade & render your 3d scene is the ultimate counter example to the “pixels is all you need” crowd. Real time video is powerful – it’s even a new medium. But explicit 3d is still very useful. Also this donut makes me hungry.
https://x.com/bilawalsidhu/status/2026184423004160185
Between Gemini 3.1 and Claude 4.6 it’s honestly wild what you can build. This feels like Google Earth and Palantir had a baby. Made this with all the geospatial bells and whistles — real time plane & satellite tracking, real traffic cams in Austin, and even got a traffic system
https://x.com/bilawalsidhu/status/2024672151949766950
Universal Beta Splatting”” TL;DR: a unified radiance field model that uses N-D Beta kernels to capture spatial, angular, and temporal effects for richer, real-time scene rendering.
https://x.com/Almorgand/status/2026329946688200780
world modeling is never about rendering pixels. rendering is local. world state is global. as soon as more than one agent exists, the only thing that truly matters is the shared representation beneath individual views. that shared representation is what scales into collective
https://x.com/sainingxie/status/2027115356318474661
Dang. My WorldView project is blowing up and is trending on X. I guess ppl really like monitoring the situation. Inbound is a little nuts — got hedge funds and OSINT folks ready to contribute; keep the feature requests coming! Been fun to put my geospatial 3D roots to work.
https://x.com/bilawalsidhu/status/2024953470806102510
Explore any world. Tell any story. All in one place. Kling 3.0 is now available in both Runway Workflows and Tool Mode. Discover all of the new models and capabilities available right inside of Runway at the link below. Morningstar Generated with AI. Made by @ceremonial_flux
https://x.com/runwayml/status/2025977383208051018
Generated Reality Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control paper: https://x.com/_akhaliq/status/2025944948453847352
Introducing Solaris: the first multiplayer world model exploration effort in Minecraft. We’ve built a scalable data collection engine, a multiplayer video diffusion model architecture, and a multi-view consistency evaluation benchmark. [1/9]
https://x.com/georgysavva/status/2027119472096518358
Marble is a generative AI platform and multimodal world model developed by World Labs, the spatial intelligence company founded by AI pioneer Fei-Fei Li. It allows users to create high-fidelity, persistent, navigable 3D worlds from simple inputs like: – Text prompts – Single or
https://x.com/TheHumanoidHub/status/2024935236057137640
My site hit #25 in rising tech publications. I’m mapping the frontier of creation & computing. Written + video deep dives on generative media, spatial intelligence and world models. Check it out https://x.com/bilawalsidhu/status/2026108063632216492
Physical Intelligence’s π0.6 models in real-world use cases Weave (left): Autonomous laundry folding Ultra (right): E-commerce packaging The models are built on a Vision-Language-Action (VLA) framework.
https://x.com/TheHumanoidHub/status/2026455516034306150
What is self-evolution trilemma? In an ideal world, an AI system where agents learn only from each other would have3 properties: – Continuous self-evolution – Isolation, meaning running in a closed loop, without outside interference – Stable safety alignment (safety invariance)
https://x.com/TheTuringPost/status/2024621675866935495
I like big splats and I cannot lie SparkJS getting LOD support is gonna be a big unlock for massive 3d scenes — whether captured or generated
https://x.com/bilawalsidhu/status/2025349799315145063
Selfi: Self-Improving Reconstruction Engine via 3D Geometric Feature Alignment”” TL;DR: a self-improving 3D reconstruction pipeline refining geometric consistency by aligning features from a pretrained vision backbone, boosting NVS and pose estimation from unposed images.
https://x.com/Almorgand/status/2026718518977200229
SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting”” TL;DR: a geometry-aware SR strategy for 3DGS that sharpens detail only where needed, yielding high-fidelity, multi-view consistent reconstructions.
https://x.com/Almorgand/status/2025966112383406314
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting”” TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.
https://x.com/Almorgand/status/2024901904405348410
3D printed tires Pure 3D-printed lattice magic in TPU that flexes, absorbs insane weight, and keeps rolling. No air or suspension needed. @robotsailor is building next-level robot wheels… fully custom, printed fast on Bambu Lab printers. If you don’t follow him… you are
https://x.com/IlirAliu_/status/2024407630735716474
Not a preplanned motion sequence. A robot deciding mid-jump what to do next. [📍 paper + demo] Researchers just showed a humanoid doing real parkour using only onboard perception. No motion script, no fixed obstacle layout. The system is called Perceptive Humanoid Parkour
https://x.com/IlirAliu_/status/2024560405335495052





Leave a Reply