Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Flat cartoon illustration of a cute coral-red lobster mascot wearing an oversized white VR headset, centered on dark charcoal background, white speech bubble with ‘AR/VR’ text above in Helvetica font, minimal cyan wireframe grid elements floating in background, kawaii mascot style, clean geometric shapes, high contrast, web interface aesthetic

Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: https://x.com/jang_yoel/status/2019083437265867057

New milestone: we trained a robot foundation model on a world model backbone, and enabled zero-shot, open-world prompting capability for new verbs, nouns, and environments. If the world model can “”dream”” the right future in pixels, then the robot can execute well in motors. We”” https://x.com/DrJimFan/status/2019112603637920237

Genie is out of the bottle. Google is rolling out access to Project Genie: – Design your world and character using text and visual prompts. – The Genie 3 world model generates the environment in real-time as you move through it. Only available for Google AI Ultra subscribers in”” https://x.com/TheHumanoidHub/status/2016987944809353260

getting meta over here… prompted genie 3 to generate a zoom call and i can take control of the cursor and take it off the screen into the world. lmao.”” https://x.com/bilawalsidhu/status/2017346682116084079

Giving the world’s first photograph, the View from the Window at Le Gras, from 1822, to Genie 3.”” https://x.com/emollick/status/2018494862178316725

Google Genie just let me walk through 1900s San Francisco. I gave it one black-and-white photo. It gave me back a city — explorable from the sky or the street. This is the closest thing we have to a time machine.”” https://x.com/bilawalsidhu/status/2017045841836405035

I tested Google’s world model Genie 3… Then DeepMind told me everything 00:00 – Intro & Authoring Workflow 00:27 – Genie 3 Playtesting & Demos 05:33 – Interview w/ Google DeepMind (Genie 3 co-lead @jparkerholder and Sr. PM Diego Rivas) 06:54 – Wildest emergent behaviors”” https://x.com/bilawalsidhu/status/2018487746508018051

Much debate over Genie vs 3D engines. You can have both – the control of 3D scene graphs + the creativity of generative ai. Wrote this in 2024 breaking down the vision. The models are almost there. Now just imagine if Unreal / Unity productized this.”” https://x.com/bilawalsidhu/status/2018119240612536587

One of the wildest emergent capabilities of Genie 3 is that maps actually work. As I walk around the forest, the GPS display updates its heading in real time. Remember. There is no game engine here. This is an AI hallucinating a working navigational instrument purely from next”” https://x.com/bilawalsidhu/status/2017252036719657193

Today is the day. Google DeepMind just shipped playable reality: https://t.co/ct43xo4G43 I went hands-on with their Genie 3 world model that spawns interactive, 3D simulations from simple text. We’ve moved past watching videos; we’re now stepping *inside* them. Stick around to”” https://x.com/bilawalsidhu/status/2016925493552206113

Took an old photo of a WWI battlecruiser, gave it to Genie 3, and prompted it to let me play as a torpedo boat at the Battle of Jutland. Considering this is a research preview, astonishing how fast this has come. An AI dynamically generating the world with no game engine…”” https://x.com/emollick/status/2018198584508760108

„The model transfers Genie 3’s vast world knowledge into precise camera and 3D lidar data unique to Waymo’s hardware.” a key reason and example of why world models are so important.”” https://x.com/kimmonismus/status/2019809839804010962

@Waymo The model transfers Genie 3’s vast world knowledge into precise camera and 3D lidar data unique to Waymo’s hardware. Engineers can prompt “what if” scenarios – like extreme weather or reckless drivers – to stress-test the system.”” https://x.com/GoogleDeepMind/status/2019809201812545835

Excited to share how Waymo is using Genie to simulate rare scenarios for autonomous driving evaluation, such as extreme weather, reckless driving by other drivers, and long-tailed road inhabitants 🐘. We are just scratching the surface of world simulation applications.”” https://x.com/shlomifruchter/status/2019820532485808329

Gemini+Genie 3 are helping @Waymo simulate long tail scenarios to make driving safer.”” https://x.com/JeffDean/status/2019824614139162804

genie 3 is insane. flying a drone over a city then hopping into a fighter jet to chase down a next-gen test vehicle. as one does when you have a proto holodeck at your disposal.”” https://x.com/bilawalsidhu/status/2017410842338460121

Project Genie: Create and Explore Worlds – YouTube https://www.youtube.com/watch?v=Ow0W3WlJxRY

Super cool use case of Genie 3 simulations!”” https://x.com/demishassabis/status/2019827916385972517

We’re excited to introduce the Waymo World Model–a frontier generative mode for large-scale, hyper-realistic autonomous driving simulation built on @GoogleDeepMind’s Genie 3. By simulating the “impossible”, we proactively prepare the Waymo Driver for some of the most rare and”” https://x.com/Waymo/status/2019804616746029508

📢 New paper from GEAR team @NVIDIARobotics We released DreamZero, a World Action Model that turns video world models into zero-shot robot policies. Built on a pretrained video diffusion backbone, it jointly predicts future video frames and actions. 🌐”” https://x.com/yukez/status/2019096072690553112

Introducing NVIDIA Cosmos Policy for Advanced Robot Control https://huggingface.co/blog/nvidia/cosmos-policy-for-robot-control

DreamZero: World Action Models are Zero-shot Policies
https://dreamzero0.github.io/

Jim Fan on X: “The Second Pre-training Paradigm” / X
https://x.com/DrJimFan/status/2018754323141054786

Website: https://t.co/2YwjQs3JMC Robot execution demos across various verbs, nouns, and environments: https://t.co/loUZXZODcR The model is open-source! https://x.com/DrJimFan/status/2019112605315637451

CHORD: Choreographing a World of Dynamic Objects”” TL;DR: universal pipeline that distills dynamic motion from video models to animate static 3D assets into coherent 4D scenes of multi-object interactions.”” https://x.com/Almorgand/status/2016938916377436454

linkedin just made me a “”top voice”” which means i’m now legally required to post about mindset and gratitude excited to share my morning routine with all of you (it involves gaussian splats) anyway back to making videos about world models in my basement”” https://x.com/bilawalsidhu/status/2016707019097424137

TL;DR Make AI a shape rotator, not a wordcel”” https://x.com/bilawalsidhu/status/2019066945090339215

First fully open Action Reasoning Model (ARM); can ‘think’ in 3D & turn your instructions into real-world actions: [📍 Bookmark for later] A model that reasons in space, time, and motion. It breaks down your command into three steps: > Grounds the scene with depth-aware”” https://x.com/IlirAliu_/status/2017162941884162379

Waypoint-1.1 is live, and we’re kicking off weekly updates. This release crosses an important line from impressive short rollouts to local, real-time world models that are coherent, controllable, and playable. New model. Better prompting. Smoother rollouts.”” https://x.com/overworld_ai/status/2019109415023178208

Planning is one of the most exciting uses of world models, but existing planners struggle on long horizons. Introducing GRASP: a fast gradient-based planner for world models that outperforms prior methods on long-horizon tasks. Two key ideas: 1.jointly optimize actions and”” https://x.com/_amirbar/status/2019903658792497482

tl;dr New planner for world models! GRASP: gradient-based, stochastic, parallelized. Long range planning for world models has always been an issue. 0th order methods like CEM/MPPI dominate, but have degrading performance at longer contexts or higher-dimensional actions. We”” https://x.com/michaelpsenka/status/2019870377032503595

Robbyant has announced LingBot-VLA: an open-source Vision-Language-Action model – Pretrained on ~20k hours of real-world dual-arm robot data – Strong generalization across 9 embodiments – Improves consistently with more data – Claims outperformance over π₀.₅, GR00T N1.6 &”” https://x.com/TheHumanoidHub/status/2017337216054575513

World Model meets robot policy! Robbyant’s LingBot-VA: unifies video world modeling and robotic policy learning. – A single model generates both future video and the actions to make it real. – Long-term memory enables long-horizon tasks. – Claims significant outperformance over”” https://x.com/TheHumanoidHub/status/2017638555741552672

self-driving <as a 2D robot with a low-dim action space that focused mostly on avoidance rather than interaction> will reach real-world impact faster than anything else. the really cool part is that the world model isn’t just about videos; it’s about modeling continuous,”” https://x.com/sainingxie/status/2019841784990351381

Accelerating Creation, Powered by Roblox’s Cube Foundation Model | Roblox https://about.roblox.com/newsroom/2026/02/accelerating-creation-powered-roblox-cube-foundation-model

one side tangent from the @yitayml pod I am still thinking about is how people still underestimate the potential of World Models based on moving around in pretty 3D worlds. @ylecun and @jacob_d_kahn showed you can have world models in text and code. currently editing a BANGER”” https://x.com/swyx/status/2019605135689937405

Tired of teleoperation? One human video → 1,000s of robot demos. (📍GitHub ) Scaling Robot Data Without Dynamics Simulation or Robot Hardware Real2Render2Real (R2R2R) is a new way to scale robot data without physics simulation or hardware. You take a phone scan + a single”” https://x.com/IlirAliu_/status/2017884655869976975

AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose https://arxiv.org/pdf/2601.16429

EditYourself https://edit-yourself.github.io/

JOSH: Joint Optimization for 4D Human-Scene Reconstruction in the Wild”” TL;DR: A unified pipeline that jointly optimizes human motion, scene geometry, and camera pose from monocular video improving accuracy in wild reconstructions.”” https://x.com/Almorgand/status/2017259738761740341

Playing as Godot, finally arriving. Just as Beckett intended, thanks to AI.”” https://x.com/emollick/status/2018213227503534572

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading