Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of a child aged 6-8 in cozy bedroom, VR headset pushed up on forehead, sitting on plush rug facing panoramic arc of TV screens displaying overlapping virtual environments and wireframe worlds, warm domestic lighting with cool blue screen glow, cardboard VR viewers and AR device scattered on floor, subtle translucent holographic overlays floating in air between child and screens, shallow depth of field, soft focus, large bold text reading AR/VR at top of frame, tender yet subtly uncanny atmosphere
The shift from text to more dynamic AI experiences https://fidjisimo.substack.com/p/more-dynamic-ai-experiences
Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from @GoogleDeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator https://x.com/Majumdar_Ani/status/1999525259276423569
vibe coding games is actually a lot of fun. can’t wait to share something cool soon.”” / X https://x.com/bilawalsidhu/status/1998961420457881654
Ad: Pretty cool to vibe code games using YouTube Playables Builder. One of my top VFX/360 videos is now a retro shooter game – stock up on burgers for your intergalactic overlords while dodging a horde of farmers who really want their cows back. https://x.com/bilawalsidhu/status/2001025884778848611
Music tools usually live on flat screens. This is an Apple Vision Pro app that turns music production into a modular, spatial system. Sounds are blocks. Arrangements become structures. The composition exists as a physical object in the room, not a timeline on a screen. https://x.com/IlirAliu_/status/2000491725597384854
Another banger paper from Apple. View synthesis from a single image is impressive. But most methods are extremely slow. The default approach to high-quality novel view synthesis uses diffusion models. Iterative denoising produces compelling results, but latency can stretch into https://x.com/omarsar0/status/2000989377883988311
Apple just released Sharp Sharp Monocular View Synthesis in Less Than a Second https://x.com/_akhaliq/status/2000587447680340257
Apple presents One Layer Is Enough Adapting Pretrained Visual Encoders for Image Generation https://x.com/_akhaliq/status/1999516539351883823
🚀 Announcing Echo — our new frontier model for 3D world generation. Echo turns a simple text prompt or image into a fully explorable, 3D-consistent world. Instead of disconnected views, the result is a single, coherent spatial representation you can move through freely. This https://x.com/SpAItial_AI/status/2000600875388027051
🚨 TRELLIS.2 is now live on fal! 🎯 Image-to-3D model producing up to 1536³ PBR textured assets 🎨 Handles arbitrary topology with rich PBR textures (Base Color, Metallic, Roughness, Alpha) ⚡ 16× spatial compression for efficient, scalable, high-fidelity asset generation https://x.com/fal/status/2001414174371373346
I thought 4D would be solved in 2026, well we’re getting much closer at the end of 2025!”” / X https://x.com/Almorgand/status/1999542068205523133
V2V時代の幕開けです!📢 文字で指示する時代から動きで指示をする時代へ。 Kling AIのモーションコントロール機能を試しました。 この機能は、1.6のバージョンで使えていた機能ですが、今回から最新モデルの2.6で利用可能です。 https://x.com/seiiiiiiiiiiru/status/2001502678116110430
EgoX is really cool – generating immersive first-person view video from any third-person footage. If you can do this vision task well, you get endless egocentric training data for robotics. https://x.com/bilawalsidhu/status/2000642584763335055
WorldPlay Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling https://x.com/_akhaliq/status/2001286164469227555
3D printed differential robot arm wrist. [👇 GitHub Link – Open Source ] High demand, so they open sourced it early. A differential mechanism released as a test fixture for future robots. Built around their Spectral micro BLDC driver. Parts, STL files, example code are https://x.com/IlirAliu_/status/2001362344459292952
Gaussian See, Gaussian Do: 3D Semantic Motion Transfer”” TL;DR: extracts the semantic motion from a multi-view source video and applies it to a static target shape in a way that is semantically meaningful. https://x.com/Almorgand/status/2001345313999852018
StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space”” TLDR: diffusion-based; through viewpoint conditioning, w/o explicit depth or warping. canonical rectified space and conditioning guide generator to infer correspondences (1/3) https://x.com/Almorgand/status/2000602000866619569
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos”” TL;DR: 3 learnable modules+lightweight IK stage: a Reference Prompt Encoder that distills per-joint queries from the asset’s skeleton, mesh, and rendered image set; (1/4) https://x.com/Almorgand/status/1999530563607122271
Holy fuck guys we’re not “”pushing hard”” for or replacing concept artists with AI. We have a team of 72 artists of which 23 are concept artists and we are hiring more. The art they create is original and I’m very proud of what they do. I was asked explicitly about concept art”” / X https://x.com/LarAtLarian/status/2001011042642505833
Updates to Meta AI Glasses: Conversation Focus, Spotify Integration, and More https://about.fb.com/news/2025/12/updates-to-meta-ai-glasses-conversation-focus-spotify-integration/
Robots learning from human videos used to be a hard research problem. It turns out scale changes that. A new result from @physical_int shows an emergent property of large VLAs like π0.5. As pre training scales, the model naturally aligns human egocentric video and robot data https://x.com/IlirAliu_/status/2001216734850646410
Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time”” TL;DR: self-attention encoder transforms the input video into the latent Global Scene Representation; decoder can query 3D position P of any given 2D point (u, v) from the source timestep at target timestep 1/2 https://x.com/Almorgand/status/1999138551972221358
🚀🚀🚀Introducing HY World 1.5 (WorldPlay)! We have now open-sourced the most systemized, comprehensive real-time world model framework in the industry. In HY World 1.5, we develop WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling https://x.com/TencentHunyuan/status/2001170499133653006




Leave a Reply