Augmented Reality (AR/VR): AI News Week Ending 09/05/2025

Augmented Reality (AR/VR): AI News Week Ending 09/05/2025

September 5, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: ARVR, mixed-reality headset constructed from small bananas with strap printed in banana pattern, faint holographic grid, photorealistic, editorial, minimal, high detail, 3:2 landscape

🏓🤖 Our humanoid robot can now rally over 100 consecutive shots against a human in real table tennis — fully autonomous, sub-second reaction, human-like strikes. https://x.com/ZhiSu22/status/1961244573658673222

HITTER https://humanoid-table-tennis.github.io/

Humanoid robots playing table tennis fully autonomously. The ‘HITTER’ system combines a model-based planner with a reinforcement learning (RL) whole-body controller. It is fully autonomous but relies on an external sensing system. A 9-camera OptiTrack motion capture setup https://x.com/TheHumanoidHub/status/1961338417628979237

USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning https://bytedance.github.io/USO/

Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Great thanks to @_akhaliq for sharing ! 🌟USO is open-sourced and supports you in combining any subjects with any styles in any scenarios! 🚀Give it a try in our demo. 👇👇👇 🥰code https://x.com/fenfenfenfenfan/status/1961464402550690007

If you think Apple is not doing much in AI, you’re getting blindsided by the chatbot hype and not paying enough attention! They just released FastVLM and MobileCLIP2 on Huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision language model (VLM) applications! It can even do live video captioning 100% locally in your browser 🤯🤯🤯 https://x.com/ClementDelangue/status/1962526559115358645

🗺️ ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling”” TL;DR: high-fidelity 3D humans across a wide range of poses, capturing both skeletal structure and surface details; separates internal skeleton from the external surface, (1/3) https://x.com/Almorgand/status/1962581481055797586

We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception — inspired by how LLMs unified NLP. 📝 https://x.com/miran_heo/status/1962649613590302776

Jensen on NVIDIA Q2 Earnings Call: “”Our new robotics computing platform, Thor, is now available. Thor delivers an order of magnitude greater AI performance and energy efficiency than NVIDIA’s AGX Orin. It runs the latest generative and reasoning AI models at the edge in real https://x.com/TheHumanoidHub/status/1961342309209100670

Nvidia launched Jetson AGX Thor, a $3,499 chip for real-time physical AI It uses a 2,560-core Blackwell GPU, 96 fifth-generation Tensor cores, and 128GB of memory to deliver up to 2,070 FP4 teraflops of AI compute https://x.com/adcock_brett/status/1962184408246415687

NVIDIA’s Jetson AGX Thor, a $3,499 ‘robot brain,’ is now available. Powered by a Blackwell GPU with 128GB memory, it delivers up to 2,070 FP4 teraflops in 130W. Early adopters include Boston Dynamics, Agility, and Figure—pushing humanoid robotics into a new era. 🤖✨ https://x.com/StarSnap_1/status/1960153258389053561

Nano Banana + Veo 3 https://x.com/dev_valladares/status/1961621010144247858

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4 https://x.com/GordonWetzstein/status/1963583050744250879

AI co-pilot boosts noninvasive brain-computer interface by interpreting user intent | EurekAlert! https://www.eurekalert.org/news-releases/1096148

Smart ring rivalry heats up: Ultrahuman sues Oura over patent claims https://www.msn.com/en-us/money/other/smart-ring-rivalry-heats-up-ultrahuman-sues-oura-over-patent-claims/ar-AA1L2RLo?apiversion=v2&noservercache=1&domshim=1&renderwebcomponents=1&wcseo=1&batchservertelemetry=1&noservertelemetry=1

90% success rate in unseen environments. No new data, no fine-tuning. Autonomously. Most robots need retraining to work in new places. What if they didn’t? Robot Utility Models (RUMs) learn once and work anywhere… zero-shot. A team from NYU and Hello Robot built a set of https://x.com/IlirAliu_/status/1961692920836215229

A robot that sees the terrain and predicts its own future… up to 5 seconds ahead? This is real. ❗️Best Systems Paper finalist at #RSS2025 The team introduces a perceptive Forward Dynamics Model that helps legged robots safely navigate rough, complex environments: no manual https://x.com/IlirAliu_/status/1962569938805141861

The most powerful thing about using the Vision Pro for social VR is that you know the person on the other end is *exactly* who they say they are and look the way they do because Apple uses a retina scan to authenticate and ensure only you can create & drive your 3D avatar. https://x.com/bilawalsidhu/status/1962198920085594568

Me after my trip to India”” / X https://x.com/bilawalsidhu/status/1962845777056956639

People ask how I get such clean 3D scans with a DSLR — and I must admit it’s a bit of a dark art. But this new $5K PortalCam changes everything. LiDAR precision + SLAM speed + 3D Gaussian Splat fidelity in a device anyone can use. The use cases are wild. Let me show you 🧵 https://x.com/bilawalsidhu/status/1963337887027707987

Training a Whole-Body Control Foundation Model — new work from my team at @agilityrobotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more https://x.com/chris_j_paxton/status/1961211488653267227

90’s SGI computers were a vibe and half too. Reality Engine systems were like $250-750K and I wanted one so bad as a kid! These beautiful beasts powered VFX in everything from Jurassic Park and Terminator 2 to The Matrix and Lord of the Rings. https://x.com/bilawalsidhu/status/1962755877481349170

Pixie: Physics from Pixels”” TL;DR: NeRF, GS w/ physics; neural network mapping pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling real‑time physics simulations. https://x.com/Almorgand/status/1961076683093524561

Lipsync Studio https://higgsfield.ai/create/speech

Computer vision (understanding reality) 🤝 Computer graphics (generating reality) https://x.com/bilawalsidhu/status/1962517172267384853

HunyuanWorld-Voyager is here and fully open-source! The world’s first ultra-long-range world model with native 3D reconstruction, redefining AI-driven spatial intelligence for VR, gaming, and simulations. ✅Direct 3D Output: Exports point cloud videos to 3D formats without tools https://x.com/TencentHunyuan/status/1962741518797836708

Entire startups have raised more venture capital on the backs of Adobe video edits than actual products. Insane if you think about it. After Effects might be the most valuable VC fundraising tool ever invented.”” / X https://x.com/bilawalsidhu/status/1962915517326332086

The utter disrespect for CGI & VFX continues to baffle me. Do these Hollywood heavyweights make these remarks because it’s an easy PR win? Or do they genuinely think modern cinema and TV would be anywhere close to where it is without CGI assisted storytelling?”” / X https://x.com/bilawalsidhu/status/1962583444158062641