Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cylindrical space station command center with multiple converging holographic data streams–scrolling blue text, green audio waveforms, amber image fragments, flickering video feeds–all merging into a single glowing unified sphere at the center, cinematic sci-fi lighting with deep space visible through transparent walls, Ender’s Game inspired tactical aesthetic, cool color palette with neon highlights, dramatic isolation and scale
I genuinely think we’re on the cusp of a new type of creation engine. Feels less like prompting and more like puppeteering reality itself. MotionStream is a taste of what’s to come: https://x.com/bilawalsidhu/status/1986877076839014462
MotionStream: Real-Time Video Generation with Interactive Motion Controls
https://joonghyuk.com/motionstream-web/
Wildminder on X: “MotionStream: Real-time, interactive video generation with mouse-based motion control; runs at 29 FPS with 0.4s latency on one H100; uses point tracks to control object/camera motion and enables real-time video editing. https://t.co/fFi9iB9ty7 https://t.co/zKb9u3bj9g” / X
https://x.com/wildmindai/status/1985828041566941576
Marble: A Multimodal World Model | World Labs https://www.worldlabs.ai/blog/marble-world-model
A robot could learn a task just by watching a generated video? PhysWorld connects video generation with real-world robot learning. It turns visual imagination into physical skill. ✅ Takes one image and a task prompt ✅ Generates a video showing how to complete the task ✅ https://x.com/IlirAliu_/status/1988678189527273831
[2511.07416] Robot Learning from a Physical World Model https://arxiv.org/abs/2511.07416
Fei-Fei Li’s World Labs speeds up the world model race with Marble, its first commercial product | TechCrunch https://techcrunch.com/2025/11/12/fei-fei-lis-world-labs-speeds-up-the-world-model-race-with-marble-its-first-commercial-product/
From Words to Worlds: Spatial Intelligence is AI’s Next Frontier https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence
.@drfeifei started her new blog We believe this will be one of the most interesting reads about Spatial Intelligence. She writes, that Spatial Intelligence depends on world models built on 3 core principles: – They must be generative – able to create coherent, https://x.com/TheTuringPost/status/1988727531353305524
Perceptron’s platform is here — built for Physical AI Developers can now use Isaac-0.1 or Qwen3VL 235B via: Perceptron API — fast, reliable multimodal intelligence Python SDK — simple, grounded prompting for vision + language Build apps that see and understand the world. https://x.com/perceptroninc/status/1988713482460750290
NotebookLM adds Deep Research, Docx, Sheets and more https://blog.google/technology/google-labs/notebooklm-deep-research-file-types/
Gemini having SOTA satellite data understanding was not on my 2025 bingo card, yet here we are 🙂 https://x.com/OfficialLoganK/status/1986978962589790536
Google is now using Gemini to cross-reference ~250M places with Street View imagery to identify visible landmarks for turn-by-turn nav. Think iconic buildings, gas stations and restaurants. So instead of “”turn right in 500 feet”” you get “”turn right after the Thai Siam https://x.com/bilawalsidhu/status/1986525085398941974
Introducing Meta Omnilingual Automatic Speech Recognition (ASR), a suite of models providing ASR capabilities for over 1,600 languages, including 500 low-coverage languages never before served by any ASR system. While most ASR systems focus on a limited set of languages that are https://x.com/AIatMeta/status/1987946571439444361
Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/
Tavus! Pretty big step towards a real life Jarvis – a multimodal ai assistant w/ a personality to boot. They’re intentionally blurring the lines between a tool and a companion. This is what Siri should’ve been by now. Cool to see it actually happening:”” / X https://x.com/bilawalsidhu/status/1988671232099926465
Excited to share our latest @SophontAI release 🥳 “”How to Train a State-of-the-Art Pathology Foundation Model with $1.6k”” We present OpenMidnight, our first pathology foundation model! It has SOTA perf. despite being only trained on 12k whole slide images w/ $1.6k compute! https://x.com/iScienceLuvr/status/1989390268316221861
We’ve been integrating Isaac across the industry and have realized developers are missing a single platform for Physical AI – prompt engineering, deployment, and integration. Today we are excited to release Perceptron’s Platform – supporting our API – supporting chat”” / X https://x.com/AkshatS07/status/1988713765152649711
Perceptron AI on X: “Perceptron’s platform is here — built for Physical AI Developers can now use Isaac-0.1 or Qwen3VL 235B via: Perceptron API — fast, reliable multimodal intelligence Python SDK — simple, grounded prompting for vision + language Build apps that see and understand the world. https://t.co/5ZyaOGQb1i” / X
https://x.com/perceptroninc/status/1988713482460750290
Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini | VentureBeat https://venturebeat.com/ai/baidu-just-dropped-an-open-source-multimodal-ai-that-it-claims-beats-gpt-5
Robots are great at following instructions. But what happens when those instructions fail? Most Vision-Language-Action models freeze or repeat the same mistake. A new approach called FailSafe shows how robots can detect and fix their own failures. The method uses a companion https://x.com/IlirAliu_/status/1986353266322538634
Super original work! What if you could match non-identical objects? (just accepted to WACV’26 congrats!)”” / X https://x.com/Almorgand/status/1988240870986953120
k2 vision is happening. this is not a drill. https://x.com/code_star/status/1987917177417289794
Samsung Vision AI Companion: Bringing Conversational AI to Households Worldwide – Samsung Global Newsroom https://news.samsung.com/global/samsung-vision-ai-companion-bringing-conversational-ai-to-households-worldwide
Here comes ERNIE 5.0 — our latest natively omni-modal foundational model. It excels in omni-modal understanding, creative writing, instruction following, and more. We will continue investing in and developing more cutting-edge models to push the boundaries of intelligence. https://x.com/Baidu_Inc/status/1988820837898829918?s=20
Fei-Fei Li dropping bars on why spatial intelligence matters: “”LLMs have begun to transform how we access and work with abstract knowledge. Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.”””” / X https://x.com/bilawalsidhu/status/1987903363095343437
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on https://x.com/drfeifei/status/1987891210699379091
“Real-time voice AI isn’t easy. It needs sub-second latency, natural turn-taking, and conversational quality all at once. Here’s how @DecagonAI and Modal built a real-time inference system using: • Supervised fine-tuning and reinforcement learning • Speculative decoding with https://t.co/sbdulC7HRR” / X
https://x.com/modal/status/1989016021919740149
Most trackers lose sight of an object once it changes shape… [👇Code & Dataset] an apple turns into slices, a caterpillar into a butterfly, and the model just gives up. Researchers at Cornell built a new system called Track Any State that does something different: it follows https://x.com/IlirAliu_/status/1988319369160781978
📣 Announcing MUSI: 1st Multimodal Spatial Intelligence Workshop @ICCVConference! 🎙️All-star keynotes: @sainingxie, @ManlingLi_, @RanjayKrishna, @yuewang314, and @QianqianWang5 – plus a panel on the future of the field! 🗓 Oct 20, 1pm-5:30pm HST 🔗 https://x.com/songyoupeng/status/1975811164765643058





Leave a Reply