Video: AI News Week Ending 08/15/2025

Video: AI News Week Ending 08/15/2025

August 15, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: CU Boulder brand style — CU Gold & Black, Helvetica Neue, Flatirons, Tuscan-vernacular sandstone + red-tile roofs; media lab edit suite, night ambient light, over-shoulder monitor view, sandstone texture band; integrate the category “Video” via Overlay: video-editing timeline UI on screen titled “VIDEO”; natural light, clean professional inspiring tone, crisp focus, subtle grain, editorial composition

Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind’s Genie 3 shook the AI world with real-time interactive world models. But… it wasn’t open-sourced. Today, Matrix-Game 2.0 changed the game. 🚀 25FPS. Minutes-long https://x.com/Skywork_ai/status/1955237399912648842

RT @Skywork_ai: Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind’s Genie 3 sh…”” / X https://x.com/slashML/status/1955320183976767673

Damn it worked! Genie 3 world –> inpaint UI –> 4x topaz AI upscale –> train 3d gaussian splat You can step inside a painting of Socrates from 1787. Better than any image-to-3d model I’ve seen. I think Google has stumbled upon the killer app for VR — the literal holodeck. https://x.com/bilawalsidhu/status/1954229425199034753

Lmao. We got open source genie ONE WEEK after Google’s announce. Meanwhile, Odyssey has a launch around the corner too. The future is generated, not rendered.”” / X https://x.com/bilawalsidhu/status/1955342603324453305

RT @altryne: This Genie-3 video is mind boggling, especially this edited out part, the airplane collides with the sphere, bounces off, the…”” / X https://x.com/_rockt/status/1955025996547232170

OpenAI gpt-oss 120B orchestrates a full video using Hugging Face spaces! 🤯 All of it, in one SINGLE prompt: create an image of a Labrador and use it to generate a simple video of it 🛠️ Tools used: 1. Flux.1 Krea Dev by @bfl_ml 2. LTX Fast by @Lightricks That’s it, gpt-oss https://x.com/reach_vb/status/1955678303395696821

Tired: painting to video Wired: painting to worlds This is a closest glimpse we’ve seen to a real life holodeck https://x.com/bilawalsidhu/status/1953959597301235943

Here are two ways to create this effect: Option 1: Motion track to analyze the camera movement and spatial positioning throughout the shot. Capture HDRs of the lighting environment to accurately recreate the illumination conditions. Create a detailed 3D model of the action https://x.com/c_valenzuelab/status/1955687077825183952

Natural conversation includes interruptions and talking over people, which is hard for an LLM to model as a single autoregressive sequence. I’m sure you can get pretty far by creating a text sequence with movie-script like breaks mid sentence, but it seems like the real solution”” / X https://x.com/ID_AA_Carmack/status/1954930438322954532

Introducing Higgsfield Draw-to-Video. RIP Prompts. Turn your sketch into an absolute cinema. Works with all our video models: MiniMax, Veo 3 & Seedance Pro. This is possible ONLY in Higgsfield. Retweet to unlock the full capacity of the best video models in your DMs. https://x.com/higgsfield_ai/status/1955742643704750571

Tencent presents Yan: Foundational Interactive Video Generation. It has been only two months since our release of Self-Forcing, and there are already two world foundation models built on top of it. Chinese teams are building at the speed of light! https://x.com/xunhuang1995/status/1955645976917811411

Runway Aleph can precisely replace, retexture or entirely reimagine specific parts of a video, making it possible to rapidly ideate and iterate new concepts with existing footage. All you need to do is tell Aleph what you want. https://x.com/runwayml/status/1955615613583519917

Matrix-3D does a smart thing: generate a full 360 panorama first, then extend temporally with camera control, then lift it to 3D Sidesteps multi-view consistency hell, and gets you the largest explorable volumes we’ve seen from text-to-3D And it’s OSS: https://x.com/bilawalsidhu/status/1955646231713337502

The elevators in this hotel are in dire need of maintenance. I really expected more from a 5 star in SF. https://x.com/bilawalsidhu/status/1955267837276099072

Macro scale 3d reconstructions are so cool https://x.com/bilawalsidhu/status/1953655989057527865

Genie 3 can basically do one-shot / single-image 3D reconstruction. Turns a 2D painting into an explorable 3D world, and holy crap the fidelity is nuts. No NeRF, no 3D mesh and blows any image-to-3D tech I’ve seen out of the water. https://x.com/bilawalsidhu/status/1954166512475906217

Genie 3 is great, but IMO and IOI gold alone don’t impress me one bit. It’s just RL- maxxing. I want to see real world results. Show me that these results transfer to any other useful task.”” / X https://x.com/scaling01/status/1955052735918670246

LightSwitch: Multi-view Relighting with Material-guided Diffusion TL;DR: material-relighting diffusion framework; relights an arbitrary number of input images to a target lighting condition while incorporating cues from inferred intrinsic properties; (1/2) https://x.com/Almorgand/status/1955655723985309967

There have been a lot of crazy many-camera rigs created for the purpose of capturing full spatial video. I recall a conversation at Meta that was basically “we are going to lean in as hard as possible on classic geometric computer vision before looking at machine learning https://x.com/ID_AA_Carmack/status/1955302165653926058

How to make videos of any length with Grok Imagine. We will make this much easier over the coming weeks and months.”” / X https://x.com/elonmusk/status/1955710887094050994

Hailuo 2 Pro is ranked the best video model (without audio)!”” / X https://x.com/Hailuo_AI/status/1955453164645429350

No Pose at All Self-Supervised Pose-Free 3DGS from Sparse Views”” TLDR: 3DGS + no poses during training/inference; shared feature extraction backbone; simultaneous prediction of 3D Gaussian primitives+camera poses in a canonical space from unposed (1 feed-forward step). https://x.com/Almorgand/status/1953480959573037419

🚀We are thrilled to open-source Hunyuan-GameCraft, a high-dynamic interactive game video generation framework built on HunyuanVideo. It generates playable and physically realistic videos from a single scene image and user action signals, empowering creators and developers to https://x.com/TencentHunyuan/status/1955839140173631656

A single, generalist control policy that solves diverse tasks at test-time, no fine-tuning needed: Researchers from UC Berkeley, Stanford, and the RAI Institute have unveiled DiffuseCLoC, a guided diffusion framework for physics-based character look-ahead control. Instead of https://x.com/IlirAliu_/status/1954084945552404675

Can you edit videos using AI? With @shawnshenjx’s @memoriesai API, it’s definitely possible and even helped me won a prize 🦄 Here’s a quick v0 I built over the weekend to make shorts using the clips from my smart glasses, looks promising #buildinpublic https://x.com/KenjiPhang/status/1952533836438589920