Video: AI News Week Ending 12/19/2025

Video: AI News Week Ending 12/19/2025

December 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of child in warm pastel bedroom viewing arc of TV screens displaying video thumbnails and paused frames, scattered polaroids and film strips on plush rug, small toy camera on tripod, shallow depth of field with soft side lighting, bold text VIDEO at top, cozy uncanny atmosphere with blue-white screen glow contrasting warm peach walls.

(13) Molmo 2 | Complex video question answering – YouTube https://www.youtube.com/watch?v=Ej3Hb3kRiac

(13) Molmo 2 | Counting objects and actions – YouTube https://www.youtube.com/watch?v=fvYfPTTTZ_w

(13) Molmo 2 | Video Tracking – YouTube https://www.youtube.com/watch?v=uot140v_h08

Molmo 2: State-of-the-art video understanding, pointing, and tracking | Ai2 https://allenai.org/blog/molmo2

Photoshop is now inside ChatGPT. Just prompt what you want and get slider-level control to dial in the perfect look. Intelligently select content and apply effects — without opening Photoshop. You’re the conductor. Photoshop is the orchestra. For me, this one’s personal — I’ve https://x.com/bilawalsidhu/status/1999594990868267227

The shift from text to more dynamic AI experiences https://fidjisimo.substack.com/p/more-dynamic-ai-experiences

Adobe launches free ChatGPT-integrated apps for Photoshop, Acrobat, and Express on desktop, the web, and iOS, after OpenAI added app integrations in October (@zombie_wretch / The Verge) https://x.com/Techmeme/status/1998741032091996348

Edit with Photoshop in ChatGPT | Adobe Blog https://blog.adobe.com/en/publish/2025/12/10/edit-photoshop-chatgpt

Runway unveiled three GWM-1 models that generate video frame-by-frame so scenes stay consistent as the camera moves and can react instantly to user inputs. GWM Worlds makes navigable scenes, GWM Robotics simulates robot viewpoints for planning/data, and GWM Avatars creates https://x.com/DeepLearningAI/status/2001834874487861352

Explore every world. Tell any story. With Runway Gen-4.5, available now. https://x.com/runwayml/status/2001655929796751371

Gen-4.5 is now available for all Runway plans. Runway Gen-4.5 is the world’s top-rated video model, offering unprecedented visual fidelity and creative control. Make something you couldn’t make before. https://x.com/runwayml/status/1999481621326729530

Kling cooked so hard with this new Motion Control. Don’t like the way your character moves? Take control of it yourself. With this, motion capture from home is activated. It’s fast, it’s cheap, it’s easy, it works! @Kling_ai https://x.com/WuxiaRocks/status/2001517467852771467

Runway Gen-4.5 allows you to generate with unprecedented physical accuracy and visual precision. Meaning vehicles move with realistic weight, momentum and force you can feel in your videos. Available now for all paid plans. https://x.com/runwayml/status/2001352437186334875

Runway Gen-4.5 has unparalleled cinematic realism, controllability and expressiveness. We have rolled out full access, now available for all paid plans. https://x.com/runwayml/status/2000930545782677645

🎥 Kling 2.6 Motion Control Feature Is Now Live! To celebrate the launch of Kling 2.6 Motion Control Feature, we’re kicking off a new contest – and the prizes are one post away from you! 🔥 Show us your creative power with Kling 2.6 Motion Control Feature – The Kling 2.6 Motion https://x.com/Kling_ai/status/2001891240359632965

🎥 Kling 2.6 Voice Control Feature Is Now Live! To celebrate the launch of Kling 2.6 Voice Control Feature, we’re kicking off a new contest – and the prizes are one post away from you! 🔥 Show us your creative power with Kling 2.6 Voice Control Feature – Use your signature voices https://x.com/Kling_ai/status/2001198609115628029

🚀 Motion Control, Leveled Up Newly upgraded Motion Control is now live in Kling VIDEO 2.6! Experience precise, full control over every action & expression ✅ Full-Body Motions — Body movements captured in stunning detail ✅ Fast & Complex Actions — From martial arts to https://x.com/Kling_ai/status/2001306445262823431

🚨 Kling O1 Video Standard is here on fal! 🎬 Same powerful editing model, 720P mode ✨ Start & end frame control for precision 🎯 3-10 second range for flexible videos 💰 Faster generation, lower cost https://x.com/fal/status/2000590369545744599

🚨Video Leaderboard Updates Kling 2.6 Pro by @kling_AI and the new Kandinsky 5.0 open models by @kandinskylab have now landed on the Video Arena leaderboard. Kling 2.6 Pro delivers a major 16-point jump over Kling-2.5-turbo-1080p. While Kandinsky 5.0 enters strong, taking the https://x.com/arena/status/1999530939886768205

A new prompt unlock? Multiple gliding rack focus through a cyberpunk nightclub, yes the characters in close up are prompted, prompt share in later post. Not keyframes. Created in @Kling_ai 2.6 Image to video. 🔊🔊🎧 https://x.com/StevieMac03/status/2002001196383391813

Do you want to create ultra-dynamic action animations with @Kling_ai 2.6? 🎬⚡️ After testing many prompts, I’ve noticed what works best. And here’s the key. 👉 What usually gives the best results is starting the prompt with “”High-speed anime battle.”” Other combinations that https://x.com/Artedeingenio/status/2001960379610767835

Kling 2.6「MotionControl」ダンス動画で検証・全身のステップや重心移動が自然・髪の毛の追尾性能も◎ こういったダンスやアクションの方が相性が良く、強みを発揮できる印象です✨ https://x.com/genel_ai/status/2001532885673873677

Oh my… Kling just dropped the next era of motion control. Kling VIDEO 2.6 can copy any action with perfect lip-sync, lifelike motion and expressive gesture. It outperforms Wan 2.2-Animate, Act-Two and DreamActor 1.5 across all metrics. More examples below. https://x.com/AngryTomtweets/status/2001569619375698199

Quick test of Kling 2.6 Motion Control Shall I keep going? 😭 https://x.com/blizaine/status/2001849003819098168

Your frames. Your timing. Kling VIDEO O1 now supports Start & End Frames generation with freely selectable durations from 3- 10s, giving you smoother transitions and more control over pacing. From fast, high-impact moments to fully immersive cinematic shots–your story moves the https://x.com/Kling_ai/status/2000581619556421673

Ad: Pretty cool to vibe code games using YouTube Playables Builder. One of my top VFX/360 videos is now a retro shooter game – stock up on burgers for your intergalactic overlords while dodging a horde of farmers who really want their cows back. https://x.com/bilawalsidhu/status/2001025884778848611

Another banger paper from Apple. View synthesis from a single image is impressive. But most methods are extremely slow. The default approach to high-quality novel view synthesis uses diffusion models. Iterative denoising produces compelling results, but latency can stretch into https://x.com/omarsar0/status/2000989377883988311

Apple just released Sharp Sharp Monocular View Synthesis in Less Than a Second https://x.com/_akhaliq/status/2000587447680340257

Apple presents One Layer Is Enough Adapting Pretrained Visual Encoders for Image Generation https://x.com/_akhaliq/status/1999516539351883823

EgoX is really cool – generating immersive first-person view video from any third-person footage. If you can do this vision task well, you get endless egocentric training data for robotics. https://x.com/bilawalsidhu/status/2000642584763335055

WorldPlay Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling https://x.com/_akhaliq/status/2001286164469227555

Gaussian See, Gaussian Do: 3D Semantic Motion Transfer”” TL;DR: extracts the semantic motion from a multi-view source video and applies it to a static target shape in a way that is semantically meaningful. https://x.com/Almorgand/status/2001345313999852018

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space”” TLDR: diffusion-based; through viewpoint conditioning, w/o explicit depth or warping. canonical rectified space and conditioning guide generator to infer correspondences (1/3) https://x.com/Almorgand/status/2000602000866619569

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos”” TL;DR: 3 learnable modules+lightweight IK stage: a Reference Prompt Encoder that distills per-joint queries from the asset’s skeleton, mesh, and rendered image set; (1/4) https://x.com/Almorgand/status/1999530563607122271

Word of the Year 2025 | Slop | Merriam-Webster https://www.merriam-webster.com/wordplay/word-of-the-year

MiniMax (Hailuo) Video Team Has Open Sourced VTP (Visual Tokenizer Pre-training)! VTP is a scalable pre-training framework for visual tokenizers, built for next-gen generative models. It challenges the conventional belief in Latent Diffusion Models that scaling the stage-1 https://x.com/MiniMax_AI/status/2000935213506171197

Holy fuck guys we’re not “”pushing hard”” for or replacing concept artists with AI. We have a team of 72 artists of which 23 are concept artists and we are hiring more. The art they create is original and I’m very proud of what they do. I was asked explicitly about concept art”” / X https://x.com/LarAtLarian/status/2001011042642505833

Introducing Wan2.6 – A native multimodal model that turns your ideas into breathtaking videos and images! · Starring: Cast characters from reference videos into new scenes. Support human or human-like figures, enabling complex multi-person and human-object interactions with https://x.com/Alibaba_Wan/status/2000930078037827972?s=20

Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥–and leads many open models on challenging industry video benchmarks. 🧵 https://x.com/allen_ai/status/2000962068774588536

Molmo 2 sets new sota in image and video tasks in open models 🔥 > comes in 3 sizes, based on SigLIP2 + Qwen3 > separate 4B model for video pointing/counting (sota!) > 💗 Apache 2.0 licensed 💗 > image + video datasets are out as well!”” / X https://x.com/mervenoyann/status/2000965892230815756

Multimodal serving pain: vision encoder work can stall text prefill/decode and make tail latency jittery. We built Encoder Disaggregation (EPD) in vLLM: run the encoder as a separate scalable service, pipeline it with prefill/decode, and reuse image embeddings via caching. This https://x.com/vllm_project/status/2000535421642502335

TurboDiffusion Accelerating Video Diffusion Models by 100-205 Times https://x.com/_akhaliq/status/2001342606450774299

Vision Bridge Transformer (ViBT) – a large-scale model based on Brownian Bridge Models for conditional generation. It’s a new kind of model that learns direct data-to-data trajectories for fast, high-quality image/video editing and stylization. We scale ViBT to 1.3B and 20B https://x.com/TheTuringPost/status/2000313966648844447

Robots learning from human videos used to be a hard research problem. It turns out scale changes that. A new result from @physical_int shows an emergent property of large VLAs like π0.5. As pre training scales, the model naturally aligns human egocentric video and robot data https://x.com/IlirAliu_/status/2001216734850646410

Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time”” TL;DR: self-attention encoder transforms the input video into the latent Global Scene Representation; decoder can query 3D position P of any given 2D point (u, v) from the source timestep at target timestep 1/2 https://x.com/Almorgand/status/1999138551972221358

🎉Amazing work from @Winterice10 and the team on fast video generation⚡️! We’re excited about the upcoming collaboration to integrate TurboDiffusion into vLLM-Omni. Check it out!”” / X https://x.com/vllm_project/status/2000720345872130413

TurboDiffusion: 100-205× faster video generation on a single RTX 5090 🚀 Only takes 1.8s to generate a high-quality 5-second video. The key to both high speed and high quality? 😍SageAttention + Sparse-Linear Attention (SLA) + rCM Github: https://x.com/Jintao_Zhang_/status/2000709961370767771

Update: I still remain obsessed with these long sequence shots from Gen-4.5. https://x.com/c_valenzuelab/status/2002047619640799504

🚀🚀🚀Introducing HY World 1.5 (WorldPlay)! We have now open-sourced the most systemized, comprehensive real-time world model framework in the industry. In HY World 1.5, we develop WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling https://x.com/TencentHunyuan/status/2001170499133653006

kling2.6(@Kling_ai )のモーションコントロールについて v2vの最大の魅力は AIで再現できない演技をさせること。実例として私が恥を晒して再現したから見てほしい。こんな動きプロンプトでは無理なんですよ。 https://x.com/onofumi_AI/status/2001840428250022087

Introducing Veo Robotics! In this work, we show that an action-conditioned video model can be used as a general robot simulator for evaluation, safety, etc. https://x.com/SeanKirmani/status/1999528692448657687