Images: AI News Week Ending 12/19/2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of a 7-year-old child from side angle sitting on plush cream rug in warm-lit bedroom, surrounded by scattered crayon drawings and art books, facing panoramic arc of glowing TV screens displaying surreal AI-generated imagery and dreamlike portraits, with child’s own drawings pinned to walls alongside printed AI art, small easel with crayons visible, warm peach and lavender tones contrasting with blue-white screen glow, shallow depth of field, tender intimate lighting, bold text ‘IMAGES’ at top of frame.

BREAKING: OpenAI releases “”GPT-Image-1.5″” (ChatGPT Images) & It instantly takes the #1 Spot on LMArena, beating Google’s Nano Banana Pro. : r/singularity https://www.reddit.com/r/singularity/comments/1po98xo/breaking_openai_releases_gptimage15_chatgpt/

FLUX.2 [max] – Top-Tier Quality Image Generation | Black Forest Labs https://bfl.ai/models/flux-2-max

FLUX.2 [max] is here Our highest quality model to date. * Grounded generation – searches the web for real-time context. * Up to 10 reference images. Products, characters, styles stay consistent. * #2 on @ArtificialAnlys in text-to-image and image editing. https://x.com/bfl_ml/status/2000945755125899427

(13) Molmo 2 | Complex video question answering – YouTube https://www.youtube.com/watch?v=Ej3Hb3kRiac

(13) Molmo 2 | Counting objects and actions – YouTube https://www.youtube.com/watch?v=fvYfPTTTZ_w

(13) Molmo 2 | Video Tracking – YouTube https://www.youtube.com/watch?v=uot140v_h08

Molmo 2: State-of-the-art video understanding, pointing, and tracking | Ai2 https://allenai.org/blog/molmo2

Images 1.5 launches today in ChatGPT and the API! Much better images in tons of ways, faster, and new editing capability.”” / X https://x.com/sama/status/2000997906078388332

GPT Image 1.5 achieves both #1 in Text to Image and Image Editing in the Artificial Analysis Image Arena, surpassing Nano Banana Pro GPT Image 1.5 is OpenAI’s newest flagship image generation model, demonstrating improved image quality and prompt fidelity relative to earlier https://x.com/ArtificialAnlys/status/2001016199094948185

GPT Image 1.5 is now available in the API: ✏️ More precise image editing and preservation of logos & faces 🎯 Better instruction following and adherence to prompts 🔤 Improved text rendering, particularly for denser and smaller text Learn more in docs: https://x.com/OpenAIDevs/status/2000992413402456485

Grace Li (@grx_xce): “”This is the biggest jump in Image Arena that we’ve seen since Nano Banana GPT-Image-1.5 has taken #1 on Image Arena with a significant lead Huge congratulations to the team at @OpenAI for this achievement!”” | XCancel https://xcancel.com/grx_xce/status/2000993261914350070?s=20

Introducing ChatGPT Images, powered by our flagship new image generation model. – Stronger instruction following – Precise editing – Detail preservation – 4x faster than before Rolling out today in ChatGPT for all users, and in the API as GPT Image 1.5. https://x.com/OpenAI/status/2000990989629161873

The Image Arena is buzzing 👀 @OpenAI’s GPT-image-1.5 is live and already shaking up the leaderboard. Watch it in action below, then try your own prompt and share what you create 👇🎨 https://x.com/arena/status/2001014708254773549

The new ChatGPT Images is here | OpenAI https://openai.com/index/new-chatgpt-images-is-here/

This is the biggest jump in Image Arena that we’ve seen since Nano Banana GPT-Image-1.5 has taken #1 on Image Arena with a significant lead Huge congratulations to the team at @OpenAI for this achievement! https://x.com/grx_xce/status/2000993261914350070

Photoshop is now inside ChatGPT. Just prompt what you want and get slider-level control to dial in the perfect look. Intelligently select content and apply effects — without opening Photoshop. You’re the conductor. Photoshop is the orchestra. For me, this one’s personal — I’ve https://x.com/bilawalsidhu/status/1999594990868267227

The shift from text to more dynamic AI experiences https://fidjisimo.substack.com/p/more-dynamic-ai-experiences

Adobe launches free ChatGPT-integrated apps for Photoshop, Acrobat, and Express on desktop, the web, and iOS, after OpenAI added app integrations in October (@zombie_wretch / The Verge) https://x.com/Techmeme/status/1998741032091996348

Edit with Photoshop in ChatGPT | Adobe Blog https://blog.adobe.com/en/publish/2025/12/10/edit-photoshop-chatgpt

🎨 Qwen-Image-Layered is LIVE — native image decomposition, fully open-sourced! ✨ Why it stands out ✅ Photoshop-grade layering Physically isolated RGBA layers with true native editability ✅ Prompt-controlled structure Explicitly specify 3-10 layers — from coarse layouts to https://x.com/Alibaba_Qwen/status/2002034611229229388

🚨 Qwen Image Layered is live on fal! ✨ Photoshop-grade layering – Native Decomposition 👑 Physically isolated RGBA layers with true native editability 🎨 Explicitly specify layers, from coarse layouts to fine-grained details https://x.com/fal/status/2002055913390195137

Another banger paper from Apple. View synthesis from a single image is impressive. But most methods are extremely slow. The default approach to high-quality novel view synthesis uses diffusion models. Iterative denoising produces compelling results, but latency can stretch into https://x.com/omarsar0/status/2000989377883988311

Apple just released Sharp Sharp Monocular View Synthesis in Less Than a Second https://x.com/_akhaliq/status/2000587447680340257

Apple presents One Layer Is Enough Adapting Pretrained Visual Encoders for Image Generation https://x.com/_akhaliq/status/1999516539351883823

V2V時代の幕開けです！📢 文字で指示する時代から動きで指示をする時代へ。 Kling AIのモーションコントロール機能を試しました。この機能は、1.6のバージョンで使えていた機能ですが、今回から最新モデルの2.6で利用可能です。 https://x.com/seiiiiiiiiiiru/status/2001502678116110430

EgoX is really cool – generating immersive first-person view video from any third-person footage. If you can do this vision task well, you get endless egocentric training data for robotics. https://x.com/bilawalsidhu/status/2000642584763335055

WorldPlay Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling https://x.com/_akhaliq/status/2001286164469227555

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space”” TLDR: diffusion-based; through viewpoint conditioning, w/o explicit depth or warping. canonical rectified space and conditioning guide generator to infer correspondences (1/3) https://x.com/Almorgand/status/2000602000866619569

Word of the Year 2025 | Slop | Merriam-Webster https://www.merriam-webster.com/wordplay/word-of-the-year

few understand that the image on the left has a lower resolution by like 10^21 times”” / X https://x.com/scaling01/status/2001226337546101146

🖼️🚨 Image Leaderboard Update Competition in the Arena continues to drive leaderboard movement, with Flux-2-Max making a competitive debut. 🔹 #3 on Text-to-Image (1167) 🔹 #7 on Image Edit (1247) The Text-to-Image leaderboard tightens as Flux-2-Max slots ahead of https://x.com/arena/status/2000947088738431408

🖼️🚨 Image Leaderboard Update Competition in the Arena continues to shake up the leaderboards. Flux-2-Dev lands on the board with solid early results. 🔹 #7 on Text-to-Image (1149) 🔹 #8 on Image Edit (1240) Margins remain slim on the Text-to-Image leaderboard, where https://x.com/arena/status/1999560495867793881

🚨 FLUX.2 [max] live on fal! ✨ Black Forest Labs’ top-tier: quality + edit consistency 🎯 Better than FLUX.2 [pro], easier prompting 🎨 Consistent edits: characters, objects, styles, backgrounds 💡 Most creative FLUX model: same prompt, varied outputs that still follow https://x.com/fal/status/2000945229977829784

🚀 The GeoAI QGIS Plugin is here 🔥 You can run Moondream vision-language models, object detection, image segmentation (SAM 3), and even train your own geospatial segmentation model end-to-end. Website: https://x.com/giswqs/status/1999536028282179721

Also a very fun way to use it to easily get fun images in ChatGPT: https://x.com/sama/status/2000998310082195496

The new GPT Image 1.5 is ~ Nano Banana Pro level in my tests. Prompt: Combine the two men (Sam & Ilya) and the dog in a 2000s film camera-style photo of them looking bored at a kids birthday party. left: OpenAI Image 1.5 right: Google Nano Banana Pro (looks more like Ilya, so I https://x.com/Yuchenj_UW/status/2000997359036326290

GPT Image 1.5’s IQ is far behind Nano Banana Pro. It fails the math problem here (left: GPT, right: 🍌), also other math/physics/maze problems. Nano Banana Pro is a multimodal built on Gemini 3 Pro. I suspect GPT Image 1.5 is still stuck on the older GPT-4o architecture. https://x.com/Yuchenj_UW/status/2001023040763920870

GPT-5.2 below Opus 4.5 and Gemini 3 Pro on LiveBench https://x.com/scaling01/status/1999323401421488319

GPT-5.2 scores 152 on the Epoch Capabilities Index (ECI), our tool for aggregating benchmark scores. This puts it second only to Gemini 3 Pro. 🧵 with individual scores. https://x.com/EpochAIResearch/status/1999548496198926728

GPT-5.2 xhigh doing better than Gemini 3 Pro on MRCR long context eval https://x.com/scaling01/status/1999327512401527107

Autoregressive generation can be seen as a special case of block diffusion where the block size is just one token. @PKU1898 and @huaweitechnolgy presented a gradual way for this autoregressive (AR) → block-diffusion transition: To make it work, they: – Use an attention pattern https://x.com/TheTuringPost/status/2001697220387913818

MiniMax (Hailuo) Video Team Has Open Sourced VTP (Visual Tokenizer Pre-training)! VTP is a scalable pre-training framework for visual tokenizers, built for next-gen generative models. It challenges the conventional belief in Latent Diffusion Models that scaling the stage-1 https://x.com/MiniMax_AI/status/2000935213506171197

Holy fuck guys we’re not “”pushing hard”” for or replacing concept artists with AI. We have a team of 72 artists of which 23 are concept artists and we are hiring more. The art they create is original and I’m very proud of what they do. I was asked explicitly about concept art”” / X https://x.com/LarAtLarian/status/2001011042642505833

Introducing Wan2.6 – A native multimodal model that turns your ideas into breathtaking videos and images! · Starring: Cast characters from reference videos into new scenes. Support human or human-like figures, enabling complex multi-person and human-object interactions with https://x.com/Alibaba_Wan/status/2000930078037827972?s=20

Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥–and leads many open models on challenging industry video benchmarks. 🧵 https://x.com/allen_ai/status/2000962068774588536

Molmo 2 sets new sota in image and video tasks in open models 🔥 > comes in 3 sizes, based on SigLIP2 + Qwen3 > separate 4B model for video pointing/counting (sota!) > 💗 Apache 2.0 licensed 💗 > image + video datasets are out as well!”” / X https://x.com/mervenoyann/status/2000965892230815756

Multimodal serving pain: vision encoder work can stall text prefill/decode and make tail latency jittery. We built Encoder Disaggregation (EPD) in vLLM: run the encoder as a separate scalable service, pipeline it with prefill/decode, and reuse image embeddings via caching. This https://x.com/vllm_project/status/2000535421642502335

TurboDiffusion Accelerating Video Diffusion Models by 100-205 Times https://x.com/_akhaliq/status/2001342606450774299

Vision Bridge Transformer (ViBT) – a large-scale model based on Brownian Bridge Models for conditional generation. It’s a new kind of model that learns direct data-to-data trajectories for fast, high-quality image/video editing and stylization. We scale ViBT to 1.3B and 20B https://x.com/TheTuringPost/status/2000313966648844447

🚨BREAKING: Image Arena Shakeup @OpenAI’s gpt-image-1.5 and chatgpt-image-latest are now available in the Arena. 🥇gpt-image-1.5 is #1 in Text-to-Image (1264) 🥇chatgpt-image-latest is #1 on Image Edit (1409) 🔹gpt-image-1.5 #4 in Image Edit (1395) gpt-image-1.5 holds a https://x.com/arena/status/2001008010399994026

ChatGPT Images, a new model and product experience for imagegen: https://x.com/gdb/status/2001035840596869228

Had early access to the new ChatGPT Image 1.5. It is good, especially at single images, but I don’t think it works as well for complex slides/graphics/information as Nano Banana Pro (though GPT-5.2 Thinking seems to do better than GPT-5.2 instant, so maybe more planning helps?) https://x.com/emollick/status/2000994111541928045

I tried something fun that worked better with ChatGPT Image Generator 1.5 than Nano Banana Pro: “”Point-and-Click adventure game me. You are the parser, make images as the output and take in commands. Make the world super interesting. Keep track of inventory, state, etc”” https://x.com/emollick/status/2001047645784002822

Our new imagegen model and creation tool in ChatGPT are live! Having a blast with it. More here: https://x.com/fidjissimo/status/2000990080840949955

Compute enabled our first image generation launch (and a +32% jump in WAU over the following weeks) as well as our latest image generation launch yesterday. We have a lot more coming… and need a lot more compute. https://x.com/OpenAI/status/2001336514786017417

@OpenAI first 2 images: chatgpt new image model last 2 images: nano banana pro i used the same prompt (down below) it’s so over for openAI https://x.com/fumonzi/status/2000993574150922351

ITS LIVE photoshop-grade layering physically isolated RGBA layers with native editability 🤯 https://x.com/linoy_tsaban/status/2002038877511377393

🚨 New Open Models @Zai_org’s GLM-4.6V and GLM-4.6V-Flash are now available in the Arena. The latest open source releases adds native function calling, larger context windows, and improved coding and reasoning, marking the next step in the GLM vision model lineup. Try them https://x.com/arena/status/2000610761371267350