Images: AI News Week Ending 05/01/2026

Images: AI News Week Ending 05/01/2026

May 1, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: High-end product photograph of a tall layered ice cream parfait whose stacked strata each resemble a different scenic photograph — sky-blue cream with whipped cloud peaks, strawberry-sauce sunset, chocolate mountain ridge, vanilla portrait swirl on top — wrapped in a crisp white paper sleeve with bold red retro lettering reading ‘IMAGES’, a small red plastic spoon beside it engraved ’75 — Milford, DE’, soft directional studio light, shallow depth of field, glossy macro detail, landscape composition.

Google prepares credit system for Gemini and new image tools
https://www.testingcatalog.com/google-prepares-credit-system-for-gemini-and-new-image-tools-2/

a gallery of shoes, where each shoe is under a painting & is styled matched to that painting: Starry Night, The Bathers, The Girl with the Pearl Earring, The Bayeux Tapestry, Klint’s Grupp Svanen nr 17, Kandinsky’s Swinging, The Garden of Earthly Delights”” “”now the full
https://x.com/emollick/status/2047162748513984570

This is a useful image for thinking about the curve we are on and what likely comes next in an intuitively understandable way.
https://x.com/emollick/status/2048126759648862571

@NVIDIA Nemotron 3 Nano Omni is now on Together AI. Enterprise multimodal AI — video, audio, image, documents & text — optimized for speed and scale. ✅ ~3B active params, 9x higher throughput ✅ Fully managed, zero infra headache ✅ Secure, zero-trust architecture Build
https://x.com/togethercompute/status/2049160446708711883

Excited to support @NVIDIA Nemotron 3 Nano Omni, now available on Fireworks. It’s the first open model that handles vision, audio, video, and text in a single inference loop. Built for multimodal sub-agents at scale, with 9× higher throughput than Qwen3 30B. 256K context. Now
https://x.com/FireworksAI_HQ/status/2049159136802398546

Introducing @NVIDIA Nemotron 3 Nano Omni. NVIDIA Nemotron 3 Nano Omni is an open multimodal foundation model that unifies audio, images, text, and video into a single context window. It powers subagents for use cases like computer-use agent, document intelligence, and video and
https://x.com/baseten/status/2049160818575749300

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents
https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇
https://x.com/NVIDIAAI/status/2049159441870717428

NVIDIA Nemotron 3 Nano Omni is now live on fal, available at launch. A single model for multimodal agents: 🔁 text, image, video, audio in one loop 🧠 1 context reasoning across complex workflows ⚡️ ~9× higher throughput with fewer inference hops Built for real-world agent
https://x.com/fal/status/2049160999442198632

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning.
https://x.com/OpenRouter/status/2049164366218772526

NVIDIA releases Nemotron-3-Nano-Omni, a new 30B open multimodal MoE model. Nemotron-3-Nano-Omni-30B-A3B is the strongest omni model for its size and supports audio, video, image and text. Run on ~25GB RAM. GGUF:
https://t.co/t4COCqVrLS Guide:
https://x.com/UnslothAI/status/2049161390150365344

GPT-5.5 + GPT-Image-2 is becoming one of the best combos for building apps! @dkundel breaks down why it works so well. We built those learnings into the Build Web Apps plugin, so Codex can handle the design-to-app loop for you. 👌
https://x.com/romainhuet/status/2049597180474970179

Age-worn, damaged images can now turn into 4k with just one prompt using Chatgpt. & Its Free 💸 Made on Chatgpt, GPT-2 Prompt: 👇🏻
https://x.com/doctorwasif/status/2048014890028486904

ChatGPT Images 2.0 (Pro) creates a photo of a Rubik’s Cube–albeit in a very simple state–resting on a mirror. This is a surprisingly hard task. All other variants I’ve seen posted on here have invalid color states, and adding even a second move to this prompt will make it fail.
https://x.com/goodside/status/2047728776520298646

ChatGPT Images 2.0 can now generate really cool UI for your apps/games with TRANSPARENCY! Previously my biggest concern with the new images model was that it could not add transparency – but the ChatGPT team listened and bought this back 😀 Experimenting more, stay tuned!
https://x.com/anulagarwal/status/2048661392472096960?s=20

ChatGPT Images 2.0 explains “Tenet” in a simple way!
https://x.com/umesh_ai/status/2048050643001282571

GPT 2 is totally insane… 🙀⚡️ I asked for a prehistoric predator and it built an entire museum around it. This is not just an image. It feels like discovering history.🤯 Prompt Drop ⤵️
https://x.com/Preda2005/status/2047556362755018960

GPT Image 2 is also great for summarizing books or scientific essays through highly visual, detailed infographics. Here I asked it for an infographic on On the Origin of Species by Charles Darwin.
https://x.com/Artedeingenio/status/2047773399447929039

GPT Image 2 on ChatGPT Prompt Create a visually rich infographic about an endangered animal. Start by finding one online, research its habitat, diet, and unique traits. Present information through annotated visuals and structured callouts, not generic sections. Style it like a
https://x.com/harboriis/status/2047704250327920716

I was curious how much the new ChatGPT image model would vary in its outputs given the same detailed prompt to make a math explainer infographic. The result: quite a bit! If it’s something important to you, try generating it a couple times, even if the first one looks great.
https://x.com/doodlestein/status/2048428001281388961

This horse-riding astronaut is a milestone on AI’s long road towards understanding | MIT Technology Review
https://www.technologyreview.com/2022/04/06/1049061/dalle-openai-gpt3-ai-agi-multimodal-image-generation/

Yeah okay, Lego bros, brodettes and brotheys are cooked with this one. GPT 2 Image can create full Lego sets! With actual Bricklink IDs so you can order the parts and build it. Whole new business opportunity here for the taking.
https://x.com/dennisonbertram/status/2048413815675539816?s=46

You can try this: Turn any photo into a beautiful woodcut/linocut style, GPT Image-2 does a great job with details, expressions. Perfect thing for the profile picture or the family photo. Or why not a gift? Try for yourself, full prompt below ⤵️
https://x.com/LinusEkenstam/status/2047945401387397317

i just asked @heyglif use GPT Image 2 and Seedance 2.0 to create Elegant but chaotic Grandma wearing a pearl necklace over her yoga outfit is trying tree pose on the shiny silver hood of a vintage white 1980s Rolls-Royce Silver Shadow parked outside a fancy country club. Her
https://x.com/awesome_visuals/status/2047609881104953658

GPT Image 2 x Seedance 2.0 x Magnific It’s crazy how you can turn a shower thought into a realistic cinematic clip! ⬇️the workflow I used blew:
https://x.com/_OAK200/status/2047616640448078167

FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation”” TL;DR: generates high-quality animatable 3D Gaussian head avatars from few images using a feed-forward transformer and lightweight deformation network
https://x.com/Almorgand/status/2047339475345281341

Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction”” TL;DR: uses vision transformers to predict per-pixel geometry and fit a 3D morphable face model from a single image, achieving strong accuracy across poses and expressions
https://x.com/Almorgand/status/2048785011587858685

Wow! Vision banana confirms that image generators are great generalist vision learners. Pretraining alone gets you zero shot segmentation, depth, normals etc – while beating out specialist models. If you teach a model to draw – you also teach it to see!
https://x.com/bilawalsidhu/status/2047332162404323465

Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page:
https://t.co/GQgRi6mWwC (1/5)
https://x.com/songyoupeng/status/2047312019976785944

Canonical and NVIDIA are collaborating to make NVIDIA Nemotron™ 3 Nano Omni easier to deploy on Ubuntu. With Canonical inference snaps, teams can go from setup to a working runtime in a single command – no complex integration required. Less time spent on infrastructure, more
https://x.com/Canonical/status/2049159988174602712

[2604.20329] Image Generators are Generalist Vision Learners
https://arxiv.org/abs/2604.20329

Our 2.0 image model is so good at making screens and vision mocks. Something about AI generated images of digital surfaces feels very “right” to me. Internally, I’ve started seeing tons of product ideas shared and brought to life via image generation rather than prototyping —
https://x.com/TheRohanVarma/status/2048985585000563009

Images 2.0 really got over some important qualitative threshold for me that I didn’t know existed.
https://x.com/sama/status/2047349336263012771

Kind of cool how we get a major step up in AI video every time image models improve. Insane work here with GPT Image 2 -> Seedance 2. Can’t wait until we can each stream our own game like this 👀
https://x.com/venturetwins/status/2047820435543437630

This is amazing. OpenAI is incredibly responsive to feedback. I noticed the 360° image trend after GPT Image 2, commented about it to @JustinBleuel on April 22, and we got a new feature five days later from @adele__li. Now let’s see what virtual worlds people build and share!
https://x.com/_simonsmith/status/2049118133495947445

We are committed to continually improving the GPT Image 2 model! I am actively fixing various issues from the community feedback. Just reply or DM me your GPT conversation! Features like 2K or 4K images are already available via the experimental API. Hope you enjoy the model!
https://x.com/BoyuanChen0/status/2047738501647728937

Microsoft Presents “”TRELLIS.2″”: An Open-Source, 4B-Parameter, Image-to-3D Model producing up to 1536³ PBR textured assets. Built On Native 3D VAES With 16× Spatial compression, delivering efficient, scalable, high-fidelity asset generation. Ngl, pretty cool!
https://x.com/kimmonismus/status/2049099376476459372

World-R1 | Reinforcing 3D Constraints for Text-to-Video Generation
https://microsoft.github.io/World-R1/

Can a horse ride an astronaut? – by James McCammon
https://www.96layers.ai/p/can-a-horse-ride-an-astronaut