Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Elegant repeating damask pattern combining vintage camera aperture blades, lens glass elements, and delicate RGB pixel grids arranged like botanical wallpaper in deep burgundy, antique gold, and cream tones with subtle embossed texture, the word IMAGES woven as decorative monogram, sophisticated gift wrap design in the style of Liberty of London textiles and Victorian photographic ephemera.

The dawn of a world simulator https://odyssey.ml/the-dawn-of-a-world-simulator

AI transcription from handwriting is now better than human level, and a very cheap model is as good as people. There are now massive troves of old documents that could be made available for research that would have been impossible or prohibitive to transcribe before.”” / X https://x.com/emollick/status/2001676059864080577

腾讯混元 https://hunyuan.tencent.com/motion?tabIndex=0

Soul https://zhangzjn.github.io/projects/Soul/

STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits https://foivospar.github.io/STARCaster/

Animate Any Character in Any World”” TL;DR: users’ provided 3DGS scene along with a 3D or multi-view character -> enabling interactive control of the character’s behaviors and active exploration of the environment through natural language commands https://x.com/Almorgand/status/2003518454280687885

FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision”” TL;DR: transformer-based 3D portrait animation model with learnable data source tokens, so-called bias sinks, which enables unified training across monocular and multi-view datasets. https://x.com/Almorgand/status/2003153695765336468

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos”” TL;DR: canonical frame selection;image to 3D (static 3D Gaussian);set of randomly sampled camera poses to fine-tune a lightweight image2pose estimator; camera pose estimator to optimize a deformable 3d object model https://x.com/Almorgand/status/2001695549259747415

There’s something so magical about turning 2d videos into 4d reconstructions. Every video becomes a spatio-temporal portal back in time – one that you can revisit from any angle. Research like d4rt is turning science fiction into reality; and it’s getting fast enough to run in https://x.com/bilawalsidhu/status/2003698903838003685

Virtually Being : Customizing Camera-Controllable Video Diffusion Models with MultiView Performance Captures TLDR: multiview character consistency; 3D camera control in video diffusion models; character trained via 4DGS,lighting variability obtained with a video relighting model https://x.com/Almorgand/status/2002069630622507504

Researchers proposed Sample-Efficient Modality Integration (SEMI), which plugs any pretrained encoder (image, audio, video, sensors, graphs) into an LLM using one projector plus LoRA adapters generated from a handful of paired examples. Trained on data-rich domains, SEMI https://x.com/DeepLearningAI/status/2003593131132916204

I think everyone, even the most cynical & informed among us, is going to fall for at least one AI-faked story, photo, or post this coming year & likely many more. (You will also likely believe a real thing was AI) This has bad implications (but wouldn’t blame those taken in) https://x.com/emollick/status/2001669491629977943

3d artists are discovering the power of nano banana pro uv unwrapped textures in one prompt”” / X https://x.com/bilawalsidhu/status/2002757896782934308

I wrote about how the jagged abilities of AI lead to bottlenecks in what AI can do… … but those bottlenecks focus the efforts of AI labs leading to breakthroughs that unlock new areas of work, like how Nano Banana Pro unexpectedly makes good PowerPoint. https://x.com/emollick/status/2002435094276157674

Nano Banana Pro is so important not because it is a really good image generator, but because a really good image generator unexpectedly unlocks a lot of new AI abilities, like the fact that AI can now research & generate compelling slides. On bottlenecks: https://x.com/emollick/status/2003121163506229656

Midjourney continues to be the AI image generator with the strongest opinions of what the human role in imagegen should be, focusing much more on tools for guidance, curation & creating variation among many options, rather than trying to nail instruction following from text”” / X https://x.com/emollick/status/2002971587130069255

wtf?? a distilled model that increases the quality? 🤯 @fal made an 8-step turbo version of FLUX.2 [dev] that ranked higher in quality on the artificial arena leaderbord 🏎️💨 I’ve just created a demo for it on @huggingface for you to play on your holidays 🤗”” / X https://x.com/multimodalart/status/2005752030669987989

🎁 New year’s gift to the community We’re open-sourcing FLUX.2 [dev] Turbo, our in house distilled version of FLUX.2 🚀 🏆️ #1 ELO open-source image model (on Artificial Analysis arena) ⚡️ Sub-second generation 🧪 Custom variant of DMD2 distillation for max quality https://x.com/fal/status/2005690257979707496

🥳 Qwen-Image-Edit-2511 on 🍞 TostUI 🧪 Big thanks to @camenduru !”” / X https://x.com/Alibaba_Qwen/status/2003753784527507781

Qwen-Image-Edit-2511-Lightning https://x.com/_akhaliq/status/2003601664675316051

You can now train LoRAs for @Alibaba_Qwen Qwen Image Edit 2511 with AI Toolkit. I also trained a 3bit Accuracy Recovery Adapter for it that will allow you to finetune it at 3bit with <24GB of VRAM. https://x.com/ostrisai/status/2003808898189611491

Vibe coding, but for robotics. Fully generated. The setup: • Designed with Nano Banana Pro • Built using Gemini 3 • Generated from high-level intent, not low-level code This robot arm simulation can stack cubes and build walls. No traditional programming. No hand-written https://x.com/IlirAliu_/status/2001593025881944387

Generative Refocusing: Flexible Defocus Control from a Single Image”” TL;DR: two-step process; DeblurNet to recover all-in-focus images from various inputs+BokehNet for controllable bokeh;semi-supervised training. https://x.com/Almorgand/status/2003140933223919815

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading