Images: AI News Week Ending 07/04/2025

Images: AI News Week Ending 07/04/2025

July 4, 2025

Image created with OpenAI GPT-Image-1. Image prompt: rich crimson, bright ivory, deep navy Independence-Day palette, vibrant, celebratory, wholesome, authentic, photorealistic red-white-blue kite festival against cobalt sky scene featuring a camera drone capturing fireworks overhead; natural lighting, subtle film grain, high detail

The race for LLM “cognitive core” – a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.
Its features are slowly crystalizing:

– Natively multimodal text/vision/audio at both input and output.
– Matryoshka-style architecture allowing a dial of capability up and down at test time.
– Reasoning, also with a dial. (system 2)
– Aggressively tool-using.
– On-device finetuning LoRA slots for test-time training, personalization and customization.
– Delegates and double checks just the right parts with the oracles in the cloud if internet is available.

It doesn’t know that William the Conqueror’s reign ended in September 9 1087, but it vaguely recognizes the name and can look up the date. It can’t recite the SHA-256 of empty string as e3b0c442…, but it can calculate it quickly should you really want it.

What LLM personal computing lacks in broad world knowledge and top tier problem-solving capability it will make up in super low interaction latency (especially as multimodal matures), direct / private access to data and state, offline continuity, sovereignty (“not your weights not your brain”). i.e. many of the same reasons we like, use and buy personal computers instead of having thin clients access a cloud via remote desktop or so.https://x.com/karpathy/status/1938626382248149433

Mandelbrot in x86 assembly by Claude https://simonwillison.net/2025/Jul/2/mandelbrot-in-x86-assembly-by-claude/

Runway now has its sights on the video game industry with its new generative AI platform https://www.engadget.com/ai/runway-now-has-its-sights-on-the-video-game-industry-with-its-new-generative-ai-platform-192350294.html

🚨 NEW LABS EXPERIMENT 🚨 Introducing Doppl, a new mobile app that lets you upload a photo or screenshot of an outfit and then creates a video of you wearing the clothes to help you find your ✨aesthetic ✨ Available on iOS and Android in the US to users 18+, download the https://x.com/GoogleLabs/status/1938284886277951916

Try on looks and discover your style with Doppl
https://blog.google/technology/google-labs/doppl/

Earlier this month, we launched the Image Edit Arena. Today, the Image Edit Leaderboard 🏆 goes LIVE, powered by more models and all your community votes. 🏆 In 1st place: GPT-Image-1 by @OpenAI 💠 2nd-4th: Flux 1 Kontext Max, Pro & Dev by @bfl_ml 💠 5th: Gemini 2.0 Flash https://x.com/lmarena_ai/status/1940795298449924220

Meet Higgsfield Soul. Our new high-aesthetic photo model. 50+ curated presets, fashion-grade realism. This will make you throw away your iPhone. Retweet this post to get a full guide in your DMs. Wild examples below: https://x.com/higgsfield_ai/status/1937931727084917097

Meet Qwen-VLo, your AI creative engine: • Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals • On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands • Global-Ready: Generate image in multiple languages • Progressive https://x.com/Alibaba_Qwen/status/1938604105909600466

Introducing Higgsfield Soul Inpaint.
Same Soul-style high aesthetic, now with pixel-perfect control.
Inpaint anything you want: clothes, hair, objects, and keep the Soul. https://x.com/higgsfield_ai/status/1940835284104761454

Blockbench MCP is here! 🔥 Let’s create a sci-fi sniper with animations in under 3 minutes using AI. Get ready for HYTOPIA -the future is now. https://x.com/PhaxyHytopian/status/1936293530101575756

One of the most useful MCP tools out there: ultra-precise background removal (use it directly from your chat) with this MCP 𝚑𝚝𝚝𝚙𝚜://𝚗𝚘𝚝-𝚕𝚊𝚒𝚗-𝚋𝚊𝚌𝚔𝚐𝚛𝚘𝚞𝚗𝚍-𝚛𝚎𝚖𝚘𝚟𝚊𝚕.𝚑𝚏.𝚜𝚙𝚊𝚌𝚎/𝚐𝚛𝚊𝚍𝚒𝚘_𝚊𝚙𝚒/𝚖𝚌𝚙/𝚜𝚜𝚎 https://x.com/abidlabs/status/1939778684388614303

Introducing @SuperDesignDev The first open-source Design Agent that lives inside Cursor, Windsurf, or any IDE: 👉 Instantly spin up 10 different design in parallel 👉 Explore, fork, and iterate 10x faster 👉 Create new or iterate on your existing UI — all fully local https://x.com/jasonzhou1993/status/1937838762320626109

SnapMoGen: Human Motion Generation from Expressive Texts https://snap-research.github.io/SnapMoGen/

The study asks if models like GPT‑4o truly understand images. It finds they juggle many jobs yet still trail task‑focused vision tools. Past tests could not match chat models with pixel specialists fairly. The authors turn every benchmark into quick yes‑no image checks any API https://x.com/rohanpaul_ai/status/1941086082679951554

Flux Kontext just dropped on @huggingface and it’s 🔥 https://x.com/fdaudens/status/1938325622725530040

FLUX.1 Kontext just landed on the Hub https://x.com/fdaudens/status/1938258297381130565

Nano is a depth-aware atmospheric haze plugin that uses ML depth estimation to add physically accurate fog and light scattering to your footage. Works *best* on log footage with visible light sources – it analyzes scene highlights then creates airlight (atmospheric scatter) and https://x.com/bilawalsidhu/status/1938421841753772434

Flux1.Kontext runs on your laptop with MFLUX + MLX: https://x.com/awnihannun/status/1938947706350903401

Autoregressive image generation models lack robust token-level watermarking, as re-tokenizing generated images significantly alters the token sequence due to poor reverse cycle-consistency. This paper from @AIatMeta adapts LLM watermarking for images, enabling reliable, robust https://x.com/rohanpaul_ai/status/1938840284860948608

visual reasoning is now in @huggingface transformers 🔥 GLM-4.1V-Thinking is just released and merged into transformers, we gave it a vibe test run 🤠 it’s very good, comes with 64k context length and MIT license 😍 it supports 4k image tokens and any aspect ratio as well! https://x.com/mervenoyann/status/1940358096552902675

Chatbots are a limitation and AI image tools, especially Midjourney, are much further along in developing new UX/UI approaches to working with AI that better take advantage of key AI strengths (it can create many variations) & the key human roles of vision, curation & selection. https://x.com/emollick/status/1938983583323992161

Strange cities. (I find working with Midjourney video to be really interesting, the ability to develop weird styles especially) https://x.com/emollick/status/1940163669276729380

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution”” TL;DR: Depth estimation (2044×1148) streaming video at 24 FPS; careful modifications of pretrained single-image depth models, these capabilities are enabled with relatively little data and training. https://x.com/Almorgand/status/1939724839004037617

SynMotion https://lucaria-academy.github.io/SynMotion/