Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the deep midnight navy car hood, shallow depth-of-field sky background, chrome pedestal base, dramatic upward angle, and automotive advertisement lighting exactly as shown. Replace only the Mercedes star with a single chrome hood ornament in the form of an elegant rectangular window frame with empty center, rendered in polished metal at realistic ornament scale, mounted on the same pedestal base. Add bold white sans-serif display text reading AR/VR across the upper portion of the image in clean automotive advertisement style.

MolmoPoint: Better pointing architecture for vision-language models | Ai2 https://allenai.org/blog/molmopoint

spent some time today playing with MolmoPoint it’s pretty crazy that we can use VLMs for multi-object tracking now instead of spelling out coordinates as text, it points by directly selecting parts of its own visual features prompt: “”Track blue players.””
https://x.com/skalskip92/status/2034606226902827228

People are undoubtedly a little alarmed at having unwittingly helped build a 3D map of the world for Niantic by contributing 30 billion crowdsourced images. I interviewed Niantic’s CTO Brian McClendon about exactly this in a TED interview last year — he’s also the guy who
https://x.com/bilawalsidhu/status/2033350363982471182

Announcing NVIDIA DLSS 5, an AI-powered breakthrough in visual fidelity for games, coming this fall. DLSS 5 infuses pixels with photorealistic lighting and materials, bridging the gap between rendering and reality. Learn More → https://x.com/NVIDIAGeForce/status/2033617732147810782

DLSS 5 is completely mind blowing. The neural rendering model with photoreal lighting and materials is a generation step up in visual fidelity. Gaming with DLSS 5 feels like future tech, but its possible now. It is truly incredible. 🤯
https://x.com/GeForce_JacobF/status/2033615891045454112

DLSS 5 might be the moment where the anti AI pendulum starts swinging back. Many in the 3D community who were against generative AI are now pushing back on the “”everything is AI slop”” crowd. The pendulum swung too far and they can feel it. Nice to see the rebalancing.
https://x.com/bilawalsidhu/status/2034281398052274666

Here’s everything we know about Nvidia’s “”greatest leap in graphics since real-time ray tracing”” You can see Digital Foundry’s jaw drop in this reaction after they just saw DLSS 5.0: – Will ship in Fall of 2026! – Demo ran 4k on 2 5090’s but is already running on single GPU in
https://x.com/Grummz/status/2033641075806769382

A breakthrough in real-time video generation. As a research preview developed with @NVIDIA and shared at @NVIDIAGTC this week, we trained a new real-time video model running on Vera Rubin. HD videos generate instantly, with time-to-first-frame under 100ms. Unlocking an entirely
https://x.com/runwayml/status/2034284298769985914#m

NVIDIA GTC 2026 Keynote: Everything That Happened in 12 Minutes – YouTube https://www.youtube.com/watch?v=X2i_8O75_Os

Schibsted open sources AI tool that turns news articles into videos | Schibsted https://schibsted.com/news/schibsted-open-sources-ai-tool-that-turns-news-articles-into-videos/

DoorDash’s New Paid Tasks Turn Couriers Into AI and Robot Trainers – Bloomberg https://www.bloomberg.com/news/articles/2026-03-19/doordash-s-new-paid-tasks-turn-couriers-into-ai-and-robot-trainers

Someone used Suno AI to generate a Japanese metal band called Neon Oni. Fake member bios, AI-generated music videos, “”Based in Tokyo”” on Spotify. 80,000+ monthly listeners. Fans had it in their Spotify Wrapped top 5. Merch was selling. Then, community sleuths exposed it. Traced
https://x.com/TheRundownAI/status/2033568236227244451?s=20

Probably the most current look at Palantir’s maven smart system software. Here’s the DoW’s Chief AI officer showing how it works:
https://x.com/bilawalsidhu/status/2032432668105712093

Tracking unregistered dark ships is notoriously difficult and expensive. But a new automated system uses existing underwater internet cables to passively detect them. Here’s the breakdown:
https://x.com/yohaniddawela/status/2031705951552647195

LiTo: Joint Geometry and Appearance Modeling for Image-to-3D Generation TL;DR: Generates high-fidelity 3D objects from a single image by jointly modeling geometry + view-dependent appearance (lighting, reflections) in a unified latent space
https://x.com/Almorgand/status/2033987312451731904

Google Maps 3d basemap and navigation experience just became a lot more immersive 😍
https://x.com/bilawalsidhu/status/2032122828992962704

Introducing our biggest upgrade to @googlemaps since the original launch, featuring Ask Gemini (with personalization), Immersive Navigation, and much more!! 🗺️
https://x.com/OfficialLoganK/status/2032101245763149908

it’s all over when google realizes the treasure trove that is street view + aerial data and launches a version of genie grounded in the real world…
https://x.com/bilawalsidhu/status/2033954619181654114

DVD: Dynamic Video Depth”” TL;DR: Recovers temporally consistent depth from monocular videos using diffusion priors + geometric constraints, handling dynamic scenes and motion robustly.
https://x.com/Almorgand/status/2034349445601538057

And 2.3 years later we have DLSS on steroids
https://x.com/bilawalsidhu/status/2033752195095535801

DLSS 5 casually solved the fancy coat of paint part of this vision
https://x.com/bilawalsidhu/status/2034131183353643289

DLSS 6 mode on about to take greyboxed 3d assets to final render. Ai video-to-video foreshadowed this; many said it could never happen in real time. Yet here we are.
https://x.com/bilawalsidhu/status/2033898489952841763

So proud of DLSS5: Fully generative neural rendering, in real-time, in real games. Mind-blowing realism. A whole new generation of real-time graphics. A decade of continuous research and development. Coming soon to PCs everywhere. 💚
https://x.com/ctnzr/status/2033613807105544666

NVIDIA thanks all its partners: the message? There is no way around NVIDIA. NVIDIA is the center of the revolution.
https://x.com/kimmonismus/status/2033615181415387610

Straight from NVIDIA GTC: Jensen Huang just unveiled a new vision for AI infrastructure For the first time, Rubin GPUs+Groq LPUs are paired: > 35× higher inference throughput > 10× more revenue from trillion-parameter models Architecture & why it’s needed
https://x.com/TheTuringPost/status/2033700480975520097

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
https://x.com/karpathy/status/2034321875506196585

AI assistants are mapping our inner world. Spatial intelligence is mapping the outer one. Layer them together and this is what you see:
https://x.com/bilawalsidhu/status/2033225076116529543

Big moment. You can cross the uncanny valley in video games by using real-time video-to-video AI. You get the best of coherence & control from classical 3d engines, then use generative AI to take it all the way.
https://x.com/bilawalsidhu/status/2033627865300816326

One of the most prescient scenes in movie history. Radio frequency (RF) is the next big modality for spatial intelligence.
https://x.com/bilawalsidhu/status/2033009623817416955

When your 3d scan is like replaying a dream
https://x.com/bilawalsidhu/status/2033257427773095936

At this nerdiest of all nerdy sessions 💞, Jeff Dean said he doesn’t think we’re running out of data. “I think there’s still an enormous amount of data in the world that we haven’t really used yet for training these models. We train on some video data, for example, but there’s a
https://x.com/TheTuringPost/status/2034411360302567803

Learning from robot data? Standard. Direct Video-Action Models (DVA) is different: treat robot control as video generation, then translate the generated video into actions. Built by @rhoda_ai_, the system pre-trains causal video models from scratch and can run complex
https://x.com/IlirAliu_/status/2032742738853048413

Video generation might be a much better backbone for robot learning than image-text models. DiT4DiT couples a video Diffusion Transformer with an action Diffusion Transformer, letting robot policies learn directly from spatiotemporal video dynamics instead of static visual
https://x.com/IlirAliu_/status/2032380216962691114

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading