ByteDance: AI News Week Ending 11/21/2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic wide shot of an opulent Emerald City ballroom with towering green columns, where luminous ribbons of video thumbnails and data streams dance in mid-air, each ribbon outlined with glowing object segmentation masks revealing TikTok and algorithm symbols, moody dramatic lighting with green and gold accents, the category name BYTEDANCE appears as an ornate movie title overlay

If you work with robotics, AV, or 3D vision, this update will save you months of engineering. Most models need complex engineering to get reliable 3D geometry. This one does it with a plain transformer. Depth Anything 3 is the new model from @BytedanceTalk that predicts stable, https://x.com/IlirAliu_/status/1989622721366446190

Depth Anything 3 proves most 3D vision research has been overengineering the problem. Vanilla DINOv2 transformer + depth-ray pairs crushes SOTA by 44% on pose, 25% on geometry. One approach for SOTA monocular depth, multi-view geometry, pose estimation, and novel view synthesis”” / X https://x.com/bilawalsidhu/status/1989444908357488832

ByteDance-Seed/Depth-Anything-3: Depth Anything 3 https://github.com/ByteDance-Seed/Depth-Anything-3

Depth Anything 3 is here! It’s a beefy one! https://x.com/Almorgand/status/1989370456131215514

After a year of team work, we’re thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 https://x.com/bingyikang/status/1989358267668336841