Multimodal: AI News Week Ending 12/26/2025

Multimodal: AI News Week Ending 12/26/2025

December 26, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Elegant seamless wrapping paper pattern in deep plum and antique gold showing musical staff notes flowing into calligraphic text which transforms into vintage photograph fragments which become expressive brushstrokes, all connected by Victorian ornamental flourishes and Art Nouveau curves, the word MULTIMODALITY integrated as damask-style monogram typography, sophisticated textile design with embossed texture quality, Liberty of London aesthetic, museum-quality decorative arts.

The dawn of a world simulator https://odyssey.ml/the-dawn-of-a-world-simulator

Visual sim2real: zero-shot deploy to the real world, with zero real data. Trained entirely in Isaac Lab. https://x.com/DrJimFan/status/2003879976173818298

Google tests 30-minute Lecture Audio Overviews on NotebookLM https://www.testingcatalog.com/exclusive-google-tests-30-minute-audio-lectures-on-notebooklm/

MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks – MiniMax News https://www.minimax.io/news/minimax-m21

Shoutout @theo for today’s live stream using M2.1! 🙌 Loved watching a real dev put it through in real-time! Key Takeaways: “M2.1 is really good at long-horizon tasks and could generate surprisingly good results” Best part? It’s 1/10 the price of Opus. Ready to be surprised? https://x.com/MiniMax_AI/status/2003673337671602378

AI transcription from handwriting is now better than human level, and a very cheap model is as good as people. There are now massive troves of old documents that could be made available for research that would have been impossible or prohibitive to transcribe before.”” / X https://x.com/emollick/status/2001676059864080577

A Redditor fed his MRI into ChatGPT and it appears to have correctly identified the cause of his sciatic leg pain. This could be a watershed moment for AI. https://x.com/reddit_lies/status/2003512194672025826

MAI-UI: Real-World Centric Foundation GUI Agents https://tongyi-mai.github.io/MAI-UI-blog/

腾讯混元 https://hunyuan.tencent.com/motion?tabIndex=0

Soul https://zhangzjn.github.io/projects/Soul/

STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits https://foivospar.github.io/STARCaster/

Onboard camera from the F.03 humanoid robot https://x.com/adcock_brett/status/2001708268469833835

Google Workspace Updates: Transform sources into structured Data Tables in NotebookLM https://workspaceupdates.googleblog.com/2025/12/transform-sources-structured-data-tables-notebooklm.html

@vllm_project @HotAisle I got MiniMax-M2.1 running on my mi300x with sglang. (I had to patch it) FP8: 55 TPS bf16: 71 TPS Still showing the same behavior where FP8 perf is worse, rather than better. so this is across both vLLM and sglang, the implementation of FP8 on mi300x is slower than bf16.”” / X https://x.com/QuixiAI/status/2005724765928210655

@vllm_project @HotAisle MiniMax-M2.1 on VLLM (FP8) gets 42 TPS. I’ll next try converting it to bf16 and see how fast it is”” / X https://x.com/QuixiAI/status/2005481942712811695

@vllm_project @HotAisle MiniMax-M2.1-bf16 gets 55.7 TPS. https://x.com/QuixiAI/status/2005502089653547174

@vllm_project @HotAisle The patch is here: https://x.com/QuixiAI/status/2005746928399827407

🚨Code Arena Update Minimax-M2.1 debuts at #1 open model on WebDev leaderboard and lands #6 overall with a 1445 score, tying with GLM-4.7. These scores come from Code Arena, where models build websites, apps, and games from a single prompt. Congrats @MiniMax__AI on this latest https://x.com/arena/status/2005779347182084585

Excited to have builders to experience M2.1 on Kilo! Let’s code smarter.”” / X https://x.com/MiniMax_AI/status/2003606223191703708

Have you checked out the new open model MiniMax M2.1 by @MiniMax__AI in the Code Arena? 👀 The next generation of live coding evals happens in the Code Arena. Built to test how models plan, scaffold, debug, and build real web apps step-by-step. Remember, your votes drive the”” / X https://x.com/arena/status/2003585316029104383

MiniMax M2.1 is live in Kilo – by Ari – Kilo Blog https://blog.kilo.ai/p/minimax-m21

MiniMax M2.1 is live on Chutes 🪂 We also ran the official Provider Verifier, results: • 100% query success • 82.83% tool calling • 95.12% tool accuracy (4 edge cases) • 100% response quality • 100% language following. Try now: https://x.com/chutes_ai/status/2005539785923072424

Seeing @dhh share his experience was both humbling and deeply encouraging! Knowing that MiniMax M2.1 can meaningfully contribute on codebases as large and complex as Ruby on Rails is a strong signal we’re moving in the right direction. It’s still far from perfect and we have to”” / X https://x.com/MiniMax_AI/status/2005536770226811014

Super excited to see Minimax 2.1 posting good results on our SWE-bench Verified, SWE-bench Multilingual and SciCode benchmarks! https://x.com/OfirPress/status/2003625671042732329

This one’s for builders, not demos. MiniMax M2.1 is now live for 30M devs on @blackboxai — appreciate the team 🙌 Let’s go further!”” / X https://x.com/MiniMax_AI/status/2003926396335460447

Want to use MiniMax M2.1 in Roo Code? It’s here for you!”” / X https://x.com/MiniMax_AI/status/2003611728320561528

We’ve evaluated @MiniMax__AI’s M2.1 on our Vals Index. Among open-weight models, it ranks #2 behind GLM 4.7, but with a lower latency and cost compared to GLM 4.7. https://x.com/ValsAI/status/2003646964664287667

AI needs to stop memorizing everything in model weights and start “seeing” like a human That’s what it’ll take to move beyond chatbots and into the physical world, believes @shawnshenjx, founder of @memories_ai and former Meta Reality Labs researcher In this interview, we talk https://x.com/TheTuringPost/status/2003530423708831788

A robot that sees the terrain and predicts its own future… up to 5 seconds ahead? This is real. ❗️Best Systems Paper finalist at #RSS2025 The team introduces a perceptive Forward Dynamics Model that helps legged robots safely navigate rough, complex environments: no manual https://x.com/IlirAliu_/status/2002092349615120757

Your robot moves fast… but objects slide off the tray? This system hears the sliding and learns how to stop it: Researchers at CMU developed a new method that uses sound to model real-world friction in motion planning. It enables time-optimized, high-speed transport without https://x.com/IlirAliu_/status/2003179502545854521

A great visual positioning system makes augmented reality feel like magic… this tech helps your phone (or robot) figure out precisely where it is in 3D space. Here’s MultiSet AI nailing it in real-time, at night, and on-device too. This new reveal shader animation shows just https://x.com/bilawalsidhu/status/2001858275738890348

Most robot foundation models still learn physics the hard way. From robot data only. This paper takes a different path. mimic-video uses large-scale internet video to learn motion and physical dynamics first, then maps that into robot actions. • Policies are grounded in https://x.com/IlirAliu_/status/2003100997065802044

strong new open 32B VLM model from Korea, which has good english and very korean benchmark scores! the Artificial Analysis score is in part due to a very high taubench score, but other bench are good as well + vision understanding also seems strong! now the fun part while https://x.com/eliebakouch/status/2005549508063559876

Survey on vision encoders in VLMs. The encoder side is weirdly understudied: everyone’s busy scaling the LM while recycling the same 400M CLIP from 2021. We looked at 70+ models and found that training methodology beats scale. A well-trained 400M encoder outperforms a 6B one. https://x.com/JinaAI_/status/2005646823201951849