Ethan B. Holland

Over 54,400 manually organized AI links and counting

Multimodal: AI News Week Ending 02/27/2026

February 27, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: 1980s NORAD war room with a silhouetted operator facing a wall of glowing CRT monitors displaying conflicting streams of audio waveforms, corrupted video static, fragmented text readouts, and biometric graphs all glitching simultaneously, dark cinematic lighting with amber and blue screen glow, massive bold red sans-serif text MULTIMODALITY across the top, retro vector graphics, high contrast, foreboding techno-thriller atmosphere.

OpenAI Plans to Price Smart Speaker at $200 to $300, as AI Device Team Takes Shape — The Information https://www.theinformation.com/articles/inside-openai-team-developing-ai-devices

An interactive world model developed by NVIDIA in collaboration with academic partners. – DreamDojo turns egocentric human video data into physical intelligence. – Human data is more scalable than robotics data but lacks action labels. – To solve this, a dedicated action model
https://x.com/TheHumanoidHub/status/2025368793321799909

Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It’s Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is
https://x.com/DrJimFan/status/2024895359236051274

With enhanced reasoning, Nano Banana 2 can carry out complex requests, capturing the specific nuances of your idea, just as you imagined it. 🧠
https://x.com/GoogleDeepMind/status/2027051581300969755

NVIDIA has open-sourced SONIC, a humanoid behavior foundation model that gives robots a core set of motor skills learned from large-scale human motion data. https://x.com/TheHumanoidHub/status/2024935738362765677

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control https://nvlabs.github.io/GEAR-SONIC/

We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC ( https://x.com/yukez/status/2024639427788857707

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We
https://x.com/DrJimFan/status/2026709304984875202

What can half of GPT-1 do? We trained a 42M transformer called SONIC to control the body of a humanoid robot. It takes a remarkable amount of subconscious processing for us humans to squat, turn, crawl, sprint. SONIC captures this “”System 1″” – the fast, reactive whole-body
https://x.com/DrJimFan/status/2026350142652383587

world modeling is never about rendering pixels. rendering is local. world state is global. as soon as more than one agent exists, the only thing that truly matters is the shared representation beneath individual views. that shared representation is what scales into collective
https://x.com/sainingxie/status/2027115356318474661

Dang. My WorldView project is blowing up and is trending on X. I guess ppl really like monitoring the situation. Inbound is a little nuts — got hedge funds and OSINT folks ready to contribute; keep the feature requests coming! Been fun to put my geospatial 3D roots to work.
https://x.com/bilawalsidhu/status/2024953470806102510

Explore any world. Tell any story. All in one place. Kling 3.0 is now available in both Runway Workflows and Tool Mode. Discover all of the new models and capabilities available right inside of Runway at the link below. Morningstar Generated with AI. Made by @ceremonial_flux
https://x.com/runwayml/status/2025977383208051018

Generated Reality Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control paper: https://x.com/_akhaliq/status/2025944948453847352

Introducing Solaris: the first multiplayer world model exploration effort in Minecraft. We’ve built a scalable data collection engine, a multiplayer video diffusion model architecture, and a multi-view consistency evaluation benchmark. [1/9]
https://x.com/georgysavva/status/2027119472096518358

Marble is a generative AI platform and multimodal world model developed by World Labs, the spatial intelligence company founded by AI pioneer Fei-Fei Li. It allows users to create high-fidelity, persistent, navigable 3D worlds from simple inputs like: – Text prompts – Single or
https://x.com/TheHumanoidHub/status/2024935236057137640

My site hit #25 in rising tech publications. I’m mapping the frontier of creation & computing. Written + video deep dives on generative media, spatial intelligence and world models. Check it out https://x.com/bilawalsidhu/status/2026108063632216492

Physical Intelligence’s π0.6 models in real-world use cases Weave (left): Autonomous laundry folding Ultra (right): E-commerce packaging The models are built on a Vision-Language-Action (VLA) framework.
https://x.com/TheHumanoidHub/status/2026455516034306150

What is self-evolution trilemma? In an ideal world, an AI system where agents learn only from each other would have3 properties: – Continuous self-evolution – Isolation, meaning running in a closed loop, without outside interference – Stable safety alignment (safety invariance)
https://x.com/TheTuringPost/status/2024621675866935495

How come this is so difficult? I thought OCR would be one of the first thing to be benchmaxxed given that synthetic data is so cheap
https://x.com/gabriberton/status/2026335831632626156

OlmOCR-Bench by @allen_ai is now a @huggingface Benchmark dataset 🏆 Add your model to this benchmark by adding a yaml file to your model repo 🤝 Find benchmark and docs on the next one ➡️
https://x.com/mervenoyann/status/2025908932691017983

Ran the same OCR models on 68 pages of historic newspaper. Every model hallucinated or looped. DeepSeek-OCR-2, LightOnOCR-2, GLM-OCR – all melt down on dense newspaper columns. You can try yourself using this @huggingface dataset: https://x.com/vanstriendaniel/status/2025930991387164919

The Qwen 3.5 Medium Models are in the Arena! 3.5-27B, 3.5-35B-A3B and 3.5-122B-A10B are ready for you in the Text, Vision and Code Arena! Let’s see how they stack up with less compute. Bring your toughest prompts and don’t forget to vote.
https://x.com/arena/status/2026716550812807181

✨ Run it now with SGLang！Chong!
https://x.com/Alibaba_Qwen/status/2026348924433477775

📊With all the Qwen-3.5 scores out for Text, Code and Vision, let’s compare the evolution of Qwen-3.5 (397B-A17B) vs Qwen-3.0 (235B-A22B). This is a +24 rank jump in Text. Specially where Qwen-3.5 gains the most: Text: – Overall (+24: #19 vs #43) – English (+25: #21 vs #46) –
https://x.com/arena/status/2026404630297719100

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment！ Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face：
https://x.com/Alibaba_Qwen/status/2026682179305275758

🚩Qwen3.5 INT4 model is now available! https://t.co/rY5GrT3b60 @Alibaba_Qwen @JustinLin610
https://x.com/HaihaoShen/status/2026208062009426209

A big jump in intelligence-per-watt today: “”Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507″”
https://x.com/awnihannun/status/2026353100144218569

Huge thanks to the @vllm_project for the Day-0 support on the Qwen3.5 Medium Series 🚀
https://x.com/Alibaba_Qwen/status/2026496673179181292

Minimax M2.5 GGUFs (from Q4 down to Q1) perform poorly overall. None of them come close to the original model. That’s very different from my Qwen3.5 GGUF evaluations, where even TQ1_0 held up well enough. Lessons: – Models aren’t equally robust, even under otherwise very good
https://x.com/bnjmn_marie/status/2027043753484021810

Qwen 3.5 family is here! > vision built-in, and can outperform previous VL models > designed to be more efficient > expanded support for more languages 35B: (fits on 24GB+ system) ollama run qwen3.5:35b 122B: ollama run qwen3.5:122b 397B (cloud only): ollama run
https://x.com/ollama/status/2026598944177009147

Qwen3.5-35B-A3B is now in Jan 🔥
https://x.com/Alibaba_Qwen/status/2026660582221558190

Qwen3.5-35B-A3B is now live in LM Studio 🚀
https://x.com/Alibaba_Qwen/status/2026496880285462962

Taken at face value, this is… somewhat catastrophic for MoEs, as @YouJiacheng notes. By right, a 397B-A17B ought to have a higher “”power level”” than a dense 27B. Also a big W for Qwen’s integrity and HLE eval quality, I guess. 397B is certainly better at memorization.
https://x.com/teortaxesTex/status/2026690994029072512

the conclusion should not be about moe vs dense, but that you can “”benchmaxx”” (not always a bad thing btw) HLE with tools no matter the model size the difference between Qwen3.5-35B-A3B and Qwen3.5-397B-A17B is only 1 point
https://x.com/eliebakouch/status/2026727151978840105

The new Qwen3.5 Medium models are ready to run 🔥 GGUF support is here! Big thanks to @UnslothAI for making it happen so quickly 🚀
https://x.com/Alibaba_Qwen/status/2026497723944546395

The Qwen3.5 series maintains near-lossless accuracy under 4-bit weight and KV cache quantization. In terms of long-context efficiency: Qwen3.5-27B supports 800K+ context length Qwen3.5-35B-A3B exceeds 1M context on consumer-grade GPUs with 32GB VRAM Qwen3.5-122B-A10B supports
https://x.com/Alibaba_Qwen/status/2026502059479179602

Why benchmarks like Peter’s “”Bullshit Benchmark”” or my ShizoBench matter so much and what do Strawberries have to do with it? I was very skeptical of the performance of Qwen3.5-27B on ArtificialAnalysis leaderboard. So I’m testing the model myself a bit. Naturally I tried the
https://x.com/scaling01/status/2027110908775002312

Qwen3.5-397B-A17B is currently the #1 trending model on Hugging Face. 🏆 This flagship open-weight model is designed for high-performance inference and complex reasoning. 🚀 Try it now on Hugging Face: https://x.com/Ali_TongyiLab/status/2026211680653611174

Not a preplanned motion sequence. A robot deciding mid-jump what to do next. [📍 paper + demo] Researchers just showed a humanoid doing real parkour using only onboard perception. No motion script, no fixed obstacle layout. The system is called Perceptive Humanoid Parkour
https://x.com/IlirAliu_/status/2024560405335495052