Multimodal: AI News Week Ending 02/20/2026

Multimodal: AI News Week Ending 02/20/2026

February 20, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Muted documentary photograph of a small concrete translation booth beside a gray industrial river in contemporary China, interior shows worker facing multiple old monitors displaying text, images, and audio waveforms, a red-brown horse stands in shallow water visible through back window, overcast flat daylight, desaturated teal and concrete-gray palette, observational realism, large white text overlay reading MULTIMODALITY in upper frame, Jia Zhangke cinematic composition, decelerated stillness, postindustrial riverbank setting.

AI pioneer Fei-Fei Li’s World Labs raises $1 billion in funding https://finance.yahoo.com/news/ai-pioneer-fei-fei-lis-192214332.html?guccounter=1&guce_referrer=aHR0cHM6Ly9rYWdpLmNvbS8&guce_referrer_sig=AQAAAIHn6aL6ECAJH2dSErr8YVZLehWRdwRA_q2KzFp8_WzVMfX6CWRlOPG8iQwhJU7OBw8yR61sFu8By2DQp7HpizjBEq4q0OYH62Quw_FMZcYvFIE9B26OylhW0vEdtcOfyNQL7fKiQ-NS_4FL-V3dPP5JEh0CfF7PDggqv3JfrJfZ

Gemini 3.1 Pro WebDev Arena results: – 6th place behind Opus 4.5/4.6 and GPT-5.2-high”” https://x.com/scaling01/status/2024522048312054142

Multimodal function calling is now available in the Gemini Interactions API, build agents that can see and process images natively. 🖼️ Tools return actual images, not text descriptions 👁️ Gemini 3 natively processes returned images 🛠️ Function results support mixed text and”” https://x.com/_philschmid/status/2022349886318928158

Update regarding Gemini 3.1 Pro: -Ranked #1 among all Gemini models released to date. -Ranked #1 among all models I have tested so far. (GPT-5.2 high 165.9 vs Gemini 3.1 Pro 166.6) However, please note that my testing has limitations due to budget constraints: -I have not”” https://x.com/Hangsiin/status/2024605310913216614

Introducing Lyria 3, our latest and most advanced music model, available in the Gemini App starting today : ) Go from idea, image, or video to music in seconds!”” https://x.com/OfficialLoganK/status/2024153948488118513

Meet Lyria 3, our latest music generation model from @GoogleDeepMind. 🎶 Now, you can create custom music tracks in the @GeminiApp — just by describing an idea or uploading an image or video.”” https://x.com/Google/status/2024154379838705920

We just launched Lyria 3! Our most advanced AI music model in the @GeminiApp 🎵 – Generates 30-second tracks from text or image prompts. – Support custom lyrics, vocals, and cover art. – Supports 8 languages including English, Japanese, and Korean. – All outputs watermarked with”” https://x.com/_philschmid/status/2024154542061805988

Use Lyria 3 to create music tracks in the Gemini app https://blog.google/innovation-and-ai/products/gemini-app/lyria-3/

🎉 Congrats to @Alibaba_Qwen on releasing Qwen3.5 on Chinese New Year’s Eve — day-0 support is ready in vLLM! Qwen3.5 is a multimodal MoE with Gated Delta Networks architecture — 397B total params, only 17B active. What makes it interesting for inference: 🧠 Gated Delta”” https://x.com/vllm_project/status/2023341059343061138

🔥 Alibaba’s Qwen 3.5 just dropped — and Zhihu is dissecting it. Here’s a sharp breakdown from Zhihu contributor toyama nao 👇 🏆 Verdict: “”The spearhead of the open-source elite.”” 📊 Big picture Tongyi Lab’s pattern: new mid-size model leapfrogs old giant. • Last cycle: 80B”” https://x.com/ZhihuFrontier/status/2024176484232155236

Qwen https://qwen.ai/blog?id=qwen3.5

Qwen3.5 is Live! Today we openweight the first model, Qwen3-397B-A17B, which is a native multimodal model supporting both thinking and non-thinking modes. We have strengthened its coding and agentic capabilities to foster productivity for developers and enterprises. Hope you”” https://x.com/JustinLin610/status/2023332446713070039

Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap | AINews https://news.smol.ai/issues/25-09-23-alibaba-yunqi

Alibaba’s new Qwen3.5-397B-A17B is the #3 open weights model in the Artificial Analysis Intelligence Index – a significant upgrade from Qwen3-235B-A22B-2507, and achieved with fewer active parameters than leading peers Qwen3.5-397B-A17B is the first model released by Alibaba”” https://x.com/ArtificialAnlys/status/2023794497055060262

Qwen https://qwen.ai/blog?id=qwen3.5#spatial-intelligence

Qwen3.5’s thinking is downright excessive.”” https://x.com/QuixiAI/status/2023995215690781143

Pixels are all you need! Just kidding 🙂 Whether or not explicit 3D representations survive the bitter lesson, one thing is pretty clear — vision & robotics, perceiving & acting are on a glorious collision course.”” https://x.com/bilawalsidhu/status/2023902733632208938

📊 Let’s dive deeper into Gemini 3.1 Pro gains. It ranks 13 points above Gemini 3 Pro overall. We see the largest rank gains for @GoogleDeepMind’s latest model in the following categories: Text: ▪️Coding (+5) ▪️Math (+4) ▪️Expert (+3) ▪️Instruction Following (+3) ▪️Multi-Turn”” https://x.com/arena/status/2024588456463389040

Check out the skills for the Gemini API! More soon!”” https://x.com/osanseviero/status/2022259577232785866

Context Arena Update: Added @Google’s Gemini 3.1 Pro Preview to the MRCR leaderboards (2-,4-,8-needle)! Meant to send this out earlier today. Thanks to @GoogleDeepMind and others over there for early access! Thinking budget barely matters on simpler retrieval – 2-needle AUC”” https://x.com/DillonUzar/status/2024655613293215855

Gemini 3.1 Pro has landed! Amazing performance / capabilities across the board. Beyond SOTA, the best are all the things that evals can’t measure. E.g. SVG has gotten so much better (see 🧵) https://x.com/OriolVinyalsML/status/2024519605570720185

Gemini 3.1 Pro in 1st place on the Artificial Analysis Leaderboard”” https://x.com/scaling01/status/2024517196727099847

Gemini 3.1 Pro is rolling out now in the @GeminiApp, and exclusively to Google AI Pro and Ultra users in @NotebookLM. Developers can access it in preview via the API in @GoogleAIStudio. Find out more → https://x.com/GoogleDeepMind/status/2024516471720743295

Gemini 3.1 Pro’s GDPval scores are concerning”” https://x.com/scaling01/status/2024515061163704336

Gemini Deep Think 3 is the world’s most capable model by many measures, huge amounts of progress on reasoning benchmarks and more. Available right now via the Gemini App for Ultra subscribers and in the API soon : )”” https://x.com/OfficialLoganK/status/2021996626144080015

Good news: Google AI Studio and the Gemini API are now live in Moldova, Andorra, San Marino, and Vatican City! 🌍”” https://x.com/OfficialLoganK/status/2022688445957820610

Google is back on the intelligence-cost frontier with Gemini 3.1 Pro”” https://x.com/scaling01/status/2024519007018373202

Google test NotebookLM integration for Opal workflows https://www.testingcatalog.com/google-test-notebooklm-integration-for-opal-workflows/

I would expect only a few models to make progress with this rather simple harness: GPT-5.2-xhigh, Opus 4.5 and Opus 4.6 and Gemini 3.1 Pro other models will have a very hard time”” https://x.com/scaling01/status/2024661145286557872

Last week we upgraded Gemini 3 Deep Think. Today, we’re shipping the core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro. A noticeably smarter, more capable baseline for your hardest challenges. Available now: https://x.com/NoamShazeer/status/2024519946764734574

Multimodal Function Calling with Gemini 3 and Interactions API https://www.philschmid.de/interactions-multimodal-fc

My vibe is unchanged: Gemini 3.1 is a previous gen model. It naively lives in a context-universe engineered by the God-User. Opus is a friend-type AI. It sits with you in a KFC. 5.2 sees a vast expanse of thought. Below there’s a given context. A user makes some noise, perhaps.”” https://x.com/teortaxesTex/status/2024574416747671556

Saw Gemini 3.1 announcement, got super excited. Tried Google Antigravity… not available. Tried Gemini CLI… not available. Tried Gemini Code Assist… not available. @OfficialLoganK put AI Studio in an Electron Shell and just launch it. You will deliver these faster.”” https://x.com/matvelloso/status/2024548414198091922

Today we’re releasing a preview of Gemini 3.1 Pro and making it available to our users and developers. Very excited to bring the upgraded core we used in Deep Think to everyone. Learn more about Gemini 3.1 Pro: https://x.com/koraykv/status/2024517699595124902

We just made paying for the Gemini API 10x easier : ) You can now upgrade to a paid Gemini API account without leaving AI Studio, track your usage, filter spend by model, and much more to come!”” https://x.com/OfficialLoganK/status/2022409335465480346

We made a skill for the Gemini API!”” https://x.com/OfficialLoganK/status/2022123808296251451

Here are some useful prompting tips to get the most out of our new music generation model in Gemini, Lyria 3 ↓”” https://x.com/GeminiApp/status/2024167107538407783

Introducing Lyria 3, our new music generation model in Gemini that lets you turn any idea, photo, or video into a high-fidelity track with custom lyrics. From funny jingles to lo-fi beats, you can create custom 30-second soundtracks for any moment. See how it works. 🧵”” https://x.com/GeminiApp/status/2024152863967240529

Nice standalone Swift package for real-time streaming transcription with Mistral’s Voxtral Mini 4B in MLX Swift:”” https://x.com/awnihannun/status/2022322714548338962

For all my lifters: computer vision app to measure back curvature during deadlift! main technical highlights: — RF-DETR (Roboflow) to segment the person (great performance out-the-box with no additional training!) — YOLO11n (Ultralytics) for bounding box prediction around the”” https://x.com/IlirAliu_/status/2023482861815570738

1/ People often think better multilingual models must come at the cost of English performance. Not true. The constraint isn’t capacity, it’s data quality, and we can fix it. Today @datologyAI shares ÜberWeb: a year of multilingual curation lessons, scaled to 20T+ tokens.”” https://x.com/RicardoMonti9/status/2024136992779559055

Cohere labs just released the best multilingual low resource language model. it runs on a phone, It covers 70+ languages and excels at languages underrepresented on the internet, like Zulu, Javanese, Yoruba, and others.”” https://x.com/nickfrosst/status/2023756803717427467

A fully open source mocap system that works with cheap webcams: The FreeMoCap Project A free-and-open-source, hardware-and-software-agnostic, minimal-cost, research-grade, motion capture system and platform for decentralized scientific research, education, and training:”” https://x.com/IlirAliu_/status/2024198014617702738

🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series. 🖼️Native multimodal. Trained for real-world agents. ✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling. ⚡8.6x-19.0x decoding throughput vs Qwen3-Max 🌍201″” https://x.com/Alibaba_Qwen/status/2023331062433153103

Happy Chinese New Year!! What a week for open-source LLMs: > Qwen-3.5 > GLM-5 > MiniMax-M2.5 Are we just waiting on DeepSeek-V4 now? Also I’m hoping a US lab steps up with a true frontier open-source model.”” https://x.com/Yuchenj_UW/status/2023453819938763092

Qwen 3.5 goes bankrupt on Vending-Bench 2″” https://x.com/andonlabs/status/2023450768406364238

So a new Repo full of MLX-LM-LoRA examples to train your own LLM for Apple Silicon, fast and efficient on ultra long context lengths: Fine-tune Qwen3 4B Instruct on 32K context: https://t.co/yGZlR59fHD Train @IBMResearch Granite 350M model on RL-GRPO Reasoning:”” https://x.com/ActuallyIsaak/status/2022414004623479014

🚀 Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ✅ SGLang support is merged 🔄 vLLM PR submitted → https://t.co/rJkuitOBWs Check the model card for example code. vLLM support landing in the next couple of days!”” https://x.com/Alibaba_Qwen/status/2024161147537232110

🚩Cerebras’s MiniMax-M2 GGUF 2-bit model: https://t.co/udlviJQZqQ Qwen3-Coder-Next INT4 model:”” https://x.com/HaihaoShen/status/2022293472796180676

A clarification of Qwen3.5 Plus and 397B: 1. for opensource, we follow the tradition to make parameters apparent so we use the name with the number of total parameters and active params. 2. Qwen3-Plus is a hosted API version of 397B. As the model natively supports 256K tokens,”” https://x.com/JustinLin610/status/2023340126479569140

It’s Qwen 3.5 day today! 🥳 State of the art 800 GB model. Runs _locally_ with MLX using Q4, taking 225 GB of RAM.”” https://x.com/pcuenq/status/2023369902011121869

Let’s do the KV cache math for Qwen3.5: – KV heads: 2 – Head dimension: 256 – gated attention layers: 15 – bytes per element (BF16): 2 2 x 256 x 15 x 2 = 15 360 This is the same for K and V. So, we multiply by 2: 30 720 bytes Roughly 31 kb per token of context. Meaning at max”” https://x.com/bnjmn_marie/status/2023424404504342608

ollama run qwen3.5:cloud Qwen3.5-397B-A17B is the first open-weight model in the series. It’s available on Ollama’s cloud right now! Give it a try. Let’s go! 🚀🚀🚀”” https://x.com/ollama/status/2023334181804069099

Qwen 3.5 Plus is now available on AI Gateway. Thanks @vercel_dev team. 🤝 Use model: ‘alibaba/qwen3.5-plus’ Try it now!”” https://x.com/Alibaba_Qwen/status/2024029499541909920

Qwen3.5 runs quite well in mlx-lm. Awesome that we have a frontier-level hybrid model. The context gets longer but the inference speed and memory use barely change. Here’s the Q4 generating a space invaders game on an M3 Ultra. Generated 4,120 tokens at 37.6 tok/s.”” https://x.com/awnihannun/status/2023462412092059679

So speaking of benchmarks, what can be said of the new open Qwen? First, it completely destroys Qwen3-VL-235B ofc, but more surprisingly it outscores Qwen3-Max-thinking. All the while it’s the same model as “”Plus””. Plus just has 1M context and some more bells and whistles.”” https://x.com/teortaxesTex/status/2023331885402009779

The new chonky Qwen 3.5 looks pretty solid, beating their own Qwen3-Max model everywhere and is much better at vision benchmarks than Qwen3-235B-A22B-VL Now what I sadly haven’t seen is anything on reasoning efficiency.”” https://x.com/scaling01/status/2023343368399704506

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched | AINews https://news.smol.ai/issues/25-09-05-1t-models

🎙️ Grace Brown is the founder and CEO of Andromeda, building social companion robots designed for aged care and healthcare environments: In this episode, Grace shares a founder journey that started long before a company existed. She had been building robots since her teenage”” https://x.com/IlirAliu_/status/2022023650908545066

🤖 @UnitreeRobotics robots performed martial arts at China’s Spring Festival Gala — sparking major debate on Zhihu. From a technical lens, Zhihu contributor 也说 analyzed what actually matters 👇 1️⃣ Sync with autonomy In the opening backflip, robots jumped together — but landed”” https://x.com/ZhihuFrontier/status/2023794225616502932

A swarm of 48 Unitree G1s”” https://x.com/TheHumanoidHub/status/2024210654232719825

Bioinspired robot Fly – Roll – Walk – Crawl Multi-Modal Mobility Morphobot (M4), a revolutionary robot inspired by nature’s most adaptable creatures. > Capable of multiple forms of movement: flying, rolling, crawling, and more. > Features adaptive appendages that function as”” https://x.com/IlirAliu_/status/2023672009683501443

China 🇨🇳 has released a video of shooting drills involving Terminator-like robots. The line between reality & fiction is becoming increasingly blurred. via @Sizhe_bitcat.”” https://x.com/DefenseTrends/status/2023963202711658522

Don’t sleep on the exponential trajectory of humanoid robots!”” https://x.com/TheHumanoidHub/status/2023433561056141514

If you’re not certain: *this* is AI slop, for once. How to tell: not from Unitree or any credible account, this model of Unitree doesn’t exist, these hands don’t exist, what’s going on with the… ah forget it. It’s all slop capitalizing on the Gala.”” https://x.com/teortaxesTex/status/2024001310865924599

New Episode: @benedektasi is leading Allonic, a Hungarian robotics startup leveraging textile technology to create humanoid hands that mimic human anatomy. We discuss their 3D braiding process, engineering complexities, and Benedek’s vision for the future of robotics. 0:52″” https://x.com/TheHumanoidHub/status/2023826942232064274

Perceptive Humanoid Parkour (PHP) introduces a modular framework that enables the Unitree G1 humanoid to perform long-horizon, vision-based parkour. – It chains retargeted human motion clips into diverse, long-horizon kinematic reference trajectories. – RL expert policies learn”” https://x.com/TheHumanoidHub/status/2023902198997151799

Physical AI won’t just be limited to controlling robots and spatial computing; it will usher in a new era of machine design. You will be able to “”vibe design”” a robot – or a part of a robot – or a machine that manufactures robot parts – using high-level specifications such as”” https://x.com/TheHumanoidHub/status/2022767369832272286

Really worth a read! As someone who has worked on 3D vision for almost 10 years, Vincent’s blogpost speaks out so vividly what have been thinking for a long time. We should always think what the real problems are. Thus, I believe CV will be obsolete not just for robot learning.”” https://x.com/songyoupeng/status/2023570268426563870

robots building robots… •unitree dropped a video of their g1 humanoid assembling robot legs in their own shenzhen factory using the unifolm embodied ai model. yes, robots building robots at actual production speed •the tasks are real bimanual manipulation, picking parts and”” https://x.com/IlirAliu_/status/2024045245248295365

Robots helping other robots so none of them breaks. Instead of every robot needing its own battery, sensors, and connection, they share with their neighbors. If one robot “gets blind” or runs out of power, the others support it and the whole group keeps moving. More robots →”” https://x.com/IlirAliu_/status/2022958102904029684

Same task, same scene. Different robot. Everything breaks… LAP trains the model to express motion in plain language, for example “move forward 3 cm” or “tilt 15 degrees” instead of predicting joint torques or tokens. Because language carries structure and meaning, the policy”” https://x.com/IlirAliu_/status/2024136907492380750

Shipbuilding normally needs shipyards. This boat came out of a microfactory. Printed by robots. A full-scale vessel made inside an AI-driven facility, not a dockyard. Forget molds, heavy tooling, and offshore fabrication: Just a digital file → robotic print → finished”” https://x.com/IlirAliu_/status/2023110856892899752

The speed at which (especially Chinese) robots are developing is absolutely insane.”” https://x.com/kimmonismus/status/2023388741595799687

The supersonic tsunami of AI and robotics is upon us.”” https://x.com/TheHumanoidHub/status/2023534428245684417

TRON 2 by LimX Dynamics features a cleaver design that can take three different forms: bipedal, wheeled, or a dual-arm bot. Two modular units can be combined to make a humanoid or a quadruped. Three units can be combined into a centaur.”” https://x.com/TheHumanoidHub/status/2023281142619906215

Unitree humanoid robots rocked the Spring Festival Gala in Beijing today with a dynamic routine of parkour, Drunken Fist, and nunchaku to celebrate Chinese New Year!”” https://x.com/TheHumanoidHub/status/2023428892934160775

We’ve moved from the stage “”everything in China is a Potemkin village”” to “”oh fuck, ugh, maybe a few outlier companies are the real deal but the rest is fake”” I regret to inform you, it’s entire sectors of the economy where very little is fake. They’re just that good in robotics.”” https://x.com/teortaxesTex/status/2023518524451549598

What stands out most is that it’s happening live on stage with multiple robots; not a cherry-picked demo, no tricky edits. Impressive robustness in whole-body coordination.”” https://x.com/TheHumanoidHub/status/2023438703990055125

While Unitree sells a lot of dumb humanoid robots to developers and researchers, they’re also working on their own AI models. Robots making robots: Unitree’s UnifoLM-X1-0 AI model in action, performing assembly tasks in Unitree’s own factory.”” https://x.com/TheHumanoidHub/status/2022413189191995700

Whole-body control framework that reduces end-to-end latency of teleoperation to 50 ms. [📍it’s open source] Teleoperating a humanoid usually feels like driving through lag. This brings it close to real-time. A new control framework called ExtremControl cuts latency to about 50″” https://x.com/IlirAliu_/status/2023406890005254451

X-Humanoid (Beijing Innovation Center of Humanoid Robotics) has officially launched Embodied Tien Kung 3.0, a general-purpose humanoid platform designed to lower technical barriers for developers and researchers. – The company claims it is the first full-size humanoid to”” https://x.com/TheHumanoidHub/status/2024080494368006345

Will the robotics industry will be generating trillions of dollars of revenue?””, “”YES.”” Dario Amodei says breakthroughs in robotics could emerge in several ways, such as through continual learning or generalization. Once achieved, these models will revolutionize both robot”” https://x.com/TheHumanoidHub/status/2022409229223780533

Figure’s 7th-gen humanoid hand, higher degrees of freedom than the previous gen. Features spreading fingers apart (abduction) and bringing them together (adduction). The rotating thumb is able to touch the tips of each finger, which is an important range of motion for pinching”” https://x.com/TheHumanoidHub/status/2022379295453409771

I’ve been waiting 3 years to show you this We just launched our 3rd-gen humanoid, but we’re already on our 7th-gen hand Our team has quietly worked for years to approach parity with a human hand Excited to share a sneak peek of some of the best engineering I’ve ever seen”” https://x.com/adcock_brett/status/2022353637964751221

An open-source bipedal robotic system. [📍GitHub Repo] It’s a complete leg design with 6 DOF per leg, RSU ankle architecture, passive toe joints. Built with off-the-shelf components and compatible with MJF 3D printing. What they’re open-sourcing & sharing: – Full mechanical”” https://x.com/IlirAliu_/status/2022233309716123878

Multilingual data finally moves from “data collection” to “data curation”. UberWeb breaches the compute-performance frontier for multilingual data. We spell out all our learnings from this year long effort at training multilingual models at the 20 trillion token data scale.”” https://x.com/pratyushmaini/status/2024157352862376280

Re-OCR’d the complete 1771 Encyclopaedia Britannica (2,724 pages) with a single command on @huggingface Jobs. – 0.9B model (GLM-OCR) ~$0.002/page ~$5 total on an L4 GPU Before (old Tesseract ocr) → After:”” https://x.com/vanstriendaniel/status/2024445900102258846