Multimodal: AI News Week Ending 05/01/2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: High-end product photograph of a classic banana split boat holding four distinct soft-serve swirls — vanilla, chocolate, strawberry, and blue cotton-candy — each topped with a different garnish (crushed waffle cone, rainbow sprinkles, a chocolate musical note, a cinnamon stick), sauces softly merging in the center, the paper liner printed with bold hero typography reading ‘MULTIMODALITY’ in vintage Dairy Queen lettering. Soft directional studio light, shallow depth of field, glossy macro detail, a small napkin beneath stamped ‘Est. 1951 — Milford, DE’, landscape composition, nostalgic 1950s Americana counter aesthetic.

DeepSeek-V4 Pricing gives you glimpses into the future Imagine in one year using a Mythos level model that can basically code everything for $4/million tokens
https://x.com/scaling01/status/2047707820552831028

You can now run DeepSeek4-Flash on 256GB Mac. Next up speed 🚀 PR:
https://x.com/Prince_Canuma/status/2047685898163147125

.@deepseek_ai v4 Pro’s checkpoint is both in FP4 and FP8, depending on the layer. This means that the entire model can fit on a single NVIDIA 8xB200 node without trouble. @vllm_project: “”Checkpoint is FP4+FP8 mixed: MoE expert weights are stored in FP4 while the remaining
https://x.com/LambdaAPI/status/2047654086263320965

Thoughts after reading the DeepSeek V4 paper: – NVIDIA really is something else. Remember how back in 2024 people were bashing Blackwell as overspec’d and dismissing FP4 as just marketing? Turns out it was all groundwork for the next generation of models. Maybe NVIDIA’s moat is
https://x.com/jukan05/status/2047861732702662741

Gemini now can create documents, and it is a nice start, but not up to the frontier yet, as you can see from my “”LBO of Hogwarts”” test. PowerPoints are substantially worse than NotebookLM, spreadsheets are primitive, still no thinking trace, it doesn’t think hard enough, either.
https://x.com/emollick/status/2049605470546022826

You can now ask Gemini to create Docs, Sheets, Slides, PDFs, and more directly in your chat. No more copying, pasting, or reformatting, just prompt and download. Available globally for all @GeminiApp users.
https://x.com/sundarpichai/status/2049519281600373159

You can now generate a variety of downloadable files, including PDFs, @GoogleWorkspace files, Microsoft Word & Excel, and more directly in your chats with Gemini. Tell Gemini what content to create and the file format you want when you prompt without having to upload a template.
https://x.com/GeminiApp/status/2049519416698683514

A completely local agent that lives right inside your browser. Powered by Gemma 4 E2B and WebGPU, it uses native tool calling to: 🔍 Search browsing history 📄 Read and summarize pages 🔗 Manage tabs 100% local. No servers needed!
https://x.com/googlegemma/status/2048805789788413984

Here is how to run a coding agent fully locally on your machine with @googlegemma and Pi. – Gemma 4 26B A4B activates 4B parameters per token. – Pi provides four tools: read, write, edit, and bash. – LM Studio runs a server at localhost:1234 by default. – Pi runs YOLO by
https://x.com/_philschmid/status/2048719354905108623

Learn how to run a local coding agent! Use: – Pi agent – Gemma 4 26B – Serving engine of choice: e.g. LM Studio
https://x.com/googlegemma/status/2049163687639007451

@NVIDIA Nemotron 3 Nano Omni is now on Together AI. Enterprise multimodal AI — video, audio, image, documents & text — optimized for speed and scale. ✅ ~3B active params, 9x higher throughput ✅ Fully managed, zero infra headache ✅ Secure, zero-trust architecture Build
https://x.com/togethercompute/status/2049160446708711883

Excited to support @NVIDIA Nemotron 3 Nano Omni, now available on Fireworks. It’s the first open model that handles vision, audio, video, and text in a single inference loop. Built for multimodal sub-agents at scale, with 9× higher throughput than Qwen3 30B. 256K context. Now
https://x.com/FireworksAI_HQ/status/2049159136802398546

Introducing @NVIDIA Nemotron 3 Nano Omni. NVIDIA Nemotron 3 Nano Omni is an open multimodal foundation model that unifies audio, images, text, and video into a single context window. It powers subagents for use cases like computer-use agent, document intelligence, and video and
https://x.com/baseten/status/2049160818575749300

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents
https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇
https://x.com/NVIDIAAI/status/2049159441870717428

NVIDIA Nemotron 3 Nano Omni is now live on fal, available at launch. A single model for multimodal agents: 🔁 text, image, video, audio in one loop 🧠 1 context reasoning across complex workflows ⚡️ ~9× higher throughput with fewer inference hops Built for real-world agent
https://x.com/fal/status/2049160999442198632

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning.
https://x.com/OpenRouter/status/2049164366218772526

NVIDIA releases Nemotron-3-Nano-Omni, a new 30B open multimodal MoE model. Nemotron-3-Nano-Omni-30B-A3B is the strongest omni model for its size and supports audio, video, image and text. Run on ~25GB RAM. GGUF:
https://t.co/t4COCqVrLS Guide:
https://x.com/UnslothAI/status/2049161390150365344

🎉 Day-0 vLLM support for the MiMo-V2.5 series! Congrats to @XiaomiMiMo on the open-source release of the MiMo-V2.5 and MiMo-V2.5-Pro. Highlights from the flagship MiMo-V2.5-Pro, an agent-oriented model focused on long-horizon tool use and frontier coding: – Long-horizon task
https://x.com/vllm_project/status/2048825703244972375

Just dropped two open-source models: MiMo-V2.5-Pro (Code Agent, 1T total) and MiMo-V2.5 (Multimodal Agent, 310B total). Oh and one more thing — we’re giving devs & creators 100T tokens on us. Go build something cool 🛠️ 🎁 100T Free Token Grant for Builders
https://x.com/_LuoFuli/status/2048851054662762618

MiMo-V2.5-Pro | Xiaomi
https://mimo.xiaomi.com/mimo-v2-5-pro

Xiaomi MiMo-V2.5 is now officially open-sourced！ MIT License, supporting commercial deployment, continued training, and fine-tuning – no additional authorization required. Two models, both supporting a 1M-token context window : • MiMo-V2.5-Pro: built for complex agent and
https://x.com/XiaomiMiMo/status/2048821516079661561

Xiaomi MiMo-V2.5 Series: Pushing Open-Source Agents Forward 🔸 MiMo-V2.5-Pro, our strongest model yet. A major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks, now matching frontier models like Claude Opus 4.6 and
https://x.com/XiaomiMiMo/status/2046988157888209365?s=20

@NousResearch absolutely crushing the 0-day support! Deepseek-v4-pro is live in the Nous Portal 😍 If you want a real personal agent/assistant/quant/researcher/artist/coworker, Hermes Agent continues to deliver!
https://x.com/mr_r0b0t/status/2047673600900010044

🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean’s Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain
https://x.com/vllm_project/status/2049503979898274163

📊 Day 0 performance is here: DeepSeek-V4-Pro running on NVIDIA Blackwell Ultra. Using @vllm_project’s Day 0 recipe, we’ve captured the initial performance Pareto for DeepSeek’s flagship 1M long-context model. This curve highlights the baseline for balancing AI factory
https://x.com/NVIDIAAI/status/2047823093578518758

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params.
https://x.com/deepseek_ai/status/2047516922263285776?s=20

🚨 DeepSeek V4 Pro just dropped 75% OFF API pricing + permanent cache price cut to 1/10! 🔥 4/26 Update: Cache permanently 90% cheaper Offer ends May 5, 2026 Insights from Zhihu contributor 普杰 💡 Key Insights: • DeepSeek doesn’t do loss-leader promotions → ¥3 in / ¥6 out ≥
https://x.com/ZhihuFrontier/status/2049027925920637077

8x VLLM CUDA MOAT ALERT: InferenceX has added @deepseek_ai V4 Pro for @vllm_project for day 3 performance across B200, B300, H200, GB200 disagg. We are seeing that B300 is up to 8x faster than H200. The team is working on benchmarking vLLM 0.20 which has the new DeepGEMM MegaMoE
https://x.com/SemiAnalysis_/status/2048957715955765284

Also, deepseek v4 is available as well
https://x.com/Teknium/status/2047798102091067677

And now a new DeepSeek model, and appears to be fully open weights. Good benchmarks, but with open models, that isn’t always as meaningful. Should be live soon to actually try.
https://x.com/emollick/status/2047516272062058890

Another reason I’m watching Delton closely is that the company works closely with Huawei. As DeepSeek’s comments suggest, Huawei’s 950 is expected to enter heavy mass production starting in the second half of this year, right?
https://x.com/jukan05/status/2047823601462812932

Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?
https://x.com/simonw/status/2047844236142497850

Compressed Sparse Attention. A Faithful Implementation of CSA from the DeepSeek-V4 paper.
https://x.com/arjunkocher/status/2049066844925936041

DeepSeek cuts V4-Pro prices by 75%
https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent

DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index @deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture
https://x.com/ArtificialAnlys/status/2047735160544841953

DeepSeek removed it’s “Thinking with Visual Primitives” repo. here a paper link if anyone needs to read it.
https://x.com/arjunkocher/status/2049875566678118898

DeepSeek said Pro pricing could fall sharply once Huawei Ascend 950 supernodes are deployed at scale in the second half of the year””
https://x.com/scaling01/status/2047760776769720360

DeepSeek staff has deleted the repo and all mentions of the vision paper. What the hell happened? People who got Vision enabled on web: do you still have it?
https://x.com/teortaxesTex/status/2049880056420298995

DeepSeek themselves estimate the gap to be 3-6 months I think it’s on the higher end of that range
https://x.com/scaling01/status/2047626000091971811

DeepSeek trains vision capabilities into their v4 Flash model by having the model directly output bounding boxes and point coordinates of an image during reasoning. This is DeepSeek’s Computer Use Agent.
https://x.com/nrehiew_/status/2049840778491662623

DeepSeek v4 earmarks the next era of open weight models and is one of the landmark papers for open weight model training. Thread and notes below 🙂
https://x.com/nrehiew_/status/2047665987730993363

DeepSeek V4 just launched on Huawei hardware, and the numbers tell a story the headlines are hiding. • Huawei’s Ascend 910C delivers roughly 60% of the inference power of an Nvidia H100. • Production is capped at 750,000 units this year; Nvidia ships that many in a single
https://x.com/PalwinderCFA/status/2047614823102619974

DeepSeek V4 MLX Quants now on MLX community HF repo, Made possible by @LambdaAPI and @TheZachMueller ❤️ Without a GPU cluster it would take me a week to upload the quants… Model collection 👇🏽
https://x.com/Prince_Canuma/status/2047847095466385899

DeepSeek V4 Open Source + vLLM Support LIVE 🚀 | Technical Breakdown 🧠 Core Insight DeepSeek V4 is built to solve 1M-token long-context inference — the biggest pain point for LLMs today. ⚠️ 2 Key Long-Context Challenges • KV Cache Explosion: KV cache grows linearly with
https://x.com/ZhihuFrontier/status/2047664976215839021

DeepSeek writing quality (at least in Chinese) is good because they’ve been obsessing about data for the entire history of the company (tbh “”clean data”” is an obvious instinct for algo traders too, but I think this is more about Wenfeng’s purism) and have such job listings
https://x.com/teortaxesTex/status/2047614729145745623

DeepSeek_V4.pdf · deepseek-ai/DeepSeek-V4-Pro at main

Click to access DeepSeek_V4.pdf

DeepSeek-V4 is a full-stack redesign of LLMs around long context + efficiency Here are some of the changes: – Hybrid attention: Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) for long-context efficiency – 1M-token context becomes ~3-10× cheaper in memory
https://x.com/TheTuringPost/status/2048566818118545887

DeepSeek-V4 uses our Hash routing approach developed back in 2021 — see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced ‘looped transformers’, but we called that staircase & ladder (see
https://x.com/jaseweston/status/2047690308217926055

DeepSeekv4 Pro 1.6T is supported on InferenceX on Day 0! We have already gotten H200 vLLM working and working on @vllm_project & @sgl_project MI355, B200, B300, GB200/300 disaggregated DeepSeekv4 day 0 performance benchmarking too to track the progress of improvement. Thank you
https://x.com/SemiAnalysis_/status/2047726025748930687

Early DeepSeek v4 impressions not great.
https://x.com/mbusigin/status/2047707082007220393

Here’s DeepSeek v4 Pro. Added to the playable gallery as well.
https://x.com/emollick/status/2047527060713664754

I get the impression many Chinese hate Huawei irrationally and suspect it of a conspiracy to deprive DeepSeek of based American chips
https://x.com/teortaxesTex/status/2047631470664020211

I hear similarly it’s not unique to Mythos/5.5 ofc, frontier models have been dealing with >100T for a while, as far as I know. We see even the open source models get close to 50T. A 100T DeepSeek V4 is just V4 + 2 more epochs, 3e25 FLOPs. still below Llama 405B level
https://x.com/teortaxesTex/status/2049830477167526255

I hope the upgrade to DeepSeek v4 will make the bot comments on here more bearable.
https://x.com/emollick/status/2047519187287846937

I’m still confused by some of the decisions done in deepseek v4 Main confusion is why the huge focus on reducing KV cache size when with something like HiSparse u can offload most of ur kv cache (making ur decode compute bound) This also is compensated with a huge 128 heads and
https://x.com/Grad62304977/status/2048785005216723072

interesting that deepseek’s also joined the path of not allowing sampler control on their api. i wonder why and how long this has been there
https://x.com/stochasticchasm/status/2047717161070989499

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows.
https://x.com/togethercompute/status/2047743446522224987

its so messed up that deepseek trained on deepseek reasoning traces. has chinese distillation gone too far?
https://x.com/kalomaze/status/2047762970931827125

Jensen was making a good point, but now it’s too late. DeepSeek is fully committed to ditching CUDA. The rest of the Chinese xiaoren ecosystem can be swayed by Hoppers; Wenfeng believes too much in long-termism. After V4, non-CUDA hardware is guaranteed to live and prosper.
https://x.com/teortaxesTex/status/2049185408785998217

Let’s dive deeper into the difference between DeepSeek V4 Pro & V4 Flash by @DeepSeek_AI. – Both support 1M token context and V4 Flash Thinking shifts the price Pareto frontier. V4 Pro ranks ~30 places higher than the V4 Flash variants, but costs 12x more at launch pricing.
https://x.com/arena/status/2047774037204742255

Let’s see DeepSeek are all nice folks and China’s national heroes, Xi is personally a man of integrity, and they’re not starting wars. American society firebombs Sam Altman, Ant is a weird sex cult, and elected US leader is a murderous monke. Why should compute decide this?
https://x.com/teortaxesTex/status/2047645676234846459

looks like the ~Opus 4.5 estimate for DeepSeek-V4 holds for now, at least on SimpleBench
https://x.com/scaling01/status/2047682465624445015

My first two TiKZ Sparks unicorns from DeepSeek v4. (Expert mode, from the DeepSeek site, which is supposed to be v4 Pro according to the release)
https://x.com/emollick/status/2047523193481547929

My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware
https://x.com/iscienceluvr/status/2047514399393579235?s=46

not even DeepSeek has any appetite for doing this again this is evolution tier architecture they’ll refactor it when they get some time
https://x.com/teortaxesTex/status/2047648219081974034

somewhere in france, still awake at sunrise, adding exclamations to their first read of the deepseek technical report, “one of the best i’ve ever read”
https://x.com/morqon/status/2047643246923325833

Surprisingly a lot of info about the data and process (which is unlike some other deepseek papers). On first read, it sounded like they only cared about specific tasks rather than a general multimodal model. On second thought however, I realized these “”visual primitives”” and
https://x.com/nrehiew_/status/2049840802562740311

TEHRAN, April 29, 2026 — Less than a week after the release of @deepseek_ai DeepSeek v4 Pro, the cracked team at @vllm_project and @inferact has achieved considerable improvement on GB200 (Dynamo+vLLM). This is largely due to the release of vLLM 0.20.0, which comes with MegaMoE
https://x.com/SemiAnalysis_/status/2049578313111216271

Thank you @NVIDIAAI for highlighting vLLM’s day 0 @deepseek_ai support and enhancing the open source inference ecosystem!
https://x.com/vllm_project/status/2047843293447500069

The strongest open-source agentic model is live on Baseten! DeepSeek V4 is a preview of two powerful MoE models: V4-Pro (1.6T params) and V4-Flash (284B params) with 1M context and SOTA open-source performance. This represents a significant jump from V3.2 (which had a 128k
https://x.com/baseten/status/2047779549644243146

This is great – @deepseek_ai V4 supports prefill! 😀 Most other providers have been dropping support for this critically important capability, so wonderful to see at least one company stepping up.
https://x.com/jeremyphoward/status/2049098509530583199

Unless I’m doing it wrong, Kimi K2.6 in Hermes is like 7x slower than DeepSeek V4, not to mention V4-Flash lmao but it can sometimes fix bugs that not even Pro can resolve. it also has some harsh words for them:
https://x.com/teortaxesTex/status/2048820805258059837

vLLM support for DeepSeek V4 base models is on the way! The V4 release includes 4 models: base/instruct × flash/pro. Initial support covers the instruct versions. To extend support to the base models, we worked with @deepseek_ai to add an expert_dtype field in the config, making
https://x.com/vllm_project/status/2048769886483329525

vLLM v0.20.0 is here! 752 commits from 320 contributors (123 new). 🎉 Highlights: DeepSeek V4, Hunyuan v3 preview support, CUDA 13 / PyTorch 2.11 / Transformers v5 baseline, FA4 as default MLA prefill, TurboQuant 2-bit KV (4× capacity), vLLM IR foundation. Thread 👇
https://x.com/vllm_project/status/2048918629144805619

Canonical and NVIDIA are collaborating to make NVIDIA Nemotron™ 3 Nano Omni easier to deploy on Ubuntu. With Canonical inference snaps, teams can go from setup to a working runtime in a single command – no complex integration required. Less time spent on infrastructure, more
https://x.com/Canonical/status/2049159988174602712

By the way, this phrasing – «the model weights will be integrated into our foundation model» – is very interesting. They’re not saying they’ll release V4-Flash-Vision. I think it suggests they’ll OPD it into V4s. Perhaps it was post-trained concurrently with the main line.
https://x.com/teortaxesTex/status/2049871869847765212

If you are at ICLR, don’t miss the chance to try out Vision Banana yourself with @NithishKannen!
https://x.com/songyoupeng/status/2047462610283721116

MathNet – a new interesting global multimodal benchmark from @MIT for mathematical reasoning and retrieval It’s a dataset of 30,676 Olympiad-level problems from 47 countries, 17 languages, and 143 competitions over 4 decades, with expert solutions. It defines 3 tasks: – problem
https://x.com/TheTuringPost/status/2049155956135841862

[2604.26752] GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752

Our 2.0 image model is so good at making screens and vision mocks. Something about AI generated images of digital surfaces feels very “right” to me. Internally, I’ve started seeing tons of product ideas shared and brought to life via image generation rather than prototyping —
https://x.com/TheRohanVarma/status/2048985585000563009

Mistral releases Mistral Medium 3.5, a new vision reasoning model. 🔥 Mistral-Medium-3.5-128B offers highly competitive performance for models 6x its size. Run locally on ~64GB RAM. Guide:
https://t.co/ztAVzgJECr GGUFs:
https://x.com/UnslothAI/status/2049511248623256017

MiMo-V2.5 Pro by @XiaomiMiMo is the #11 model (#3 among open) in Code Arena: Frontend WebDev and has shifted the Pareto frontier with $1 input / $3 output per MToken.
https://x.com/arena/status/2049582973926949116

SGLang and vLLM support for the MiMo-V2.5 series is here. 🙌 Huge thanks to SGLang project from @lmsysorg and @vllm_project for moving fast and helping developers get started with MiMo-V2.5 on day zero.
https://x.com/XiaomiMiMo/status/2048821520798302409

xiaomi mimo v2.5 eval card, pro is 1T total 42B active, omni (video/image/audio) is 310B total 15B active, both have 1M context support they train in FP8, 27T tokens for pro and 48T for the smaller variant. interleaved SWA with an aggressive 6:1 ratio and 128 window size, still
https://x.com/eliebakouch/status/2048845602633433258

Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Moonshot’s Kimi K2.6 – the current top open weights model. MiMo V2.5 Pro’s weights are expected to be released soon, which would make MiMo V2.5 Pro the first equal open weights model
https://x.com/ArtificialAnlys/status/2047799218828665093?s=20

SMASH enables the first outdoor humanoid ping-pong system on the Unitree G1 using pure egocentric vision alone. The team built a unified perception pipeline (Adaptive EKF + physics-based predictor + planner). They first validated and tuned the full stack using clean data from an
https://x.com/TheHumanoidHub/status/2048837852067398090

Efficient Video Intelligence in 2026 – Vikas Chandra – AI Research @ Meta
https://v-chandra.github.io/efficient-video-intelligence/