Ethan B. Holland

Over 54,400 manually organized AI links and counting

Multimodal: AI News Week Ending 03/13/2026

March 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic on worn mustard-yellow cotton fabric, deep red ink illustration of an oversized Swiss Army knife with folded-out beach tools including sand shovel, camera lens, seashell speaker, and sunglasses all deployed simultaneously, bold arcing text reading MULTIMODALITY in retro novelty shirt typography, simple cartoon outlines, slightly imperfect printed texture with minor fabric stains, humorous local beach shop charm

Claude builds interactive visuals right in your conversation | Claude https://claude.com/blog/claude-builds-visuals

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://x.com/claudeai/status/2032124273587077133

Claude’s new interactive chart is crazy… the UI is so good
https://x.com/crystalsssup/status/2032334906517536969

Sweet! You can now generate interactive charts and diagrams with Claude (directly in the chat). I was building something like this yesterday with MCPs. My orchestrator now generates and iterates on nano banana images, excalidraw diagrams, remotion clips, and soon interactive
https://x.com/omarsar0/status/2032127096361804058

gemini embedding 2 brings text, images, audio, video, and docs into a single vector space, enabling search across all your media at once, finding semantic matches regardless of the data format see it in action with our multimodal search demo ⬇️
https://x.com/GoogleAIStudio/status/2032145393967038583

Gemini Embedding 2: Our first natively multimodal embedding model https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/

Say hello to Gemini Embedding 2, our new SOTA multimodal model that lets your bring text, images, video, audio, and docs into the same embedding space! 👀
https://x.com/OfficialLoganK/status/2031411916489298156

What if one embedding model could understand text, images, video, audio, and PDFs all at once? Excited to share Gemini Embedding 2 our first fully multimodal embedding model. 🖼️ 5 modalities in a single unified embedding space 🌍 Supports up to 8,192 input tokens, 100+ languages
https://x.com/_philschmid/status/2031412260162138428

@GoogleWorkspace @googledocs @googledrive While we don’t have favorites, the evolution of Gemini in Google Sheets might be our most impressive yet. Gemini in Google Sheets has achieved a state-of-the-art benchmark, achieving a 70.48% success rate on the full SpreadsheetBench dataset. This performance not only exceeds
https://x.com/GoogleAI/status/2031356545552847091

Introducing the new Gemini powered Docs, Sheets, Slides, and Drive experience featuring AI Overviews, fulled editable AI made slides, and new grounding sources to make writing docs context aware 📃 Available today to G1 Pro and Ultra users : )
https://x.com/OfficialLoganK/status/2031374503599567113

New Gemini updates to make @GoogleWorkspace more personal, helpful and collaborative: choose your sources and create a Doc draft in seconds, build complex Sheets 9X faster, or generate on-brand Slide layouts with a simple prompt. Plus, Drive now generates summarized answers right
https://x.com/sundarpichai/status/2031380361696129261

Write, create and get things done faster in Docs, Sheets, Slides and Drive with these new Gemini features for Google AI Ultra and Pro subscribers 🧵
https://x.com/Google/status/2031359339236143301

The Maps driving experience is also evolving with Immersive Navigation, featuring clearer visuals and intuitive guidance. You’ll be able to see the buildings, overpasses and terrain around you in a vivid 3D view, made possible with help from Gemini models. You’ll also be able
https://x.com/Google/status/2032079598683332742

Breast cancer is one of the most common cancers in the world, and in the U.K. it affects 1 in 8 women. We partnered with Imperial College London and the NHS to see if AI can strengthen early detection efforts. The result: Our experimental research AI system identified 25% of the
https://x.com/Google/status/2031734020979998795

98ms time to first token (faster than human visual reaction time), built for agentic workflows. 65% faster throughput compared to leading 8B models. Reka Edge is a 7B VLM built for latency-sensitive apps: real-time video analysis, agentic workflows, on-device deployment
https://x.com/RekaAILabs/status/2032132996422082619

Also, regarding this model’s vision capabilities, I’ve been using a very difficult dataset from an OCR project I worked on a few months ago as my benchmark whenever a new model is released. It consists of scanned files in the form of very long Excel-style tables written in
https://x.com/Hangsiin/status/2030882409819086923

filesystem + code sandbox combo eats another modality. remember when o3 destroyed at geoguessr? gemini agentic vision will find location on any street photo you take faster than Liam Neeson can get back his daughter
https://x.com/swyx/status/2017097813520449761

First thoughts on Gemini 2 embedding prices: 🫠 – Text pricing is on the higher side than competition. You should probably not use this model for text-only embeddings coz of the pricing (more below). Use only if you are doing multimodal retrieval. – 0.00079$ per video frame. So
https://x.com/neural_avb/status/2031648857625395321

Gemini Embedding 2 is out! 📹Embeddings for text/images/video/audio/PDFs 🪆Matryoshka embeddings: you can use smaller embedding sizes while retaining high-quality and reducing storage costs 🤗Integrated with your favorite developer tools such as LlamaIndex, Weaviate, and QDrant
https://x.com/osanseviero/status/2031691784074477766

Google launches new multimodal Gemini Embedding 2 model https://www.testingcatalog.com/google-launches-new-multimodal-gemini-embedding-2-model/

Introducing Replit Animation Vibecode your next viral video in minutes, powered by Gemini 3.1 Pro. (This video was 100% made in Replit Animation)
https://x.com/Replit/status/2024578806208745637?s=20

Start building with Gemini Embedding 2, our most capable and first fully multimodal embedding model built on the Gemini architecture. Now available in preview via the Gemini API and in Vertex AI.
https://x.com/googleaidevs/status/2031421430718415051

𝗧𝗲𝘅𝘁. 𝗜𝗺𝗮𝗴𝗲𝘀. 𝗩𝗶𝗱𝗲𝗼. 𝗔𝘂𝗱𝗶𝗼. 𝗣𝗗𝗙𝘀. One embedding model. One unified space. @googleaidevs just released 𝗚𝗲𝗺𝗶𝗻𝗶 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝟮, their first fully multimodal embedding model – and it’s now available in @weaviate_io. The model maps text, images,
https://x.com/victorialslocum/status/2032141700412686592

The era of juggling 5 different embedding models is over. Google just unified text, images, video, audio, and PDFs into one vector space. 𝗢𝗻𝗲 𝗺𝗼𝗱𝗲𝗹, 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗺𝗼𝗱𝗮𝗹𝗶𝘁𝗶𝗲𝘀: Text, images, video, audio, and PDFs all mapped into a single unified vector
https://x.com/weaviate_io/status/2032139558968852849

The Gemini Embedding 2 baseline here is.. 2 days old. Was just being celebrated and is now outperformed by a median of 14% and up to 91 points. If I didn’t kind of know how powerful scaling ColBERTs and ColPalis can be compared to a single-vector model, I’d be in disbelief!
https://x.com/lateinteraction/status/2032162162836164697

Google shares Gemini updates to Docs, Sheets, Slides and Drive https://blog.google/products-and-platforms/products/workspace/gemini-workspace-updates-march-2026/

Gemma-4 imminent
https://x.com/scaling01/status/2030986695181836466

Holy, Gemma 4 will be 120b in total, 15b active parameters
https://x.com/kimmonismus/status/2031001097993642009

Nice: Gemma 4 is already leaked. Curious what else we will see.
https://x.com/kimmonismus/status/2031116062272688467

Selectively reducing eval awareness and murder in Gemma 3 27B via steering — LessWrong https://www.lesswrong.com/posts/QfM6SHyBPveDtHAma/selectively-reducing-eval-awareness-and-murder-in-gemma-3

🤖 From this week’s issue: Microsoft releases Phi-4-reasoning-vision-15B, a compact open-weight multimodal model that rivals much larger models on math, science, and computer-use tasks while requiring a fraction of the training compute.
https://x.com/dl_weekly/status/2031415180383437304

New research from Microsoft. Phi-4-reasoning-vision-15B is a 15-billion parameter multimodal reasoning model that combines visual understanding with structured reasoning capabilities. As I have been saying, not every agent task needs a frontier model. Phi-4-reasoning-vision
https://x.com/omarsar0/status/2029926242640912429

NEW: Microsoft releases Phi-4-reasoning-vision-15B, a 15B parameter multimodal reasoning model.
https://x.com/dair_ai/status/2029927938259308905

Having a sense of focus for any startup is incredibly energizing. We are currently laser-focused on building the world’s best document OCR. Besides pure focus, I believe a business needs to constantly reinvent itself to stay relevant in this world of agentic AI. If you are
https://x.com/jerryjliu0/status/2031171466574889344

I’m so excited to introduce this! We’ve worked on a million different moving parts to produce this. I’m fairly confident it’s the best multimodal model that exists, period — and it’s not too shabby at pushing back the LIMITs of retrieval either…
https://x.com/bclavie/status/2032128055104380980

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos… You can now get the best retrieval performance on your data, no matter its format.
https://x.com/mixedbreadai/status/2032127466081567106

Meet Reka Edge – Our next-generation vision language model for physical AI. Uses 3x fewer input tokens and achieves 65% faster throughput compared to leading 8B models. Image understanding, video analysis, object detection, and tool use. Built for Action. Fast enough for
https://x.com/RekaAILabs/status/2031781818349834628