International: AI News Week Ending 12/05/2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Minimalist photograph of empty United Nations assembly hall from podium view, endless rows of vacant delegate seats with translation headsets in cold blue-grey light, single dramatic spotlight on empty speaker’s podium, marble and brass details, vast architectural space emphasizing absence and isolation, pristine untouched perfection, cinematic composition with deep shadows, white text overlay reading INTERNATIONAL in bold sans-serif

Seedream 4.5 https://seed.bytedance.com/en/seedream4_5

🚨BREAKING: Text Leaderboard Update: A new open source model has landed on the leaderboard! Mistral-Large-3 lands at #6 among open models and #28 overall on the Text leaderboard. Mistral 3 is the next generation of Mistral AI models and their most capable model family to date. https://x.com/arena/status/1995877395510051253

Introducing Mistral 3 | Mistral AI https://mistral.ai/news/mistral-3

Introducing Mistral Code | Mistral AI https://mistral.ai/news/mistral-code

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵 https://x.com/MistralAI/status/1995872766177018340

Magistral | Mistral AI https://mistral.ai/news/magistral

Mistral Small 3 | Mistral AI https://mistral.ai/news/mistral-small-3

Mistral Small 3.1 | Mistral AI https://mistral.ai/news/mistral-small-3-1

Voxtral | Mistral AI https://mistral.ai/news/voxtral

Aerial intelligence with prompts. Moondream segments pools, tennis courts, and even solar panels, with pixel-perfect accuracy. https://x.com/moondreamai/status/1997058204589871395

Moondream’s new segmentation just dropped. Prompt: “dirty laundry items on the bed.” Moondream: pixel-perfect + actually understands the scene. SAM 3: grabs the floor. https://x.com/moondreamai/status/1996001944838832501

Open-Vocabulary Image Segmentation | Moondream
https://moondream.ai/skills/segment

🚨New Models in the Arena! 🐳DeepSeek V3.2: a new family of reasoning-first, agent-oriented models from @deepseek_ai are now live in the Arena. Standard, Thinking, and Speciale are all in the Text Arena, waiting for your toughest prompts! Get your votes in: we’ll see how they https://x.com/arena/status/1995564824718442620

bytedance’s async ulysses attention is deceptively simple to understand and when you have a faster all-to-all kernel than NCCL the communication can be very well overlapped with computation https://x.com/maharshii/status/1996280889962365380

Presenting Hermes 4.3 on ByteDance Seed 36B, the latest update to our flagship Hermes series of models. This model offers roughly equivalent performance to Hermes 4 70B at half the model size, and was post-trained entirely on the Psyche network secured by @Solana. https://x.com/NousResearch/status/1996311677009121367

We just released our first Hermes model trained entirely with Distro on Psyche, Hermes 4.3 on ByteDance Seed 36B! Outcome was actually better than the centralized comparison run, and even brought it to top spot on the RefusalBench leaderboard, weights on HF! Check it out: https://x.com/Teknium/status/1996330606595391780

🚨🖼️ Image Leaderboard Update Seedream 4.5 by Bytedance has officially entered the Arena on both the Image Edit and Text-to-Image leaderboards. Here is where it landed: 🔹 #3 on Image Edit (score: 1338) 🔹 #7 on Text-to-Image (score: 1146) This update delivers a 27-pt increase https://x.com/arena/status/1996641968005566876

@awnihannun added batched generation to MLX-LM >2 months ago. Everybody, since, has been asking for batching in the MLX-LM server. Well, enjoy the first version in the latest MLX-LM release. The following video is serving 4 consecutive requests for Qwen3 30B on an M2 Ultra. https://x.com/angeloskath/status/1996364526749639032

🚀 @deepseek_ai just dropped two official models — V3.2 & V3.2-Speciale, and Chinese tech circles are buzzing. What do they really achieve? Zhihu contributor toyama nao breaks it down, closely aligning with DeepSeek’s own published scores👇 DeepSeek has already shaken China’s AI https://x.com/ZhihuFrontier/status/1995689116999311455

🚀 Day 0 Deepseek v3.2 launch on @FireworksAI_HQ ! Congrat @deepseek_ai team on releasing another SOTA model! Continuing our promise, you can access DSV3.2 now on our platform. We heavily focus on quality first. A ton of perf optimization will come shortly. Below are the”” / X https://x.com/lqiao/status/1995915147714723974

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech https://x.com/deepseek_ai/status/1995452641430651132

🚀 vLLM now offers an optimized inference recipe for DeepSeek-V3.2. ⚙️ Startup details Run vLLM with DeepSeek-specific components: –tokenizer-mode deepseek_v32 \ –tool-call-parser deepseek_v32 🧰 Usage tips Enable thinking mode in vLLM: – https://x.com/vllm_project/status/1996760535908642986

DeepSeek V3.2 is the #2 most intelligent open weights model and also ranks ahead of Grok 4 and Claude Sonnet 4.5 (Thinking) – it takes DeepSeek Sparse Attention out of ‘experimental’ status and couples it with a material boost to intelligence @deepseek_ai V3.2 scores 66 on the https://x.com/ArtificialAnlys/status/1996110256628539409

deepseek-ai/DeepSeek-V3.2 · Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3.2

Game over https://x.com/Yuchenj_UW/status/1995523554679673180

If you need an adrenaline rush to wake up from your post-Thanksgiving stupor… we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. It’s so smart your mother will ask why you can’t be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst https://x.com/basetenco/status/1996623218040254793

Incredible writeup! Some notable 💎s: Deepseek reduced attention complexity from quadratic to ~linear through warm-starting (w/ separate init + opt dynamics) and adapting the change over ~1T tokens. They also use separate attention modes for disaggregated prefill vs decode (is https://x.com/suchenzang/status/1995535496421015741

Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs https://api-docs.deepseek.com/news/news250929

Link to DeepSeek’s technical paper: https://x.com/ArtificialAnlys/status/1996110267353325748

LisanBench results for DeepSeek-V3.2 DeepSeek-V3.2 and V3.2 Speciale are affordable frontier models* *the caveat is that they are pretty slow at ~30-40tks/s and produce by far the longest reasoning chains at 20k and 47k average output tokens (incl. reasoning) – which results in https://x.com/scaling01/status/1995895894219100462

New Model(s) Drop: DeepSeek V3.2 is now live on Yupp! From @deepseek_ai, these are open-source models with enhanced math, coding and logic capabilities – offered in Chat, Thinking and Speciale versions. Let’s see how they perform: https://x.com/yupp_ai/status/1995538168146526274

Speciale is the first DeepSeek model of all time that gets my dumb bilingual joke about God of War. V3.2 flails and invents cringe fake etymology, just like R1. 4o could do this already. The knowledge gap is wide and deep indeed. Still. Frontier at last. https://x.com/teortaxesTex/status/1995527632578834829

While reviewing the results again, I noticed a misjudged part in the score for the deepseek v3.2 Speciale model and corrected it. The revised result is a very impressive 8.81, which is top tier and achieved a perfect 10 across all quantitative metrics. However, as I mentioned https://x.com/Hangsiin/status/1995899545339990042

🚨BREAKING: Text Leaderboard Update 🐳 Deepseek-v3.2 enters the leaderboard at #38, and Deepseek-v3.2-thinking lands at #41. For comparison, previous versions ranked higher: 🔹 v3.2 ranks #38 (-5 pts v3.1 and -14 pts v3.2-exp) 🔹 v3.2-thinking ranks #41 (-7 pts vs v3.1-thinking https://x.com/arena/status/1996707563208167881

Compare how DeepSeek V3.2 performs relative to models you are using or considering at: https://x.com/ArtificialAnlys/status/1996110266065715249

DeepSeek’s new DeepSeekMath-V2 hits gold-medal performance on IMO and Putnam. It’s the first open model that can check its own proofs, fix mistakes, and improve itself. DeepSeekMath-V2 uses two “minds” in one model: ▪️ A verifier – Reads a proof and points out issues. – https://x.com/TheTuringPost/status/1994926897248288813

very smart choices by @stochasticchasm and the arcee team. in terms of arch, this is pretty much the perfect setup if you’re a bit constrained by compute/time and can’t do 100s of ablations hybrid nope, gated attention, norms to stabilize everything, muon, deepseek routing this”” / X https://x.com/eliebakouch/status/1995600008603697346

EU to Open Bidding for AI Gigafactories in Early 2026 – WSJ https://www.wsj.com/tech/ai/eu-to-open-bidding-for-ai-gigafactories-in-early-2026-809b7570?st=k7U8kH&reflink=desktopwebshare_permalink

@swyx yeah all public info – the 3K cluster is in the main mistral 3 blog post today, and the 18K cluster was announced by nvidia earlier this year https://x.com/AnjneyMidha/status/1996000762904936755

And Mistral Large 3, a frontier class open source MoE. https://x.com/MistralAI/status/1995872771516354828

🎉 Congratulations to the Mistral team on launching the Mistral 3 family! We’re proud to share that @MistralAI, @NVIDIAAIDev, @RedHat_AI, and vLLM worked closely together to deliver full Day-0 support for the entire Mistral 3 lineup. This collaboration enabled: • NVFP4 https://x.com/vllm_project/status/1995890057224618154

Europe still has one frontier model maker that can generally keep pace with Chinese open weights models, though no reasoner for Mistral 3 yet means they are behind the curve of actual performance – DeepSeek r1 got 71.5% on GPQA Diamond (& 1-shot, not 5-shot) back in January. https://x.com/emollick/status/1996068920596594932

I want to especially thank @MistralAI for releasing the base models for Mistral 3. Fewer companies are sharing base models and this opens many use cases from custom instruct to non-instruct cases”” / X https://x.com/QuixiAI/status/1996272948378804326

Meet the Ministral 3 models from @MistralAI! – 3B, 8B, and 14B models – Instruct, reasoning, and base variants – Supports tool use and vision input – Open-weights, Apache 2.0 licensed https://x.com/lmstudio/status/1995908228526604451

Mistral 3 is now available on Ollama v0.13.1 (currently in pre-release on GitHub). 14B: ollama run ministral-3:14b 8B: ollama run ministral-3:8b 3B: ollama run ministral-3:3b Please update to the latest Ollama. https://x.com/ollama/status/1995885696360566885

Mistral releases Ministral 3, their new reasoning and instruct models! 🔥 Ministral 3 comes in 3B, 8B, and 14B with vision support and best-in-class performance. Run the 14B models locally with 24GB RAM. Guide + Notebook: https://x.com/UnslothAI/status/1995874975631503479

NEW: @MistralAI released a fantastic family of multimodal models, Ministral 3. You can fine-tune them for free on Colab using TRL ⚡️, supporting both SFT and GRPO https://x.com/SergioPaniego/status/1996257877871509896

NEW: @MistralAI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯 https://x.com/xenovacom/status/1995879338583945635

Run Mistral Large 3 on Ollama’s cloud: ollama run mistral-large-3:675b-cloud”” / X https://x.com/ollama/status/1996682858933768691

Super nice to see Mistral Large 3 as the #1 OSS model for coding on lmarena 🥳😎🙌 And the spoiler alert! 👀👀”” / X https://x.com/sophiamyang/status/1996587296666128398

Support for running Mistral Large 3 locally will be available in Ollama soon.”” / X https://x.com/ollama/status/1996683156817416667

The Bert-Nebulon Alpha Stealth model is live now as @MistralAI’s new Mistral Large 3! Try the full release now on OpenRouter: https://x.com/OpenRouterAI/status/1995904288560988617

The world’s best small models–Ministral 3 (14B, 8B, 3B), each released with base, instruct and reasoning versions. https://x.com/MistralAI/status/1995872768601325836

Mistral Large 3 debuts as the #1 open source coding model on the @arena leaderboard. We’d love for you to try it! More on coding in a few days… 👀 https://x.com/MistralAI/status/1996580307336638951

Mistral AI raises 1.7B€ to accelerate technological progress with AI | Mistral AI https://mistral.ai/news/mistral-ai-raises-1-7-b-to-accelerate-technological-progress-with-ai

🧊 Off-policy RL for LLMs is hard. Dr. GRPO collapses at 10 steps off-policy. TBA doesn’t. @Kimi_Moonshot K2’s approach is robust too – both independently landed on the same key ingredients 🤝 We ablate RL recipe ingredients + show the 2 small changes giving off-policy https://x.com/bartoldson/status/1996769053420265959

You can now integrate Kimi CLI into JetBrains via the ACP. For details, check out the Kimi CLI GitHub repo: https://x.com/Kimi_Moonshot/status/1996953835080966390

> be arcee > look around > realize open-weight frontier MoE is basically a Qwen/DeepSeek monopoly > decide “nah, we’re building our own” > actual end-to-end pretraining > on US soil > introducing Trinity > Nano (6B MoE) and Mini (26B MoE) > open weights, Apache 2.0 > free on https://x.com/TheAhmadOsman/status/1995613231629381935

Our new Qwen3-TTS (version 2025-11-27) is here! 🚀 We’ve leveled up on what matters most: ✨ More Personalities: Over 49 high-quality voices, from cute and playful to wise and stern. Find your perfect match! 🌍 Global Reach: Now speaks 10 languages (zh, en, de, it, pt, es, ja, https://x.com/Alibaba_Qwen/status/1996947806138126547

The latest mlx-lm is out and it has continuous batching with mlx_lm.server! Added by @angeloskath Check-out the video of 4 simultaneous requests running with Qwen3 30B on the same M2 Ultra:”” / X https://x.com/awnihannun/status/1996365940343402596

TIL you can compile quantized models thanks to quanto although memory blows up a bit on Qwen3-VL https://x.com/mervenoyann/status/1996998362118201850