International: AI News Week Ending 05/09/2025

International: AI News Week Ending 05/09/2025

May 9, 2025

Image created with GPT Image 1. Image prompt: In a diplomatic robe composed of dozens of woven flags melted into ankara and tartan panels, a proud Afro-Caribbean envoy strides across a granite runway of LED timezones; holding a briefcase-cane stamped with embassies and motherboards, nodding to Pan-African futurism — all captured on 35mm with slow shutter blur.

Sam Patterson https://sampatt.com/blog/2025-04-28-can-o3-beat-a-geoguessr-master

Anduril to acquire Ireland’s Klas to bolster AI warfare systems | Reuters https://www.reuters.com/business/aerospace-defense/anduril-acquire-irelands-klas-bolster-ai-warfare-systems-2025-05-05/

DeepSeek released Prover-V2, an open-source AI combining informal math reasoning with theorem proving With 671B params, the model solves 88.9% of problems on MiniF2F It does a ‘cold-start’ to break down proofs into subgoals before formal verification https://x.com/adcock_brett/status/1919060364655800684

We just released DeepSeek-Prover V2. – Solves nearly 90% of miniF2F problems – Significantly improves the SoTA performance on the PutnamBench – Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: https://x.com/zhs05232838/status/1917600755936018715

China’s Deep Robotics launched Lynx M20, a rugged version of its robo-dog It can run through rough terrain and extreme temperature Specialized for tasks like power inspection, emergency response, and logistics https://x.com/adcock_brett/status/1919060538379677767

Baidu is making big moves with MCP! At #BaiduCreate2025, they showed how MCP (Model Context Protocol) is already transforming how developers connect large models with real-world services. Here’s the breakdown: https://x.com/heyDhavall/status/1916087124865991098

UAE Rolls Out AI for Schoolkids in New Push for Sector Forefront – Bloomberg https://www.bloomberg.com/news/articles/2025-05-04/uae-rolls-out-ai-for-schoolkids-in-new-push-for-sector-forefront?embedded-checkout=true

India panel to review copyright law amid legal challenges to OpenAI | Reuters https://www.reuters.com/sustainability/boards-policy-regulation/india-panel-review-copyright-law-amid-legal-challenges-openai-2025-05-06/

Introducing OpenAI for Countries | OpenAI https://openai.com/global-affairs/openai-for-countries/

Qwen3 benchmark results 235B is a BEAST placing 3rd in the overall and with the best generalization among all tested models all of the Qwen3 models have very low or perfect percentages of invalid moves which means good instruction following 235B MoE > 32B > 14B > 30B MoE > 8B https://x.com/scaling01/status/1918031153312731536

2/ China’s Alibaba just released Qwen 3 with support for MCP and 119 languages. It matches the performance of DeepSeek-R1, OpenAI o1, o3-mini, and Grok-3. @Saboo_Shubham_ Plus, AI Agents with Qwen3 can now think deeper with hybrid reasoning modes. https://x.com/AtomSilverman/status/1918424770749874668

China’s Alibaba just released Qwen 3 with support for MCP and 119 languages. It matches the performance of DeepSeek-R1, OpenAI o1, o3-mini, and Grok-3. Plus, AI Agents with Qwen3 can now think deeper with hybrid reasoning modes. https://x.com/Saboo_Shubham_/status/1916972515077066922

Episode 167: Overnight Agent We share the results of our first overnight agent run. We fed DeepSeek R1 a summary of the new @Cloudflare agents SDK and asked it to think every 15 minutes about the entire conversation history and reflect on new ideas that extend the ideas https://x.com/OpenAgentsInc/status/1901964880594313542

2/ Cloudflare Agent SDK Summary @OpenAgentsInc fed DeepSeek R1 a summary of the new Cloudflare agents SDK and asked it to think every 15 minutes about the entire conversation history and reflect on new ideas that extend the ideas further. https://x.com/AtomSilverman/status/1918047663800631794

We’re launching Computer Use in smolagents! 🥳 -> As vision models become more capable, they become able to power complex agentic workflows. Especially Qwen-VL models, that support built-in grounding, i.e. ability to locate any element in an image by its coordinates, thus to https://x.com/AymericRoucher/status/1919783847597670780

Is Qwen3-235B the new budget-friendly coding champ in Cline? Early user feedback is rolling in — it’s promising, but not perfect. Here’s what we’re hearing from the Cline community: 🧵” / X https://x.com/cline/status/1917708041857949983

Alibaba’s Qwen team released Qwen3 family with 2 MoE models and 6 dense models —Models range from 600M to 235B params —Flagship version rivals OpenAI o1 & DeepSeek-R1 —Hybrid “thinking” in all —Boosted coding + agent performance —119 languages supported https://x.com/adcock_brett/status/1919060402417119375

3/ The Next Generation of Payments Is Here! @Razorpay becomes India’s first payment gateway with an Official MCP (Model Context Protocol) Server, designed for an AI-first world. https://x.com/AtomSilverman/status/1920168570954137795

The Next Generation of Payments Is Here! Today, Razorpay becomes India’s first payment gateway with an Official MCP (Model Context Protocol) Server, designed for an AI-first world. Here is how it works 👇 https://x.com/Razorpay/status/1916117747718672593

Baidu debuted Turbo versions of ERNIE 4.5 and X1, with faster speed and lower cost 4.5 Turbo comes at 11c and 44c per million I/O tokens (0.2% of GPT-4.5). X1 Turbo tops DeepSeek R1 & V3, coming at 14c/55c per million I/O tokens https://x.com/adcock_brett/status/1919060425770942619

Introducing Mistral Medium 3: our new multimodal model offering SOTA performance at 8X lower cost. – A new class of models that balances performance, cost, and deployability. – High performance in coding and function-calling. – Full enterprise capabilities, including hybrid or https://x.com/MistralAI/status/1920119463430500541

Huawei aims to take on Nvidia’s H100 with new AI chip | TechCrunch https://techcrunch.com/2025/04/28/huawei-aims-to-take-on-nvidias-h100-with-new-ai-chip/

NVIDIA’s CEO, Jensen Huang, shared his thoughts on the AI race with China, saying: —The country has great capability and is ‘right behind’ the U.S —Huawei has everything to advance AI (in terms of infra) —It’s a long-term, infinite competition https://x.com/rowancheung/status/1917844219571454027

Exclusive: US lawmaker targets Nvidia chip smuggling to China with new bill | Reuters https://www.reuters.com/world/us/us-lawmaker-targets-nvidia-chip-smuggling-china-with-new-bill-2025-05-05/

People expect the economic impact of a high-tariff regime to look like a downward step function, like in the image on the left (not expecting 145% tariffs on China — but let’s say 30%). In reality, it will be more like the image on the right. 1. The early response will be https://x.com/fchollet/status/1918258519624790273

Agility CEO Peggy Johnson expressed concern over Chinese military robots and said the US needs to keep pace technologically. She’s open to discussions with the Department of Defense to ensure the military has access to the best technology available. https://x.com/TheHumanoidHub/status/1918432599703454180

Amplify Initiative: Localized data for globalized AI https://research.google/blog/amplify-initiative-localized-data-for-globalized-ai/

How Google is preparing Asia Pacific’s workforce for the AI future https://blog.google/around-the-globe/google-asia/ai-opportunity-fund-asia-pacific/

Meet Solo Tech, one of the 10 international recipients of the second Llama Impact Grants. Solo Tech uses Llama to offer offline, multilingual AI support for underserved rural communities with limited internet access. This grant will help them to equip 50 rural centers with AI https://x.com/AIatMeta/status/1917727629601616030

PyTorch Day France Featured Sessions: A Defining Moment for Open Source AI – PyTorch https://pytorch.org/blog/pt-day-france-featured-sessions/

👏🏻 Excited to see Qwen3-235B-A22B’s impressive performance on LiveCodeBench! This positions Qwen3 as the top open model for competitive-level code generation, matching the performance of o4-mini (low). https://x.com/huybery/status/1919418019517776024

Pretty fucking incredible week so far: > Qwen3 – MoE (235B, 30B) + Dense (32, 14, 8, 4, 0.6B) > Xiaomi – MiMo 7B dense > Kyutai – Helium 2B dense > DeepSeek – Prover V2 671B MoE > Qwen2.5 Omni 3B > Microsoft – Phi4 14B Reasoning, Mini (3.8B) & Plus > JetBrains- Mellum 4B Dense” / X https://x.com/reach_vb/status/1917938596465750476

The community votes are in for Qwen3-235B-A22B 🥁 The latest open-source Qwen3 is now on the Arena Top 10 🏆 Congrats to @alibaba_qwen on this achievement! 👏 Highlights: 💠 For Chat: Qwen3-235B-A22B ranks #10, tied with o1 💠 Strong in Coding at #4 and Math #1 💠 For WebDev: https://x.com/lmarena_ai/status/1919448953042706759

DeepSeek quietly released Prover-V2, an open-source AI combining informal math reasoning with theorem proving —671B params —Solves 88.9% of problems on MiniF2F —Does ‘cold-start’ to break down complex proofs into subgoals before formal verification https://x.com/rowancheung/status/1917844254648324388

Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models | TechCrunch https://techcrunch.com/2025/04/28/alibaba-unveils-qwen-3-a-family-of-hybrid-ai-reasoning-models/

@Alibaba_Qwen Do you plan to make a Qwen 3 Coder in the future with FIM capabilities similar to Qwen 2.5 Coder?” / X https://x.com/ggerganov/status/1918373399891513571

🚨This week’s top AI/ML research papers: – DeepSeek-Prover-V2 – The Leaderboard Illusion – Phi-4-reasoning Technical Report – Mem0 – X-Fusion – Softpick – RL for Reasoning in LLMs with One Training Example – ReasonIR – RL for LLM Reasoning Under Memory Constraints – https://x.com/TheAITimeline/status/1919155696655843474

Is this the Smol Models Festival? – Phi-4 just dropped a new reasoning model – Qwen2.5 Omni now has a 3B version – And OLMo-2 1B version https://x.com/fdaudens/status/1917961029675347973

Medium is the new large. | Mistral AI https://mistral.ai/news/mistral-medium-3

Qwen 3 235B now on @togethercompute API! Qwen 3 is a reasoning model that has a non-reasoning instruct mode with allowance for setting a thinking budget. It’s efficient ($0.20/M input & $0.60/M output on our throughput optimized endpoint) and fantastic on a variety of” / X https://x.com/vipulved/status/1917777842466889873

A ton of impactful models and datasets in open AI past week, let’s summarize the best 🤩 link to all are on the next one ⤵️ 💬 @Alibaba_Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B https://x.com/mervenoyann/status/1919784802099540446

We will release the quantized models of Qwen3 to you in the following days. Today we release the AWQ and GGUFs of Qwen3-14B and Qwen3-32B, which enables using the models with limited GPU memory. Qwen3-32B-AWQ: https://x.com/Alibaba_Qwen/status/1918353505074725363

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general https://x.com/Alibaba_Qwen/status/1916962087676612998

⟪Humanoid⟫ is working with major UK retailers to pilot their humanoid robots in real-world scenarios. The company intends to begin mass production within five years.” / X https://x.com/TheHumanoidHub/status/1918005939594240471

How to train an AI model without falling into GDPR pitfalls? – Lexology https://www.lexology.com/library/detail.aspx?g=b58d1d9a-4803-468e-90ff-160d26139c5e