Image created with Flux Pro v1.1 Ultra. Image prompt: Assembly instruction diagram for a desktop globe with removable continent sections, vintage educational style, ocean blue and earth tone colors, atlas-style background, “INTERNATIONAL” in classic map font, latitude/longitude markings, political boundary indicators

great to work with the UAE on our first international stargate! appreciate the governments working together to make this happen. sheikh tahnoon has been a great supporter of openai, a true believer in AGI, and a dear personal friend.”” / X https://x.com/sama/status/1926006829592543235

ByteDance’s Bagel 14B MOE (7B active) Multimodal with image generation (open source, apache license) is just an incredible modle. A unified multimodal model rivalling GPT-4o and Gemini 2.0, with 7B active params (14B total), 40K context, 88% GenEval and 85% understanding, https://x.com/rohanpaul_ai/status/1927705853580509607

deepseek-ai/DeepSeek-R1-0528 · Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

DeepSeek is aiming for the king: o3 and Gemini 2.5 Pro https://x.com/i/web/status/1928067335014793526

On GPQA Diamond, a set of PhD-level multiple-choice science questions, DeepSeek-R1-0528 scores 76% (±2%), outperforming the previous R1’s 72% (±3%). This is generally competitive with other frontier models, but below Gemini 2.5 Pro’s 84% (±3%). https://x.com/EpochAIResearch/status/1928489527204589680

GOP sneaks decade-long AI regulation ban into spending bill – Ars Technica https://arstechnica.com/ai/2025/05/gop-sneaks-decade-long-ai-regulation-ban-into-spending-bill/

NEW: Mistral AI announces Agents API – code execution – web search – MCP tools – persistent memory – agentic orchestration capabilities Cool to see that Mistral AI has joined the growing number of agent frameworks. More below: https://x.com/omarsar0/status/1927366520985800849

OFFICIAL BENCHMARKS OUT – we have a new open source frontier approaching O3 and Gemini 2.5 Pro 🔥🔥🔥 https://x.com/i/web/status/1928054949247693219

Meta shuffles AI, AGI teams to compete with OpenAI, ByteDance, Google https://www.axios.com/2025/05/27/meta-ai-restructure-2025-agi-llama

Exclusive: Musk’s DOGE expanding his Grok AI in US government, raising conflict concerns | Reuters https://www.reuters.com/sustainability/boards-policy-regulation/musks-doge-expanding-his-grok-ai-us-government-raising-conflict-concerns-2025-05-23/

UAE becomes the first country globally to provide free ChatGPT Plus access to all residents and citizens. UAE partners with OpenAI to offer free ChatGPT Plus access nationwide, as part of the Stargate UAE initiative to build the world’s largest AI supercomputing cluster, backed https://x.com/rohanpaul_ai/status/1926935591918182482

Build AI agents with the Mistral Agents API | Mistral AI https://mistral.ai/news/agents-api

Introducing Agents API: your go-to tool for building tailored agents to solve complex real-world problems! https://x.com/MistralAI/status/1927364741162307702

Mistral Agents | Hacker News https://news.ycombinator.com/item?id=41184559

On SWE-bench Verified, a benchmark of real-world software engineering tasks, DeepSeek-R1-0528 scores 33% (±2%), competitive with some other strong models but well short of Claude 4. Performance can vary with scaffold; we use a standard scaffold based on SWE-agent. https://x.com/EpochAIResearch/status/1928489533886058934

Inference providers aren’t sleeping on the switch. https://x.com/fdaudens/status/1927834963509961041

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently https://x.com/i/web/status/1928071179115581671

Capgemini and SAP partner with Mistral to deploy AI for sensitive sectors | Reuters https://www.reuters.com/business/capgemini-sap-partner-with-mistral-deploy-ai-sensitive-sectors-2025-05-26/

I am alarmed by the proposed cuts to U.S. funding for basic research, and the impact this would have for U.S. competitiveness in AI and other areas. Funding research that is openly shared benefits the whole world, but the nation it benefits most is the one where the research is”” / X https://x.com/i/web/status/1928099650269237359

ByteDance Seed introduces: Emerging Properties in Unified Multimodal Pretraining “”In this work, we introduce BAGEL, an open-source foundational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder-only model pretrained on trillions https://x.com/iScienceLuvr/status/1925162040534208758

Exclusive: Nvidia to launch cheaper Blackwell AI chip for China after US export curbs, sources say | Reuters https://www.reuters.com/world/china/nvidia-launch-cheaper-blackwell-ai-chip-china-after-us-export-curbs-sources-say-2025-05-24/

Pretty impressive 7B VLM coming out of Xiaomi 🤓 ViT encoder w/ MLP and powered by their 7B Text backbone Compatible w/ Qwen VL arch so works across vLLM, Transformers, SGLang and Llama.cpp Bonus: it can reason and is MIT licensed 🔥 https://x.com/reach_vb/status/1928360066467439012

0528 looks at the big picture… The sycophancy is really too much https://x.com/teortaxesTex/status/1927895061452210456

DeepSeek-R1-0528 just dropped on Hugging Face https://x.com/_akhaliq/status/1927790819001389210

DeepSeek: DeepSeek V3 0324 – Provider Status | OpenRouter https://openrouter.ai/deepseek/deepseek-chat-v3-0324/providers?sort=latency

In case you didn’t catch this – if you make this one simple change to the chat template, you can switch on and off reasoning in @deepseek_ai”” / X https://x.com/i/web/status/1927892447809454455

R1-0528 is out!🎉 https://x.com/i/web/status/1928084342732939642

Plus 13,8% on Aider Polyglot It’s almost like DeepSeek has explicitly stated that they will overhaul RL for coding in the conclusion of R1 paper in January. https://x.com/i/web/status/1927940397872947599

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: https://x.com/i/web/status/1928061589107900779

DeepSeek dropped DeepSeek R1 v2 this morning! We at Hyperbolic Labs now serve DeepSeek-R1-0528, the first inference provider serving this model on @huggingface. My vibe check: It seems to be the only model that consistently answers “”what is 9.9 – 9.11?”” correctly. 🐋 To whale: https://x.com/Yuchenj_UW/status/1927828675837513793

DeepSeek has maintained its status as amongst AI labs leading in frontier AI intelligence https://x.com/i/web/status/1928071183276159117

DeepSeek has released DeepSeek-R1-0528, an updated version of DeepSeek-R1. How does the new model stack up in benchmarks? We ran our own evaluations on a suite of math, science, and coding benchmarks. Full results in thread! https://x.com/EpochAIResearch/status/1928489524616630483

DeepSeek’s R1 update consolidates the lead of 🇨🇳 Chinese AI Labs in open weights intelligence https://x.com/i/web/status/1928226455424528519

Happy to share 💭 Mixture of Thoughts 💭 A curated, general reasoning dataset that trims down over 1M samples from public datasets to ~350k through an extensive set of ablations 🧑‍🍳 Models trained on this mix match or exceed the performance of DeepSeek’s distilled models — not https://x.com/_lewtun/status/1927043160275923158

Hey guys! We noticed some of you sharing screenshots and links to our DeepSeek-V3-0526 article on @UnslothAI. The link was hidden and wasn’t meant to be shared publicly or taken as a fact but it seems a few of you were scrapping through the site and uncovered it early! 😅 The”” / X https://x.com/danielhanchen/status/1926966742519091327

Now imagine DeepSeek R2 The 150IQ Tsinghua grads are about to outsmart the 120IQ MIT midwits https://x.com/i/web/status/1928073407943319588

Ollama can now think! 🤔🤔🤔 For thinking models, and especially useful for very thoughtful models like DeepSeek-R1-0528, Ollama can separate the thoughts and the response. Thinking can also be disabled! This is useful for getting a direct response. This works across https://x.com/ollama/status/1928543644090249565

Live in Cline: DeepSeek-R1-0528 It’s showing significant benchmark gains, now matching OpenAI o3 in reasoning tasks (key for Plan mode). We’re excited to observe how these improvements impact real-world coding performance in Cline. https://x.com/i/web/status/1928140455923044636

There will be DeepSeek R1 0528 Qwen 3 8B too matching Qwen 3 235B Thinking in performance too 🤯 Whale COOKED! https://x.com/i/web/status/1928058862923391260

DeepSeek R1 05-28 LiveBench results: – 8th in the Overall ahead of o4-mini, Gemini 2.5 Flash Preview and Qwen3-235B-A22B (biggest competitors) – 1st on Data Analysis !!! – 3rd on Reasoning !! – 4th on Mathematics ! – 11th on Language – 20th on Instruction Following – 23rd on https://x.com/i/web/status/1928173385399308639

Deepseek MLA is afaik the first attn variant that can hit compute-bound regime during inference decode, thanks to high arithmetic intensity (~256). If you’re paying $30k for an H100 and only max out the mem bw and not the FLOPS during inference, you’re leaving $20k on the table.”” / X https://x.com/tri_dao/status/1928170652516725027

We made dynamic 1bit quants for DeepSeek-R1-0528 – 74% smaller 713GB to 185GB. Use the magic incantation -ot “”.ffn_.*_exps.=CPU”” to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk. Quants here: https://x.com/danielhanchen/status/1928278088951157116

Deep Seek R1 Qwen3 8B knows it’s overthinking it 😂 https://x.com/i/web/status/1928119439737729482

The 4-bit DWQ of DSR1 Qwen3 8B is up on HF. Use the command below or use it in @lmstudio: https://x.com/awnihannun/status/1928125690173383098

Meta understood that copying DeepSeek piecemeal is not working, and decided to copy the org structure, creating an internal AGI division. a cruel rhyme from Russian school program comes to mind “”And you, my friends, no matter your positions,         Will never be musicians!”” https://x.com/i/web/status/1927944123358581182

Canada now has a minister of artificial intelligence. What will he do? | CBC News https://www.cbc.ca/news/politics/artificial-intelligence-evan-solomon-1.7536218

China approaches AI “”like electricity, not nuclear weapons”” vs the US (per The Economist). Key difference: 🇺🇸 Focus on building models 🇨🇳 Focus on practical applications Worth a read. https://x.com/fdaudens/status/1927020700302184634

Chinese scientists develop AI model to predict stellar flares – Chinadaily.com.cn https://www.chinadaily.com.cn/a/202505/28/WS68366271a310a04af22c1e46.html

Gemma 3 abliterated again ✂️✂️ Abliteration removes refusals from the models. This new and improved version targets refusals with more accuracy, based on previous work with Qwen 3. Here’s how to do it https://x.com/i/web/status/1928030013275918464

Just a few minutes later & the updated R1 is already available on some of our inference partners. All on the model page – beautiful! https://x.com/ClementDelangue/status/1927825872221774281

A new recipe for training multimodal models 👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees. ByteDance proposed and implemented this idea in their https://x.com/TheTuringPost/status/1927123359969468420

English-centric LLMs struggle with other languages, especially low-resource ones, when using standard fine-tuning. Cross-Lingual Optimization (CLO) efficiently transfers English-focused LLMs to target languages. It uses translated data to improve target language skill while https://x.com/rohanpaul_ai/status/1926585158238343518

Here’s what happens when a national institution builds its own digital intelligence: France’s Ministry of Culture just released 17K+ real users testing 30+ chatbots in French. Raw, diverse, and a goldmine for studying LLMs in the wild. https://x.com/fdaudens/status/1925909858433241411

Just FYI all the reports from our RL experiments have not been on Qwen, they’ve been on Llama (DeepHermes 8B) – so hopefully that gives some additional assurance on the impact RL can have and that its not random god-mode qwen math improvements from randomness”” / X https://x.com/i/web/status/1928184393035559191

There is a nice documentation for this release. You can see below the things that are supported. Persistent state across conversations, image generation, handoff capabilities, structured outputs, document understanding, citations, and more. https://x.com/omarsar0/status/1927367265789387087

Introducing Stargate UAE | OpenAI https://openai.com/index/introducing-stargate-uae/

Free AI for all? UAE becomes first to offer ChatGPT Plus to every resident and citizen – The Arabian Stories News https://www.thearabianstories.com/2025/05/25/free-ai-for-all-uae-becomes-first-to-offer-chatgpt-plus-to-every-resident-and-citizen/

Musk-Altman AI rivalry complicating Trump’s dealmaking in Middle East https://www.cnbc.com/2025/05/29/musk-altman-ai-rivalry-complicating-trumps-dealmaking-in-middle-east.html

OpenAI to set up shop in South Korea https://www.techinasia.com/news/openai-to-set-up-shop-in-south-korea

kicking the qwen randomly makes it work better”” like old TVs. I’m not reading any of it at this point”” / X https://x.com/teortaxesTex/status/1927459880341782700

How does an LLM writing out this program (WITHOUT a code interpreter running the output) make things more accurate? Verified on Qwen 3 – a30b (below) Lots of interesting takeaways from the Random Rewards paper. NOT that RL is dead, but honestly far more interesting than that! https://x.com/hrishioa/status/1927974614585725353

random rewards only work for Qwen models but not for other models improvements with random rewards were due to clipping, and disappear once clipping is removed Conjecture by authors: “”Under clipping, random rewards don’t teach task quality – instead, they trigger a”” / X https://x.com/scaling01/status/1927424801938825294

Why are almost all RL experiments done on qwen models? Kind of interesting right…”” / X https://x.com/i/web/status/1927948317931000277

Worth thinking about how this paper reflects on every other RL paper using Qwen. If Qwen works with any random reward, how do we know if any of these papers actually does anything”” / X https://x.com/nrehiew_/status/1927424673702121973

It’s interesting how the major LLM API vendors are converging on the following features: – Code execution: Python in a sandbox – Web search – like Anthropic, Mistral seem to use Brave – Document library aka hosted RAG – Image generation (FLUX for Mistral) – Model Context Protocol”” / X https://x.com/simonw/status/1927378768873550310

Agent Connectors You can connect tools like web search and code execution to the agents. Other built-in tools include image generation and a document library (accessing documents from Mistral Cloud) for building agentic RAG systems. https://x.com/omarsar0/status/1927369763023396900

“Mecha Combat Arena” will be broadcast live on Chinese national TV channels on May 25th at 8:30 PM China time. https://x.com/TheHumanoidHub/status/1925586473077940407

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading