AGI: AI News Week Ending 03/28/2025

“🚨 Gemini 2.5 Pro Exp dropped and it’s now #1 across SEAL leaderboards: 🥇 Humanity’s Last Exam 🥇 VISTA (multimodal) 🥇 (tie) Tool Use 🥇 (tie) MultiChallenge (multi-turn) 🥉 (tie) Enigma (puzzles) Congrats to @demishassabis @sundarpichai & team! 🔗 https://x.com/alexandr_wang/status/1904590438469951873

“Gemini 2.5 Pro Experimental on Livebench 🤯🥇 https://x.com/OfficialLoganK/status/1904925675892728179

“Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the @lmarena_ai leaderboard by a drastic 40 ELO margin. Only a handful of https://x.com/OriolVinyalsML/status/1904583691566727361

“Gemini 2.5 Pro is an awesome state-of-the-art model, no.1 on LMArena by a whopping +39 ELO points, with significant improvements across the board in multimodal reasoning, coding & STEM. You can try it out now in AI Studio https://x.com/demishassabis/status/1904587103805006218

“Today we are launching 2.5 Pro! I think it’s the best model in the world. State-of-the-art reasoning and great vibes (+39 ELO gap on lmsys!) 2.5 Pro improves in coding, stem, multimodal, instruction following, and lots more. Available in AI Studio & the Gemini App! https://x.com/jack_w_rae/status/1904583894458110218

“Introducing Gemini 2.5 Pro, the world’s most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc) Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon! https://x.com/OfficialLoganK/status/1904580368432586975

Gemini 2.5: Our newest Gemini model with thinking https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/

“Gemini 2.5 now does a much better job than most AIs writing plots & characters, thanks in part to its ability to “think” through details in advance. A fun test: “play a game of Fiasco (think a Coen Brothers movie, but as an RPG), make it interesting in terms of plot and prose” https://x.com/emollick/status/1904656593083396541

Google is rolling out Gemini’s real-time AI video features | The Verge https://www.theverge.com/news/634480/google-gemini-live-video-screen-sharing-astra-features-rolling-out

“Deep Dive Video: Complex image editing used to take hours — now Google’s Gemini 2.0 turns advanced ComfyUI & Photoshop workflows into simple text prompts. Here’s exactly how to try it (completely free). Chapters: 00:00 Conversational Editing with Google’s Multimodal AI 00:53 https://x.com/bilawalsidhu/status/1903172149764034605

“Google’s new Gemini 2.5 Pro Experimental takes the #1 position across a range of our evaluations that we have run independently Gemini 2.5 Pro is a reasoning model, it ‘thinks’ before answering questions. Google has released it as an experimental API in AI Studio only, and has https://x.com/ArtificialAnlys/status/1904923020604641471

“BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard – the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename “nebula”🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer https://x.com/lmarena_ai/status/1904581128746656099

Tracing the thoughts of a large language model \ Anthropic https://www.anthropic.com/research/tracing-thoughts-language-model

“Anthropic continues to do some of the best AI interpretability research out there” / X https://x.com/iScienceLuvr/status/1905354818451112352

Announcing ARC-AGI-2 and ARC Prize 2025 https://arcprize.org/blog/announcing-arc-agi-2-and-arc-prize-2025

“In addition to the ARC-AGI-2 release, we’re launching the ARC Prize 2025 competition, with a $700,000 grand prize for getting to 85%, as well as many other progress prizes. It will be live on Kaggle this week. We’re also reopening our public leaderboard for continuous benchmark” / X https://x.com/fchollet/status/1904266438959084003

“The key lesson of ARC-AGI is that if you spend the time and effort building and promoting a solid benchmark that has not been saturated, you can steer the efforts of billion dollar training runs in your direction Surprised more attempts are made to develop these sorts of tests” / X https://x.com/emollick/status/1904409850450174352

“As someone who has spent a lot of time thinking and building in AI education, and sees huge potential, I have been shown this headline a lot I am sure Alpha School is doing interesting things, but there is no deployed AI tutor yet that drives up test scores like this implies. https://x.com/emollick/status/1904193352045596745

AI ‘tutor’ boosts Texas private school test scores to top 2% nationally | Fox News https://www.foxnews.com/media/texas-private-schools-use-ai-tutor-rockets-student-test-scores-top-2-country

“When working with LLMs I am used to starting “New Conversation” for each request. But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between” / X https://x.com/karpathy/status/1902737525900525657

“Some similarities between our brains & LLMs: “The study revealed a remarkable alignment between the neural activity in the human brain’s speech areas and the model’s speech embeddings & between the neural activity in the brain’s language area and the model’s language embeddings.”” / X https://x.com/emollick/status/1903500731899944995

“This is a more significant paper than people seem to realize. You can give the AI a novel picture of a location and it can, with reasonable accuracy, tell you where it was taken even if it hasn’t “seen” that picture before This is a finding with a lot of real-world implications” / X https://x.com/emollick/status/1903135115334594871

A quote from Greg Kamradt Today we’re excited to launch ARC-AGI-2 to challenge the new frontier. ARC-AGI-2 is even harder for AI (in particular, AI reasoning systems), while maintaining the same relative ease for humans. https://simonwillison.net/2025/Mar/25/greg-kamradt/

“ARC Prize 2025 is now live on Kaggle! $700k Grand Prize and $125k in progress prize.” / X https://x.com/fchollet/status/1904945818605650027

“Since o1, and especially since DeepSeek-R1, improved “reasoning” has basically become the standard for new LLMs. Speaking of which, Gemini 2.5 Pro just came out as the latest reasoning model offering, which ends up at the top of most benchmarks (notably Humanity’s Last Exam). https://x.com/rasbt/status/1904940955192418555

“2.5 Pro will come to Advanced users in the @GeminiApp. 💬 Simply select it in the model dropdown on desktop and mobile. It will also soon be available on @GoogleCloud’s #VertexAI platform. Developers and enterprises can start experimenting now in @Google AI Studio → https://x.com/GoogleDeepMind/status/1904581166755123463

“Gemini models, both 2.5 Pro and 2.0 Flash, have the fastest output speed compared to leading models https://x.com/ArtificialAnlys/status/1904923026820653435

“Gemini 2.5 Pro is live in the WebDev Arena! You can even see Gemini’s progress against its past. ⚔️ ✏️ “Animate layered triangles or polygons rotating in opposite directions with changing hues, creating a mesmerizing kaleidoscope effect”. https://x.com/lmarena_ai/status/1905002138172112971

“Gemini 2.5 Pro is a very good model, seems like a real step forward, in both metrics and practical use. I think because it is labelled 2.5 and was sort of quietly rolled out, people may miss how big a jump it is, but discussions are making me think others are feeling similarly.” / X https://x.com/emollick/status/1904756949284983200

“Introducing Gemini 2.5 Pro Experimental. The 2.5 series marks a significant evolution: Gemini models are now fundamentally thinking models. This means the model reasons before responding, to maximize accuracy — and it’s our best Gemini model yet. Blog -” / X https://x.com/NoamShazeer/status/1904581813215125787

“Built on our state-of-the-art Gemma models, TxGemma can understand and predict the properties of small molecules, chemicals, proteins and more. This could help scientists identify promising targets faster, predict clinical trial outcomes, and reduce overall costs. https://x.com/GoogleDeepMind/status/1905274926665208043

“Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks – meaning it can handle complex problems and give more accurate responses. Try it now → https://x.com/GoogleDeepMind/status/1904579660740256022

“Gemini 2.5 is a very impressive model so far. For example, it is only the third LLM (after Grok 3 and o3-mini-high) to be able to pull off “create a visually interesting shader that can run in twigl-dot-app make it like the ocean in a storm,” and I think the best so far. https://x.com/emollick/status/1904700257822540076

“Gemini 2.5 Pro is the most skilled model in the world, please let us know if you have any issues using it! ♊️♊️♊️” / X https://x.com/zacharynado/status/1904641052096754156

“Gemini 2.5 Pro Released and it’s beating every other model across the board🤯 💡Includes reasoning with Chain of Thought 💻Top 1 on Aider code editing leaderboard 🆓The model is free for everyone to use https://x.com/casper_hansen_/status/1904590489128440163

“Today we’re releasing an experimental version of Gemini 2.5 Pro. 💡2.5 Pro shows strong reasoning and improved code capabilities, with state-of-the-art performance across a range of benchmarks. 📈It’s topped @lmarena_ai’s leaderboard by a huge margin https://x.com/Google/status/1904581629017735261

“Google is winning by so much. We have always been the best in AI. 🔥 Gemini 2.5 pro is the best model in the world 👇” / X https://x.com/YiTayML/status/1904598794278494272

“Google JUST dropped Gemini 2.5 Pro their “most intelligent AI model yet”. This new “thinking model” tops LMArena by a significant margin and crushes DeepSeek-R1, Grok 3 and Claude 3.7 in math, science and coding benchmarks. Ships with 1M token context (2M coming), can process https://x.com/bilawalsidhu/status/1904579366232695128

“📈 Arena Trend Update (Oct ’24 – Mar ’25) The past month saw a tight race at the top between @xAI and @OpenAI — and this week, a new shift! 😮 @GoogleDeepMind released Gemini 2.5 Pro and it pushed the Arena scoreboard to new highs 📈 #aitrends animation credit: Peter Gostev https://x.com/lmarena_ai/status/1905308013663281176

“Interestingly, if you look at almost every investment decision by venture capital, they don’t really believe in AGI either, or else can’t really imagine what AGI would mean if they do believe in it.” / X https://x.com/emollick/status/1903902120819999082

[2501.16496] Open Problems in Mechanistic Interpretability https://arxiv.org/abs/2501.16496

Meme humor – “OpenAI has reached AGI https://x.com/scaling01/status/1904694932909990153

“AI models are trained, not directly programmed, so we don’t understand how they do most of the things they do. Our new interpretability methods allow us to trace the steps in their “thinking”. Read the blog post: https://x.com/AnthropicAI/status/1905303838417973669

“OpenAI hiring a ‘Mechanical Architect, Robotics’ to develop the end-to-end mechanical architecture of robotic systems. The Robotics team is “focused on unlocking general-purpose robotics and pushing toward AGI-level intelligence in dynamic, real-world settings.” https://x.com/TheHumanoidHub/status/1904425238819209338

“Today, we’re releasing ARC-AGI-2. It’s an AI benchmark designed to measure general fluid intelligence, not memorized skills – a set of never-seen-before tasks that humans find easy, but current AI struggles with. It keeps the same format as ARC-AGI-1, while significantly https://x.com/fchollet/status/1904265979192086882

“We’re recruiting researchers to work with us on AI interpretability. We’d be interested to see your application for the role of Research Scientist ( https://x.com/AnthropicAI/status/1905303862883365370

“the dialectics of decentralization: all of the world’s data for one model one model available on every computer DeepSeek will build AGI and we must help them. https://x.com/teortaxesTex/status/1904851047270559935

“🧨 @ylecun dropping hard truth about actual AI capabilities: “The idea that we’re going to have a country of genius in the data center, that’s complete BS.” Scaling up LLMs will have a big impact, but it won’t create the next Einstein. They’re retrievers, not creators. https://x.com/fdaudens/status/1902791678395859249