Ethan B. Holland

Over 56,100 manually organized AI links and counting

Science and Medicine: AI News Week Ending 05/16/2025

May 16, 2025

Image created with GPT Image 1. Image prompt: rose cluster centre with thin color-code strip footer, PCL muted floral palette, minimalist graphic design inspired by New Order’s ‘Power, Corruption & Lies’, metaphor for microscope circuits illuminating data, flat color, subtle texture, 1980s Saville typography style

AlphaEvolve: A coding agent for scientific and
algorithmic discovery https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

New HealthBench eval! Very excited we (@OpenAI) are investing in AI for health, a defining use case for AGI. Favorite plot is how the performance-cost frontier has improved over time. Congrats @rahularoradfs @thekaransinghal & team! Follow them for more exciting work to come https://x.com/_jasonwei/status/1922002699240775994

We also applied AlphaEvolve to over 50 open problems in analysis ✍️, geometry 📐, combinatorics ➕ and number theory 🔂, including the kissing number problem. 🔵 In 75% of cases, it rediscovered the best solution known so far. 🔵 In 20% of cases, it improved upon the previously https://x.com/GoogleDeepMind/status/1922669334142271645

Discovering novel algorithms with AlphaTensor – Google DeepMind https://deepmind.google/discover/blog/discovering-novel-algorithms-with-alphatensor/

In September, 2024, physicians working with AI did better at the Healthbench doctor benchmark than either AI or physicians alone. With the release of o3 and GPT-4.1, AI answers are no longer improved on by physicians. Also error rates appear to be dropping for newer AI models. https://x.com/emollick/status/1922145507461197934

code search has been a major use case for deep research — excited to launch our Github integration so it can now directly search your repos”” / X https://x.com/isafulf/status/1920572177335669140

We applied AlphaEvolve to a fundamental problem in computer science: discovering algorithms for matrix multiplication. It managed to identify multiple new algorithms. This significantly advances our previous model AlphaTensor, which AlphaEvolve outperforms using its better and https://x.com/GoogleDeepMind/status/1922669331336384515

FDA Announces Completion of First AI-Assisted Scientific Review Pilot and Aggressive Agency-Wide AI Rollout Timeline | FDA https://www.fda.gov/news-events/press-announcements/fda-announces-completion-first-ai-assisted-scientific-review-pilot-and-aggressive-agency-wide-ai

Ex-Google CEO Eric Schmidt-backed FutureHouse dropped five ‘AI Scientist’ agents: —Crow for general research —Falcon for deep literature reviews —Owl for identifying previous research —Phoenix for chemistry workflows —Finch for discovery in biology https://x.com/adcock_brett/status/1921597086002287090

Announcing the newest releases from Meta FAIR. We’re releasing new groundbreaking models, benchmarks, and datasets that will transform the way researchers approach molecular property prediction, language processing, and neuroscience. 1️⃣ Open Molecules 2025 (OMol25): A dataset https://x.com/AIatMeta/status/1922690879279808572

Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. https://x.com/OpenAI/status/1921983050138718531

Introducing Collaborative Reasoner: a framework to improve collaborative reasoning in language models. Collaborative Reasoner paves the way for developing social agents that can partner with humans and other agents. Read the research paper and download the code. https://x.com/AIatMeta/status/1921978043998077011

One of the great ironies of AI writing is that the only people who can detect it with accuracy are people who use AI for writing a lot (at least if you take a majority vote among five such people). Non-users are no better than chance, and AI detectors are also less accurate. https://x.com/emollick/status/1920588718949159355

This feels hard to describe! Our Research team is cooking. @GoogleDeepMind AlphaEvolve is evolutionary coding agent using an ensemble of Gemini 2.0 Flash & Pro to discover and optimize algorithms that solve complex problems in mathematics and computing. Compared to other SWE https://x.com/_philschmid/status/1922913381746352188

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across @Google. 🧵 https://x.com/GoogleDeepMind/status/1922669321559347498

We’re excited to keep developing AlphaEvolve. This system and its general approach has potential to impact material sciences, drug discovery, sustainability and wider technological and business applications. Find out more ↓ https://x.com/GoogleDeepMind/status/1922669336101065183

We’ve just released HealthBench — a new eval for AI systems for health. Developed with 262 physicians who have practiced in 60 countries.”” / X https://x.com/gdb/status/1921987974356443595

II-Medical – Intelligent Internet https://ii.inc/web/blog/post/ii-medical

Mathematical discoveries from program search with large language models | Nature https://www.nature.com/articles/s41586-023-06924-6

Congrats to the AlphaEvolve, Gemini and Science teams!! Read more about it here: https://x.com/demishassabis/status/1922855470374572051

Wild breakthrough on Math after 56 years… [Exclusive] – YouTube https://www.youtube.com/watch?v=vC9nAosXrJw

OpenAI introduces HealthBench, a new open-source LLM benchmark for health! Across frontier models, o3 is the best performing model with a score of 60%, followed by Grok 3 (54%) and Gemini 2.5 Pro (52%) A deeper dive: HealthBench consists of 5,000 synthetically generated https://x.com/iScienceLuvr/status/1922013874687246756

Introducing HealthBench | OpenAI https://openai.com/index/healthbench/

Releasing the OpenAI to Z Challenge — using o3/o4 mini and GPT 4.1 models to discover previously unknown archaeological sites:”” / X https://x.com/gdb/status/1923105670464782516

Announcing the OpenAI to Z Challenge: use OpenAI o3, o4-mini, or GPT-4.1 to find previously unknown archaeological sites in the Amazon. Use #OpenAItoZ to share your progress. https://x.com/OpenAIDevs/status/1923062948060168542

OpenAI to Z Challenge | OpenAI https://openai.com/openai-to-z-challenge/

FutureHouse releases AI tools it claims can accelerate science | TechCrunch https://techcrunch.com/2025/05/01/futurehouse-releases-ai-tools-it-claims-can-accelerate-science/

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other “”zero”” models in math & coding domains. 🧵 1/ https://x.com/_AndrewZhao/status/1919920459748909288

Deep Research can now connect to your organization’s Sharepoint:”” / X https://x.com/gdb/status/1922315410600312932

Mass General Brigham’s researchers introduced FaceAge, an AI tool that can estimate cancer survival outcomes with facial photos The AI estimates biological age from photos, helping teams guide their treatment levels accordingly https://x.com/rowancheung/status/1922201339318206495

Did you know your face can reveal your biological age? @MGBResearchNews has developed FaceAge, an #AI algorithm that predicts biological age survival outcomes for patients with cancer using a single photo. Patients with cancer appeared five years older than their actual age, and https://x.com/MassGenBrigham/status/1920607240865698080

Current radiology report models lack expert-like structured reasoning. They fail to link visual findings to precise anatomical locations, hindering clinical trust. BoxMed-RL solves this with a two-phase framework. It first instills radiologist-like thinking and visual https://x.com/rohanpaul_ai/status/1921511349978632479

After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS https://x.com/NeelNanda5/status/1921928364790833651

The @vercel Chat SDK now features stream resumption. This makes AI conversations resilient to network hiccups and reloading or sharing a chat mid-generation. This is especially valuable for long responses (e.g.: Deep Research). No proprietary APIs, no sticky load balancing, just https://x.com/rauchg/status/1921168985900372081