OpenAI: AI News Week Ending 05/16/2025

OpenAI: AI News Week Ending 05/16/2025

May 16, 2025

Image created with GPT Image 1. Image prompt: offset cyan grid on midnight-blue field, Movement cyan/blue palette, minimalist graphic design inspired by New Order’s ‘Movement’, metaphor for frontier research lightning bolt, flat color, subtle texture, 1980s Saville typography style

New HealthBench eval! Very excited we (@OpenAI) are investing in AI for health, a defining use case for AGI. Favorite plot is how the performance-cost frontier has improved over time. Congrats @rahularoradfs @thekaransinghal & team! Follow them for more exciting work to come https://x.com/_jasonwei/status/1922002699240775994

In September, 2024, physicians working with AI did better at the Healthbench doctor benchmark than either AI or physicians alone. With the release of o3 and GPT-4.1, AI answers are no longer improved on by physicians. Also error rates appear to be dropping for newer AI models. https://x.com/emollick/status/1922145507461197934

Having an API-only model breaks the flow of how I see a lot of AI adoption in organizations, which is people develop and experiment through the chat interface and then build product based on that. OpenAI success comes from ChatGPT users. Weird to have one model they can’t access.”” / X https://x.com/emollick/status/1922340589317578990

Netflix started testing its long-rumored generative AI search capabilities The feature, powered by OpenAI’s models, will allow users to search content by describing their mood or what they are looking for in natural language https://x.com/rowancheung/status/1920384611890127216

code search has been a major use case for deep research — excited to launch our Github integration so it can now directly search your repos”” / X https://x.com/isafulf/status/1920572177335669140

Not bad from GPT-4.1: “”create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future”” First go, no errors. https://x.com/emollick/status/1922749136996114771

o3 is shockingly good at this stuff, nailing the line between parody & nostalgia: “”create a screengrab from a [vaguely creepy 1970s children’s television/1980s action movie/1990s sitcom/ 2000s Cartoon Network] show that never existed and include the close captioned text”” https://x.com/emollick/status/1921011458239820275

what ilya saw https://x.com/andrew_n_carr/status/1922031056225439852

Welcome @fidjissimo! Fidji has been an amazing friend and colleague, with unique insights and advice on OpenAI. I’m super excited to work with her to deliver AGI that benefits all of humanity.”” / X https://x.com/gdb/status/1920344903466529193

A common question is “”can an AI make money?”” This benchmark, where AIs run a simulated vending machine over time, suggests yes, with an important caveat. On average, Claude 3.5 & o3-mini beat a human, but they are high in variance, and fail at random times for complex reasons. https://x.com/emollick/status/1921048218353197470

The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis | Humanities and Social Sciences Communications https://www.nature.com/articles/s41599-025-04787-y

The meta-analysis results of this study confirm the positive impacts of ChatGPT on learning performance, learning perception, and higher-order thinking,”” Plenty of caveats, but a meta-analysis of all 51 experimental papers suggests ChatGPT helps learning when used appropriately https://x.com/emollick/status/1921226900871037188

Safety evaluations hub | OpenAI https://openai.com/safety/evaluations-hub/

You can now export your deep research reports as well-formatted PDFs—complete with tables, images, linked citations, and sources. Just click the share icon and select ‘Download as PDF.’ It works for both new and past reports. https://x.com/OpenAI/status/1921998278628901322

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take https://x.com/OpenAIDevs/status/1920531856426143825

Coded up by GPT-4.1, rolling out today in ChatGPT. https://x.com/OpenAIDevs/status/1922709921772036164

Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. https://x.com/OpenAI/status/1921983050138718531

Microsoft dropped Pages, a new Copilot feature that allows users to collaborate with the AI assistant on answers It works pretty much like ChatGPT Canvas, but doesn’t seem to have coding capabilities https://x.com/rowancheung/status/1921815704900112546

You can now connect GitHub repos to deep research in ChatGPT. 🐙 Ask a question and the deep research agent will read and search the repo’s source code and PRs, returning a detailed report with citations. Hit deep research → GitHub to get started. https://x.com/OpenAIDevs/status/1920556386083102844

We’ve added support for the Responses API in the Evals API and dashboard. 🧭 https://x.com/OpenAIDevs/status/1923048126002102530

Benchmarking ARI: 76% Win Rate Over OpenAI Deep Research, According to OpenAI’s Model | You.com https://you.com/articles/o3-mini-judges-ari-enterprise-winner-over-openai-deep-research

SoftBank Stargate Venture With OpenAI Snags on Tariff Fears – Bloomberg https://www.bloomberg.com/news/articles/2025-05-12/softbank-stargate-venture-with-openai-hits-snags-on-tariff-fears?embedded-checkout=true

Plus, Pro, & Team users will be able to access GPT-4.1 via the “”more models”” dropdown in the model picker. Enterprise & Edu users will get access in the coming weeks. We’re also introducing GPT-4.1 mini, replacing GPT-4o mini, in ChatGPT for all users. https://x.com/OpenAI/status/1922707556402618533

By popular request: ✨GPT 4.1✨ is now available in ChatGPT for Plus/Pro/Teams subscribers (and soon to Enterprise/Edu). We built it for developers, so it’s very good at coding and instruction following—give it a try! Also, ✨GPT 4.1 mini✨ is replacing GPT 4o mini everywhere”” / X https://x.com/kevinweil/status/1922732062345142306

OpenAI negotiates with Microsoft for new funding and future IPO, FT reports | Reuters https://www.reuters.com/business/openai-negotiates-with-microsoft-unlock-new-funding-future-ipo-ft-reports-2025-05-11/

I don’t think we’ve fully appreciated how wild natively multimodal image generation is with GPT-4o and Gemini. This was one prompt. It used to be a whole ass ComfyUI workflow, with a variable hit rate — now it just works. Legit the closest thing to a “”graphic designer as an https://x.com/bilawalsidhu/status/1920277002935755135

o3, create a diagram explaining why these are indeed very good tools”” https://x.com/emollick/status/1921433141123670175

o3 now one-shots extremely New Yorker-y original New Yorker cartoons, and sometimes the punchlines are actually kind of okay. And most of them involve therapists or cellphones or both, which feels very on-brand. https://x.com/emollick/status/1920700991298572682

Building, launching, and scaling ChatGPT Images https://simonwillison.net/2025/May/13/launching-chatgpt-images/

Has OpenAI made any announcements about why GPT-4o image generation infuses everything with a sepia tone? https://x.com/emollick/status/1920344525823701140

Just days after launching GitHub integration, OpenAI has enhanced ChatGPT Deep Research with PDF export support This allows users to extract their reports as well-formatted PDFs, complete with tables, images, linked citations, and sources https://x.com/rowancheung/status/1922201261962678414

(4) Building, launching, and scaling ChatGPT Images https://newsletter.pragmaticengineer.com/p/chatgpt-images

Microsoft and OpenAI may be renegotiating their partnership | TechCrunch https://techcrunch.com/2025/05/11/microsoft-and-openai-may-be-renegotiating-their-partnership/

In talking to smart people in the real world it is clear how much how the AI labs have undermined use with naming & unclear instructions: “Why is 4o sometimes worse than o3, the number is higher?” “If I want to give it data, should I use a PDF or Excel?” “What does Canvas mean?””” / X https://x.com/emollick/status/1922355618918048172

So excited to work with @fidjissimo – she’s a product visionary!”” / X https://x.com/markchen90/status/1920353685156016488

.@fidjissimo is an amazing leader and this is a huge get”” / X https://x.com/saranormous/status/1920352615839211881

gpt-4.1 landing in chatgpt today!! we were initially planning on keeping this model api only but you all wanted it in chatgpt 🙂 happy coding!”” / X https://x.com/michpokrass/status/1922716587468984689

OpenAI uses FastAPI to serve ChatGPT and you’re complaining about Python and how FastAPI cannot support your 10 user CRUD app https://x.com/nrehiew_/status/1922668335960924579

You can now use @OpenAI’s Deep Research with Github repos. It takes a while but the results very impressive! https://x.com/ericciarla/status/1922058961081049576

OpenAI Expands Leadership with Fidji Simo | OpenAI https://openai.com/index/leadership-expansion-with-fidji-simo/

OpenAI also launched a GitHub connector for ChatGPT The feature will allow users to connect their repos and use ChatGPT’s Deep Research to read and search source code and PRs, creating a detailed report with citations https://x.com/adcock_brett/status/1921596972735111576

Can AI developers continue scaling up reasoning models like o3? @justjoshinyou13 reviews the available evidence in this week’s Gradient Update, and it appears that the rapid scaling of reasoning training, like the jump from o1 to o3, will likely slow down in a year or so. https://x.com/EpochAIResearch/status/1920932361136447740

GPT-4.1 has been a surprise hit — now available in ChatGPT:”” / X https://x.com/gdb/status/1922727473164227001

OpenAI quietly released their GPT-4.1 Prompting Guide. It’s a must read if you’re using agents or LLMs. https://x.com/LiorOnAI/status/1922306849795101044

‘AI models are capable of novel research’: OpenAI’s chief scientist on what to expect https://www.nature.com/articles/d41586-025-01485-2

OpenAI just dropped a GitHub connector for ChatGPT’s Deep Research Now you can plug into GitHub repos to search code, scan PRs, and auto-generate detailed, citation-backed reports — all inside ChatGPT. Dev workflows just got smarter https://x.com/rowancheung/status/1921815660214034672

Here’s a handy guide on how to get started, using an example of comparing gpt-4.1-mini with gpt-4o-mini on stored responses. https://x.com/OpenAIDevs/status/1923048127826722849

Reinforcement fine-tuning now available for o4-mini:”” / X https://x.com/gdb/status/1920708742477119585

This is a huge upgrade for ALL ChatGPT free users! GPT-4.1-mini replaces GPT-4o mini It’s honestly much, much better.”” / X https://x.com/scaling01/status/1922715792849674568

Super excited to ship Reinforcement Fine‑Tuning (RFT) on o4‑mini today 🎉 Our aim is to make RL as flexible & accessible as we can. Here’s a bit on what we built and why we’re pumped to let you customize our frontier reasoning models.”” / X https://x.com/john__allard/status/1920585315405676943

By popular request, GPT-4.1 will be available directly in ChatGPT starting today. GPT-4.1 is a specialized model that excels at coding tasks & instruction following. Because it’s faster, it’s a great alternative to OpenAI o3 & o4-mini for everyday coding needs.”” / X https://x.com/OpenAI/status/1922707554745909391

i built a sports card pricing tool using @Replit + @OpenAI the kicker? i went from idea to prototype in 2 hours 🤯 learn my exact process and how i built it in < 12 minutes you can just code things https://x.com/billyjhowell/status/1917026804348465317

Super excited to work with @fidjissimo even more closely. Welcome to @OpenAI! It’s fun and wild and inspiring.”” / X https://x.com/kevinweil/status/1920348319856943114

Famously, GPT-4o makes up citations to papers (though error rates appear far lower for citations generated by Deep Research models). How often does it do that? This clever large-scale study gives us a clear picture. The AI is also biased towards shorter titles & famous papers. https://x.com/emollick/status/1920319164993933511

OpenAI introduces HealthBench, a new open-source LLM benchmark for health! Across frontier models, o3 is the best performing model with a score of 60%, followed by Grok 3 (54%) and Gemini 2.5 Pro (52%) A deeper dive: HealthBench consists of 5,000 synthetically generated https://x.com/iScienceLuvr/status/1922013874687246756

Introducing HealthBench | OpenAI https://openai.com/index/healthbench/

ChatGPT for learning:”” / X https://x.com/gdb/status/1921259212170055919

What’s the carbon footprint of using ChatGPT? https://www.sustainabilitybynumbers.com/p/carbon-footprint-chatgpt

OpenAI’s Stargate project reportedly struggling to get off the ground, thanks to tariffs | TechCrunch https://techcrunch.com/2025/05/12/openais-stargate-project-reportedly-struggling-to-get-off-the-ground-thanks-to-tariffs/

Releasing the OpenAI to Z Challenge — using o3/o4 mini and GPT 4.1 models to discover previously unknown archaeological sites:”” / X https://x.com/gdb/status/1923105670464782516

Announcing the OpenAI to Z Challenge: use OpenAI o3, o4-mini, or GPT-4.1 to find previously unknown archaeological sites in the Amazon. Use #OpenAItoZ to share your progress. https://x.com/OpenAIDevs/status/1923062948060168542

OpenAI to Z Challenge | OpenAI https://openai.com/openai-to-z-challenge/

Good debate on idea generation and AI: 1) Experimental paper finds that using the old GPT-3.5 helps people generate better ideas 2) Response paper finds that the AI’s ideas are all quite similar to each other 3) Response to that argues that it may not matter as results are good https://x.com/emollick/status/1922717797848613068

OpenAI for programming a robot:”” / X https://x.com/gdb/status/1921963245071475107

Deep Research can now connect to your organization’s Sharepoint:”” / X https://x.com/gdb/status/1922315410600312932