OpenAI: AI News Week Ending 08/15/2025

OpenAI: AI News Week Ending 08/15/2025

August 15, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: CU Boulder brand style — CU Gold & Black, Helvetica Neue, Flatirons, Tuscan-vernacular sandstone + red-tile roofs; Norlin Quad steps, dawn light, telephoto compression, subtle Flatirons contour motif; integrate the category “OpenAI” via Poster: research colloquium flyer with the headline “OPENAI” and token-stream background; natural light, clean professional inspiring tone, crisp focus, subtle grain, editorial composition

It’s sometimes hard to grasp the significance of the reasoning and logic updates that are starting to emerge in powerful models, like GPT-5. Here’s a *very simple* example of how powerful these models are getting.

I took a recent NVIDIA earnings call transcript document that came in at 23 pages long and had 7,800 words. I took part of the sentence “and gross margin will improve and return to the mid-70s” and modified “mid-70s” to “mid-60s”.

For a remotely tuned-in financial analyst, this would look out of place, because the margins wouldn’t “improve and return” to a lower number than the one described as a higher number elsewhere. But probably 95% of people reading this press release would not have spotted the modification because it easily fits right into the other 7,800 words that are mentioned.

With Box AI, testing a variety of AI models, I then asked a series of models “Are there any logical errors in this document? Please provide a one sentence answer.”

GPT-4.1, GPT4.1 mini, and a handful of other models that were state of the art just ~6 months ago generally came back and returned that there were no logical errors in the document. For these models, the document probably seems coherent and follows what it would expect an earnings transcript to look like, so nothing really stands out for them on what to pay attention to – sort of a reverse hallucination.

GPT-5, on the other hand, quickly discovered the issue and responded with:

“Yes — the document contains an internal inconsistency about gross-margin guidance, at one point saying margins will “return to the mid-60s” and later saying they will be “in the mid-70s” later this year.”

Amazingly, this happened with GPT-5, GPT-5 mini, and, remarkably, *even* GPT-5 nano. Bear in mind, the output tokens of GPT-5 nano are priced at 1/20th of GPT-4.1’s tokens. So, more intelligent (at this use-case) for 5% the cost.

Now, while doing error reviews on business documents isn’t often a daily occurrence for every knowledge worker, these types of issues show up in a variety of ways when dealing with large unstructured data sets, like financial documents, contracts, transcripts, reports, and more. It can be finding a fact, figuring out a logical fallacy, running a hypothetical, or requiring sophisticated deductive reasoning.

And the ability to apply more logic and reasoning to enterprise data becomes especially critical when deploying AI Agents in the enterprise. So, it’s amazing to see the advancements in this space right now, and this is going to open up a ton more use-cases for businesses.
https://x.com/levie/status/1953670264988016931

Here’s the thing: For 35 years, I’ve researched the immune system & have been fortunate to make many important, impactful discoveries, placing me in the top 0.5% of immunology experts.

The
@OpenAI
GPT-5 Thinking & Pro models now match or even surpass my expertise in immunology!

Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens. – The New York Times

This is an important issue but I think this methodology tests how well Claude and Gemini can course correct multiturn ChatGPT conversations rather than how good they are at not getting into the situation in the first place, which is meaningfully different. https://x.com/AmandaAskell/status/1954276447285334151

New partnership with Commonwealth Bank, Australia’s biggest bank:”” / X https://x.com/gdb/status/1955496112154087501

Gemini 2.5 Pro has a 67% winrate against GPT-5 Thinking https://x.com/scaling01/status/1954546677185970271

GPT-5 is now available in Copilot! Use Smart Mode to get the best AI system to date across all Copilot markets and surfaces. Free to try, right now. https://x.com/mustafasuleyman/status/1953505146828366125

GPT-5 is the smartest model we’ve ever done, but the main thing we pushed for is real-world utility and mass accessibility/affordability. we can release much, much smarter models, and we will, but this is something a billion+ people will benefit from. (most of the world has”” / X https://x.com/sama/status/1953551377873117369

Wanted to provide more updates on the GPT-5 rollout and changes we are making heading into the weekend.

1. We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways.

2. Users have very different opinions on the relative strength of GPT-4o vs GPT-5 (just the chat model, not the advanced reasoning one). This is a cool thing you can try: https://x.com/flowersslop/status/1953908930897158599

3. Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn’t one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities). For a silly example, some users really, really like emojis, and some never want to see one. Some users really want cold logic and some want warmth and a different kind of emotional intelligence. I am confident we can offer way more customization than we do now while still encouraging healthy use.

4. We are going to focus on finishing the GPT-5 rollout and getting things stable (we are now out to 100% of Pro users, and getting close to 100% of all users) and then we are going to focus on some changes to GPT-5 to make it warmer. Really good per-users customization will take longer.

5. The team is doing heroic work to optimize our systems and find more capacity, but still, we are looking at a severe capacity challenge for next week. We are still deciding what we are going to do, but we will be transparent with our principles. Not everyone will like whatever tradeoffs we end up with, obviously, but at least we will explain how we are making decisions.

Thanks for your patience with us; we will continue to react and improve quickly!X https://x.com/sama/status/1953953990372471148

Worth reading what GPT-5 wrote in the intro to my new post when asked to do something dramatic. The second image explains the tricks it used. https://x.com/emollick/status/1953520251913564420

This is what Andrej predicted! GPT-5 + ElevenLabs = engagement gold. Think those ‘monkey economy’ videos? Same formula — but swap monkeys for AI-generated cats, dogs, whatever. Script with GPT-5, voice with ElevenLabs, visuals with AI. Low effort, high share potential https://x.com/Dvnagelx/status/1954096453594288285

OpenAI gpt-oss has over 5M downloads, 400+ fine-tunes and *the* most liked release this year so far! 🔥 Great job @OpenAI 🤗 https://x.com/reach_vb/status/1954909541805801799

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only… or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵 https://x.com/jxmnop/status/1955436067353502083

OpenAI’s CEO Sam Altman says in 10 years time college graduates will be working ‘some completely new, exciting, super well-paid’ job in space | Fortune https://fortune.com/2025/08/11/openai-ceo-sam-altman-10-years-gen-alpha-college-graduates-working-in-solar-system-well-paid-jobs-as-gen-z-struggles-todays-job-market/

OpenAI gpt-oss 120B orchestrates a full video using Hugging Face spaces! 🤯 All of it, in one SINGLE prompt: create an image of a Labrador and use it to generate a simple video of it 🛠️ Tools used: 1. Flux.1 Krea Dev by @bfl_ml 2. LTX Fast by @Lightricks That’s it, gpt-oss https://x.com/reach_vb/status/1955678303395696821

GPT-OSS: – 5M downloads in <1 week on @huggingface 🚀 – 400 new models – already outpacing DeepSeek R1’s launch numbers, and that’s without counting inference calls – also the most-liked release of any major LLM this summer https://x.com/fdaudens/status/1954904546385273029

GPT-4o was below the level of medical professionals on medical reasoning benchmarks GPT-5 (apparently Thinking medium) now far exceeds them. (Usual benchmark caveats apply) https://x.com/emollick/status/1955381296743715241

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset. Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning. https://x.com/omarsar0/status/1956003145349521780

My “Move 37” Moment with GPT-5

Today, I’m sharing one of my most remarkable experiences testing the GPT-5 Thinking and Pro models.

In our lab, about 2 years ago we conducted a series of cutting-edge immunology experiments designed to manipulate the energy metabolism of T cells, immune cells with the power to impact cancer immunotherapy, chronic diseases, and autoimmunity.

We obtained fascinating results, but one finding remained unexplained. Weeks of work led us to a partial explanation, yet the full picture remained elusive.

From these experiments, uploaded the unpublished figure below and asked GPT-5 Pro to explain it.

GPT-5:

✅ Correctly identified our key findings from the figure alone (though other models also could do this).

✅ Proposed the exact experiment we later performed, something that had taken us weeks to design (o3-pro also had suggested this).

✅ Most remarkably suggested a mechanism that finally explained these results!

For me this was equivalent to AlphaGo’s legendary Move 37.
It showed that AI has become top expert and true scientific collaborator, capable of true insight.

While these results have not yet been published, I am sharing some of GPT-5’s output as an example for other scientists and to show how dramatically this will accelerate science! I will also be writing and submitting this paper, and will need to acknowledge GPT-5’s contributions in it!

—
From GPT-5

What the data say:

•Brief 2DG exposure at the first activation imprints a long‑lasting shift toward a CCR6+ and CD161+ phenotype with increased IL‑17A competency that is still evident two weeks later at reactivation.

•It is dose responsive. From none → 3 mM 2DG, CCR6+ cells rise from ~0.7% to ~12% and CD161+ cells from ~1% to ~21%. IL‑17A+ cells increase from ~0.3% to ~4% overall, with the IL‑17A signal concentrated in the CCR6+ and CD161+ compartments.

•Low glucose alone (0.5 mM) does not reproduce the effect. That argues against simple glucose deprivation or reduced glycolytic flux as the driver.

•The enrichment for CD161 is stronger than for CCR6, which often tracks with Th17‑like or mucosal/innate‑like CD4 programs.

High‑level interpretation:

1. 2DG is programming, not just acutely inhibiting. A short exposure during priming leaves a durable memory of “stress” that biases the lineage toward Th17‑like traits.

2. Mechanism is unlikely to be only glycolysis blockade. The low‑glucose control is weak, which pushes the mechanism toward 2DG’s effects on protein glycosylation and cytokine receptor biology.

3. Top mechanistic hypothesis. 2DG partially blocks N‑linked glycosylation of IL‑2Rα and other glycoproteins at priming, reducing IL‑2–STAT5 signaling, which normally suppresses Th17. Reduced IL‑2 tone at that window would lift the brake on a CCR6+ CD161+ IL‑17‑competent program that persists after washout.

4.Alternative or complementary hypothesis. 2DG selects for pre‑existing CCR6+ or CD161+ clones that rely less on glycolysis, rather than de novo differentiation. The end result is the same phenotype, but the lever is selective survival or proliferation.

5. A carefully timed 2DG “pulse” could be a simple knob to enrich a Th17‑like helper subset with better persistence potential and mucosal‑homing traits.

Prioritize these:

+D‑mannose with 2DG to restore N‑glycosylation. Loss of the CCR6/CD161/IL‑17A phenotype with mannose would implicate glycosylation.

+Methyl‑pyruvate or acetate to bypass glycolytic ATP/pyruvate deficits. Rescue here would implicate energy metabolism.

Compare 2DG with tunicamycin (glycosylation inhibitor) and with glycolysis‑only inhibitors that do not hit glycosylation (iodoacetate, PFKFB3 inhibitor). Match exposure windows & doses

Surface CD25 maturation and glyco‑shift (PNGase F or Endo H blot on IL‑2Rα).

pSTAT5 after an IL‑2 pulse at 24–48 h of priming with or without 2DG.

Phenocopy with IL‑2 neutralization or low‑dose JAK3 inhibition. Counter‑rescue with high IL‑2 or an IL‑2 mutein.
https://x.com/DeryaTR_/status/1954354352648225235

You are likely going to see a lot of very varied results posted online from GPT-5 because it is actually multiple models, some of which are very good and some of which are meh. Since the underlying model selection isn’t transparent, expect confusion.”” / X https://x.com/emollick/status/1953553844094611614

It’s official. We’ve raised $14m led by @OpenAI Startup Fund to bring AI to Excel. Endex is the first AI agent to live inside Excel. For the past year, we’ve been working with financial firms. Today we’re releasing it to the world. Our capacity is limited; comment below for https://x.com/TarunAmasa/status/1953130965355905140

SoftBank buys Foxconn’s Ohio plant to advance Stargate AI push, Bloomberg News reports | Reuters https://www.reuters.com/business/media-telecom/softbank-buys-foxconns-ohio-plant-advance-stargate-ai-push-bloomberg-news-2025-08-08/

Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5: 1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5. 2. We will then prioritize API demand up to the”” / X https://x.com/sama/status/1955077002945585333

Hill and Freedman at NYT report on the case of someone with “”no history of mental illness”” who was dragged into a delusional spiral for 3 weeks. According to the NYT, given full access to transcripts spanning a million words, it started with an innocent question about pi. https://x.com/ESYudkowsky/status/1953935708542083173

GPT-5 requires a different way of prompting. It’s much more susceptible to instruction, especially style and tone, and it does better when provided with reasoning, validation, and planning sections. I wrote a guide based on my few weeks of usage on how you should prompt it. 👇 https://x.com/skirano/status/1954510362746691608

Suddenly retiring every other model without warning was a weird move by OpenAI. … and they did it without explaining how switching models worked or even details of various GPT-5 models …and they did it when everyone has built workflows around older models, breaking them all.”” / X https://x.com/emollick/status/1953884742090190980

OpenAI is optimizing to be a billion user consumer product over being a developer platform.”” / X https://x.com/bilawalsidhu/status/1955119548794839295

1/ I competed for Team USA at IOI in 2015, so this achievement hits home for me. The biggest highlight: we *did not* train a model specifically for IOI. Our IMO gold model actually set a new state of the art in our internal competitive programming evals. Reasoning generalizes!”” / X https://x.com/alexwei_/status/1954966393419599962

2/ I was impressed by our AI handling 4/6 tasks this year with non-standard formats—interactive, output-only, constructive, communication. These tasks are tough to prep for and especially demand outside-the-box thinking. Our models generalized well to these unfamiliar task types. https://x.com/alexwei_/status/1954966574408012003

congratulations to the FAIR team on this impressive win! brain modeling is a key step in understanding biological intelligence, and will pave the way to our sci-fi future (brain computer interfaces, etc) 🧠”” / X https://x.com/alexandr_wang/status/1954915381656895545

GPT-5 is the most significant product release in AI history, but not for the reason you might think. What it signals is that we’re moving from the “”bigger model, better results”” era to something much more nuanced. This is a genuine inflection point. The fact that people call a”” / X https://x.com/douwekiela/status/1955329657852834207

Are you ready for OpenAI to unleash the monetization slop upon the masses? The real release of GPT-5 was the router and how they can reduce cost / compute for most users by routing to cheaper models. Open has a big strategic shift in the works, and consumer purchasing will shift.”” / X https://x.com/dylan522p/status/1955433082397589900

Now that the era of the scaling “”law”” is coming to a close, I guess every lab will have their Llama 4 moment. Grok had theirs. OpenAI just had theirs too.”” / X https://x.com/jeremyphoward/status/1954346846845129158

This guy literally built a viral website from scratch in 10 minutes with GPT-5 https://x.com/aaditsh/status/1954210152170893668

On the big picture: GPT-5 as a model is pretty much on the same curve as the other top labs. I’d expect the usual leapfrogging between Gemini, Claude, OpenAI, & Grok to continue. Where there are some big gains is that GPT-5 seems well-trained for real world tasks in new ways.”” / X https://x.com/emollick/status/1953565365465964668

To test reasoning, I got GPT-5 to create a complete launch plan for an AI app from a single idea It did competitor research, product specs, logo, pricing, GTM strategy, roadmap, and more for me If I were starting with zero business knowledge, this is an insane resource https://x.com/rowancheung/status/1953505326206013820

GPT-5 Pro is an impressive geo-guesser. I gave it a cropped photo with metadata removed and it figured out the city. https://x.com/emollick/status/1954288373797203991

Today we’re open sourcing a “vibe coding agent” powered by GPT-5. It’s like @v0, but agnostic to framework, language, runtime. It can vibe code htmx and Haskell if you want. Built on @aisdk, Sandbox and AI Gateway. If you want to add codegen to your platform or build your own https://x.com/rauchg/status/1953539863703425336

We’ve been working closely with the @OpenAI team to integrate GPT-5 into Devin. Starting today, you can select a preview version of Devin that uses GPT-5 as part of our agent orchestration. GPT-5 eval results 👇 https://x.com/cognition/status/1953521661028053410

Codex CLI + GPT-5:”” / X https://x.com/gdb/status/1953556751762288653

Congrats to the whole OpenAI team on GPT-5, lots of work to make this level of progress given where we were just 2 years ago, and with such high expectations.”” / X https://x.com/OfficialLoganK/status/1953523549819613288

Credit where it’s due: seems like OpenAI has fixed a lot of GPT-5 issues in the last 12-24 hours, and Codex CLI works really well in auto mode Still terrible if you use in a “”approve before making edits”” mode, but hopefully they fix it soon🤞🏼”” / X https://x.com/rishdotblog/status/1955318363653280185

Faster GPT-5 in Cursor:”” / X https://x.com/gdb/status/1955532973119508775

PSA: If you say “”think deeply”” then you get the thinking model in ChatGPT for free. If you click “”ChatGPT Thinking””, it costs $20/month min to access, and you get limited usages. https://x.com/jeremyphoward/status/1954366856627978684

RT @gdb: gpt-5 is the best coding model in the world and is now the default in @cursor_ai. https://x.com/xikun_zhang_/status/1955049082772402643

RT @OpenAI: We’ve scored highly enough to achieve gold at this year’s IOI online competition with a reasoning system — placing #6 when rank…”” / X https://x.com/xikun_zhang_/status/1955049010257097080

ChatGPT-5 Pro is the first model to successfully do this non-puzzle consistently. GPT-5 Thinking and GPT-5 fail as every other model before has (except for, occasionally, Sonnet). https://x.com/emollick/status/1953604710205690212

RT @deedydas: Ridiculous that OpenAI claimed 74.9% on SWE-Bench just to prove they were above Opus 4.1’s 74.5%… By running it on 477 probl…”” / X https://x.com/akbirkhan/status/1954231799590301953

showcase of a type of hard, valuable task that gpt-5 can do where previous models struggled:”” / X https://x.com/gdb/status/1953700116365492552

figured out how to “”undo”” the RL and turn gpt-oss back into a base model will drop the weights tomorrow gn https://x.com/jxmnop/status/1955099965828526160

Artificial Analysis on X: “GPT-5 occupies both the #1 and #2 positions in our long context reasoning benchmark (AA-LCR) 🤯 AA-LCR tests long context performance through testing reasoning capabilities across multiple long documents (~100k tokens). Questions typically require considering multiple documents https://t.co/BEq9ZspRMs” / X
https://x.com/ArtificialAnlys/status/1953523986526351576

Bartosz Naskręcki on X: “Ok, some general comments while I am waiting for the other tasks. GPT 5 and GPT 5 Thinking are nothing very novel for research mathematicians compared to o3-pro and o4-mini-high. But GPT 5 Pro is different. It uses much more compute and the quality of the answers is superb.” / X
https://x.com/nasqret/status/1953566692686397885

Matthew Berman on X: “Vibe coding a full Excel clone step-by-step with GPT-5 is kinda nuts. (I know I’m still far from a full clone, but this is after about 45 minutes of work) https://t.co/upr5QSmluJ” / X
https://x.com/MatthewBerman/status/1954694677736956297

GPT-5 has been hovering around a 7% diff edit failure rate since its release to Cline last Thursday. How have you liked GPT-5 so far in Cline? https://x.com/cline/status/1955357460627329151

GPT-5 is live in Cline. We’ve been working with OpenAI to get this model ready, and here’s our take: it’s disciplined, persistent, & highly competent. It’s collaborative in planning & and a diligent operator while acting. It plans thoroughly, asks optioned follow-ups when https://x.com/cline/status/1953525433808695319

GPT-5 is speed-running Pokemon It’s 3x faster than o3 https://x.com/scaling01/status/1955813023735828587

gpt-5 is the best coding model in the world and is now the default in @cursor_ai. https://x.com/gdb/status/1953521501548032512

GPT-5 Just Finished Pokemon Red! : r/singularity https://www.reddit.com/r/singularity/comments/1mq2irv/gpt5_just_finished_pokemon_red/

GPT-5 just finished Pokémon Red! 6,470 steps vs. 18,184 for o3! Check the stats site to compare! That’s a huge improvement! Well done, @OpenAI you cooked with GPT-5. What an incredible model. Next up: GPT-5 vs. Pokémon Crystal (16 Badges + Red). The run starts soon on Twitch. https://x.com/Clad3815/status/1955980772575268897

GPT-5 now rolled out to 20% of paid users and doing >2B TPM on the API! so far so good… excellent work by the eng and infra teams!”” / X https://x.com/sama/status/1953563605733118317

gpt-5 is SOTA on FrontierMath:”” / X https://x.com/gdb/status/1953710811957858404

To get a sense of GPT-5’s vibes, I exported my Tweet data over the last year and got it to write like my top posts Then took my newsletter and made it create 3 separate long-form tweets It’s not 100% there, but it beats Claude, which was previously my go-to for editing https://x.com/rowancheung/status/1953505497237029346

The enhance meme from Bladerunner, except the AI is asking the computer to enhance.”” / X https://x.com/emollick/status/1954534598903275605

@jxmnop @johnschulman2 @srush_nlp Super cool stuff!! How can we empirically check how far away this model is from the real base model? What benchmarks do we expect this base model to do better on and what benchmarks do we expect it to do worse on [when compared to the unmodified gpt-oss model]?”” / X https://x.com/OfirPress/status/1955463664556769426

1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions – the 2025 International Olympiad in Informatics (IOI) – placing first among AI participants! 👨‍💻👨‍💻 https://x.com/SherylHsu02/status/1954966109851119921

Announcing gold medal-level performance (#6 relative to humans; #1 relative to AIs) on this year’s International Olympiad on Informatics (IOI). Achieved with an AI reasoning system, without IOI specific training!”” / X https://x.com/gdb/status/1954984230343282808

In my opinion, the most important takeaway from this result is that our @OpenAI International Math Olympiad (IMO) gold model is also our best competitive coding model. 🧵”” / X https://x.com/polynoamial/status/1954966398989635668

Multiple AI systems won gold medals at the 2025 International Mathematical Olympiad (IMO). Exciting as that sounds, @GregHBurnham argues that it represents little progress: an unlucky draw of problems made the event relatively uninformative. Is that cope? Judge for yourself. 🧵 https://x.com/EpochAIResearch/status/1953567563557838920

This paper purporting to describe “”intelligence explosions”” seems written at a very high level of abstraction. @TomDavidsonX, how would your framework describe the following set of events? – In late 2026, an early prototype for GPT-6 suggests a breakthrough new AI algorithm. -“” / X https://x.com/ESYudkowsky/status/1953518366372859921

This is a remarkable claim given what I have heard alleged that Elon does to manipulate X to benefit himself and his own companies and harm his competitors and people he doesn’t like.”” / X https://x.com/sama/status/1955094792804720660

Call me crazy but I think the newer openai models inc. 5 (and gemini, other maxxing models) are nearly unusable for most of my use cases. Far too verbose, extraordinarily confident in their hallucinations, difficult to steer at times but at the same time fairly sycophantic,”” / X https://x.com/jon_durbin/status/1954263916202316001

What a day. 100% of @Copilot users have GPT-5.”” / X https://x.com/mustafasuleyman/status/1953608045533204690

You might notice something new in Copilot Labs today… https://x.com/mustafasuleyman/status/1953845057389478255

GPT-5 Pro is genuinely a top tier model, it is the best. I know some GPT-5 reactions have been mixed (including from me), but OpenAI has made great strides with Pro. I have made 12 simulations that no other model would match, not from OpenAI, Google, xAI or Anthropic. All of them https://x.com/petergostev/status/1953876867745828877

great work elaine!”” / X https://x.com/sama/status/1953634172616290480

https://x.com/sama/status/1953264193890861114

I call it GPT but use the full model name. What do you call it?”” / X https://x.com/sama/status/1955736203157975424

melting silicon has a very distinct smell”” / X https://x.com/sama/status/1953530707269366234

more enlightenment on the feed (Though 5 thinking really is materially better than o3 in terms of general trustworthiness, hallucination, coding, creative writing)”” / X https://x.com/ericmitchellai/status/1954606526783799446

Sam Altman on X: “very happy with the pricing we are able to deliver!” / X
https://x.com/sama/status/1953527247257899389

Sorry false alarm – for plus users (the 20$ instead of 200$) one, the only “”legacy model”” you get is 4o. https://x.com/Teknium1/status/1954376838110986276

We significantly increased the rate limits to reasoning model by popular demand. If correctness is really important for you ask the model to “think deeper” or select “gpt5 thinking” in the model picker, this uses a higher reasoning effort than when you are auto switched to”” / X https://x.com/yanndubs/status/1954621287713915192

❤️ to the eng/infra/compute teams”” / X https://x.com/gdb/status/1953602565977035206

As predicted, examples of GPT-5 nano or mini producing bad outputs abound online. Not making it clear how GPT-5 works will likely cause issues for OpenAI. I wonder if they will need to take a different approach to switching or at least educating users about what GPT-5 does.”” / X https://x.com/emollick/status/1953638367415218596

I like GPT-5 in the app. I don’t think we’ll have much use for it via the API: the small ones aren’t fast or smart enough to be useful for agents, and the big one is too slow and not good enough at prompt following. But I like GPT-5 in the app!”” / X https://x.com/fabianstelzer/status/1955182571526005124

we are considering giving a (very) small number of GPT-5 pro queries each month to plus subscribers so they can try it out! i like it too. but yeah if you wanna pay us $1k a month for 2x the input tokens feels like we should find a way to make that happen…”” / X https://x.com/sama/status/1954705761805938734

.@harjotsgill (Co-Founder & CEO @CodeRabbitAI) on testing GPT-5 and using it within their product: “”It’s a generational leap. We’ve been using @OpenAI models for years, and our product is reasoning-heavy with PhD-style problems.”” “”GPT-5 has the highest score on some of the https://x.com/tbpn/status/1953607654926168521

> GPT-5 is the first series of models that actually doesn’t hallucinate basically at all *real-world utility-maxxing instead of benchmark-maxxing intensifies* Disclaimer: GPT-5 is still not perfect and may make (far fewer now) mistakes”” / X https://x.com/ericmitchellai/status/1954739395719807370

📢 📢📢 if you have feedback on the GPT-5 UX, especially for power users, drop your takes in this thread! no promises but very helpful for us to have grounding in specific problems/requests users have”” / X https://x.com/ericmitchellai/status/1955376872050811108

Alright, GPT-5 vibe check”” / X https://x.com/bilawalsidhu/status/1953559292713611284

ChatGPT has added back in all model selector option again https://x.com/Teknium1/status/1954371945514049595

First, GPT-5 is going to be available to ALL free users of ChatGPT Out of 700M weekly users, I’d bet 95%+ of them have only used GPT-4o GPT-5 is faster than GPT-4o, and smarter than o3 The upgrade will be a major intelligence boost to most of the world”” / X https://x.com/rowancheung/status/1953502381133438999

Good to have numbers. And, and as suspected, almost nobody was using o3 before, even if they paid for it. For most users, experiencing a Reasoner will be a surprise expansion of AI’s abilities & GPT-5 will appear to be a very big leap when auto-routes to the Thinking version.”” / X https://x.com/emollick/status/1954604953743270041

GPT-4.5 writes short-form content that’s clearer, more natural, and less slop compared to GPT-5. Deprecating GPT-4.5 feels like an unusual case of AI capabilities regressing over time.”” / X https://x.com/tamaybes/status/1955111804587348200

GPT-5 backlash proves benchmarks are dead. We’re just vibes-maxxing now. https://x.com/bilawalsidhu/status/1954989971510870255

GPT-5 feels more like a UX improvement, and less like a step change. No one likes picking between a plurality of models. GPT-5 attempts to be a one stop shop that just works.”” / X https://x.com/bilawalsidhu/status/1953587544416796933

gpt-5 pro 💪”” / X https://x.com/gdb/status/1954595363421319222

GPT-5 rollout updates: *We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. *We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for. *GPT-5 will seem smarter starting”” / X https://x.com/sama/status/1953893841381273969

GPT-5 Thinking is not the same for Plus and Pro Users”” / X https://x.com/scaling01/status/1955610515134460285

GPT-5 Thinking limit up to 3000/week for Plus Users I thank you all for the participation in the first ChatGPT Plus rebellion It looks like the civil war has ended. We forced an emergency decision.”” / X https://x.com/scaling01/status/1954609552810459203

Major update for all ChatGPT Plus Users: This chart is no longer valid. Sam has agreed to increase the limit to 3000 Thinking requests per week. You can stop rioting for now. https://x.com/scaling01/status/1954611571923255468

model switching in gpt-5 very cool!”” / X https://x.com/sama/status/1953526708742537220

OpenAI: “”We are going to fix model naming and make it less confusing”” also OpenAI: https://x.com/scaling01/status/1954292296704250005

So we have something like five versions of GPT-5 and now people have to know that GPT-5 Pro is better than GPT-5 Thinking which is better than GPT-5 unless GPT-5 decides to hand off your question to GPT-5 Thinking or GPT-5 Pro which you cannot control or know if it is happening.”” / X https://x.com/emollick/status/1953606039020126347

the /r/chatgpt AMA is mostly people begging for gpt-4o back because of it’s personality… really not what i expected! https://x.com/typedfemale/status/1953840064175517810

The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation. https://x.com/emollick/status/1954210778321465634

To round things up, GPT-5 is a huge step for democratizing intelligence to the world 700M weekly users now have state-of-the-art AI in their pockets and can do/learn/build wildly more things than ever before We’re living in incredible times–go build something this weekend :^)”” / X https://x.com/rowancheung/status/1953505530283975087

today we are significantly increasing rate limits for reasoning for chatgpt plus users, and all model-class limits will shortly be higher than they were before gpt-5. we will also shortly make a UI change to indicate which model is working.”” / X https://x.com/sama/status/1954602880952115603

Updates to ChatGPT: You can now choose between “Auto”, “Fast”, and “Thinking” for GPT-5. Most users will want Auto, but the additional control will be useful for some people. Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking”” / X https://x.com/sama/status/1955438916645130740

We spent the last few weeks taste-testing GPT-5, getting weird with it, pushing its limits, and built something to match the brilliance (and weirdness) we found. Introducing a fully revamped Vibe Check: – A deep-dive review of the model – Hot takes from our team – Internal https://x.com/lucas__crespo/status/1953510123797983569

Why is it the right move? Seriously? 1. Models already think more for harder problems in reasoning mode. 2. You could just always have it try to reason, then itll never fail you in case it needs to. 3. Any time an answer isnt satisfactory if you didnt have reasoning on, you”” / X https://x.com/Teknium1/status/1954519089902473436

GPT-5 for Computer-Use agents. Same tasks, same grounding model – we just swapped GPT-4o → GPT-5 as the thinking model. Left = 4o, right = 5. Watch GPT-5 pull away. 1/2 https://x.com/trycua/status/1953583236501631084

gpt-5 for immunology:”” / X https://x.com/gdb/status/1955445380310802845

gpt-5 for long context reasoning:”” / X https://x.com/gdb/status/1953747271666819380

gpt-5 for math research:”” / X https://x.com/gdb/status/1955662632771522650

gpt-5 for vibe coding whole applications:”” / X https://x.com/gdb/status/1954706670267035999

My first project at OpenAI involved teaching our models to reason and use tools by improving their competitive programming skills. Back then, GPT-4 struggled with even the simplest Codeforces problems, often oom-ing in the sandbox. It’s incredible to see that just 2.5 years”” / X https://x.com/ahelkky/status/1954973043320819907

New ChatGPT model selector. We are back to where we started 🙂 Nice to see cleaner naming though. I assume auto routing will get better over time, but for now I default to GPT-5 thinking for most queries and GPT-4.5 for writing tasks. https://x.com/bilawalsidhu/status/1955732509377089786

GPT-5 is pretty good at coding I kept adding features expecting something to break, but it just kept chugging along. I added music and sound using ElevenLabs. I *actually* enjoyed playing this game https://x.com/WesRothMoney/status/1953921754105299092

GPT-5 with high reasoning effort on SimpleBench https://x.com/scaling01/status/1953771276549358041

gpt-5: our smartest, fastest, and most useful model to date. it’s also incredible at coding. rolling out to everyone (excitingly including free ChatGPT users!) today.”” / X https://x.com/gdb/status/1953509854603358597

I had access to GPT-5. I think it is a very big deal as it is very smart & just does stuff for you Full write up in comments, but this is “make a procedural brutalist building creator where i can drag and edit buildings in cool ways”” & “”make it better”” a bunch. I touched no code https://x.com/emollick/status/1953502029126549597

I saw a lot of people complaining about 32k context size in ChatGPT for plus users, which would be terrible for coding. But actually we are giving 196k context size for plus users when using GPT5 thinking and that’s the model you should use for coding use-cases! 32k is for the”” / X https://x.com/yanndubs/status/1955194413283737716

I suspect this is right. And I wouldn’t be surprised if the vast majority of the 700M users of ChatGPT already greatly prefer GPT-5 & that the opinion on X is not reflective of the typical experience. (Which doesn’t mean that the issues identified here aren’t very real)”” / X https://x.com/emollick/status/1954442950491902393

I used GPT-5 to leverage trade memes. From open to taking profit to setting stop losses, I followed every choice it made. Of course, I used @wasabi_protocol and decided to do this all with $troll on 3x leverage, which was freshly listed and giga sending. https://x.com/ChrisCoffeeEth/status/1954282100389281866

I’ve been using gpt-5 for a bit now. This model broke me. It is so good. I didn’t know what the price was. I assumed it would be o3-pro priced because it is that smart. Nope. Truly insane. Videos coming very soon. https://x.com/theo/status/1953507203979391011

If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly”” / X https://x.com/sama/status/1954703747495649670

Important GPT-5 PSA; if you want an answer that is maximally correct, do tell the model to think hard in your prompt. It literally will do so clearly we failed to communicate this well, apologies for that”” / X https://x.com/ericmitchellai/status/1954418339536683078

LLM meets analog. Turns out LLMs are a great brainstorming partner for synth patches. This was co-created with gpt-5 🎵 https://x.com/martin_casado/status/1953868101596192850

Let’s take a look into GPT-5’s record-setting performance on FrontierMath. How did it perform on the holdout vs. non-holdout set, how did it do across tiers, and what new Tier 4 problems did it solve? 🧵 https://x.com/EpochAIResearch/status/1955667249252978741

initial gpt-oss download stats looking exciting!”” / X https://x.com/gdb/status/1954992508964155587

i thought the transformers gpt-oss MoE finetuning was broken, how did you get it working?”” https://x.com/jxmnop/status/1955347764130254863

GPT-5-high is pretty good at competitive programming Just a 700 point gap in rating between GPT-5 and Gemini 2.5 Pro https://x.com/scaling01/status/1955053949637021732

my gpt-oss MFUmaxxer PR is here! ✅ cat/splice sink -> flexattn ✅ sin/cos pos embs -> complex freqs_cis ✅ moe for-loop -> grouped gemm ✅ checkpoint conversion ✅ matches huggingface fwd pass currently adding parallelism and ensuring training steps healthily ⬇️”” / X https://x.com/khoomeik/status/1955433361402724679

tldr: Fireworks, Deepinfra, and TogetherAI are the accurate inference providers for hosting gpt-oss-120b.”” / X https://x.com/jeremyphoward/status/1955438370274087369

The first thing that is immediately noticeable about GPT-5 is the ability to code good front-end/UI GPT-5 generated this fully functioning budgeting app in one shot with ~1000 lines of code, and made it Tetris-themed It even added the sound effects https://x.com/rowancheung/status/1953502382681198610

My “Move 37” Moment with GPT-5 Today, I’m sharing one of my most remarkable experiences testing the GPT-5 Thinking and Pro models. In our lab, about 2 years ago we conducted a series of cutting-edge immunology experiments designed to manipulate the energy metabolism of T https://x.com/DeryaTR_/status/1954354352648225235

OpenAI’s o3 Crushes Grok 4 In Final, Wins Kaggle’s AI Chess Exhibition Tournament – Chess.com https://www.chess.com/news/view/kaggle-game-arena-chess-2025-day-3

when you get access to gpt-5, try a message like “”use beatbot to make a sick beat to celebrate gpt-5″”. it’s a nice preview of what we think this will be like as AI starts to generate its own UX and interfaces get more dynamic. it’s cool that you can interact with the https://x.com/sama/status/1953529799219319205

Breaking: GPT-5 ranked 🥇 on Humanity’s Last Exam and 🥈 on MultiChallenge SEAL Leaderboards. https://x.com/scale_AI/status/1953591873031090505

GPT-5-Pro is finally rivalling Elon Musk’s 148 IQ https://x.com/scaling01/status/1955344356547653773

Overall, the general vibes of GPT-5 feel much more human-like It’s hard to measure *vibes*, but the combination of speed, lower hallucination rate, and intelligence is very noticeable As a power user, I’ve always enjoyed o3, but the speed makes it impossible for daily queries”” / X https://x.com/rowancheung/status/1953505371487600877

Our team spent some time benchmarking the GPT-5 models on one-shot document understanding capabilities. Sharing some WIP results 💡: 1️⃣ GPT-5 mini does a good job. From initial testing it edges out Sonnet and Gemini models. 2️⃣ Surprisingly GPT-5 is middle of the pack (and also https://x.com/jerryjliu0/status/1954293351702036712

unicorn-drawing eval from gpt-3.5 to gpt-5:”” / X https://x.com/gdb/status/1953528523437600892

This is exactly what @javirandor did with chatgpt last year to get it to spit out memorized training data: https://t.co/dtFpAAw6ps There’s probably some interesting stuff to study on such “”re-based”” models”” / X https://x.com/florian_tramer/status/1955510942252572946

I’ve said it a hundred times but I’ll keep saying it: AI adoption and behavior change are slow — and will stay slow — no matter how fast capabilities improve. The stat in the screenshot is worth pondering: nearly a year after the release of “”thinking”” models, only a tiny fraction https://x.com/random_walker/status/1954912993747128554

This is what I’d expect to see if OpenAI had made general progress on fighting sycophancy and manipulation. :/ If that’s in fact what happened, OpenAI made that choice rightly.”” / X https://x.com/ESYudkowsky/status/1953885007971586089

U.S. Govt gets ChatGPT for a buck https://www.therundown.ai/p/us-government-gets-chatgpt-for-a-buck

1/ As early-access partners with @OpenAI, we’ve been stress testing GPT‑5 for weeks on the most complex financial analyst agent tasks. It’s not hype. It’s incredible. Investing & banking jobs will never be the same. 👇🧵 https://x.com/HebbiaAI/status/1953501578968605039

@SebastienBubeck Heads up, I’m fairly certain that the o3 run being compared to did not have the google search tool – which is important, since Bulbapedia gives the solution in one tool-use call to some puzzles that can take a much longer time if solved without solution info”” / X https://x.com/kiranvodrahalli/status/1956044490885751273

You can run gpt-oss-20B on Google Colab thanks to @pcuenq @reach_vb 🤯 https://x.com/fdaudens/status/1953420511137931342

our livestream tomorrow at 10 am PDT will be longer than usual, around an hour. we have a lot to show and hope you can find the the time to watch!”” / X https://x.com/sama/status/1953296157586931745

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team : r/ChatGPT https://www.reddit.com/r/ChatGPT/comments/1mkae1l/gpt5_ama_with_openais_sam_altman_and_some_of_the/

Somewhat related to the data constraints: I’m not a huge fan of synthetic data because we are doing it wrong and you can feel it when using Phi models or the newer OpenAI models, especially the open source ones. The synthetic data is too clean and uniform. It makes the models”” / X https://x.com/scaling01/status/1954689516314435767

RT @ggerganov: whisper.cpp is coming to ffmpeg https://x.com/ggerganov/status/1955161982023131645

“Narrate a regency romance like a machine learning model being trained. Each step of the story must be described as data inputs, probability weights & optimization functions. The romance must emerge through correlation coefficients & unexpected variables in the training data” https://x.com/emollick/status/1955217534812082574

We’ve scored highly enough to achieve gold at this year’s IOI online competition with a reasoning system — placing #6 when ranked with humans and #1 when ranked with other AIs. In just a few weeks: • 2nd at AtCoder • Gold medal-level at IMO • Gold medal-level at IOI”” / X https://x.com/OpenAI/status/1954969035713687975

Which OpenAI model is currently your daily driver?”” / X https://x.com/bilawalsidhu/status/1955724935705309642