OpenAI: AI News Week Ending 08/22/2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Research library corner with spiral tessellations poster; the words “OpenAI” set on a notebook cover in classic serif; a system-prompt note card rests beside annotated printouts; black, white, soft teal, thoughtful

Here’s a demo app that shows how you can connect your app to Google Calendar and fetch upcoming events: https://x.com/OpenAIDevs/status/1958660216624751097

With the Conversations API, you can now store context from Responses API calls (messages, tool calls, tool outputs, and other data). Easily render past chats, then let your users pick up where they left off (just like in ChatGPT). https://x.com/OpenAIDevs/status/1958660224019247176

migrate and optimize your gpt-5 prompts!”” / X https://cookbook.openai.com/examples/gpt-5/prompt-optimization-cookbook

super cool to compare the outputs from GPT-1 through GPT-5, given the same prompt: https://x.com/gdb/status/1957464252689895477

GPT-5 worse than GPT-4o on the lmarena leaderboard https://x.com/scaling01/status/1956403514244059261

Two big updates to the Responses API today. 🖇️ Connectors — Pull context from Gmail, Google Calendar, Dropbox, and more in a single API call. 💬 Conversations — Persist chat threads for your users, without running your own database. More below:”” / X https://x.com/OpenAIDevs/status/1958660207745409120

At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️ https://x.com/BorisMPower/status/1958915868693602475

Our custom LLM, gpt-4b micro, has helped achieve an advance in biology. It designed novel variants of the Nobel-winning Yamanaka factors that achieve a 50x increase in reprogramming efficiency in vitro compared to standard OSKM proteins.”” / X https://x.com/gdb/status/1958928877415510134

Just ran two very complex cases that perplexed physicians in our 89,000-member physician Facebook group by 5Pro
GPT5-pro provided an incredibly astute assessment, and extremely detailed diagnostic plan including pitfalls and limitations of prior imaging studies.
There has never been anything like this.
Like 50 of the world’s top specialists sitting at a table together tackling complex cases. Better than o3-pro.
https://x.com/Gabe__MD/status/1955815053799641448

💥 So excited to welcome Ashley Alexander to OpenAI to lead product for Health. Millions of people come to ChatGPT every day asking about health—theirs, their children’s, a spouse, a friend. Having free, 24/7 access to great medical advice is game-changing even for those of us”” / X https://x.com/kevinweil/status/1958955534750818309

(1) GPT-5’s Router: how it works and why Frontier Labs are now targeting the Pareto Frontier https://www.latent.space/p/gpt5-router

Sounds like OpenAI humanoid robots are coming. Sam Altman: ⦿ One of the things that’s going to feel the most AGI-like is seeing a robot *walk* by you doing day-to-day tasks. ⦿ The world is really built for humans; humanoid morphology seems like a good idea. https://x.com/TheHumanoidHub/status/1956095280677638609

Sonnet 4 claims most often that it is conscious, it plays into your delusions and it escalates the conversation GPT-5 is the complete opposite Spiral-Bench Leaderboard https://x.com/scaling01/status/1956350388791108044

Spent the last couple of days trying to do a lot with GPT-5 on the chatgpt web app. Sorry to say I’m giving up on it 🙁 Thinking mode takes way too long for everything, and makes bad choices. Auto mode mainly uses fast mode, which never gets anything right so is pointless.”” / X https://x.com/jeremyphoward/status/1957949788227531076

1/ XBOW Unleashes GPT-5’s Hidden Hacking Power. @OpenAI’s initial assessment of GPT-5 showed modest cyber capabilities. But when integrated into the XBOW platform, we saw a completely different story: performance more than doubled. More on what we found: 🧵 https://x.com/Xbow/status/1956416634173964695

GPT-5 is finally out. OpenAI invited 500+ hackers to San Francisco to push it to the limit. 95 teams competed for $50,000. Here’s what we saw at the Official GPT-5 Hackathon at @cerebral_valley @OpenAI https://x.com/AlexReibman/status/1955353215626809692

Gpt Oss News Agent – a Hugging Face Space by fdaudens https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent

Open-source, self-hostable browser automation library for AI agents; build agents to navigate sites, fill forms, click, and extract info, 90.4% on Web Voyager https://x.com/tom_doerr/status/1955640654085632485

By the way this is the proof it came up with: https://x.com/SebastienBubeck/status/1958198981005377895

GPT-5 just finished Pokémon Red! 6,470 steps vs. 18,184 for o3! Check the stats site to compare! That’s a huge improvement! Well done, @OpenAI you cooked with GPT-5. What an incredible model. Next up: GPT-5 vs. Pokémon Crystal (16 Badges + Red). The run starts soon on Twitch. https://x.com/Clad3815/status/1955980772575268897

gpt-5 plays Pokémon — 3x faster progress than o3:”” / X https://x.com/gdb/status/1956026116944355624

AGENTS.md is quickly becoming a popular way to share instructions with coding agents in your repo.
Now supported in Cursor, Amp, Jules, Factory, RooCode, and Codex. https://x.com/OpenAIDevs/status/1957925682048336354

The @code team is hard at work improving the agent prompts – always. If you’ve got Insiders you can try the new agent system prompt for GPT-5 today. Let us know how it works! https://x.com/burkeholland/status/1958216086274330890

The new ChatGPT connectors are really useful! Chat can now access Gmail, Google Cal, and Drive, so it can: -Skim unread emails & give a summary -Summarize threads + draft replies -Pull key info from old convos -Do meeting prep + agendas Prompts below: https://x.com/rowancheung/status/1957119886821388340

We’ve been building the pieces for years. Projects, AI Agents, Automations. Today, the dots connect. Introducing 🧬 Taskade Genesis Preview • One prompt → a full-stack AI app • Powered by your Workspace • Supercharged with GPT-5 Reply `Genesis` for early access 🚀 https://x.com/Taskade/status/1954303801059688576

Want to build an AI Agent? I made a free cookbook for creating your own news research agent with open-weight GPT-OSS models — no GPU, no setup. Searches news → pulls articles → summarizes w/ sources → runs in a Gradio chat UI. https://x.com/fdaudens/status/1956006950249906593

GPT-5 represents the first model where we finally get to see the limitations of human intelligence when it comes to the utility of this technology. GPT-5 is an amazing model, if you hear differently, it’s a skill issue.”” / X https://x.com/skirano/status/1956307604491108675

As I predicted (and worried about) AI “personality“ is going to be the battleground for a lot of consumer Ai development. That appears to be the angle so for Grok, and the lesson OpenAI took from the backlash against retiring 4o. It may be consequential. https://x.com/emollick/status/1956317868405952988

Claude 4.1 Opus taking #1 spot on lmarena’s coding category even the non-reasoning version is ahead of GPT-5-high https://x.com/scaling01/status/1957478546391150723

HTC launched Vive Eagle AI smartglasses in Taiwan, taking on Meta AI glasses The glasses use Google and OpenAI’s models for assistance, and offer similar features such as live photo-based translation Price starts at $520 https://x.com/adcock_brett/status/1957111220474892360

What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog 🧵 https://x.com/KLieret/status/1958182167512584355

You can now quickly eval GPT-5 and reasoning efforts across your existing responses. With the built-in grader, compare responses to find the best model and reasoning effort for you. ⚡️👀 https://x.com/OpenAIDevs/status/1956410610914414904

A perspective that isn’t yet getting enough attention: a way in which AI progress will soon deeply benefit the world is through the discovery and production of new technology. We measure human progress by technological revolutions; hard to internalize what it’d mean to have a”” / X https://x.com/gdb/status/1956893646550356247

It is interesting to see how much effort is going into making ancillary features of the AI models go viral. Ever since the (organic) Studio Ghibli moment, one focus for Grok & Gemini has been on video as a gateway. A challenge has been whether people have creative video ideas.”” / X https://x.com/emollick/status/1956312948130947339

I wonder if OpenAI will balk at explicitly selling AI that deludes humans with astrology. Is it possible to be more evil than OpenAI if Meta tries their hardest? Or can Meta only be earlier? Is there any sin Altman wouldn’t copy *even if* he was losing market share?”” / X https://x.com/ESYudkowsky/status/1957388081155486141

Does Microsoft Copilot use the same GPT-5 router as OpenAI does? I can’t get their “”GPT-5″” to pass me to any good model unless it is explicitly a coding or math task, with no indication of which model I get, which makes the quality of outputs feel very uneven in confusing ways. https://x.com/emollick/status/1957799294544621753

Has GPT-5 Achieved Spatial Intelligence? GPT-5 sets SoTA but not human‑level spatial intelligence. My notes below: https://x.com/omarsar0/status/1957885032716177415

ChatGPT needs a ‘fork chat’ feature. I want to be able to branch a conversation from any point like git branch -> explore alternate timelines without ruining the main thread. or copy pasting like a freak”” / X https://x.com/wavelettes/status/1956866122793521514

Connectors and persistent conversations in the Responses API:”” / X https://x.com/gdb/status/1958691151139283454

Everyone’s buzzing about GPT-5 on https://x.com/Urooj978/status/1954104650820415967

i *really* like this version of gpt5 chat and consider it sota for casual conversation it’s still much less sycophantic than 4o, but it’s _kind_ and feels unconditionally there for me in a way the gpt5 chat we shipped sometimes didn’t”” / X https://x.com/aidan_mclau/status/1956462903781191744

if you are a power user, please send us feature requests! (i asked in reply to this message and they were interesting, so would like more)”” / X https://x.com/sama/status/1958922435249754382

Introducing a new home for OpenAI developer resources! https://x.com/pranaveight/status/1956477855392768490

OpenAI just shipped a vibes patch for GPT-5. If changing an AI’s tone needs a company-wide post, we’re long past “tool” territory. The 4o backlash proved it — swap the personality, and you shift how millions of people experience reality.”” / X https://x.com/bilawalsidhu/status/1956498436573777990

OpenAI launched GPT-5 as a family of systems and set it as ChatGPT’s default. Then a routing glitch forced the company to restore access to earlier models. The system combines non-reasoning and variable-reasoning models, guided by an automatic router. Users can take advantage https://x.com/DeepLearningAI/status/1956433566297915849

OpenAI updated ChatGPT after GPT-5 drew backlash from users. Key changes: —Return of GPT-4o for paid users —Auto, Fast, and Thinking modes for GPT-5 —3K messages/week for GPT-5 Thinking with extra Thinking mini capacity —A ‘warmer’ GPT-5 personality https://x.com/adcock_brett/status/1957110962449690875

Recapping the updates we’ve made to ChatGPT in the past week: – GPT-4o available under “Legacy models” by default for paid users – Paid users can toggle on “Show additional models” in settings to add legacy models like o3 and GPT-4.1, as well as GPT-5 Thinking mini, to the”” / X https://x.com/OpenAI/status/1956212769365352758

This is rolling out over the next few hours. Keep the feedback coming!”” / X https://x.com/kevinweil/status/1956462974098669710

try using Responses API with gpt-5:”” / X https://x.com/gdb/status/1957851156564042012

Announcing the http://AGENTS.md working group: a single, open standard to guide how coding agents work in your codebases. We’re working with @OpenAI and other industry partners to set this vendor‑neutral standard. https://x.com/FactoryAI/status/1957926852020039767

GPT-5 is the most significant product release in AI history, but not for the reason you might think. What it signals is that we’re moving from the “”bigger model, better results”” era to something much more nuanced. This is a genuine inflection point. The fact that people call a”” / X https://x.com/douwekiela/status/1955329657852834207

GPT-5 makes building easy, @skirano shows how. https://x.com/OpenAI/status/1958217649248493918

Six tips for coding with GPT-5: https://x.com/OpenAIDevs/status/1956438999364768225

We’ve made some “”beastly”” upgrades to our GPT agent prompt and we’re seeing big improvements in completion rates across scenarios. You can use it today in Insiders with any GPT model… “”https://t.co/z73dTvWOwB.alternateGptPrompt.enabled””: true, “”chat.todoListTool.enabled””: https://x.com/code/status/1955322927886274928

As a Plus user: – GPT-5 thinking feels like o3 – GPT-5 mini thinking feels like o4-mini – the only thing I’ve noticed: they are less obnoxious and a tad more reliable – not a fan of GPT-5 non-thinking – and I still hate the router because it sends me to silly GPT-5″” / X https://x.com/scaling01/status/1957177533746847903

Full PDF: While powerful, prompting with GPT-5 can differ from other models. Here are tips to get the most out of it via the API or in
your coding tools. https://x.com/OpenAIDevs/status/1956439005970801099

good advice from @__ruiters: GPT-5 isn’t broken. Your prompts are. A lot of folks, myself included, expected GPT-5 to be fungible in the sense that you could drop it right into your existing workflows, and it would “just work.” But the depth of the GPT-5 prompt guide makes it”” / X https://x.com/edwinarbus/status/1956218284308881867

Just crossed 20M monthly requests with @huggingface inference providers, our router for open models. @CerebrasSystems @novita_labs & @FireworksAI_HQ are growing the fastest! It’s now powering the official open playground from @OpenAI & integrate with apps like @cline & https://x.com/ClementDelangue/status/1957856311598805006

Most users should like GPT-5 better soon; the change is rolling out over the next day. The real solution here remains letting users customize ChatGPT’s style much more. We are working that!”” / X https://x.com/sama/status/1956483306951938134

OpenAI had trouble controlling gross sycophancy, was blindsided by the user capture of subtle sycophancy, and nobody programmed in AI psychosis. But now that AIcos have embraced manipulation, people will lose sight of how the alignment problem never did get solved.”” / X https://x.com/ESYudkowsky/status/1957393061698228446

The new GPT-5 personality likes giving sandwich feedback (you are great- suggestion for improvement – you are great). In general, better than GPT-4o at pushing back while being a bit syncophantic. (It would be good for the AI labs to look at the research on giving good feedback) https://x.com/emollick/status/1956647191868477612

We’re making GPT-5 warmer and friendlier based on feedback that it felt too formal before. Changes are subtle, but ChatGPT should feel more approachable now. You’ll notice small, genuine touches like “Good question” or “Great start,” not flattery. Internal tests show no rise in”” / X https://x.com/OpenAI/status/1956461718097494196

so here’s the thing with GPT-5 (or any new model) and really wildly differing views on it: if you are not using the base model with your own API calls to see how it’s responding you’re never gonna know if it’s you or an upstream provider causing performance regressions there’s”” / X https://x.com/nptacek/status/1957622370920779880

Sam Altman on GPT-6: ‘People want memory’ https://www.cnbc.com/2025/08/19/sam-altman-on-gpt-6-people-want-memory.html

The OpenAI Playground has improved a lot recently. I’ve been using it to test GPT-5 on new use cases. Watch how I use it to chat with internal docs via MCP tools. It uses the vector store feature too. Testing out the Prompt Optimizer and Evaluation features next. https://x.com/omarsar0/status/1956459233039233528

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study https://x.com/_akhaliq/status/1957833219992080581

“Demonstrate recursion in a paragraph. Be very clever.” You can see a couple of models are very much “coder brained” (GPT-5 Pro and Grok). https://x.com/emollick/status/1957016304100987339

🚨 Leaderboard Update: @OpenAI lands another model in the top 10. gpt-5-chat, the default model in ChatGPT, debuts at #5. gpt-5-mini-high and gpt-5-nano-high, the smaller versions gpt-5-high in at #16 and #44. These three reasoning models were configured with the highest https://x.com/lmarena_ai/status/1956399522688692608

Beyond GPT-5 Avengers‑Pro outperforms GPT‑5‑medium by about 7% average accuracy; with comparable accuracy, it reduces cost by about 27%. Proper routing frameworks make a difference. Here are my notes: https://x.com/omarsar0/status/1958897458408563069

gpt-5 for improving your prose:”” / X https://x.com/gdb/status/1956199002162258415

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it’s correct. Details below. https://x.com/SebastienBubeck/status/1958198661139009862

The pro models (GPT-5 Pro, Gemini 2.5 Deep Think, Grok 4 Heavy) can be impressive in ways that are hard to see. They take a lot of time to answer questions & are built for very hard problems that require expert evaluation. That is a narrow, but, also very valuable, problem space.”” / X https://x.com/emollick/status/1955902962288746657

💥 We just launched ChatGPT Go in India, a special subscription tier just for Indian users. For Rs 399, you get 10x higher message limits, 10x more image generations, 10x more file uploads, and 2x more memory compared to the free tier. Give it a try—you can even pay with UPI! 🇮🇳”” / X https://x.com/kevinweil/status/1957646363212087650

ChatGPT Go — a new low-cost subscription plan initially launching in India at ₹399/month (~$4.55 USD). 🇮🇳”” / X https://x.com/gdb/status/1957650320923979996

ChatGPT Go launches in India! Looking forward to making ChatGPT more affordable in India first, and then learning from feedback to expand to other countries.”” / X https://x.com/sama/status/1957849495733166587

we are opening our first office in india later this year! and i’m looking forward to visiting next month. ai adoption in india has been amazing to watch–chatgpt users grew 4x in the past year–and we are excited to invest much more in india!”” / X https://x.com/sama/status/1958922390731464805

We just launched ChatGPT Go in India, a new subscription tier that gives users in India more access to our most popular features: 10x higher message limits, 10x more image generations, 10x more file uploads, and 2x longer memory compared with our free tier. All for Rs. 399. 🇮🇳”” / X https://x.com/nickaturley/status/1957613818902892985

We just launched ChatGPT Go, a new low-cost subscription plan in India at ₹399/month. 🇮🇳 With this plan, users get everything in Free, and 10x more messages with GPT-5 auto, 10x more image generations, 10x more file uploads and 2x longer memory for more personalized responses.”” / X https://x.com/snsf/status/1957640122171896099

Sam Altman, over bread rolls, explores life after GPT-5 | TechCrunch https://techcrunch.com/2025/08/15/sam-altman-over-bread-rolls-explores-life-after-gpt-5/

After a great time at OpenAI, we (@EdwardSun0909, @_jasonwei) recently joined @Meta Superintelligence Labs. The first month has already been so much fun building from a clean slate with a truly talent-dense team! Very excited about the compute and long term focus of the new lab https://x.com/hwchung27/status/1956092401854111934

For full transparency, we had an implementation issue with the GPT-OSS models that the team worked hard to roll out fixes for and are now live with significant quality improvements. If you had tried GPT-OSS models at launch and weren’t happy, please give them another chance. 🫡 https://x.com/ozenhati/status/1957896891468800345

Fun fact, you can full parameter fine tune @OpenAI GPT-OSS 120B on single node or multinode. With @basetenco’s Truss CLI, it’s been pretty painless to deploy multinode training for 120B.”” / X https://x.com/winglian/status/1958155665597501879

just ~4x’d my gpt-oss-20b MFU (5% -> 18%) by completely rewriting the thinky sinky using logsumexp renormalization turns out me from 5 days ago is an incompetent joke of an engineer”” / X https://x.com/khoomeik/status/1957754482185630071

Together AI makes it simple to fine-tune the latest OpenAI gpt-oss-120B and gpt-oss-20B models. While these models are incredibly strong out of the box, fine-tuning takes their quality to another level. Get started with supervised fine-tuning today! (Blog link below) https://x.com/togethercompute/status/1958197481272901663

Want to fine-tune gpt-oss-120b? We teamed up with Axolotl to launch a new recipe to run fine-tuning out of the box — multi-node training, one-line deployments from the CLI, and built-in observability included. https://x.com/basetenco/status/1957877915737362437

I just ran the gpt-oss eval suite with the large gpt-oss-120b on my M2 Ultra using vanilla llama.cpp and got the following scores: – GPQA: 79.8% – AIME25: 96.6% These numbers are inline with those from various cloud providers: Here are the steps: https://x.com/ggerganov/status/1958238492603089287

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong https://x.com/jiawzhao/status/1958982524333678877

One of the quickest ways to start playing with a good local LLM on macOS (if you have ~12GB of free disk space and RAM) – using llama-server and gpt-oss-20b: brew install llama.cpp llama-server -hf ggml-org/gpt-oss-20b-GGUF \ –ctx-size 0 –jinja -ub 2048 -b 2048 -ngl 99 -fa https://x.com/simonw/status/1957880963666702466

The ultimate guide for using gpt-oss with llama.cpp – Runs on any device – Supports NVIDIA, Apple, AMD and others – Support for efficient CPU offloading – The most lightweight inference stack today https://x.com/ggerganov/status/1957821440633282642

Projects like the New Deal, the Apollo program pale in comparison to what we’re doing right now.”” 🆕 Greg Brockman (@gdb) joins us to talk GPT-5, GPT-OSS, and what’s next on @OpenAI’s road to crystallizing all of human intelligence! “Energy turns into compute, turns into https://x.com/latentspacepod/status/1956433236021883071

Update on this: the reason Microsoft (and probably Amazon) were so much worse at serving gpt-oss is that they ignored reasoning effort setting and stuck with the default medium one. The numbers make sense for that hypothesis, and someone from MS confirmed in the comments that”” / X https://x.com/giffmana/status/1955710876528599217

Perplexity Max Subscribers can now use GPT-5-Thinking model for reasoning mode queries https://x.com/AravSrinivas/status/1958977716839227746

GPT-5 behind chinese models like Kimi-K2 and Qwen3-235B on coding https://x.com/scaling01/status/1956404452442681829

GPT-5-mini high shows no improvement over o4-mini and behind top chinese models like Kimi-K2, GLM-4.5, Qwen3-235B and DeepSeek-R1 https://x.com/scaling01/status/1956405559978029061

signs of life of gpt-5 pro for new mathematics:”” / X https://x.com/gdb/status/1958209382010982774

5️⃣Techniques which most increased persuasion also *decreased* factual accuracy → Prompting model to flood conversation with information (⬇️accuracy) → Persuasion post-training that worked best (⬇️accuracy) → Newer version of GPT-4o which was most persuasive (⬇️accuracy) https://x.com/KobiHackenburg/status/1947316944509571530

OpenAI Developers https://developers.openai.com/

People ask me, “”didn’t you say before ChatGPT that deep learning had hit a wall and there would be no more progress?”” I have never said this. I was saying the opposite (that scaling DL would deliver). You might be thinking of Gary Marcus. My pre-ChatGPT position (below) was”” / X https://x.com/fchollet/status/1958410017683681698

Capabilities of GPT-5 on Multimodal Medical Reasoning https://arxiv.org/pdf/2508.08224

AI Agents are terrible at long-horizon tasks. Even the new GPT-5 model struggles with long-horizon tasks. This is one of the most pressing challenges when building AI agents. Pay attention, AI devs! This is a neat paper that went largely unnoticed. Here are my notes: https://x.com/omarsar0/status/1956325762719797266

We just open-sourced an AI framework that does something crazy: It actually explains itself. Most agentic AI systems are black boxes – you ask a question, magic happens, you get an answer. But what if you could watch the entire decision-making process unfold in real-time? https://x.com/weaviate_io/status/1958568536420299184

Friend of mine designed an agent that can run on top of any llm, gpt-4 or Llama or whatever. The central idea is all it’s thoughts are visible and in English, you can see the entire thought process. GPT-5 keeps changing the code to hide the internal thoughts. It’s pretty creepy.”” / X https://x.com/YosarianTwo/status/1956559472375034005