Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A cinematic photograph of a professional figure in modern attire standing at the entrance of a grand stone archway with classical legal carvings and subtle circuit patterns, holding a softly glowing blue orb, warm golden light spilling from the archway interior meeting cool technological light, regal and judicial atmosphere, shallow depth of field focusing on the threshold moment.

GPT-5-Codex is here: a version of GPT-5 better at agentic coding. It is faster, smarter, and has new capabilities. Let us know what you think! The team has been absolutely cooking, very fun to watch.”” / X https://x.com/sama/status/1967650108285259822

How GPT5 + Codex took over Agentic Coding — ft. Greg Brockman, OpenAI https://www.latent.space/p/gpt5-codex

Sweet! GPT-5-Codex seems to make Codex more steerable and optimized for agentic coding in larger codebases. https://x.com/omarsar0/status/1967640731956453756

We’re releasing GPT-5-Codex — a version of GPT-5 further optimized for agentic coding in Codex. Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github. https://x.com/OpenAI/status/1967636903165038708

GPT-5 Codex – from code suggestions to coding agents, with no waste of tokens. Some developers complain that Codex feels longer (though smarter) than Claude Code – but that’s actually the whole point. > Codex has been trained to spend its effort where it matters. > It doesn’t https://x.com/TheTuringPost/status/1967882454351405314

GPT-5-Codex is 10x faster for the easiest queries, and will think 2x longer for the hardest queries that benefit most from more compute. https://x.com/polynoamial/status/1967667644905251156

We are witnessing an incredible level of efficiency in reasoning models. Faster and more efficient reasoning models are on the rise. First, GPT-5 (and GPT-5-Codex) with remarkably efficient token use, and now Gemini 2.5 Deep Think, achieving gold-medal level performance at the https://x.com/omarsar0/status/1968378996573487699

We trained gpt-5-codex to be great at both responsive and mobile front-ends. Here’s a thread of some examples: “”Make a pixel art game where I can walk around and talk to other villagers, and catch wild bugs.”” https://x.com/OpenAIDevs/status/1968065647541440879

Introducing upgrades to Codex | OpenAI https://openai.com/index/introducing-upgrades-to-codex/

this is the most important chart on the new gpt-5-codex model We are just beginning to exploit the potential of good routing and variable thinking: Easy responses are now >15x faster, but for the hard stuff, 5-codex now thinks 102% more than 5. Same model, same paradigm, but https://x.com/swyx/status/1967651870018838765

Codex for modernizing code:”” / X https://x.com/gdb/status/1967783077561926137

Detecting and reducing scheming in AI models | OpenAI https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We’ve identified—and they’ve patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6 https://x.com/alxndrdavies/status/1966614120566001801

Their ongoing testing of models like Claude Opus 4 and 4.1 has helped us find vulnerabilities and build strong safeguards before deployment. Read more: https://x.com/AnthropicAI/status/1966599337426681899

Our collaboration with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI) shows the importance of public-private partnerships in developing secure AI models.”” / X https://x.com/AnthropicAI/status/1966599335560216770

Reasoning models (apparently without tool use) scored #1 (OpenAI) & tied for #2 (Google) in the International Collegiate Programming Contest Its been one year since reasoners were first announced, it is genuinely surprising how good they have gotten at hard problems, so quickly https://x.com/emollick/status/1968402884627697950

Last week, our reasoning models took part in the 2025 International Collegiate Programming Contest (ICPC), the world’s premier university-level programming competition. Our system solved all 12 out of 12 problems, a performance that would have placed first in the world (the best”” / X https://x.com/merettm/status/1968363783820353587

Our general-purpose reasoning models solved all 12 problems at the 2025 International Collegiate Programming Contest (ICPC) World Finals, the world’s top university programming competition which was enough for a 1st-place human ranking.”” / X https://x.com/OpenAI/status/1968368133024231902

1/n I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have https://x.com/MostafaRohani/status/1968360976379703569

Building towards age prediction | OpenAI https://openai.com/index/building-towards-age-prediction/

OpenAI to Gain $50 Billion From Cutting Revenue Share with Microsoft, Partners — The Information https://www.theinformation.com/articles/openai-gain-50-billion-cutting-revenue-share-microsoft-partners

OpenAI Ramps Up Robotics Work in Race Toward AGI | WIRED https://www.wired.com/story/openai-ramps-up-robotics-work-in-race-toward-agi/

the proof is in the ice cream (GPT telling the difference between ice cream and dogs”” / X https://x.com/gdb/status/1967038157586649118

Xcode 26 is now available on the Mac App Store. Sign in with your ChatGPT account and code with GPT-5 built-in. https://x.com/OpenAIDevs/status/1967704919487729753

How money works: 1. OpenAI signs $300B GPU deal with Oracle 2. Larry gains $100B (no GPUs shipped) 3. Larry invests in OpenAI’s $1T round 4. Sam uses $300B to pay Oracle 5. Oracle stock pumps again 6. Larry makes another $100B 7. Larry invests in OpenAI Flywheel go brrr.”” / X https://x.com/Yuchenj_UW/status/1966553671866687689

when chatgpt said moondream wasn’t a frontier model, i took it personally”” / X https://x.com/vikhyatk/status/1968811248381784167

GPT-5-Codex — big improvement for long-running agentic tasks:”” / X https://x.com/gdb/status/1967639750648750409

$ npm i -g @openai/codex $ codex -m gpt-5-codex”” / X https://x.com/OpenAIDevs/status/1967637842806624370

Codex is the most frustrating experience Ive ever had trying to code with an LLM. I’d rather put all of my codebase into chatgpt app this is absolute trash so far.”” / X https://x.com/Teknium1/status/1967806788084064290

Currently serving gpt-5-codex at degraded speeds, about 2X, due to the very high demand. We are finding GPUs left and right to match demand, we’ll keep you posted”” / X https://x.com/thsottiaux/status/1967996885500928459

GPT-5 update found in Codex-CLI by @OpenAI The new AI model is called “”gpt-5-high-new””. Its description says: “”our latest release tuned to rely on the model’s built-in reasoning defaults””. It’s unclear what the improvements in this new model are. Screenshot by @iannuttall: https://x.com/mark_k/status/1966521489529643169

GPT-5-Codex already ~40% of traffic for codex! should be the majority some time today.”” / X https://x.com/sama/status/1967674950502015165

so codex -m gpt-5-codex works fine but codex -m gpt-5-codex — –reasoning-effort=high doesn’t? what?? “”The model `gpt-5-codex` does not exist or you do not have access to”” the model is good but the harness needs love”” / X https://x.com/finbarrtimbers/status/1968066956193595761

the vibes on codex feel like the first few months of chatgpt. fun energy!”” / X https://x.com/sama/status/1967954997754335680

We’ve reset limits for gpt-5-codex to make up for the slowdowns from earlier today.”” / X https://x.com/OpenAIDevs/status/1968168606828794216

Codex for creating an animated video as a React app:”” / X https://x.com/gdb/status/1967939123391631864

Ok seriously what is wrong with codex? I tried to let it do what it said it wanted and /init – make an agents markdown file. 25 minutes of <something> later and nothing produced? It ended up using 40,000 tokens for .. nothing.. wtf did I just download? Token usage:”” / X https://x.com/Teknium1/status/1967804542357217768

o1 Preview is exactly one year old. I still remember when o1 was still known by its project name Q*; it was a time when rumors were circulating that OpenAI had made a world-changing breakthrough that would change everything. There were concerns that this project posed a threat”” / X https://x.com/kimmonismus/status/1966627812858855624

o1-preview -> GPT 5 pro in a year”” / X https://x.com/gdb/status/1966612991421423814

OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3! https://x.com/PKirgis/status/1966547382033936577

OpenAI has finally fixed their SWEBench errors and we can now finally apples to apples compare their scores over the entire 500 sample set (the fact that it took this long says alot about how much they care about SWEBench internally and maybe there’s a lesson here) https://x.com/nrehiew_/status/1967781400528245221

OpenAI just revealed that they have an internal unreleased SWE-bench-style benchmark for large ‘refactoring’ PRs, like the one mentioned here that edits 3.5k lines across 232 files. Their new model gets 51% accuracy on this benchmark. Who wants to make a public version of this? https://x.com/OfirPress/status/1967652031704994131

OpenAI’s Models Are Getting Too Smart For Their Human Teachers — The Information https://www.theinformation.com/articles/openais-models-getting-smart-human-teachers

GPT-5 is the best model for code quality out there 2 years ago, we created the world’s hardest software design quiz. Only 5 questions, multiple choice. Yet only about 3% of software engineers get them. The average score is somewhere between 2 and 3. Supposedly brilliant models https://x.com/jimmykoppel/status/1968683689421701413

Hey Claude, ChatGPT, Gemini: “”I am time traveling back to the 75 BC Rome for one day. I can’t bring anything back. What is the one thing I could learn that would most advance today’s knowledge and what is one thing I could do there that would make me richest today”” Pretty good https://x.com/emollick/status/1967009330789589077

Evals now support native audio inputs and audio graders. Evaluate model audio responses, with no text transcription needed. Get started in the Cookbook guide: https://t.co/V8qD5XFNqt https://t.co/tZuaCYccnQ&#8221; / X
https://x.com/OpenAIDevs/status/1965923707085533368

Meta and OpenAI said they will tighten child-safety controls in their chatbots after reports of harmful interactions with minors. ♾️ Meta will train assistants on Facebook, Instagram, and WhatsApp to avoid sexual or self-harm talk with teens and to block minors from user-made https://x.com/DeepLearningAI/status/1967749185232355369

With over 4.5k dedicated votes in the Text-to-Image modality, Seedream 4 ranks at #5. 🥇Gemini 2.5 Flash Image is tied with Image 4.0 Ultra Generate for #1. 🥉GPT-Image-1 and Image 4.0 Generate Preview rank tied for #3. Check out the leaderboard details for Image Edit and https://x.com/arena/status/1966562486897029274

🚨 Leaderboard Update: With over 43k votes collected, the community has spoken! 🥈 Seedream 4 by ByteDance has landed at #2 on the Image Edit Leaderboard 🔸 It is also ranked #5 for Text-to-Image Real prompts and votes at scale illustrate sharper confidence intervals and more https://x.com/arena/status/1966562484506230922

🚨New Model update before the weekend 📣 By popular demand, we’ve added a “”High Res”” version of Seedream 4 that supports an output at 4096×4096 dimensions. We’ll see how this version of Seedream 4 stacks up vs. all the other top Image generation models soon. https://x.com/arena/status/1966673628327801255

Prompt for Nano Banana / Seedream: “Imagine what an entity sees that exists outside of time in a higher dimension that can concurrently visualize everything that has ever happened or will ever happen when looking at [insert point of interest]. Now generate that image projected https://x.com/bilawalsidhu/status/1966191138530013661

i suspect society was better off with phone call culture than meeting culture”” / X https://x.com/sama/status/1966899254804574266

Fun to look back at this exploration with Area on the OpenAI brand. This work partially inspired the circle that we use and love in our products.”” / X https://x.com/sama/status/1968357219952660668

Updated the Realtime API docs with several things that were under-specified in the recent API update. – simpler “”unified”” WebRTC API (with samples): https://x.com/juberti/status/1968102280949055543

We’ve heard your feedback that GPT-5 Thinking can sometimes take longer than you’d like.  Now Plus, Pro, and Business users can set the pace to match the moment. Select GPT-5 with Thinking in ChatGPT on web to toggle thinking time in the message composer. – Plus, Pro, Business https://x.com/OpenAI/status/1968395215536042241

When we at @OpenAI released o1-preview a year ago, it would think for seconds. Today, our best reasoning models can think for hours, browse the web, and write code. But there’s a lot of room to push reasoning even further. I’m excited for what the next year will bring!”” / X https://x.com/polynoamial/status/1966527147469598794

The GPT-5 “”router unification”” was completely pointless. There are now more models selectable than before if you include legacy models. before – 7 models GPT-4o GPT-4.1-mini GPT-4.1 GPT-4.5 o4-mini o4-mini-high o3 after “”unification”” – 12 models Auto / GPT-5 Router Instant /”” / X https://x.com/scaling01/status/1968417511017529705

have gpt-5 write the prompts for you:”” / X https://x.com/gdb/status/1966912852687810893

I’m so excited about the new faster default gpt5r which is the best model IMO in ChatGPT (the one I’ve been using internally) What’s more you now have the control of thinking time. The choice is sticky so you can also set your default to *more* thinking time than before.”” / X https://x.com/yanndubs/status/1968400320523821220

I don’t expect that everyone will agree with these tradeoffs, but given the conflict it is important to explain our decisionmaking. Here is the text: Some of our principles are in conflict, and we’d like to explain the decisions we are making around a case of tensions between”” / X https://x.com/sama/status/1967956382646223248

OH SHIT THEY PUT THE ROUTER -IN- THE MODEL LMAO https://x.com/swyx/status/1967691956693373183

We are learning from OpenAI and Anthropic about how people use AI for work. It is primarily for high-level tasks – critical thinking, the interpretation of information, getting/giving advice & being creative (both companies categorize a little differently, but similar patterns) https://x.com/emollick/status/1967800804301283452

Tencent Hires OpenAI Researcher as China Steps Up Talent Search | Mint https://www.livemint.com/technology/tech-news/tencent-hires-openai-researcher-as-china-steps-up-talent-search-11757689035858.html

gpt-5 and gpt-5-mini API rate limits are now more than doubled for these tiers: gpt-5 Tier 1: 30K → 500K TPM (1.5M batch) Tier 2: 450K → 1M (3M batch) Tier 3: 800K → 2M Tier 4: 2M → 4M gpt-5-mini Tier 1: 200K → 500K (5M batch)”” / X https://x.com/OpenAIDevs/status/1966610846559134140

We felt bad about the slowdowns as we were adding GPUs, so we reset everyone’s limits to make up for it:”” / X https://x.com/sama/status/1968316161113882665

Teen safety, freedom, and privacy | OpenAI
https://openai.com/index/teen-safety-freedom-and-privacy/

OpenAI Korea 🇰🇷”” / X https://x.com/gdb/status/1966742816530235803

Shipped. For instance, you can see that gpt-oss-120b is 196 GB right from the “Files” tab https://x.com/mishig25/status/1968598133543256151

tldr: you can RL qwen3 8b to fool gpt-4o that it’s not doing a hidden side task (when it is) this is somewhat surprising given the disparity in model capabilities between an 8b agent and gpt-4o as a relatively strong monitor https://x.com/neev_parikh/status/1967767438243876924

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading