Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A sophisticated two-tiered birthday cake in deep blue and slate grey with crisp white frosting and rich red ribbon details, two tall candles burning brightly on top, minimalist modern styling with high contrast lighting, small AI-themed decorative elements like chat bubbles and creative symbols around the base, colorful confetti, shot in elegant product photography style with dramatic shadows and highlights.

As AI capability increases, alignment work becomes much more important. In this work, we show that a model discovers that it shouldn’t be deployed, considers behavior to get deployed anyway, and then realizes it might be a test.”” / X https://x.com/sama/status/1968674357309223020

I find it unimaginably based that the OAI Evals team keeps making benchmarks finding that Claude is better and publishing it anyway. they are 3 for 3 this year in acknowledging specifically how much Claude is better at tasks OAI care about. there is no sarcasm here folks. this https://x.com/swyx/status/1971404125553242253

NEW: Anthropic web search ✨ OpenRouter now uses the native web engines for OpenAI and Anthropic models by default For all other models, our custom web search will be used, powered by @ExaAILabs Configurable! 👇 https://x.com/OpenRouterAI/status/1968360919488151911

How does a (then) 2-year-old startup with NO commercial product raise $675M from Jeff Bezos, OpenAI & NVIDIA? 🤯 This is the untold story of @Figure_robot. And @adcock_brett. It’s a masterclass in building a reality-distortion field that convinced the world’s smartest https://x.com/IlirAliu_/status/1969748812437549265

Yang Song, one of the world’s top diffusion model researchers and inventor of consistency models, has left OpenAI to join Meta https://x.com/iScienceLuvr/status/1971087101203775782

Zuck just poached another Chinese researcher from OpenAI. Yang Song is a giga-brain, easily one of the strongest hires Meta has made from OpenAI so far. Some of my oai friends were shocked to see him leave. https://x.com/Yuchenj_UW/status/1971088866095603858

Abundant Intelligence – Sam Altman https://blog.samaltman.com/abundant-intelligence

Grateful to Jensen for the almost-decade of partnership!”” / X https://x.com/sama/status/1970483993486217258

In case anyone was wondering, 10GW is about 6% of the energy that all humans in the world spend thinking.”” / X https://x.com/gneubig/status/1970449455846768701

OpenAI Shows Us The Money – by Zvi Mowshowitz https://thezvi.substack.com/p/openai-shows-us-the-money

Our vision is simple: we want to create a factory that can produce a gigawatt of new AI infrastructure every week.”” — @sama, in reference to OpenAI”” / X https://x.com/kevinweil/status/1970519868324860145

10GW is about $340B of nvidia h100 at $30k/gpu (assuming 20% of power for non-gpus). if openai got a 30% volume discount, they’d pay nvidia $230b probably. so instead, maybe openai pays nvidia full price and nvidia invests the excess $100B into openai stock 😬 (just throwing”” / X https://x.com/soumithchintala/status/1970464906072801589

looking forward to what we’ll build together with NVIDIA!”” / X https://x.com/gdb/status/1970299081999426016

More compute in the making. Announcing 5 new Stargate sites with Oracle and SoftBank, putting us ahead of schedule on the 10-gigawatt commitment we announced in January. https://x.com/OpenAI/status/1970601342680084483

OpenAI & NVIDIA Announce Strategic Partnership to Deploy 10GW of NVIDIA Systems This enables OpenAI to build & deploy at least 10 gigawatts of AI datacenters with NVIDIA systems representing millions of GPUs for OpenAI’s next-gen AI infrastructure. https://x.com/OpenAINewsroom/status/1970157101633990895

OpenAI and NVIDIA Announce Strategic Partnership to Deploy 10 Gigawatts of NVIDIA Systems | NVIDIA Newsroom https://nvidianews.nvidia.com/news/openai-and-nvidia-announce-strategic-partnership-to-deploy-10gw-of-nvidia-systems

OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems | OpenAI https://openai.com/index/openai-nvidia-systems-partnership/

Together, NVIDIA and OpenAI are expanding the frontier of AI — transforming nearly every industry and unlocking use cases once unimaginable.   “There’s no partner but NVIDIA that can do this at this kind of scale, at this kind of speed,” said @OpenAI CEO Sam Altman. https://x.com/nvidianewsroom/status/1970223778937586043

OpenAI might also be developing AI glasses, a voice recorder, and a pin | The Verge https://www.theverge.com/news/781854/openai-chatgpt-hardware-rumors-smart-speaker-glasses-pin

GPT-5 is the best model for code quality out there 2 years ago, we created the world’s hardest software design quiz. Only 5 questions, multiple choice. Yet only about 3% of software engineers get them. The average score is somewhere between 2 and 3. Supposedly brilliant models https://x.com/jimmykoppel/status/1968683689421701413

Measuring the performance of our models on real-world tasks | OpenAI https://openai.com/index/gdpval/

The implications of OpenAI’s plan to rent $450 billion worth of servers before the end of this decade are 🤯 https://x.com/amir/status/1969043037805228388

Announcing strategic partnership with @nvidia for millions of GPUs — about as much compute as they’ve shipped in 2025 in total — and an investment up to $100B as these GPUs are deployed: https://x.com/gdb/status/1970173243350008201

AI should do more than just answer questions; it should anticipate your needs and help you reach your goals. That’s what we’re beginning to build, starting with ChatGPT Pulse (rolling out now to Pro, with goal of making it available to everyone over time): https://x.com/fidjissimo/status/1971258542578663829

Introducing ChatGPT Pulse | OpenAI https://openai.com/index/introducing-chatgpt-pulse/

Now in preview: ChatGPT Pulse This is a new experience where ChatGPT can proactively deliver personalized daily updates from your chats, feedback, and connected apps like your calendar. Rolling out to Pro users on mobile today. https://x.com/OpenAI/status/1971259652684878019

Today we are launching my favorite feature of ChatGPT so far, called Pulse. It is initially available to Pro subscribers. Pulse works for you overnight, and keeps thinking about your interests, your connected data, your recent chats, and more. Every morning, you get a”” / X https://x.com/sama/status/1971297661748953263

For both $NVDA and OpenAI, the $100B $NVDA investment is perfect: 1. For OAI, the biggest question was how they were going to raise the future +$300B as the valuation is already very high, and the cash burn for the next few years is projected to be crazy. On top of it, there is”” / X https://x.com/rihardjarc/status/1970170005858726278

Two years ago. No model had surpassed GPT-4 & it wasn’t clear that was possible. Now you can get better than GPT-4 level performance on open weights models running on consumer hardware, and the state of the art in LLMs is cheaper & faster and very much more capable than GPT-4.”” / X https://x.com/emollick/status/1970790843868213361

We’ve released a large-scale study on how people are using ChatGPT. Consumer adoption has broadened beyond early-user groups, and lots of economic value is being created through both personal and professional use: https://x.com/gdb/status/1969953507215302836

💥 Announcing GDPval, a new eval that measures model performance on economically valuable, real-world tasks across 44 occupations.”” / X https://x.com/kevinweil/status/1971250647778635904

GDPval.pdf https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf

Just released GDPval: an early step towards better methods for measuring and forecasting real-world model progress.”” / X https://x.com/gdb/status/1971301844585676930

opus 4.1 beats gpt-5-high on OAI’s own new GDP eval. nice of them to be transparent hehe https://x.com/dejavucoder/status/1971253593404735706

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://x.com/OpenAI/status/1971249374077518226

OpenAI, SAP & Microsoft are launching OpenAI for Germany—a partnership to bring frontier AI to Germany’s public sector, through a sovereign, certified cloud environment. Built on SAP’s Delos Cloud and running on Microsoft Azure, this new initiative will help employees across”” / X https://x.com/OpenAINewsroom/status/1970844821624680801

so let me get this right: Oracle says Openai committed $300B for cloud compute → oracle stock jumps 36% (best day since 1992) Oracle runs on Nvidia GPUs → has to buy billions in chips from Nvidia Nvidia just announced they’re investing $100B into openai Openai uses that”” / X https://x.com/SullyOmarr/status/1970176527137718654

SAP and OpenAI partner to launch sovereign ‘OpenAI for Germany’ | OpenAI https://openai.com/global-affairs/openai-for-germany/

A research team at @OpenAI, where I am proud to be a board member, released an important new paper today. This paper looks at what might be thought of as task specific Turing Tests and shows that AI systems, even with limited guidance, perform many tasks — such as planning”” / X https://x.com/LHSummers/status/1971252567981146347

VraserX e/acc on X: “GPT-5 just passed what researchers call the “Gödel Test.” That means it’s not just solving textbook problems, it’s tackling open math conjectures that would normally take a skilled PhD student days to crack. In a new paper, GPT-5 was tested on 5 unsolved optimization https://t.co/4lGYKLrdrD” / X
https://x.com/VraserX/status/1970902050931159184

Results so far No single model dominates: GPT-5 “high” reasoning leads on tough tasks but collapses on time-critical ones. Claude-4 Sonnet balances speed vs accuracy but at higher cost. Open-source models (like Kimi-K2) show promise in adaptability. Scaling curves plateau, https://x.com/omarsar0/status/1970147904087322661

@OpenAI Interesting: 1. Linear progress across OpenAI generations (GPT-4o, o3, GPT-5) 2. Claude Opus 4.1 is on top, nearing industry expert, much better than GPT-5 high. Thanks for acknowledging competitors. https://x.com/Yuchenj_UW/status/1971254164069212231

it’s quite incredible how bad Sonnet 4 is at long-context retrieval Grok-4 > GPT-5 ~ Gemini 2.5 Pro > Claude 4 Sonnet https://x.com/scaling01/status/1970661469667660100

Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes: https://x.com/omarsar0/status/1971215698140819516

Price analysis reveals trends in the Speech to Text market: Fireworks and Groq are the lowest-cost inference providers for Whisper Large v3, offering competitive access to OpenAI’s model. As word error rate decreases, pricing tends to increase, reflecting the performance-cost https://x.com/ArtificialAnlys/status/1971232403973943517

OpenAI launches shared projects for ChatGPT Business https://www.testingcatalog.com/openai-launches-shared-projects-for-chatgpt-business/

Progress at our datacenter in Abilene. Fun to visit yesterday! https://x.com/sama/status/1970812956733739422

Ollama now has a web search API and MCP server! ⚡️ Augment local and cloud models with the latest content to improve accuracy 🔧 Build your own search agent 🔍 Directly plugs into existing MCP clients like @OpenAI Codex, @cline, Goose (@jack) and more! Let’s go!!!! 🧵👇 https://x.com/ollama/status/1971085470785319349

Meta Poaches OpenAI Scientist to Help Lead AI Lab | WIRED https://www.wired.com/story/meta-poaches-openai-researcher-yang-song/

Over the next few weeks, we are launching some new compute-intensive offerings. Because of the associated costs, some features will initially only be available to Pro subscribers, and some new products will have additional fees. Our intention remains to drive the cost of”” / X https://x.com/sama/status/1969835407421374910

Over the next few weeks, we are launching some new compute-intensive offerings. Because of the associated costs, some features will initially only be available to Pro subscribers, and some new products will have additional fees. Our intention remains to drive the cost of”” / X https://x.com/sama/status/1969835407421374910?s=46

📢 New Model(s) Drop: GPT-5 Codex Low, Medium and High are now on Yupp! @OpenAI’s frontier coding models, with adaptive thinking effort and lower token usage. We explored their capabilities with some tough coding prompts: https://x.com/yupp_ai/status/1970617312559669685

Codex CLI 0.39 was released by @OpenAI BIG new feature: Codex CLI now includes /review for automated code reviews! GPT-5-Codex will investigate and find critical bugs in your code. It’s like having another team member always available. https://x.com/mark_k/status/1968934227149291535

Excited to share that OpenAI’s GPT-5-Codex is now live in Windsurf! We’re making it free to paid users for a limited time. We can’t wait to hear what you build with it! https://x.com/windsurf/status/1970549712551100523

GPT-5-Codex is Live in Cline. OpenAI’s agent-optimized version of GPT-5: > 400K context window > adaptive reasoning: 93% fewer tokens on simple tasks, 2x more on complex ones > variable thinking that scales with task complexity > $1.25/$10 per million tokens Built for coding https://x.com/cline/status/1970619799119241709

GPT-5-Codex is live in the Responses API. If you use the Codex CLI via API key, you can now also use GPT-5-Codex.”” / X https://x.com/OpenAIDevs/status/1970535239048159237

GPT-5-Codex is now available in Cursor. Let us know your thoughts!”” / X https://x.com/cursor_ai/status/1970540811168473250

gpt-5-codex is now in the API:”” / X https://x.com/gdb/status/1970631954887565823

GPT-5-Codex is now rolling out to @code developers! https://x.com/pierceboggan/status/1970572801267638421

GPT-5-Codex is relentless and runs until whatever you give it is really done. https://x.com/steipete/status/1969054373385801896

GPT-5-Codex, meet Droid. Optimized for agentic coding and tuned in Factory in collaboration with @OpenAI, we find GPT-5-Codex to be a strong daily-driver. Particular strengths: – Long-running tasks – Autonomous pull requests – Quick questions with adaptive reasoning Available https://x.com/FactoryAI/status/1970549069996302846

GPT-5-Codex, optimized for agentic coding, is rolling out to @code now! Try it out and let us know what you think. https://x.com/code/status/1970579099472056350

I am really liking the new gpt-5-codex. this is its attempt at one shotting minecraft in three.js https://x.com/JasonBotterill3/status/1969730846417629277

I have a Codex project, a Deep Research run, a ChatGPT agent task, & a GPT-5 Pro task all going at the same time. Is this that agent managing thing that some people keep saying is supposed to be the future of work? The current AI interfaces aren’t really up to the task.”” / X https://x.com/emollick/status/1969980811504939443

OpenAI solved the “”make sure your code actually runs”” reward with gpt-5 codex and it shows”” / X https://x.com/andrew_n_carr/status/1969784664912179533

OpenAI tests ChatGPT Agent upgrades powered by new models https://www.testingcatalog.com/openai-tests-chatgpt-agent-upgrades-powered-by-new-alpha-models/

when chatgpt said moondream wasn’t a frontier model, i took it personally”” / X https://x.com/vikhyatk/status/1968811248381784167

Why we built the Responses API https://developers.openai.com/blog/responses-api/

wow! GPT-5-Codex in Cursor just gave me the best reasoning I have ever seen in an LLM. the fact that it was able to recover and continue reasoning multiple times is mindblowing. I’ve seen many models say “but wait” repeatedly and still struggle to find the actual cause. https://x.com/_overment/status/1970630704489803857

Agent run times aren’t everything. I gave the same high level task to Sonnet 4 and GPT-5-Codex. Sonnet completed the task across 7 files in ~6 minutes. Codex is so far at 8 files but 32 minutes in still…. The Sonnet output was perfectly acceptable.”” / X https://x.com/zachtratar/status/1970625784500130065

Aviro (@aviro_ai) makes enterprise AI agents continuously upskill to deliver on complex tasks. Their runtime layer, Cortex, helped their in-house deep research agent beat OpenAI’s by 70% on enterprise search and top Microsoft’s Deep Research benchmark. https://x.com/ycombinator/status/1968691488222503194

DeepSeek’s updated V3.1 Terminus ties with gpt-oss-120b (high) as the most intelligent open weights model and offers increased instruction following and long context reasoning capabilities 🧠 Our benchmarking results indicate DeepSeek V3.1 Terminus shows a greater intelligence https://x.com/ArtificialAnlys/status/1971114096008495501

Sam Altman says ChatGPT will stop talking about suicide with teens | The Verge https://www.theverge.com/ai-artificial-intelligence/779053/sam-altman-says-chatgpt-will-stop-talking-about-suicide-with-teens

there’s this guy on tiktok who video calls chatgpt, shows it objects, asks “how much gram,” then weighs them, and if chatgpt is wrong he puts his phone in the fridge where chatgpt is forced to chat with the condiments. roko’s basilisk is definitely making a note of this man. https://x.com/paularambles/status/1971234855309672467

Shipped. For instance, you can see that gpt-oss-120b is 196 GB right from the “Files” tab https://x.com/mishig25/status/1968598133543256151

The Illusion of Readiness (Health AI) • GPT-5 & peers ace med benchmarks—but stress tests reveal fragility • Guess answers w/o images, flip under trivial prompt tweaks • Fabricate “reasoning” that sounds right but isn’t • Leaderboard wins ≠ real-world readiness https://x.com/arankomatsuzaki/status/1970684893966516477

patentlyapple.com/2025/09/openai-raids-apple-for-hardware-talent-and-supply-chain-partners.html https://www.patentlyapple.com/2025/09/openai-raids-apple-for-hardware-talent-and-supply-chain-partners.html

GPT-5 Pro is remarkably good at weird hypotheticals. “”I have a time machine. I want to preserve the temple of Artemis at Ephesus to today. Suggest the dates & interventions I need to make”” “”Its 2087, a very valuable material is discovered on Io. Figure out what & the economy”” https://x.com/emollick/status/1968879525447516510

We’ve made progress on the AI safety problem of detecting and reducing “”scheming””: – Created evaluation environments to detect scheming – Observed current models scheming in controlled settings – Found deliberative alignment ( https://x.com/gdb/status/1969437389027492333

Is OpenAI’s Reinforcement Fine-Tuning (RFT) Worth It? · TensorZero https://www.tensorzero.com/blog/is-openai-reinforcement-fine-tuning-rft-worth-it/

What GPT-oss Leaks About
OpenAI’s Training Data https://fi-le.net/oss/

🚨 New Models Alert: WebDev 💻 GPT-5-Codex and Qwen3-Coder-Plus are both now available on WebDev Arena! In the WebDev Arena, you can test out all the best frontier AI coding models on web development tasks. Vote for your preferred response and see how they stack up on the https://x.com/arena/status/1970962780225507775

XAI OPENAI TRADE SECRETS LAWSUIT complaint2.pdf https://fingfx.thomsonreuters.com/gfx/legaldocs/byvreaygope/XAI%20OPENAI%20TRADE%20SECRETS%20LAWSUIT%20complaint2.pdf

Musk’s xAI sues OpenAI, alleging theft of trade secrets – Sherwood News https://sherwood.news/tech/musks-xai-sues-openai-alleging-theft-of-trade-secrets/

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading