OpenAI News: Week Ending 09/13/2024

The statue of The Thinker with a plaque that reads “o1” on the pedestal.

“As LLMs get smarter, evals need to get harder. OpenAI’s o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs. We’re putting up $500K in prizes for the best questions. (read on)

https://twitter.com/alexandr_wang/status/1835738937719140440

Amir Efrati on X: “new: ChatGPT is CONSERVATIVELY generating more than $225 million per month right now. that’s some kind of growth, folks. https://t.co/2perQfY2LG” / X – https://x.com/amir/status/1834347880251052203

Tibor Blaho on X: “Summary of what we have learned during AMA hour with the OpenAI o1 team today Model Names and Reasoning Paradigm – OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1 – “Preview” indicates it’s an early version of the full model – “Mini”” / X – https://x.com/btibor91/status/1834686946846597281

Cognition on X: “We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code. Linked below is a deep dive with more eval results and https://t.co/yv5bCMoXN3” / X – https://x.com/cognition_labs/status/1834292718174077014?s=46

Rohan Paul on X: “The leap of @OpenAI ‘s 01 models on PhD-level tasks are just MASSIVE, specially on PhD-level Physics. GPQA diamond benchmark (PhD-level physics, chemistry, biology): – o1 first model to surpass human PhD expert performance – Outperforms GPT-4o by a wide margin —- 🧠 AIME https://t.co/ghph1OZ5Hq” / X – https://x.com/rohanpaul_ai/status/1834294432214159439

I have played a little bit with OpenAI’s new iteration of #GPT, GPT-o1, which performs an initial reasoning step before running the LLM. It is certainly a more capable tool than previous iterations, though still struggling with the most advanced research mathematical tasks.

https://mathstodon.xyz/@tao/113132502735585408

“Some reflection on what today’s reasoning launch really means: New Paradigm I really hope people understand that this is a new paradigm: don’t expect the same pace, schedule, or dynamics of pre-training era. I believe the rate of improvement on evals with our reasoning models

https://twitter.com/willdepue/status/1834294935497179633

“Sam Altman says the reason this latest OpenAI model is named 01 is because it’s the beginning of a significant new paradigm. And also says ‘We have the next few years in the bag’ there is no capping out of AI progress in sight. — Full Video from “St. Louis Public Radio”

https://twitter.com/rohanpaul_ai/status/1835295597571481999

OpenAI to Release Thinking ‘Strawberry’ AI Model Within 2 Weeks

OpenAI to Release Thinking ‘Strawberry’ AI Model Within 2 Weeks

Sam Altman told OpenAI staff the company’s non-profit corporate structure will change next year | Fortune

https://fortune.com/2024/09/13/sam-altman-openai-non-profit-structure-change-next-year

OpenAI Aims for a $150 Billion Valuation – The New York Times

OpenAI O1-Mini | Hacker News

https://news.ycombinator.com/item?id=41523050

“@LanceUlanoff An OpenAI spokesperson reportedly confirmed that GPT-Next is a placeholder, not a new model The AI community was going crazy earlier this week when OpenAI Japan’s CEO put up a slide that suggested GPT-Next is coming But it seems the excitement was a bit premature :/

https://twitter.com/rowancheung/status/1832995634183303370

“We just heard that the famed ChatGPT upgrade Strawberry is coming by September 24th but something doesn’t make sense.

It was ‘a threat to humanity’ according to certain OpenAI ex-staff (Reuters)
It ‘rises to human-level reasoner’ (leak to Bloomberg)

But according to early testers ‘its slightly better answers aren’t worth the 10 to 20 second wait’? And it often thinks for that long even if you ask it not to. And it will be pricey.”

https://x.com/AIExplainedYT/status/18335271324981 12532

(6) AI in the News: OAI releases o1, Wikipedia’s Future in the Age of AI, Deepfake Misogyny in South Korea | LinkedIn

https://www.linkedin.com/pulse/ai-news-wikipedias-future-age-deepfake-misogyny-south-florent-daudens-za3re

“o1 vibe check, week 1” / X

https://twitter.com/swyx/status/1834967538234802503

Terence Tao on O1 | Hacker News

https://news.ycombinator.com/item?id=41540902

ChatGPT o1 preview + mini Wrote My PhD Code in 1 Hour*—What Took Me ~1 Year – YouTube

“OpenAI o1 — our first model trained with reinforcement learning to think hard about problems before answering. Extremely proud of the team! This is a new paradigm with vast opportunity. This is evident quantitatively (eg reasoning metrics are already a step function improved)” / X

OpenAI o1 — our first model trained with reinforcement learning to think hard about problems before answering. Extremely proud of the team!

This is a new paradigm with vast opportunity. This is evident quantitatively (eg reasoning metrics are already a step function improved)… https://t.co/rj0wMh4Sec
— Greg Brockman (@gdb) September 12, 2024

“o1 output token pricing matches original GPT-3 pricing – $0.06 / 1K tokens o1 input token pricing 75% cheaper than GPT-3 Edit: considering we pay for (hidden) reasoning tokens, overall price probably comparable for many use cases” / X

o1 output token pricing matches original GPT-3 pricing – $0.06 / 1K tokens

o1 input token pricing 75% cheaper than GPT-3

Edit: considering we pay for (hidden) reasoning tokens, overall price probably comparable for many use cases https://t.co/kijaEEIf04
— Nathan Labenz (@labenz) September 12, 2024

“OpenAI is not revealing chain of thought text to users for o1 for reasons relating to ‘competitive advantage’ – 100% a means to prevent synthetic training data exfiltration.” / X

OpenAI is not revealing chain of thought text to users for o1 for reasons relating to 'competitive advantage' – 100% a means to prevent synthetic training data exfiltration.
— Mike Conover (@vagabondjack) September 12, 2024

OpenAI o1 System Card | OpenAI

https://openai.com/index/openai-o1-system-card

“A breakthrough new agent interface How do you reduce the price of advanced AI while scaling and increasing the intelligence of the system ? Are $20/month subscription plans, like for OpenAI’s o1 model today, the only option for billions of us on this planet ? At @HyperspaceAI

A breakthrough new agent interface

How do you reduce the price of advanced AI while scaling and increasing the intelligence of the system ? Are $20/month subscription plans, like for OpenAI's o1 model today, the only option for billions of us on this planet ?

At @HyperspaceAI… pic.twitter.com/O2pFUwtkik
— Varun (@varun_mathur) September 12, 2024

“here is o1, a series of our most capable and aligned models yet:

here is o1, a series of our most capable and aligned models yet:https://t.co/yzZGNN8HvD

o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. pic.twitter.com/Qs1HoSDOz1
— Sam Altman (@sama) September 12, 2024

“We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code. Linked below is a deep dive with more eval results and

We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1's reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code.

Linked below is a deep dive with more eval results and… pic.twitter.com/yv5bCMoXN3
— Cognition (@cognition_labs) September 12, 2024

“I’ve had access to @OpenAI’s o1 for several weeks. My advice on using it: 1. Don’t think of it like a traditional chat model. Frame o1 in your mind as a really smart friend you’re going to send a DM to solve a problem. She’ll answer back with a very well thought out explanation” / X

I've had access to @OpenAI's o1 for several weeks. My advice on using it:

1. Don’t think of it like a traditional chat model. Frame o1 in your mind as a really smart friend you’re going to send a DM to solve a problem. She’ll answer back with a very well thought out explanation…
— Andrew Mayne (@AndrewMayne) September 13, 2024

“According to this figure, it makes absolutely no sense at all to serve o1-preview. What’s up with that?” / X

According to this figure, it makes absolutely no sense at all to serve o1-preview.

What's up with that? https://t.co/QEXEVkO9Zd
— Lucas Beyer (bl16) (@giffmana) September 12, 2024

“I missed this in the launch post. With o1, @OpenAI is introducing a new class of tokens. Reasoning tokens. Used for chain of thought, reasoning tokens are billed as output tokens. Reasoning tokens count toward the 128K context window. You need to allocate space for reasoning

I missed this in the launch post.

With o1, @OpenAI is introducing a new class of tokens.

Reasoning tokens.

Used for chain of thought, reasoning tokens are billed as output tokens.

Reasoning tokens count toward the 128K context window.

You need to allocate space for reasoning… pic.twitter.com/e1PBmbWbky
— virat (@virattt) September 12, 2024

“🍓 Finally o1 is out – our first model with general reasoning capabilities. Not only it achieves impressive results on hard, scientific tasks, but also it gets significantly improved on safety and robustness.

🍓 Finally o1 is out – our first model with general reasoning capabilities. Not only it achieves impressive results on hard, scientific tasks, but also it gets significantly improved on safety and robustness.https://t.co/FIW0pnFqgN

We found reasoning in context about safety… pic.twitter.com/FtbuivZmRc
— Lilian Weng (@lilianweng) September 12, 2024

“Fun things to do with your limited o1-preview uses that can show you the power and limitations: 🤖Give it an RFP and ask it to just do the work 🤖Give it an academic paper & ask it to offer strategies for replication 🤖Ask it to create an entrepreneurial product that it can build

Fun things to do with your limited o1-preview uses that can show you the power and limitations:
🤖Give it an RFP and ask it to just do the work
🤖Give it an academic paper & ask it to offer strategies for replication
🤖Ask it to create an entrepreneurial product that it can build pic.twitter.com/TrRVZjA2Yj
— Ethan Mollick (@emollick) September 13, 2024

OpenAI cites increase in business users, weighs price boosts

https://www.axios.com/2024/09/06/openai-chatgpt-cash-subscriptions

OpenAI o1 Hub | OpenAI

https://openai.com/o1/#snake-video

“We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1’s modest scores on ARC-AGI? Our notes:

We put OpenAI o1 to the test against ARC Prize.

Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet.

Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI?

Our notes:https://t.co/sV6LM1foGx pic.twitter.com/xLPqLRbSaU
— ARC Prize (@arcprize) September 13, 2024

“🎉Congrats to @OpenAI for releasing o1: – Economics: @tylercowen asked o1 basically to write a college essay – Genetics: @catbrownstein asked o1 to help her reason through “n of 1” cases – medical cases that nobody has ever seen – Physics: @mariokrenn6240 used o1 to draft and

🎉Congrats to @OpenAI for releasing o1:

– Economics: @tylercowen asked o1 basically to write a college essay
– Genetics: @catbrownstein asked o1 to help her reason through "n of 1" cases – medical cases that nobody has ever seen
– Physics: @mariokrenn6240 used o1 to draft and… pic.twitter.com/av5rdBKoMa
— swyx io (@swyx) September 12, 2024

“The O1 release posts are unscientific — they don’t compare against previous SOTA from other labs, they don’t cite or even acknowledge previous work in the area of inference time compute. This is actively harmful to the research community, and bordering on disingenuous.” / X

The O1 release posts are unscientific — they don’t compare against previous SOTA from other labs, they don’t cite or even acknowledge previous work in the area of inference time compute.

This is actively harmful to the research community, and bordering on disingenuous.
— Aaron Defazio (@aaron_defazio) September 12, 2024

OpenAI releases new o1 reasoning model – The Verge

https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt

“Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI’s new o1 model series! (aka 🍓) Let me explain 🧵 1/

Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 🍓) Let me explain 🧵 1/ pic.twitter.com/aVGAkb9kxV
— Noam Brown (@polynoamial) September 12, 2024

“Excited to bring o1-mini to the world with @ren_hongyu @_kevinlu @Eric_Wallace_ and many others. A cheap model that can achieve 70% AIME and 1650 elo on codeforces.

Excited to bring o1-mini to the world with @ren_hongyu @_kevinlu @Eric_Wallace_ and many others. A cheap model that can achieve 70% AIME and 1650 elo on codeforces.https://t.co/z0rrjWVniH
— Shengjia Zhao (@shengjia_zhao) September 12, 2024

“Inspired by the new o1 model, I hacked together g1, powered by Llama-3.1 on @GroqInc. It uses reasoning chains to solve problems. It solves the Strawberry problem ~70% of the time, with no fine tuning or few shot techniques. A thread 🧵 (with GitHub repo!)

Inspired by the new o1 model, I hacked together g1, powered by Llama-3.1 on @GroqInc. It uses reasoning chains to solve problems.

It solves the Strawberry problem ~70% of the time, with no fine tuning or few shot techniques.

A thread 🧵 (with GitHub repo!) pic.twitter.com/X1AjWFzYT2
— Benjamin Klieger (@BenjaminKlieger) September 14, 2024

Introducing OpenAI o1 | OpenAI

https://openai.com/index/introducing-openai-o1-preview

“o1-mini is the most surprising research result i’ve seen in the past year obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it’s hard to believe congrats @ren_hongyu @shengjia_zhao for the great work!” / X

o1-mini is the most surprising research result i've seen in the past year

obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe

congrats @ren_hongyu @shengjia_zhao for the great work!
— Jason Wei (@_jasonwei) September 12, 2024

“OpenAI dropped a new o1 prompting advice guide Since it’s not just a new model and performs chain-of-thought prompting internally, the best prompts for the new ChatGPT will be completely different If you’re testing OpenAI o1, share your best prompts and results below ⬇️

OpenAI dropped a new o1 prompting advice guide

Since it's not just a new model and performs chain-of-thought prompting internally, the best prompts for the new ChatGPT will be completely different

If you're testing OpenAI o1, share your best prompts and results below ⬇️ pic.twitter.com/vgDfBbOoYS
— Rowan Cheung (@rowancheung) September 12, 2024

“OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there’re only 2 techniques that scale indefinitely with compute: learning & search. It’s time to shift focus to

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to… pic.twitter.com/jTViQucwxr
— Jim Fan (@DrJimFan) September 12, 2024

“🚨🍓 OpenAI o1 support in LangChain OpenAI just shipped a new model in preview that uses reinforcement learning and chain-of-thought reasoning to generate more carefully thought out answers. Announcement blog:

🚨🍓 OpenAI o1 support in LangChain

OpenAI just shipped a new model in preview that uses reinforcement learning and chain-of-thought reasoning to generate more carefully thought out answers.

Announcement blog: https://t.co/BKfPPgqItf

Try it today in LangChain Python & JS/TS: pic.twitter.com/sjoM0GfeO3
— LangChain (@LangChainAI) September 12, 2024

“5 papers you want to read to understand better how @OpenAI o1 might work. Focusing on Improving LLM reasoning capabilities for complex tasks via training/RLHF, not prompting. 👀 > Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (

5 papers you want to read to understand better how @OpenAI o1 might work. Focusing on Improving LLM reasoning capabilities for complex tasks via training/RLHF, not prompting. 👀

> Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (https://t.co/c2mheh8fLM)… pic.twitter.com/DKCvcQkmCa
— Philipp Schmid (@_philschmid) September 15, 2024

“Terence Tao is giving commentary on o1’s math capabilities on Mastodon, and has mixed but overall optimistic takeaways World’s foremost expert is seriously evaluating a model vs. his PhD students w/ mixed results. Rate of improvement is incredible.

Terence Tao is giving commentary on o1's math capabilities on Mastodon, and has mixed but overall optimistic takeaways

World's foremost expert is seriously evaluating a model vs. his PhD students w/ mixed results. Rate of improvement is incredible.https://t.co/CsGf5weNX2 pic.twitter.com/tjqJVT8dxt
— Jay Hack (@mathemagic1an) September 15, 2024

“Summary of what we have learned during AMA hour with the OpenAI o1 team today Model Names and Reasoning Paradigm – OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1 – “Preview” indicates it’s an early version of the full model – “Mini”” / X

Summary of what we have learned during AMA hour with the OpenAI o1 team today

Model Names and Reasoning Paradigm

– OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1
– "Preview" indicates it's an early version of the full model
– "Mini"… https://t.co/LdIsWgn1Hy
— Tibor Blaho (@btibor91) September 13, 2024

o1-preview is SOTA on the aider leaderboard | aider

https://aider.chat/2024/09/12/o1.html

“New w/ @erinkwoo @amir: OpenAI is planning to release Strawberry as part of ChatGPT in the next 2 weeks. We have more exclusive details on the new model’s strengths and weaknesses here:

New w/ @erinkwoo @amir:

OpenAI is planning to release Strawberry as part of ChatGPT in the next 2 weeks.

We have more exclusive details on the new model's strengths and weaknesses here:https://t.co/kYYO5OC0wK
— Stephanie Palazzolo (@steph_palazzolo) September 10, 2024

Reasoning – OpenAI API

https://platform.openai.com/docs/guides/reasoning

Learning to Reason with LLMs | OpenAI

https://openai.com/index/learning-to-reason-with-llms

“It’s so over. OpenAI’s 01 model got 25 out of 35 IQ questions correct, far above what most humans get. And, these questions were never part of 01’s training data, as they have never been posted to the public internet.” / X

It's so over.

OpenAI's 01 model got 25 out of 35 IQ questions correct, far above what most humans get.

And, these questions were never part of 01's training data, as they have never been posted to the public internet. https://t.co/u04X2CGREW
— Rohan Paul (@rohanpaul_ai) September 15, 2024

“Strawberry model from @OpenAI latest by next week 🤯 “OpenAI plans to release Strawberry as part of its ChatGPT service in the next two weeks, earlier than the original fall timeline we had recently reported, said two people who have tested out the model. Release timelines are

Strawberry model from @OpenAI latest by next week 🤯

"OpenAI plans to release Strawberry as part of its ChatGPT service in the next two weeks, earlier than the original fall timeline we had recently reported, said two people who have tested out the model.

Release timelines are… pic.twitter.com/u5dCoQAF0N
— Rohan Paul (@rohanpaul_ai) September 10, 2024

ChatGPT – Bounded Sequence Solution

https://chatgpt.com/share/94152e76-7511-4943-9d99-1118267f4b2b

OpenAI Targets $150B Valuation With New Major Funding Round – Bloomberg

https://www.bloomberg.com/news/articles/2024-09-11/openai-fundraising-set-to-vault-startup-s-value-to-150-billion

ChatGPT – Convolution Limit Dependence

https://chatgpt.com/share/2ecd7b73-3607-46b3-b855-b29003333b87

“Just plotted the new @OpenAI model on my AI IQ tracking page. Note that this test is an offline-only IQ quiz that a Mensa member created for my testing, which is *not in any AI training data* (so scores are lower than for public IQ tests.) OpenAI’s new model does very well

Just plotted the new @OpenAI model on my AI IQ tracking page.

Note that this test is an offline-only IQ quiz that a Mensa member created for my testing, which is *not in any AI training data* (so scores are lower than for public IQ tests.)

OpenAI's new model does very well pic.twitter.com/D3MDZOxzhK
— Maxim Lott (@maximlott) September 13, 2024

ChatGPT – Chebyshev Asymptotic Proofhttps://chatgpt.com/share/bb0b1cfa-63f6-44bb-805e-8c224f8b9205