The statue of The Thinker with a plaque that reads “o1” on the pedestal.
“As LLMs get smarter, evals need to get harder. OpenAI’s o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs. We’re putting up $500K in prizes for the best questions. (read on)
Amir Efrati on X: “new: ChatGPT is CONSERVATIVELY generating more than $225 million per month right now. that’s some kind of growth, folks. https://t.co/2perQfY2LG” / X – https://x.com/amir/status/1834347880251052203
Tibor Blaho on X: “Summary of what we have learned during AMA hour with the OpenAI o1 team today Model Names and Reasoning Paradigm – OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1 – “Preview” indicates it’s an early version of the full model – “Mini”” / X – https://x.com/btibor91/status/1834686946846597281
Cognition on X: “We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code. Linked below is a deep dive with more eval results and https://t.co/yv5bCMoXN3” / X – https://x.com/cognition_labs/status/1834292718174077014?s=46
Rohan Paul on X: “The leap of @OpenAI ‘s 01 models on PhD-level tasks are just MASSIVE, specially on PhD-level Physics. GPQA diamond benchmark (PhD-level physics, chemistry, biology): – o1 first model to surpass human PhD expert performance – Outperforms GPT-4o by a wide margin —- 🧠 AIME https://t.co/ghph1OZ5Hq” / X – https://x.com/rohanpaul_ai/status/1834294432214159439
I have played a little bit with OpenAI’s new iteration of #GPT, GPT-o1, which performs an initial reasoning step before running the LLM. It is certainly a more capable tool than previous iterations, though still struggling with the most advanced research mathematical tasks.
“Some reflection on what today’s reasoning launch really means: New Paradigm I really hope people understand that this is a new paradigm: don’t expect the same pace, schedule, or dynamics of pre-training era. I believe the rate of improvement on evals with our reasoning models
“Sam Altman says the reason this latest OpenAI model is named 01 is because it’s the beginning of a significant new paradigm. And also says ‘We have the next few years in the bag’ there is no capping out of AI progress in sight. — Full Video from “St. Louis Public Radio”
OpenAI to Release Thinking ‘Strawberry’ AI Model Within 2 Weeks
Sam Altman told OpenAI staff the company’s non-profit corporate structure will change next year | Fortune
OpenAI Aims for a $150 Billion Valuation – The New York Times
OpenAI O1-Mini | Hacker News
“@LanceUlanoff An OpenAI spokesperson reportedly confirmed that GPT-Next is a placeholder, not a new model The AI community was going crazy earlier this week when OpenAI Japan’s CEO put up a slide that suggested GPT-Next is coming But it seems the excitement was a bit premature :/
“We just heard that the famed ChatGPT upgrade Strawberry is coming by September 24th but something doesn’t make sense.
It was ‘a threat to humanity’ according to certain OpenAI ex-staff (Reuters)
It ‘rises to human-level reasoner’ (leak to Bloomberg)
But according to early testers ‘its slightly better answers aren’t worth the 10 to 20 second wait’? And it often thinks for that long even if you ask it not to. And it will be pricey.”
https://x.com/AIExplainedYT/status/18335271324981 12532
(6) AI in the News: OAI releases o1, Wikipedia’s Future in the Age of AI, Deepfake Misogyny in South Korea | LinkedIn
“o1 vibe check, week 1” / X
Terence Tao on O1 | Hacker News
ChatGPT o1 preview + mini Wrote My PhD Code in 1 Hour*—What Took Me ~1 Year – YouTube
“OpenAI o1 — our first model trained with reinforcement learning to think hard about problems before answering. Extremely proud of the team! This is a new paradigm with vast opportunity. This is evident quantitatively (eg reasoning metrics are already a step function improved)” / X
“o1 output token pricing matches original GPT-3 pricing – $0.06 / 1K tokens o1 input token pricing 75% cheaper than GPT-3 Edit: considering we pay for (hidden) reasoning tokens, overall price probably comparable for many use cases” / X
“OpenAI is not revealing chain of thought text to users for o1 for reasons relating to ‘competitive advantage’ – 100% a means to prevent synthetic training data exfiltration.” / X
OpenAI o1 System Card | OpenAI
“A breakthrough new agent interface How do you reduce the price of advanced AI while scaling and increasing the intelligence of the system ? Are $20/month subscription plans, like for OpenAI’s o1 model today, the only option for billions of us on this planet ? At @HyperspaceAI
“here is o1, a series of our most capable and aligned models yet:
“We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code. Linked below is a deep dive with more eval results and
“I’ve had access to @OpenAI’s o1 for several weeks. My advice on using it: 1. Don’t think of it like a traditional chat model. Frame o1 in your mind as a really smart friend you’re going to send a DM to solve a problem. She’ll answer back with a very well thought out explanation” / X
“According to this figure, it makes absolutely no sense at all to serve o1-preview. What’s up with that?” / X
“I missed this in the launch post. With o1, @OpenAI is introducing a new class of tokens. Reasoning tokens. Used for chain of thought, reasoning tokens are billed as output tokens. Reasoning tokens count toward the 128K context window. You need to allocate space for reasoning
“🍓 Finally o1 is out – our first model with general reasoning capabilities. Not only it achieves impressive results on hard, scientific tasks, but also it gets significantly improved on safety and robustness.
“Fun things to do with your limited o1-preview uses that can show you the power and limitations: 🤖Give it an RFP and ask it to just do the work 🤖Give it an academic paper & ask it to offer strategies for replication 🤖Ask it to create an entrepreneurial product that it can build
OpenAI cites increase in business users, weighs price boosts
OpenAI o1 Hub | OpenAI
“We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1’s modest scores on ARC-AGI? Our notes:
“🎉Congrats to @OpenAI for releasing o1: – Economics: @tylercowen asked o1 basically to write a college essay – Genetics: @catbrownstein asked o1 to help her reason through “n of 1” cases – medical cases that nobody has ever seen – Physics: @mariokrenn6240 used o1 to draft and
“The O1 release posts are unscientific — they don’t compare against previous SOTA from other labs, they don’t cite or even acknowledge previous work in the area of inference time compute. This is actively harmful to the research community, and bordering on disingenuous.” / X
OpenAI releases new o1 reasoning model – The Verge
“Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI’s new o1 model series! (aka 🍓) Let me explain 🧵 1/
“Excited to bring o1-mini to the world with @ren_hongyu @_kevinlu @Eric_Wallace_ and many others. A cheap model that can achieve 70% AIME and 1650 elo on codeforces.
“Inspired by the new o1 model, I hacked together g1, powered by Llama-3.1 on @GroqInc. It uses reasoning chains to solve problems. It solves the Strawberry problem ~70% of the time, with no fine tuning or few shot techniques. A thread 🧵 (with GitHub repo!)
Introducing OpenAI o1 | OpenAI
“o1-mini is the most surprising research result i’ve seen in the past year obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it’s hard to believe congrats @ren_hongyu @shengjia_zhao for the great work!” / X
“OpenAI dropped a new o1 prompting advice guide Since it’s not just a new model and performs chain-of-thought prompting internally, the best prompts for the new ChatGPT will be completely different If you’re testing OpenAI o1, share your best prompts and results below ⬇️
“OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there’re only 2 techniques that scale indefinitely with compute: learning & search. It’s time to shift focus to
“🚨🍓 OpenAI o1 support in LangChain OpenAI just shipped a new model in preview that uses reinforcement learning and chain-of-thought reasoning to generate more carefully thought out answers. Announcement blog:
“5 papers you want to read to understand better how @OpenAI o1 might work. Focusing on Improving LLM reasoning capabilities for complex tasks via training/RLHF, not prompting. 👀 > Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (
“Terence Tao is giving commentary on o1’s math capabilities on Mastodon, and has mixed but overall optimistic takeaways World’s foremost expert is seriously evaluating a model vs. his PhD students w/ mixed results. Rate of improvement is incredible.
“Summary of what we have learned during AMA hour with the OpenAI o1 team today Model Names and Reasoning Paradigm – OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1 – “Preview” indicates it’s an early version of the full model – “Mini”” / X
o1-preview is SOTA on the aider leaderboard | aider
“New w/ @erinkwoo @amir: OpenAI is planning to release Strawberry as part of ChatGPT in the next 2 weeks. We have more exclusive details on the new model’s strengths and weaknesses here:
Reasoning – OpenAI API
Learning to Reason with LLMs | OpenAI
“It’s so over. OpenAI’s 01 model got 25 out of 35 IQ questions correct, far above what most humans get. And, these questions were never part of 01’s training data, as they have never been posted to the public internet.” / X
“Strawberry model from @OpenAI latest by next week 🤯 “OpenAI plans to release Strawberry as part of its ChatGPT service in the next two weeks, earlier than the original fall timeline we had recently reported, said two people who have tested out the model. Release timelines are
ChatGPT – Bounded Sequence Solution
OpenAI Targets $150B Valuation With New Major Funding Round – Bloomberg
ChatGPT – Convolution Limit Dependence
“Just plotted the new @OpenAI model on my AI IQ tracking page. Note that this test is an offline-only IQ quiz that a Mensa member created for my testing, which is *not in any AI training data* (so scores are lower than for public IQ tests.) OpenAI’s new model does very well
ChatGPT – Chebyshev Asymptotic Proofhttps://chatgpt.com/share/bb0b1cfa-63f6-44bb-805e-8c224f8b9205





Leave a Reply