OpenAI News: Week Ending 09/20/2024

A phd student sits in the rain on the sidewalk outside of a college lecture hall. Dark and sad.

OpenAI News: Week Ending 09/20/2024

September 20, 2024

A phd student sits in the rain on the sidewalk outside of a college lecture hall. Dark and sad.

OpenAI Threatens to Ban Users Who Probe Its ‘Strawberry’ AI Models | WIRED

https://www.wired.com/story/openai-threatens-bans-as-users-probe-o1-model

“When an expert realizes that o1 could write his PhD code (that took him a year) in 1 hour. Video:

When an expert realizes that o1 could write his PhD code (that took him a year) in 1 hour.

Video: https://t.co/iOSUSinO8o pic.twitter.com/w2x3lTJJGA
— Stefan Streichsbier (@s_streichsbier) September 15, 2024

Inside Jony Ive’s Life After Apple and His LoveFrom Design Business – The New York Times

“Jony Ive finally confirms the OpenAI AI device. Sam is insane. He managed to seal a chatgpt distribution deal with Apple while collaborating on an iPhone killer with Apple’s top designers.

https://twitter.com/8teapi/status/1837979330867351626?s=46

AmebaGPT on X: “Video showing race of AI labs showing top ranked models from each lab from @lmsysorg. OpenAI leads by over 50 points, something we haven’t seen since March 2024. CC: @altryne @aidan_mclau @btibor91 @BorisMPower @Scobleizer @MatthewBerman @8teAPi https://t.co/a3NKcFkJRR” / X – https://x.com/amebagpt/status/1836803294796128614

“Well at least it’s disclosed now. Looking at timeline Oct 23 – OpenAI research team achieves reasoning “AGI has been achieved internally” Nov 23 – Reuters publishes rumors, which Twitter crazy people (me + others) infer is reasoning model Jul 24 – OpenAI announces cusp of” / X

https://twitter.com/8teAPi/status/1836808828282687962

“Open Dataset release by @OpenAI! 👀 OpenAI just released a Multilingual Massive Multitask Language Understanding (MMMLU) dataset on @huggingface! 🌍 MMLU test set available in 14 languages, including Arabic, German, Spanish, French,…. 🧠 Covers 57 categories from elementary to

https://twitter.com/_philschmid/status/1838230108072476951?s=46

One in five GPs use AI such as ChatGPT for daily tasks, survey finds | GPs | The Guardian

https://www.theguardian.com/society/2024/sep/17/one-in-five-gps-use-ai-such-as-chatgpt-for-daily-tasks-survey-finds

How much energy can AI use? Breaking down the toll of each ChatGPT query – The Washington Post

https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers

OpenAI Messed With the Wrong Mega-Popular Parenting Forum | WIRED

https://www.wired.com/story/mumsnet-openai-copyright-allegations

“OpenAI’s ability to shrink its reasoning model without losing many capabilities might be as big a deal as the reasoning power itself.

https://twitter.com/amir/status/1836782911250735126?s=46O

OpenAI technical goals | OpenAI

https://openai.com/index/openai-technical-goals

Reasoning – OpenAI API

https://platform.openai.com/docs/guides/reasoning/how-reasoning-works

“AI contributions on @github have surged 230% since @OpenAI released ChatGPT” / X

https://twitter.com/rohanpaul_ai/status/1837829123625853259

OpenAI says the latest ChatGPT can ‘think’ – and I have thoughts | Technology | The Guardian

https://www.theguardian.com/technology/2024/sep/17/techcsape-openai-chatgpt-thoughts

“The Safety and Security Committee—a committee established to review critical safety and security issues—has made recommendations across five key areas, which we are adopting.

https://twitter.com/OpenAINewsroom/status/1835773859947069734

An update on our safety & security practices | OpenAI

https://openai.com/index/update-on-safety-and-security-practices

OpenAI is launching an ‘independent’ safety board that can stop its model releases – The Verge

https://www.theverge.com/2024/9/16/24246617/openai-independent-safety-board-stop-model-releases

“o1 is the first model in awhile that has felt really different in terms of prompting that being said, bottom line is still the same for these types of complaints: skill issue” / X

https://twitter.com/nptacek/status/1836832186558734662

The Intelligence Age

https://ia.samaltman.com

Jony Ive confirms he’s working on a new device with OpenAI – The Verge

https://www.theverge.com/2024/9/21/24250867/jony-ive-confirms-collaboration-openai-hardware

“In the past few days, I’ve been testing OpenAI o1 models, mostly o1-mini, for developing PhD or postdoc level projects. I can confidently claim that the o1 model is comparable to an outstanding PhD student in biomedical sciences! I’d rate it among the best PhDs I’ve have trained!

https://twitter.com/DeryaTR_/status/1836434726774526381

“On the tasks I spend time with, o1 preview and o1 mini *are* subjective improvements in output quality – kudos to the reasoning team @OpenAI!!” / X

https://twitter.com/JvNixon/status/1837884523092283599

“.@OpenAI is hiring ML engineers for a new multi-agent research team! We view multi-agent as a path to even better AI reasoning. Prior multi-agent experience isn’t needed. If you’d like to research this area with @kevinleestone and me fill out this form:

https://twitter.com/polynoamial/status/1836872735668195636

“I suspect that most people will not want to use o1-preview for most things. Where it shines is complex work tasks where you are an expert facing a problem that you could solve, but not quickly. If giving 4o time to ponder on the answer might result in a solution, use o1-preview

https://twitter.com/emollick/status/1834953035367264266

“NousCon: a celebration of Actually Open AI @NousResearch x @github x @arcee_ai x @huggingface x @dottxtai x @haizelabs Here is @karan4d launching Nous Forge, the open o1 competitor with open, editable, auditable chain of thought:

https://twitter.com/swyx/status/1836605035201073183

“OpenAI recently released the o1 family of models and a graph showing scaling laws for test-time compute — sadly without the x-axis labeled. Using only the public o1-mini API, I tried to reconstruct the graph as closely as possible. Original on left, my best attempt on right.

https://twitter.com/hughbzhang/status/1838288923656941860

“o1-preview is the first model to actually pull off a full sestina. Every other model has failed, though Claude 3.5 gets so close before flubbing at the envoi (the pattern of the last three lines). Also the summarized list of thoughts when you ask o1 to write poetry is something.

https://twitter.com/emollick/status/1835869076091986020

“The “thinking” process in o1 is just a brilliant bit of user experience. Though it may or may not represent the actual chain of thought, it makes the oracular answer at the end feel understandable and explainable (even if it doesn’t contain actual information?)

https://twitter.com/emollick/status/1835886044102693094

“The confusion and skepticism from technologists about o1 is remarkably similar to the early response to GPT3 and ChatGPT. “That’s all it does?” “It’s just ….” “How is this different from …?” “Don’t really get it” Meanwhile a small number of people are whispering about wild” / X

https://twitter.com/scottastevenson/status/1836811502340252020

“Note o1 is showing how unprepared we are for testing high level AIs to figure out what they are good or bad at. Instead we are turning to experts, like one of the greatest living mathematicians, to give it a vibe check (“mediocre but not completely incompetent, grad student”)

https://twitter.com/emollick/status/1835090283832442982

“🚨 OpenAI CEO Sam Altman confirms that Level-3 Agents are coming soon ” The shift to level 2 took time, but it accelerates the development of level 3. This will enable impactful agent-based experiences that will greatly impact technology advancements in technology ”

https://twitter.com/slow_developer/status/1836647241358143534

“Use cases from o1 are going to be in areas requiring deep expertise where experts can assess the value (and limits) of the system. This writeup from a healthcare startup gives some very specific examples of o1 versus 4o in complex medical admin work.

https://twitter.com/emollick/status/1835000528972984829

“Is OpenAI’s o1 a good calculator? We tested it on up to 20×20 multiplication—o1 solves up to 9×9 multiplication with decent accuracy, while gpt-4o struggles beyond 4×4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

https://twitter.com/yuntiandeng/status/1836114401213989366

“After spending a decent amount of time with o1-preview, I would be very surprised if it is not able to do economically valuable analytical work inside large companies. The main issue is that prompting it remains really weird. But a real R&D effort inside firms might crack that.” / X

https://twitter.com/emollick/status/1836064591479971894

“OpenAI’s new o1 model is a BIG breakthrough in AI intelligence, if IQ tests say anything. I gave it the Norway Mensa IQ test, and it blows other AIs out of the water. I’m surprised!… Because there hadn’t been public progress in the last 6mo. Link to full analysis below:

https://twitter.com/maximlott/status/1835043371339202639

“We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users, we have increased rate limits for o1-mini by 7x, from 50 messages per week to 50 messages per day. o1-preview is more expensive to serve, so we’ve increased the rate” / X

https://twitter.com/OpenAI/status/1835857163765637607

“Here is my talk at @MIT (after some delay😅) I made this talk last year when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm. It’s a good time to zoom out for high level thinking. (1/11)

https://twitter.com/hwchung27/status/1836842717302943774

“OpenAI increased the rate limits for its new o1-mini and o1-preview models in ChatGPT o1-mini is increasing by 7x to 50 messages per day, o1-preview is going from 30 to 50 messages per week But still no word on opening up the API to non-tier 5 users

https://twitter.com/adcock_brett/status/1837885561203224595

“o1-preview is pretty good at planning

https://twitter.com/polynoamial/status/1838251987009183775?s=46

“No more waiting. o1’s is officially on Chatbot Arena! We tested o1-preview and mini with 6K+ community votes. 🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance! 🥈o1-mini: #1 in technical areas, #2 overall.

https://twitter.com/lmarena_ai/status/1836443278033719631