“We’re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions sourced from Kaggle.
“We benchmarked the @OpenAI’s DevDay Eval product and @bespokelabsai’s Minicheck for hallucination detection. Minicheck is the current best hallucination detector on @guardrails_ai Hub. OpenAI: – Accuracy: 69.19% – F1: 0.7564 – High recall, lower precision Minicheck: – Accuracy:
“I’m leaving OpenAI after over 2 years of wild ride. Alongside @barret_zoph , @LiamFedus , @johnschulman2 , and many others I got to build a “low key research preview” product that became ChatGPT. While we were all excited to work on it, none of us expected it to be where it is” / X
“speaking of chatgpt, was trying to figure out the perfect walled garden i someday wanted to build. the “edit this area” of the image gen tool is so helpful for brainstorming ideas quickly. after 10 minutes of playing around, i ❤️
Orchestrating Agents: Routines and Handoffs | OpenAI Cookbook
“Reminder: 1. LLM’s, transformers…etc not state secrets any longer. 2. Apple not in OpenAI consortium = Gemini or Anthropic or both as partners. 3. If you believe AI is “core tech” then Apple will not let a third-party control their destiny no matter the cost. (See chips as” / X
“After running these additional experiments, we were impressed by a few things: 1) OpenAI o1 models show a consistent improvement over Anthropic and Google models on our long context RAG Benchmark up to 128k tokens. (3/5)” / X
The OpenAI Talent Exodus Gives Rivals an Opening | WIRED
“2) Despite lower performance than the SOTA OpenAI and Anthropic models, Google Gemini 1.5 models have consistent RAG performance at extreme context lengths of up to 2 million tokens. (4/5)” / X
“Why @OpenAI does not prioritize API Revenue and focuses on consumer products (ChatGPT), my thoughts. 🤔 Why not API: > Open models will be equally good, and enterprises might prefer more control > Models will become smaller and cheaper to run -> less revenue/margin > Other
Uber to launch AI assistant powered by OpenAI’s GPT-4o to help drivers go electric | Reuters
“The new Realtime API with web crawling is mind-blowing! Talk in realtime with any website. Powered by the OpenAI Realtime API and @firecrawl_dev 🔥 Check it out:
“Interesting observation by altimeter — OpenAl revenue exceeds Google at the time of their IPO
Microsoft’s AI Story Is Getting Complicated – WSJ
OpenAI reducing dependency on Microsoft data centers, The Information reports – TipRanks.com
OpenAI Leaders Say Microsoft Isn’t Moving Fast Enough to Supply Servers — The Information
“Launch GPT4 Chat Interface in just 3 lines of code! It can’t get simpler than this 😀. Being covered by popular publications as we speak! import gradio as gr import openai_gradio gr.load( name=’gpt-4-turbo’, src=openai_gradio.registry,).launch() This and more in Gradio 5 —
Before Mira Murati’s surprise exit from OpenAI, staff grumbled its o1 model had been released prematurely | Fortune
OpenAI gets $4 billion revolving credit line on top of latest funding
OpenAI partners with Cosmopolitan and Elle publisher Hearst
OpenAI’s GPT Store Has Left Some Developers in the Lurch | WIRED
“openai-gradio a Python package that makes it very easy for developers to create web apps that are powered by @OpenAI API in a few lines of code pip install openai-gradio
OpenAI Projections Imply Losses Tripling to $14 Billion in 2026 — The Information
OpenAI and Hearst Content Partnership | OpenAI
OpenAI Funding Fuels Wave of Big AI Deals — The Information
“BREAKING: Looks like OpenAI is entering the arena against Perplexity… citations are now in GPT-4o 👀
The Race to Block OpenAI’s Scraping Bots Is Slowing Down | WIRED
Generative AI’s Act o1: The Reasoning Era Begins | Sequoia Capital
OpenAI to open Paris office





Leave a Reply