“We’re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions sourced from Kaggle. 

“We benchmarked the @OpenAI’s DevDay Eval product and @bespokelabsai’s Minicheck for hallucination detection. Minicheck is the current best hallucination detector on @guardrails_ai Hub. OpenAI: – Accuracy: 69.19% – F1: 0.7564 – High recall, lower precision Minicheck: – Accuracy: 

“I’m leaving OpenAI after over 2 years of wild ride. Alongside @barret_zoph , @LiamFedus , @johnschulman2 , and many others I got to build a “low key research preview” product that became ChatGPT. While we were all excited to work on it, none of us expected it to be where it is” / X

“speaking of chatgpt, was trying to figure out the perfect walled garden i someday wanted to build. the “edit this area” of the image gen tool is so helpful for brainstorming ideas quickly. after 10 minutes of playing around, i ❤️ 

Orchestrating Agents: Routines and Handoffs | OpenAI Cookbook

“Reminder: 1. LLM’s, transformers…etc not state secrets any longer. 2. Apple not in OpenAI consortium = Gemini or Anthropic or both as partners. 3. If you believe AI is “core tech” then Apple will not let a third-party control their destiny no matter the cost. (See chips as” / X

“After running these additional experiments, we were impressed by a few things: 1) OpenAI o1 models show a consistent improvement over Anthropic and Google models on our long context RAG Benchmark up to 128k tokens. (3/5)” / X

The OpenAI Talent Exodus Gives Rivals an Opening | WIRED

“2) Despite lower performance than the SOTA OpenAI and Anthropic models, Google Gemini 1.5 models have consistent RAG performance at extreme context lengths of up to 2 million tokens. (4/5)” / X

“Why @OpenAI does not prioritize API Revenue and focuses on consumer products (ChatGPT), my thoughts. 🤔 Why not API: > Open models will be equally good, and enterprises might prefer more control > Models will become smaller and cheaper to run -> less revenue/margin > Other 

Uber to launch AI assistant powered by OpenAI’s GPT-4o to help drivers go electric | Reuters

“The new Realtime API with web crawling is mind-blowing! Talk in realtime with any website. Powered by the OpenAI Realtime API and @firecrawl_dev 🔥 Check it out: 

“Interesting observation by altimeter — OpenAl revenue exceeds Google at the time of their IPO 

Microsoft’s AI Story Is Getting Complicated – WSJ

OpenAI reducing dependency on Microsoft data centers, The Information reports – TipRanks.com

OpenAI Leaders Say Microsoft Isn’t Moving Fast Enough to Supply Servers — The Information

“Launch GPT4 Chat Interface in just 3 lines of code! It can’t get simpler than this 😀. Being covered by popular publications as we speak! import gradio as gr import openai_gradio gr.load( name=’gpt-4-turbo’, src=openai_gradio.registry,).launch() This and more in Gradio 5 — 

Before Mira Murati’s surprise exit from OpenAI, staff grumbled its o1 model had been released prematurely | Fortune

OpenAI gets $4 billion revolving credit line on top of latest funding

OpenAI partners with Cosmopolitan and Elle publisher Hearst

OpenAI’s GPT Store Has Left Some Developers in the Lurch | WIRED

“openai-gradio a Python package that makes it very easy for developers to create web apps that are powered by @OpenAI API in a few lines of code pip install openai-gradio 

OpenAI Projections Imply Losses Tripling to $14 Billion in 2026 — The Information

OpenAI and Hearst Content Partnership | OpenAI

OpenAI Funding Fuels Wave of Big AI Deals — The Information

“BREAKING: Looks like OpenAI is entering the arena against Perplexity… citations are now in GPT-4o 👀 

The Race to Block OpenAI’s Scraping Bots Is Slowing Down | WIRED

Generative AI’s Act o1: The Reasoning Era Begins | Sequoia Capital

OpenAI to open Paris office

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading