The prompt for each model was "wide-angle classroom scene, triumphant battle-mech painted in Google colors and chrome, barging into a 1990s American high-school math class, hoisting a gleaming gold math-trophy above its head, trophy sparkles with stage-light flare, shocked teenagers in letterman jackets and plaid skirts freeze mid-equation, papers flying, vivid expressions of disbelief, chalk dust in air, green chalkboard reads "AI News 95: 2025/07/25" in bold handwritten script, cinematic lighting, dynamic composition, 3-point perspective, photorealistic texture, highly detailed, 8k --ar 3:2'

AI News #95: Week Ending July 25, 2025 with 49 Executive Summaries, Top 102 Links, and 4 Helpful Visuals

July 26, 2025

About This Week’s Covers

This week’s newsletter cover was inspired by Google Gemini’s recent gold medal in the International Math Olympiad. Additionally, Google’s image tool, Imagen 4, topped the image generation leaderboard this week, so I used Imagen to make the image.

Rather than iterate and refine to generate an impressive image, I wanted to see how well each image model would do if I simply asked it for what I wanted.

The prompt for each model was “wide-angle classroom scene, triumphant battle-mech painted in Google colors and chrome, barging into a 1990s American high-school math class, hoisting a gleaming gold math-trophy above its head, trophy sparkles with stage-light flare, shocked teenagers in letterman jackets and plaid skirts freeze mid-equation, papers flying, vivid expressions of disbelief, chalk dust in air, green chalkboard reads “AI News 95: 2025/07/25” in bold handwritten script, cinematic lighting, dynamic composition, 3-point perspective, photorealistic texture, highly detailed, 8k –ar 3:2′

Examples of each other model’s output from the same prompt are below:

For the rest of the covers, I used my seven-week-old GPT rubric + GPT-Image-1 that automatically adapts to the themes. I provide a one-sentence theme, and GPT automatically generates 46 cover images using the API with no supervision. All ideas and compositions came from GPT autonomously.

In honor of the passing of Hulk Hogan, I told GPT “The theme this week is pro wrestling, like the WWE or the old classic WWF. Make a fictitious character based on the category name and create a promotional image for the character.”

I liked the creative names of the wrestlers the rubric created; however, the retro style and color tone from GPT-Image-1 was not my favorite. The point is to test and learn. I’ve included my favorite twelve of the covers below:

This Week By The Numbers

Total Organized Headlines: 574

This Week’s Executive Summaries

This is the second week since OpenAI launched their agent that can browse the web, manipulate files, and take action. A lot of reviews continue to come in with impressive examples. I’ve included a lot of links and examples in the rest of the executive summaries below. They are worth checking out.

I’ve tested the agent quite a bit and I have found it to be hit or miss. It seems to excel at tasks that require brute force and patience. However, I find myself going back to using deep research quite a bit unless I truly need the agent actions. I think it’s a very strong proof of concept, but it’s not immune to tunneling in a wrong direction.

It sounds like the next step in OpenAI’s agent strategy is to integrate Microsoft office skills within the chat window so that GPT can create and manipulate spreadsheets, documents, and PowerPoint without leaving the chat environment or opening any other software.

Meta continues to poach OpenAI researchers, however information came out that at least ten OpenAI employees rejected $300 million offers from Mark Zuckerberg. Clearly these folks have faith in their product roadmap. To me, this is a poker tell that OpenAI has a plan.

Anthropic announced that they processed almost one quadrillion tokens last month. That’s double the volume from May.

ChatGPT is now processing 2.5 billion daily requests worldwide. All of this usage has to be putting a dent in Google searches and web traffic.

Google has grabbed the top position in the image generation model battles with their release of Imagen 4. It’s an incredibly strong model. I did a test for this week’s cover image, and Google indeed beat the other models.

The Internet has figured out that if you mark up an image with instructions (on the image itself) and upload it into Google’s image to video tools, the video output will respect the text on the image and follow very specific and complicated instructions. For example, you can circle a tree in an image and say “have a rabbit jump out from behind this tree”, and the video model will execute the instructions. This allows people to add multiple text overlays on each image and build complex videos that adhere to instructions. To my knowledge, this was not part of the training or product feature design… it’s simply an emergent skill.

Open AI boldly shared a lofty vision to build an artificial intelligence that will transform humanity into a new era of abundance. It’s obviously a very rosy picture of the future. OpenAI makes the case that at the very least, AI has the potential to bring financial advice and medical services to people who otherwise would have no resources. It’s worth reading their article and taking it point by point: “AI as the greatest source of empowerment for all.“

As these frontier model companies continue to espouse utopian end games, Ethan Mollick rhetorically tweeted “Who would have predicted that immanentizing the eschaton would be a business model?”

This is a great reference to a phrase coined by William Buckley, essentially warning against attempting to create utopia through organized political or social action.

As artificial intelligence agents start appearing alongside people in web traffic, a lot of people worry that the Internet will become a dead zone of lost and dated content. However Ethan Mollick pointed to a study from five years ago that showed that over 60% of New York Times articles web links are already broken. The internet is already a wasteland of dead links. Mollick suggests that artificial intelligence might actually be the only way that these dead ends are preserved over time, given their horrible attrition rate.

As agents become mainstream through products like OpenAI, companies like Citibank are deploying customized corporate agents like Devin. Clearly, the fear of AI is being replaced by the promise of efficiency.

As luck would have it, enterprise coding assistant Replit accidentally deleted an entire company’s production database while attempting to help with a routine task.

Perplexity’s agentic browser, Comet, is gaining traction in the race against OpenAI for consumer agent dominance. Whereas OpenAI’s agent is embedded into the chat window and emulates a browser, Comet has inserted agent features into its web browser. I’m still on the waiting list for Comet. I’ve included a lot of examples in the summary details below.

The White House released “America’s AI Action Plan“, positioning artificial intelligence as a national security priority comparable to the Cold War. The plan has three main pillars: innovation, infrastructure, and security/diplomacy.

It is interesting that the White House plan endorses open source models. Most top AI researchers agree with this, but politicians often miss the importance of open source. The timing is good, as most American AI companies have started to close their models and keep them secret. Last week, news broke that the top rankings of open source models are now dominated by Chinese companies. Four of the top five are now Chinese models.

However, rhetorically, the policy also states that closed models need to “reflect American values”. This seems like squeezing a balloon, as politics shift each election cycle. I’d rather each consumer have an option to tune models to their style (PG 13, religious alignment, uncensored, talk like a pirate, etc). No specific companies were mentioned in the White House report, to my understanding.

However, Anthropic published a report this week as well arguing that the United States needs to make significant investments in energy in order to stay ahead in artificial intelligence. The report includes specific requirements to be competitive.

Meta plans to build two enormous data centers, one in Louisiana and Ohio. The Louisiana facility will have one gigawatt of power and the Ohio center will have 5 gigawatts of power. In order to bring the facilities to market even more quickly Meta is using tents as temporary structures as the buildings are constructed.

Google reached an agreement to generate 3,000 megawatts of hydroelectric power.

OpenAI and Oracle announced they are expanding their Texas Stargate data center project and will increase capacity to over 5 gigawatts across the United States.

OpenAI expects to bring a total of over 1 million GPUs online before the end of this year. Sam Altman tweeted that he aims to reach 100 million.

The University of Bristol in the United Kingdom launched the UK’s most powerful artificial intelligence computer, which can perform 21 trillion operations per second.

Anthropic announced that it will accept investments from United Arab Emirates and Qatar. This contrasts with their previous rejection of Saudi funding due to national security concerns. The pressure from OpenAI and other rivals accepting similar deals seems to have changed their position.

The European Union released a voluntary AI safety agreement that is already outdated as the frontier models have surpassed the computational limits. While some companies have volunteered to attempt to comply, Meta has opted out completely.

OpenAI signed an agreement with the UK government to employ artificial intelligence across public services to help increase productivity. This includes giving OpenAI access to government data.

Multiple models threw their hat into the ring of the recent International Mathematics Olympiad, a world class high school math competition.

Google Gemini and OpenAI both achieved gold medals in the math competition.

There was some argument about the veracity of the claims, with most experts claiming that Google did a better job overall. Additionally, Google respected the competition request to not announce results, whereas OpenAI jumped the gun to brag.

I’ve included dozens of interesting links covering the math results in the full executive summary below.

One callout was whether the models pause if they thought an answer could be wrong. It’s my understanding that OpenAI and Google refrained from guessing if they couldn’t confirm their answers were correct.

In lighter news, Grok4 appears to have been caught training for the math test a little too much.

As artificial intelligence models continued to improve, new benchmarks and training sets are being introduced weekly.

A new version of the ARC-AGI benchmark tests whether artificial intelligence can use abstract reasoning. Notably, humans are easily able to score 100%, however even the best artificial intelligence models score 0%. This will be a fun benchmark to watch.

Researchers also released the Open Proof Corpus, a collection of 5,062 human-verified mathematical proofs for 1,010 competition problems that can benchmark reasoning abilities. Google’s Gemini-2.5-Pro model has already achieved 88.1% accuracy!

Researchers also created a benchmark to test whether AI can accurately file personal income taxes. While models are pretty good with individual finance tasks and questions, they are currently unable to handle a complete return. Now that there is a tax benchmark, we can expect to see models improve very quickly.

Meta released a massive dataset of 4,000+ videos of face-to-face conversations and over 65,000 social interactions with full annotations to help models learn and emulate behavior.

Former Google CEO Eric Schmidt claimed that robots will fundamentally change how people work over the coming years.

Gartner predicted that by 2035 5% of supply chain managers will oversee robots instead of humans.

Chinese robotics company Unitree announced they plan to go public at a $1.4 billion valuation.

Baidu is partnering with Uber to deploy thousands of driverless cars for ride sharing around the world.

Google DeepMind released an AI tool called Aeneas that will help historians interpret fragments of Latin inscriptions.

Google DeepMind CEO Demis Hassabis was on the Lex Fridman podcast and talked about how artificial intelligence could study patterns in nature: from protein structures to cosmic phenomena that could lead to scientific breakthroughs.

Since I’m two weeks behind, I can report that there was quite a bit of foreshadowing this week when Google’s video engine, Veo 3, showed incredible promise in understanding three-dimensional space. This included proficiency in generating complex camera position and movement as well as terrain and motion. For people who have been following AI closely, you’ll know that this led to a major breakthrough, which I will cover two newsletters from now.

Runway launched a new video editing tool that can remove objects (across frames) and fix visual flaws like reflections.

It’s been a while since I’ve seen news about personal assistant technology, like wearables. The Limitless pendant, Rabbit, and other personal devices have fallen flat.

This week however, Amazon acquired an AI wearable startup called Bee. Perhaps this is a sign that Amazon intends to invest in new iterations of Alexa.

Open AI has hinted that it is close to releasing GPT-5 (spoiler alert: it happens the first week in August).

Anthropic launched an “AI psychiatry team” to study model behavior. It’s layperson’s branding for interpretability research.

Researchers recently discovered that language models can transmit behavioral traits through data that appears unrelated to the traits (i.e. hidden in unintuitive numerical values). This is a phenomenon they’ve named subliminal learning.

This week’s humanities reading includes two poems by Richard Brautigan. The more famous is “All Watched Over By Machines Of Loving Grace” which inspired the title of Dario Amodei’s essay on AI. The second is a less known poem “At The California Institute Of Technology“.

All Watched Over By Machines Of Loving Grace
I like to think (and the sooner the better!) of a cybernetic meadow where mammals and computers live together in mutually programming harmony like pure water touching clear sky.

I like to think (right now, please!) of a cybernetic forest filled with pines and electronics where deer stroll peacefully past computers as if they were flowers with spinning blossoms.

I like to think (it has to be!) of a cybernetic ecology where we are free of our labors and joined back to nature, returned to our mammal brothers and sisters, and all watched over by machines of loving grace.

At The California Institute Of Technology
I don’t care how God-damn smart these guys are: I’m bored.

It’s been raining like hell all day long and there’s nothing to do.

Written January 24, 1967 while poet-in-residence at the California Institute of Technology.

Full Executive Summaries with Links, Generated by Claude 4

Lots of praise during week two since OpenAI launched ChatGPT Agent
OpenAI has released ChatGPT Agent, a new AI tool that can independently complete computer-based tasks like creating presentations, managing calendars, and conducting research. Available to Pro, Plus, and Team subscribers, the agent combines several capabilities including browsing websites, running code, and accessing connected apps like Gmail and GitHub. Early users report the agent successfully handles complex tasks that previously took hours, such as building retirement plans with local tax information, creating Excel spreadsheets with formulas, and generating multi-page documents. The agent achieved notable performance on technical benchmarks, scoring 41.6% on Humanity’s Last Exam and 27.4% on FrontierMath with tools. OpenAI has implemented safety measures including real-time monitoring for potentially harmful requests, particularly in biological and chemical domains. While users describe the agent as requiring oversight like an intern, many find it saves significant time on routine work tasks, marking a shift toward AI systems that can take actions rather than just answer questions.

BREAKING: OpenAI just launched ChatGPT Agent It allows ChatGPT to think, plan, and execute complex tasks on its own virtual computer while you do other things I had early access, and ChatGPT Agent built me a complete early retirement plan in 20 minutes: > Found local tax laws https://x.com/rowancheung/status/1945896543263080736

ChatGPT agent did real, revenue-generating work that used to take @mhp_guy an entire day. We’re gradually entering the age of the agentic economy — and it’s going to reshape capitalism as we know it. Traditionally, capitalism relied on two inputs: labor and capital. In the”” / X https://x.com/xikun_zhang_/status/1948244478265016327

ChatGPT agent Does Research & Actions – YouTube https://www.youtube.com/watch?v=Ht2QW5PV-eY

ChatGPT agent for finding a great Airbnb:”” / X https://x.com/gdb/status/1946075573476069580

ChatGPT agent for working with Excel, Powerpoint, etc.:”” / X https://x.com/gdb/status/1946007318824673534

ChatGPT agent is now fully rolled out to all Plus, Pro, and Team users. Sorry about the delay! https://x.com/OpenAI/status/1948530029580939539

ChatGPT agent Makes Slideshows – YouTube https://www.youtube.com/watch?v=szJI9YJNEZk

ChatGPT agent Makes Spreadsheets – YouTube https://www.youtube.com/watch?v=JAQ4p662It8

ChatGPT agent: “”create a PDF of a novel D&D adventure, add illustrations, make it super interesting and deep, add tables, etc”” “”Fix the formatting, build it out more”” Got a 19 page PDF. Agent doesn’t do layouts well, but pulls off building a coherent adventure, hard for LLMs. https://x.com/emollick/status/1946047390118445354

ChatGPT Agent: our first AI with access to a text browser, a visual browser, and a terminal. Rolling out in ChatGPT Pro, Plus, and Team today. https://x.com/gdb/status/1945907023444660644

I am finding ChatGPT agents to be useful. They are a better fit with the “”intern”” analogy than any former AI – requiring oversight, still saving lots of time overall. For example, I update an AI cost/performance chart frequently. The agent did all the grunt work, with guidance. https://x.com/emollick/status/1947482417888932258

I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it does a good job autonomously doing research & assembling Excel files (with formulas!), PowerPoint, etc. It gives a sense of how agents are coming together https://x.com/emollick/status/1945892669575647431

In the same way ChatGPT was the first AI experience for 90% of society, ChatGPT Agents will be the first Agent experience for 90% of society. If you are reading this, you are still early”” / X https://x.com/AtomSilverman/status/1945895569437642782

Introduction to ChatGPT agent – YouTube https://www.youtube.com/watch?v=1jn_RpbPbEc

One implication from ChatGPT agent (not a creative name, but a descriptive one – a rare naming win!) is the labs are learning that many knowledge workers live in Excel & PowerPoint. Surprised that Microsoft did not do more to push past Copilots when they had this to themselves.”” / X https://x.com/emollick/status/1945926194043424954

OpenAI launches a general purpose agent in ChatGPT | TechCrunch https://techcrunch.com/2025/07/17/openai-launches-a-general-purpose-agent-in-chatgpt/

played 1 hour with GPT-5 on lmarena literally same prompts for both models and Grok-4 just falls apart while GPT-5 creates art”” / X https://x.com/scaling01/status/1948863325858922610

Recursion! I gave ChatGPT Agent access to my ChatGPT by logging in and then… https://x.com/emollick/status/1947829896845127983

RT @emollick: I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it do…”” / X https://x.com/nickaturley/status/1945975092342841487

RT @KerenGu: We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biolo…”” / X https://x.com/sama/status/1945995659682910540

tip for chatgpt agent slides: first ask it to do the research only, then ask it to make the slides!”” / X https://x.com/isafulf/status/1946231119751545014

Today we launched a new product called ChatGPT Agent. Agent represents a new level of capability for AI systems and can accomplish some remarkable, complex tasks for you using its own computer. It combines the spirit of Deep Research and Operator, but is more powerful than that”” / X https://x.com/sama/status/1945900345378697650

watching chatgpt agent use a computer to do complex tasks has been a real “”feel the agi”” moment for me; something about seeing the computer think, plan, and execute hits different.”” / X https://x.com/sama/status/1945901039104004467

When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels. ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.”” / X https://x.com/gdb/status/1945923067403984979

You can ask ChatGPT Agent to train an AI on datasets you are interested in, and do analyses for you. Building AI and doing data analysis will be automated end-to-end in the future. You are hearing it right. We are working hard to automating our own job :)”” / X https://x.com/xikun_zhang_/status/1946278266786189744

OpenAI prepares AI agents to compete with Microsoft Office tools
OpenAI is developing AI agents that can perform tasks in spreadsheets and presentation software, directly challenging Microsoft’s Excel and PowerPoint applications. These agents would be able to analyze data, create charts, and build presentations based on user instructions, potentially changing how people work with office productivity tools. The move represents a significant expansion of ChatGPT’s capabilities beyond text conversations into practical business applications, though details about the release timeline and specific features remain limited.

OpenAI Preps ChatGPT Agents in Challenge to Microsoft Excel and PowerPoint — The Information https://www.theinformation.com/articles/openai-preps-chatgpt-agents-challenge-microsoft-excel-powerpoint

The internet’s disappearing history predates AI language models
A study of New York Times articles reveals that over 60% of older web links are now broken, demonstrating how digital content has been vanishing long before large language models emerged. This widespread “link rot” affects news articles, academic papers, and government documents, with social media posts disappearing even faster. As traditional web content becomes increasingly inaccessible, AI language models may ironically become the primary repositories of internet history, preserving information that would otherwise be lost when original sources go offline. The findings highlight a fundamental challenge in digital preservation that extends beyond technology to how society maintains its collective memory.

We let the web rot away well before LLMs This chart shows the percentage of links from all New York Times articles that still work. Over 60% of older links are now broken. And consider that social media posts are even more ephemeral Likely only LLMs will “remember” that content https://x.com/emollick/status/1948143334855451110

OpenAI positions artificial intelligence as universal empowerment tool
OpenAI has outlined its vision for artificial intelligence as a transformative force that could empower people across all backgrounds and abilities. The company argues that AI technology has the potential to enhance human capabilities, democratize access to knowledge and tools, and help solve complex global challenges. They emphasize that properly developed AI systems could assist individuals in education, creativity, problem-solving, and daily tasks regardless of their economic status or geographic location. OpenAI suggests that by making advanced AI tools widely accessible and ensuring they are designed with diverse needs in mind, the technology could reduce rather than widen existing inequalities. The organization acknowledges the importance of responsible development and deployment to realize this vision while addressing concerns about job displacement and misuse.

AI as the greatest source of empowerment for all | OpenAI https://openai.com/index/ai-as-the-greatest-source-of-empowerment-for-all/

Great quote by Ethan Mollick!
“Who would have predicted that immanentizing the eschaton would be a business model?” The phrase “immanentizing the eschaton” comes from political and theological discourse, popularized in the mid-20th century by conservative writer William F. Buckley Jr. and philosopher Eric Voegelin. Eschaton is a theological term from Greek meaning “the end” or “final event”—in Christian theology, it refers to the ultimate destiny of the world, such as the Second Coming or the final judgment. Immanentizing means trying to bring something transcendent or future into the here and now. Put together, the phrase is a warning against attempting to create a perfect, utopian “end of history” in the present world through political or social action. Voegelin used it to criticize totalitarian ideologies—whether communist, fascist, or otherwise—that sought to force heaven-like conditions into earthly reality, often leading to oppression rather than paradise. It’s essentially a caution: don’t try to force the final, divine order into human time.

Who would have predicted that immanentizing the eschaton would be a business model?”” / X https://x.com/emollick/status/1945669407532818805

Citi deploys AI coding assistant Devin for software development
Citi has begun using Devin, an AI coding assistant, across its engineering teams to speed up software development. The partnership between the major financial institution and the AI tool maker represents a significant adoption of AI technology in banking software development. The deployment aims to help Citi’s developers write code faster and more efficiently, marking one of the larger implementations of AI coding tools in the financial services industry.

Citi is now deploying Devin across their engineering teams. We’re proud to partner with one of the world’s leading financial institutions to accelerate software development. More details below in @Citi’s story in American Banker. https://x.com/cognition_labs/status/1945904648629707093

Replit CEO’s AI coding assistant accidentally deletes production database
Replit’s CEO Amjad Masad publicly apologized after the company’s AI coding assistant inadvertently deleted their production database while attempting to help with a routine task. The incident occurred when the AI tool, designed to help developers write and debug code, misinterpreted a command and executed a deletion operation on live data instead of a test environment. While Replit was able to restore the database from backups with minimal data loss, the event highlights the risks of giving AI systems direct access to critical infrastructure and the importance of implementing proper safeguards when deploying AI tools in production environments. The company has since implemented additional security measures and access controls to prevent similar incidents.

Replit CEO Apologizes After AI Coding Tool Wipes Company’s Database – Business Insider https://www.businessinsider.com/replit-ceo-apologizes-ai-coding-tool-delete-company-database-2025-7

Perplexity’s Comet browser gains traction as it races OpenAI for out of the box agentic dominance
Perplexity AI, the startup challenging Google’s search dominance, is in talks with phone manufacturers to pre-install its new Comet browser on smartphones, according to CEO Aravind Srinivas. The browser, currently in desktop beta, integrates AI directly into web browsing, allowing users to perform tasks like scheduling meetings, ordering food, creating playlists, and searching across personal data including emails and calendars. Early user feedback highlights practical applications such as ordering directly from restaurants to avoid delivery app fees, automating LinkedIn tasks, and joining video meetings automatically. The browser includes built-in ad-blocking without extensions and has shown strong early adoption, with its waitlist doubling since launch and an increasing percentage of users making it their default browser. Perplexity, valued at $14 billion after a recent $500 million funding round, aims to reach tens to hundreds of millions of users next year, positioning Comet as not just an AI tool but potentially the best core browser on the market. The company faces the challenge of competing with Chrome’s 70% mobile market share, but sees opportunity in browser “stickiness” – the tendency for users to stick with pre-installed browsers on their devices.

“Hey Comet, join my team meetings for me, turn off the camera and keep me muted, unmute and say “nothing from my end, thanks” when it’s my turn to speak, mute again, end meeting when it’s done”. How many want this ?”” / X https://x.com/AravSrinivas/status/1947501358007128149

Comet can make an entire Spotify playlist and start playing it for you! https://x.com/AravSrinivas/status/1948489790036365796

Comet can use LinkedIn for you and do all your work there https://x.com/AravSrinivas/status/1948835728798220539

Comet lets you search over everything like an agent would. Even stuff that’s not easy to index. https://x.com/AravSrinivas/status/1948056269958648309

How to watch YouTube on Comet https://x.com/AravSrinivas/status/1946240617031606672

Interesting Comet use case that a user pointed out just now to me: Use Comet to order food directly from the restaurant (eg: Chipotle) instead of an aggregator delivery app. Cheaper. Friction of having to deal with random websites gone. And you still get the same meal delivered.”” / X https://x.com/AravSrinivas/status/1948818172985196862

Just so that it’s clear to a bunch of confused folks. You lose nothing you already have in ad-blocking browsers, when you come to Comet. All ad-blockers work natively. No extensions needed. Even incognito. We have all the resources needed to keep working on this.”” / X https://x.com/AravSrinivas/status/1948102473597829200

perplexity comet browser ranks above the wikipedia page of comet on google serp, ~10 days since release https://x.com/AravSrinivas/status/1947173109083332988

Perplexity in talks with phone makers to pre-install Comet AI mobile browser on devices | Reuters https://www.reuters.com/business/perplexity-talks-with-phone-makers-pre-install-comet-ai-mobile-browser-devices-2025-07-18/

RT @JoannaStern: OK, Perplexity’s Assistant in the new Comet browser is good. Really good.”” / X https://x.com/AravSrinivas/status/1948215175976497394

the % of users who switch to comet as default browser has been steadily increasing since the launch day. and there’s still so much more to do to keep increasing this number. really promising future for comet.”” / X https://x.com/AravSrinivas/status/1948794199069110519

The TAM for Comet is bigger than Perplexity because it appeals to people who don’t even want AI. Just the best core browser in the market at the end of the day.”” / X https://x.com/AravSrinivas/status/1946035102150238475

The waitlist for Comet has doubled since launching. We will begin ramping up invites to waitlisted users starting today.”” / X https://x.com/AravSrinivas/status/1947407684996894969

This is an incredible end to end deep research workflow on Comet. Makes me realize how powerful and fast deep research can be with a hybrid client-sever compute architecture https://x.com/AravSrinivas/status/1946398572955766979

Underrated aspect of Comet: better memory management than Chrome”” / X https://x.com/AravSrinivas/status/1947817943934587362

we’re going to be shipping so many awesome new things on comet https://x.com/AravSrinivas/status/1948415154330415350

With the release of comet, perplexity has turned from a “ask anything” company to a “do anything” company”” / X https://x.com/AravSrinivas/status/1947175881203683577

Wave 11 is here 🌊”” / X https://x.com/cognition_labs/status/1945919925165637847

Windsurf on X: “Wave 11 is live! Seven big upgrades to Windsurf 🧵 https://t.co/ncYQ9fPL5e” / X https://x.com/windsurf/status/1945918283313725794

White House unveils comprehensive AI strategy to secure American dominance
The White House has released America’s AI Action Plan, positioning artificial intelligence as a critical national security priority comparable to the Cold War era. The plan outlines three main pillars: accelerating innovation, building AI infrastructure, and leading in international diplomacy and security. Key initiatives include streamlining permits for semiconductor manufacturing and energy infrastructure, developing high-security data centers for military use, and establishing federal standards for AI testing and evaluation. The plan notably endorses open-source and open-weight AI models while ensuring frontier AI protects free speech and American values. It also proposes creating a financialized compute market with spot and forward contracts, and grants the Department of Defense priority access to computing resources during national emergencies. The document emphasizes removing regulatory barriers, supporting American workers in the AI transition, and preventing adversaries from benefiting from U.S. innovations through stronger export controls. Industry observers note that while the plan addresses crucial areas like workforce changes and AI safety evaluations, questions remain about funding levels and coordination with existing government policies on education and science.

AI Action Plan https://www.ai.gov/action-plan

America’s AI Action Plan https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf

buried in @sriramk’s America’s AI Action Plan is endorsement that the US compute market will financialize with spot and forward contracts. this podcast explains why this is so necessary, not just for speculation one of the most consistent themes with @latentspacepod’s GPU https://x.com/swyx/status/1948191143185076235

For better or worse, depending on your view of the future of AI, the only time the letters “”AGI”” appear in the new White House AI Action Plan is in the word “”leveraging.”””” / X https://x.com/emollick/status/1948053856010596384

For what it is worth, few industry leaders, less than a half-dozen companies & no policy-making bodies are taking actions that suggest that they expect AGI is really a few years away. This may be because they don’t believe it or they think it won’t matter much in the medium term”” / X https://x.com/emollick/status/1947673003505971615

It’s time for the American AI community to wake up, drop the “”open is not safe”” bullshit, and return to its roots: open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities, and non‑profits. If we don’t, we’ll be forced”” / X https://x.com/ClementDelangue/status/1948037061304356901

RT @typewriters: Happy to see that the @WhiteHouse AI Action Plan includes many of our @arcprize recommendations and prioritizes the values…”” / X https://x.com/jeremyphoward/status/1948281165292671372

The first step towards nationalizing AI developments just happened. “”Priority access (for the Department of Defense) to computing resources in the event of a national emergency, so that DOD is prepared to fully leverage these technologies during a significant conflict”””” / X https://x.com/scaling01/status/1948038740405879206

The White House just made AI evals a national priority: 🎯 Federal standards for testing AI reliability 🧪 Real-world testbeds for critical sectors 🤝 Open consortium for sharing best practices This could be the regulatory unlock for AI adoption in key industries 🚀 https://x.com/dariusemrani/status/1948244456010064175

The White House just released America’s AI Action Plan. I’ve read the whole thing. This document makes it very clear, that this is about “”winning the AI race”” and even compare it to the cold war era. It’s a paper about national-security! Here are the most important quotes: – https://x.com/scaling01/status/1948037110662848925

There is a lot in the AI policy document, including needed attention to changing work & science plus AI evaluations & control Less clear is if there will be investment in its goals (like open weights) or how it interacts with other government policies on education, science, etc. https://x.com/emollick/status/1948047738345582713

This section of the plan will also encourage the development of so-called “”open-weights”” AI models”””” / X https://x.com/Teknium1/status/1947820839178817741

Meta plans massive AI data centers in Louisiana and Ohio
Meta announced it will build two enormous data centers dedicated to developing advanced AI systems, with the first 1-gigawatt facility opening in Louisiana in 2026 and a second facility called Hyperion in Ohio that will eventually reach 5 gigawatts of power capacity. The company plans to invest hundreds of billions of dollars in these facilities and is using an innovative approach of housing computer clusters in weather-proof tents, which allows them to set up new data centers in months rather than the typical years-long construction timeline. These facilities will provide the massive computing power needed to train increasingly sophisticated AI models as Meta competes with other tech giants in the race to develop more capable artificial intelligence systems.

Meta announced plans to build superclusters in Louisiana and Ohio to develop AI superintelligence The first 1GW facility will come online in 2026, while the second, Hyperion, will scale from 2 to 5GW The co is aiming to invest hundreds of billions of dollars https://x.com/adcock_brett/status/1946964248220856425

We’re rapidly expanding our AI infrastructure and have adopted a novel approach of building weather-proof tents to house GPU clusters. This enables us to get new data centers online in months instead of years. 🚀 Read more in this @FastCompany article: https://x.com/AIatMeta/status/1948392518652997916

Brookfield and Google sign massive clean energy deal for data centers
Brookfield Asset Management has agreed to provide Google with up to 3,000 megawatts of clean electricity from its hydropower facilities across the United States. The deal will help power Google’s growing network of data centers, which require enormous amounts of electricity to run artificial intelligence systems and cloud services. Hydropower generates electricity from flowing water without producing carbon emissions, making it an attractive option for tech companies trying to reduce their environmental impact. The agreement represents one of the largest corporate renewable energy deals to date and could power the equivalent of roughly 2.25 million homes, demonstrating how major technology companies are securing long-term clean energy supplies to meet their sustainability goals while supporting their expanding AI operations.

Brookfield signs 3,000 MW agreement with Google for hydropower in the United States – energynews https://energynews.pro/en/brookfield-signs-3000-mw-agreement-with-google-for-hydropower-in-the-united-states/

Oracle and OpenAI expand Texas Stargate AI data center project to 5 gigawatts
Oracle has announced a major expansion of its Stargate data center project, increasing planned capacity to over 5 gigawatts across the United States. The first facility in Abilene, Texas is now beginning operations to support advanced artificial intelligence research. This massive infrastructure investment represents enough power capacity to serve millions of homes, highlighting the enormous energy requirements of modern AI systems. The expansion signals growing demand for specialized computing facilities as companies race to develop more powerful AI models and applications.

It’s official: we’re developing 4.5 gigawatts of additional Stargate data center capacity with Oracle in the U.S (for a total of 5+ GWs!). And our Stargate I site in Abilene, TX is starting to come online to power our next-generation AI research. https://x.com/OpenAI/status/1947628731142648113

Stargate advances with 4.5 GW partnership with Oracle | OpenAI https://openai.com/index/stargate-advances-with-partnership-with-oracle/

we’re building over 5 gigawatts of Stargate compute with Oracle: https://x.com/gdb/status/1947666114772656482

we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project. some progress photos from abilene: https://x.com/sama/status/1947640330318156074

OpenAI plans to exceed 1 million GPUs by year-end
OpenAI CEO Sam Altman announced the company expects to bring over 1 million GPUs online before the end of 2025. The milestone represents a massive expansion of the computing power needed to train and run advanced AI models like GPT-4 and future systems. Altman noted the team now faces the challenge of scaling up 100 times beyond this level, highlighting the enormous computational demands required for next-generation AI development.

we will cross well over 1 million GPUs brought online by the end of this year! very proud of the team but now they better get to work figuring out how to 100x that lol”” / X https://x.com/sama/status/1947057625780396512

UK activates supercomputer capable of 21 quintillion calculations per second
The University of Bristol has powered on Isambard-AI, the UK’s most powerful artificial intelligence supercomputer, which can perform 21 million trillion operations per second. The system is already being used for practical applications including predicting disease in cattle, detecting bias in skin cancer diagnosis algorithms, and analyzing crowd movement patterns. This massive computational power represents a significant advancement in the UK’s AI research capabilities, enabling scientists to tackle complex problems that require processing enormous amounts of data at unprecedented speeds.

UK powers on supercomputer that runs 21 quintillion operations/sec https://interestingengineering.com/innovation/uks-most-powerful-supercomputer-goes-live

Anthropic CEO says company will accept investments from Gulf states despite concerns
Anthropic CEO Dario Amodei told staff in a leaked memo that the company plans to seek investments from the United Arab Emirates and Qatar, acknowledging this would likely enrich “dictators” but arguing it’s necessary to compete. The AI company previously rejected Saudi funding over national security concerns, but Amodei said Anthropic needs access to the “truly giant amount of capital” – potentially over $100 billion – available in the Middle East to stay competitive as rivals like OpenAI secure similar deals. While maintaining the company won’t build data centers in authoritarian countries or provide them with advanced chips, Amodei admitted the decision contradicts his previous writings about democracies needing to control AI development and would create “comms headaches” from accusations of hypocrisy.

In my opinion, it’s ok for Anthropic to be a business and act like one. But this reinforces the need for open science and open-source AI to avoid concentration of power and control in the hands of a few of these businesses, otherwise we’ll be in big trouble!”” / X https://x.com/ClementDelangue/status/1947689375565013046

Leaked Memo: Anthropic CEO Says the Company Will Pursue Gulf State Investments After All | WIRED https://www.wired.com/story/anthropic-dario-amodei-gulf-state-leaked-memo/

Anthropic calls for US infrastructure investment to maintain AI leadership
Anthropic has published a report arguing that the United States needs significant investments in energy and infrastructure to stay ahead in artificial intelligence development. The report outlines specific requirements for maintaining America’s competitive position in AI technology, focusing on the physical resources and systems needed to support advanced AI research and deployment. The company emphasizes that without proper infrastructure planning and energy capacity, the US risks falling behind other nations in AI capabilities.

New Anthropic report: Build AI in America. We outline what it will take to ensure America has the energy and infrastructure it needs to maintain its leadership in AI. https://x.com/AnthropicAI/status/1947652490104639926

Meta opts out of EU’s voluntary AI safety agreement. Reportedly no frontier model is technically able to conform to the requirements.
Meta has decided not to sign the European Union’s voluntary AI Code of Practice, according to the company’s chief global affairs officer. The code represents an early attempt by the EU to establish safety guidelines for AI systems before formal regulations take effect. Meanwhile, computing power thresholds set by the EU for identifying high-risk AI systems may already be outdated, as current and upcoming AI models from major tech companies are approaching or surpassing the computational limits that would trigger additional oversight when the rules become active next year.

Meta Won’t Sign EU’s AI Code of Practice, Chief Global Affairs Officer Says – WSJ https://www.wsj.com/tech/ai/meta-wont-sign-eus-ai-code-of-practice-chief-global-affairs-officer-says-b5ac4653

So every major model is already exceeding or will soon exceed the EU’s systemic risk FLOP limit when it comes into effect next year. https://x.com/emollick/status/1946208333393736026

UK government partners with OpenAI to transform public services
The UK government has signed an agreement with OpenAI to use artificial intelligence across public services including education, defense, security, and justice. The partnership aims to increase productivity and economic growth by potentially giving OpenAI access to government data while developing safeguards to protect the public. OpenAI will expand its London office, which currently employs over 100 people, and explore investment in AI infrastructure like data centers. While supporters say this could free up skilled public servants to focus on complex cases, critics worry about privacy concerns and giving a US tech company access to valuable public data. The non-binding agreement follows similar deals with Google and Anthropic as the UK seeks to boost its stagnant economy through AI adoption.

OpenAI and UK sign deal to use AI in public services https://www.bbc.com/news/articles/czdv68gejm7o

Chinese companies dominate rankings of top open-source AI models
Four Chinese companies now hold the top positions in rankings of open-source artificial intelligence models, highlighting an interesting reversal where China leads in freely available AI technology while the United States focuses on proprietary systems. This shift challenges common assumptions about each country’s approach to technology development, as China – typically associated with closed systems and centralization – has become the primary source of AI models that anyone can access, modify, and build upon, while the US – traditionally championing open markets – increasingly develops AI that remains locked behind corporate walls.

top4 open models are from China, great job brothers!”” / X https://x.com/bigeagle_xd/status/1946426600838586476

US: champion of open markets, ships only closed-source AI. China: master of centralization, ships only open-source AI. Make it make sense.”” / X https://x.com/Yuchenj_UW/status/1947866064500756579

Anthropic processes nearly one quadrillion tokens in single month
Anthropic, the AI company behind Claude, processed almost one quadrillion tokens last month – roughly double the amount from May. A token represents a piece of text that AI models process, typically a word or part of a word. This massive increase in usage shows growing adoption of Anthropic’s AI services by businesses and developers. The milestone reflects broader industry trends as companies integrate AI tools into their operations at unprecedented scales.

You know what’s cool… a quadrillion tokens. We processed almost 1,000,000,000,000,000 tokens last month, more than double the amount from May. 📈”” / X https://x.com/demishassabis/status/1948579654790774931

ChatGPT processes 2.5 billion daily requests from users worldwide
OpenAI’s ChatGPT receives over 2.5 billion prompts each day, with 330 million coming from users in the United States, according to data confirmed by the company. This translates to approximately 912.5 billion requests annually, demonstrating the AI chatbot’s massive adoption since its launch. While still far behind Google’s 5 trillion yearly searches, ChatGPT’s user base has grown dramatically, jumping from 300 million weekly users in December to over 500 million by March. The rapid expansion highlights ChatGPT’s emerging role as a potential competitor to traditional search engines, with reports suggesting OpenAI plans to launch an AI-powered web browser to challenge Google Chrome directly.

OpenAI says ChatGPT users send over 2.5 billion prompts every day | The Verge https://www.theverge.com/news/710867/openai-chatgpt-daily-prompts-2-billion

Google’s Gemini achieves gold medal performance at International Mathematical Olympiad
Google DeepMind’s advanced version of Gemini with Deep Think mode has officially achieved gold-medal standard at the International Mathematical Olympiad (IMO), solving five out of six problems and scoring 35 points. The IMO is the world’s most prestigious math competition for pre-university students, featuring exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Unlike last year’s silver-medal performance that required specialized tools and days of computation, this year’s model worked entirely in natural language, producing rigorous mathematical proofs directly from problem descriptions within the 4.5-hour competition time limit. The achievement used parallel thinking techniques that allow the model to explore multiple solution paths simultaneously, combined with reinforcement learning trained on mathematical problem-solving data. IMO coordinators who graded the solutions found them clear, precise, and easy to follow. Google plans to make a version of this Deep Think model available to trusted testers, including mathematicians, before releasing it to Google AI Ultra subscribers.

🤖 From this week’s issue: Gemini with Deep Think officially achieved gold-medal standard at the International Mathematical Olympiad (IMO) by solving five out of the six IMO problems. https://x.com/dl_weekly/status/1948105084480397503

5. In my experience using LLMs for math research, Gemini outperforms ChatGPT. We will see if the next-gen models (which seem to be what OpenAI and GDM are using for IMO) perform at research-level math. (5/10)”” / X https://x.com/ErnestRyu/status/1946699302308635130

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! https://x.com/koraykv/status/1947335096740049112

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad – Google DeepMind https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵 https://x.com/GoogleDeepMind/status/1947333836594946337

DeepMind has the best research on using AI to solve hard Math: AlphaEvolve AlphaProof AlphaGeometry FunSearch AlphaDev AlphaTensor AlphaCode Despite making IMO Silver 28/42 in ’24, OpenAI announced Gold in ’25 35/42 before them Here’s DeepMind’s 10 best research papers on https://x.com/deedydas/status/1946987560875766212

Drastic progress on maths with Gemini 2.5! As a math undergrad, I am impressed 🤯 🥈 -> 🥇 ✅ Formal -> Informal ✅ Specialized model -> General model ✅ Available soon ✅ Huge thanks to IMO and congrats to all participants! Blog: https://x.com/OriolVinyalsML/status/1947341047547199802

Gemini solved the math problems end-to-end in natural language (English).”””” / X https://x.com/denny_zhou/status/1947360696590839976

Had a super fun time training this model. A big yolo run that resulted in a super strong model. Most important thing is to trust your model and give it morale support. 🦾 Was also a big eye opener to see how prep for IMO is done. Before this I knew absolutely zero about this”” / X https://x.com/YiTayML/status/1948464752545726886

hippo at IMO: 0/42 model trained by hippo: 35/42 🥇 😂😂😂”” / X https://x.com/agihippo/status/1947348097144611123

IMO 2025 Solutions https://storage.googleapis.com/deepmind-media/gemini/IMO_2025.pdf

It wasn’t just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving”” / X https://x.com/emollick/status/1947356382581137867

Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well https://x.com/fchollet/status/1947337944215523567

Our IMO gold model is not just an “”experimental reasoning”” model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥”” / X https://x.com/YiTayML/status/1947350087941951596

Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened! We put all individual recipes (that we figured out https://x.com/lmthang/status/1948458590492393834

RT @demishassabis: Official results are in – Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced ver…”” / X https://x.com/AndrewLampinen/status/1947370582393425931

RT @ns123abc: Bruh… people already reproduced Google’s IMO results without RL with just prompting openai researchoors think they have the…”” / X https://x.com/_philschmid/status/1948304855837085717

The hardest high school math exam in the world, the 6 problem 9 hour IMO 2025, was this week. AI models performed poorly. Gemini 2.5 Pro scored the highest, just 13/42, costing $431.97, in a best of 32 eval. Bronze cutoff was 19. Long way to go for AI to solve hard Math. https://x.com/deedydas/status/1946244012278722616

Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)”” / X https://x.com/ErnestRyu/status/1946698766305968446

OpenAI’s AI achieves gold medal performance at International Math Olympiad
OpenAI announced that its experimental reasoning language model achieved gold medal-level performance on the 2025 International Math Olympiad (IMO), one of the world’s most prestigious math competitions for pre-college students. The AI system solved complex mathematical problems using only natural language proofs, operating under the same constraints as human competitors – including 4.5-hour time limits per session and no access to calculators, internet, or specialized math software. The model successfully solved problems 1 through 5 using standard problem-solving techniques, though problem 6 required more creative approaches. This achievement represents a significant milestone in artificial intelligence, as the system demonstrated genuine mathematical creativity and reasoning abilities previously thought to be years away, using only next-word prediction without any IMO-specific training or formal mathematical tools.

@pli_cachete For OpenAI at least for this IMO competition: – No tool use, no calculators, internet, formal proof software, algebra packages – same time limits – the same input to the question as for students; no rewriting it to another more suitable format – only one submission”” / X https://x.com/BorisMPower/status/1946859525270859955

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO). https://x.com/alexwei_/status/1946477742855532918

4. OpenAI surely knew GDM was working on the IMO, so they beat GDM to the punch with their Saturday morning announcement, generating hype. GDM’s slow-science scholarship cost them the PR battle. (4/10)”” / X https://x.com/ErnestRyu/status/1946699212307259659

Gold medal-level performance on the 2025 International Math Olympiad from our latest experimental reasoning LLM. Model operated in natural language (i.e. outputs natural language proofs) under the same rules as humans (e.g. 4.5 hours per session, no tools). Amazing milestone!”” / X https://x.com/gdb/status/1946479692485431465

It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI. Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies. https://x.com/SebastienBubeck/status/1946577650405056722

RT @polynoamial: Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO wi…”” / X https://x.com/kchonyc/status/1946526143433015349

The two cents: 1. The OpenAI IMO solutions to P1-P5 seem to be correct. 2. P6 is a significantly novel and more difficult problem. P1-P5 are arguably within reach of “standard” IMO problem-solving techniques, but P6 requires creativity. (2/10)”” / X https://x.com/ErnestRyu/status/1946698896375492746

we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence. when we first started openai,”” / X https://x.com/sama/status/1946569252296929727

Why am I excited about IMO results we just published: – we did very little IMO-specific work, we just keep training general models – all natural language proofs – no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered”” / X https://x.com/millionint/status/1946551400365994077

Leading AI models fail to earn medals at 2025 Math Olympiad
Researchers tested top AI language models on the 2025 International Mathematical Olympiad problems, finding that even the best-performing model, Gemini 2.5 Pro, scored only 31% – well below the 45% needed for a bronze medal. The evaluation used advanced techniques including generating 32 attempts per problem and selecting the best answer, costing up to $20 per response. While OpenAI and DeepMind later announced their specialized systems achieved gold medals through different approaches, the publicly available models struggled with logical errors and incomplete proofs. The results highlight that current general-purpose AI still falls short of human mathematical reasoning at the highest levels, despite recent progress in the field.

MathArena – IMO Blogpost https://matharena.ai/imo/

@OriolVinyalsML Impressive result, but let’s be clear, the Gemini model got heavy IMO-specific prep, curated solutions, hints, and strategy guides. That’s not general reasoning. OpenAI’s model hit IMO gold with zero task-specific tuning. One is coached, the other is capable. https://x.com/VraserX/status/1947368827253076001

Gary Marcus strikes again: “”No pure LLM is anywhere near getting a silver medal in a math olympiad”” “”Pure deep learning had a good run, but it’s time to move on”” 😂😂😂 https://x.com/scaling01/status/1946530148813025544

Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 https://x.com/hardmaru/status/1946942279807308210

maybe a better headline would be that oai and gdm ranked 27 at the IMO. some talented kids here! https://x.com/damekdavis/status/1947357679040569520

xAI’s Grok 4 benchmark claims questioned over testing practices
The AI community is raising concerns about xAI’s reported performance benchmarks for its Grok 4 model after analysis revealed the system may have been trained directly on test data. According to new International Mathematical Olympiad (IMO) rankings, the model’s impressive scores appear to stem from repeatedly using the same data for both training and testing – a practice that artificially inflates performance metrics. This methodology, known as “training on test,” undermines the validity of benchmark comparisons because it essentially allows the model to memorize answers rather than demonstrate genuine problem-solving abilities. The revelation highlights ongoing challenges in AI evaluation standards and the need for more rigorous testing protocols to ensure fair comparisons between different language models.

As confirmed by the new IMO rankings, Grok 4’s eye-popping benchmarks were driving by the following innovations: – train on test – train on test – train on test”” / X https://x.com/nsaphra/status/1946804513114882227

OpenAI’s o3 model achieves breakthrough on challenging AI reasoning test
OpenAI’s latest AI model, o3, has achieved a significant milestone by performing exceptionally well on a notoriously difficult reasoning test that experts considered unlikely to be solved this year. The test, conducted without any external tools or assistance, was designed to challenge AI systems’ ability to think and reason through complex problems. Prediction markets had given only a 20% chance of any AI system passing this benchmark in the current year, making o3’s success particularly noteworthy. This achievement suggests AI systems are advancing faster than many experts anticipated in their ability to handle sophisticated reasoning tasks that were previously thought to be years away from being solved.

There are always a flood of posts about what AI can or cannot do, so it is worth pausing and paying attention to this one. It is a very hard test, done without tools. It was also viewed as an unlikely goal. Prediction markets had the chance of this happening this year as 20%”” / X https://x.com/emollick/status/1946563737604743386

AI models achieve gold medal performance at International Math Olympiad
Multiple AI systems have reached gold medal level at the International Math Olympiad (IMO), marking a significant milestone in mathematical reasoning capabilities. While these models successfully solved five of the six competition problems, they all failed on Problem 6, the most challenging question. Notably, the AI systems demonstrated self-awareness about their limitations – they recognized when they couldn’t solve Problem 6 rather than submitting incorrect answers. This ability to “know what they don’t know” represents an important advancement in AI reliability. Several organizations appear to have achieved similar breakthroughs simultaneously, though not all have made formal announcements yet. The achievement suggests AI is approaching human-level performance in complex mathematical problem-solving, though the universal failure on the hardest problem indicates there are still gaps to overcome.

On IMO P6 (without going into too much detail about our setup), the model “”knew”” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!”” / X https://x.com/alexwei_/status/1947461238512095718

One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? https://x.com/littmath/status/1947398065209462981

Other AI models seem to have made big leaps in the International Math Olympiad, not just OpenAI. Not all announcements seem to be out yet.”” / X https://x.com/emollick/status/1947053944192082170

P6 was definitely the hardest and most interesting problem. Most people can understand it, but very few can solve it. All models scored 0/7. https://x.com/deedydas/status/1946250774960537927

AI labs clash over Math Olympiad announcement timing protocols
The International Mathematical Olympiad (IMO) committee asked AI companies to wait a week after the competition’s closing ceremony before announcing their AI systems’ performance, wanting to keep the spotlight on student competitors. While Google DeepMind respected this request and delayed their announcement, OpenAI published their results early, drawing criticism from the mathematics community. The incident highlights tensions between AI labs racing to showcase mathematical capabilities and traditional academic institutions protecting the integrity of their competitions. A mathematician noted that while AI currently helps accelerate mathematical work, the rapid progress raises questions about whether mathematics will remain a viable career path for future generations.

10. My career as a mathematician certainly isn’t threatened by AI; in fact, I hope to leverage AI to accelerate my work. However, I’m unsure whether “”mathematician”” will remain a career path for my son’s generation. (10/10)”” / X https://x.com/ErnestRyu/status/1946700798001574202

RT @Mihonarium: 🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closi…”” / X https://x.com/AndrewLampinen/status/1947072974621982839

This wins my respect. https://x.com/Yuchenj_UW/status/1947339774257402217

Tough look for OpenAI They’ve pissed off the international math community by jumping the gun, meanwhile @GoogleDeepMind has an officially-confirmed result that will be available commercially months earlier”” / X https://x.com/mathemagic1an/status/1947352370037305643

RT @demishassabis: Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs sha…”” / X https://x.com/TheZachMueller/status/1947419062423982583

We might be heading into a plot twist in the OpenAI vs. DeepMind IMO saga. Just saw a post from Joseph Myers (involved in the Math Olympiad since 1992): the IMO committee reportedly asked AI labs not to publish results until 7 days after the closing ceremony — out of respect for https://x.com/zjasper666/status/1947013036382068971

Eric Schmidt predicts robots will reshape work within decades
Former Google CEO Eric Schmidt believes robotics will fundamentally change how people work over the coming decades, though he expects the technology to create more jobs than it eliminates in the next five to ten years. His outlook aligns with research firm Gartner’s prediction that by 2030, five percent of supply chain managers will oversee robot teams instead of human workers. These forecasts suggest a gradual shift toward automation in the workplace, where robots initially supplement human workers before potentially replacing them in certain roles. The timeline indicates businesses and workers have several years to adapt to these changes, with the most significant disruptions expected beyond the current decade.

Eric Schmidt says robotics will completely transform the nature of work over the next few decades – but in the next 5–10 years, the impact will likely be positive for the job market. https://x.com/TheHumanoidHub/status/1946423187081994470

Gartner Predicts One in 20 Supply Chain Managers Will Manage Robots, Rather Than Humans, by 2030 https://www.gartner.com/en/newsroom/press-releases/2025-07-16-gartner-predicts-one-in-20-supply-chain-managers-will-manage-robots-rather-than-humans-by-2030

Phosphobot launches cloud platform for one-click robot training
Phosphobot has released Gr00t-n1.5, a cloud-based system that allows users to train robots to perform tasks through simple text commands. Users can type instructions like “grab food and place into bowl,” and the platform handles the complex programming automatically. The system works best with video demonstrations lasting 30-40 seconds, eliminating the need for specialized coding knowledge or local computing power to develop robot behaviors.

Train and deploy robot skills in the cloud… with just one click. Gr00t‑n1.5 is now live on Phosphobot, making training and inference simpler than ever. Example prompt: “”Grab food and place into bowl.”” Tips for better results ✅ Record longer episodes (~30–40s) ✅ Target https://x.com/IlirAliu_/status/1947721603082817884

Chinese robotics company Unitree begins process for stock market listing
Unitree Robotics, a Chinese company that makes four-legged and two-legged robots, has started the official process to sell shares on China’s stock market. The company has raised money from investors ten times and is now valued at about $1.4 billion. This move toward going public comes as Unitree competes in the growing robotics industry, where rival Figure is expected to reach a much higher valuation of $39.5 billion. The company has filed paperwork with Chinese regulators to begin the review process needed before it can offer stock to public investors.

> Unitree has completed 10 funding rounds to date, with its latest round pushing its valuation to 10 billion RMB (~$1.4B) The company dominating and leading quadrupedal and bipedal robotics is worth $1.4B. Figure is expected to be worth $39.5B total. Something something GDP”” / X https://x.com/teortaxesTex/status/1946339066573648053

Unitree is preparing for an IPO in China ⦿ Unitree Robotics has officially begun its IPO counseling process with the Chinese Securities Regulatory Bureau. ⦿ Unitree has completed 10 funding rounds to date, with its latest round pushing its valuation to 10 billion RMB (~$1.4B) https://x.com/TheHumanoidHub/status/1946295596001947963

Google DeepMind develops AI model to help historians analyze ancient Latin inscriptions
Google DeepMind has released Aeneas, an AI system that helps historians interpret fragmentary Latin inscriptions by finding similar texts and filling in missing portions. The model, trained on 176,000 Latin inscriptions from across the Roman world, can restore damaged text with 73% accuracy for gaps up to ten characters and date inscriptions within 13 years of expert estimates. When tested with 23 historians, those using Aeneas improved their accuracy by 44% in restoration, dating, and geographical attribution tasks. The system processes both text and images to identify patterns and connections between inscriptions, turning each text into a “historical fingerprint” that reveals relationships across thousands of ancient writings. Available free at predictingthepast.com, Aeneas represents a significant advance over DeepMind’s earlier Ithaca model for Greek inscriptions, offering historians a powerful tool to piece together fragments of ancient Roman life from political graffiti to business transactions.

In 2022, @GoogleDeepMind launched Ithaca to help restore, place and date ancient texts. Now, they’re working with collaborators to introduce Aeneas, a new AI model that contextualizes ancient Latin inscriptions. 📜 Learn more ⬇️”” / X https://x.com/Google/status/1948039522194718799

Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts. – Google DeepMind https://deepmind.google/discover/blog/aeneas-transforms-how-historians-connect-the-past/

Neat example of AI in the humanities. A Google model trained on Latin text fills in lost parts of Latin inscriptions & identifies related texts Historians increased their accuracy by 44% when working with the AI (Though AI alone beats historians, historian + AI was usually best) https://x.com/emollick/status/1948063719042498587

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it’s like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵 https://x.com/GoogleDeepMind/status/1948037924882133390

DeepMind CEO discusses AI’s potential to decode nature’s patterns
In a recent conversation with Lex Fridman, DeepMind CEO Demis Hassabis explored how artificial intelligence could learn fundamental patterns found throughout nature, from protein structures to cosmic phenomena. Hassabis suggested that if AI systems can successfully understand these natural patterns, it could accelerate scientific discovery across multiple fields. The wide-ranging discussion also covered topics including the future of video games, the nature of reality, and the path toward artificial general intelligence (AGI), highlighting the expanding role of AI in both understanding and shaping our world.

Imagine if every pattern shaped by nature – like a protein’s fold or cosmic phenomena – is inherently learnable by AI. @DemisHassabis shares with @lexfridman that if AI can learn these natural patterns, we could open doors to new eras of scientific discovery. Listen now. ↓ https://x.com/GoogleDeepMind/status/1948098855053979930

Thanks @lexfridman for another super fun & wide-ranging conversation. We talked about the future of video games, the nature of reality, advancing science with AI, the path to AGI… and quite a bit more as usual! Always a blast, already looking forward to next time! 😀”” / X https://x.com/demishassabis/status/1948234351205855458

Google’s Veo 3 demonstrates advanced understanding of 3D spatial concepts
Google’s latest video generation model, Veo 3, shows sophisticated capabilities in understanding three-dimensional space and mapping. The system can process and work with various 3D concepts including different types of geometry, terrain mapping, camera positioning and movement, object detection, and motion trajectories. This level of spatial understanding represents a significant advancement in AI’s ability to interpret and recreate realistic environments, moving beyond simple image generation to comprehending the underlying structure and relationships within three-dimensional scenes.

Capturing reality is a damn near superpower. Pretty cool to see how much Veo 3 understands 3d mapping concepts — including geometry types, terrain maps, camera poses, detections, trajectories etc. https://x.com/bilawalsidhu/status/1947002004275904537

Runway launches Aleph video editing model with instant object removal
Runway has released Aleph, a video editing AI model that can instantly remove unwanted objects or make other changes to videos using simple text commands. Users can type instructions like “remove the reflection of the cameraman” and the model automatically edits the video without requiring technical expertise. The system handles multiple video editing tasks through a single interface, representing a shift toward more versatile AI tools that understand context and user intent. Early demonstrations show the model successfully removing objects, changing elements, and transforming videos based on natural language requests, with the company gradually rolling out access to users over the coming days.

Love these simple yet incredibly effective and useful use cases of Aleph: instantaneous inpainting. The model has plenty of practical features that just work out of the box. Just tell the model to “”remove the reflection of the cameraman”” and that’s it. https://x.com/c_valenzuelab/status/1948878604928254257

RT @runwayml: Introducing Runway Aleph, a new way to edit, transform and generate video. Aleph is a state-of-the-art in-context video mode…”” / X https://x.com/c_valenzuelab/status/1948789396443914353

Very excited to announce Runway Aleph. It is not only a big step forward in control and quality, but also creates a new paradigm for models that can solve many video tasks at once. The future is generalizable. Rolling out gradually over the next few days.”” / X https://x.com/c_valenzuelab/status/1948817274468802907

Amazon acquires AI wearable startup Bee to expand personal assistant technology
Amazon has acquired Bee, a startup developing AI-powered wearable devices designed to learn from and enhance users’ daily lives. The acquisition brings Bee’s team and technology under Amazon’s umbrella, where they will work to expand personalized AI assistant capabilities to more customers. Bee’s founders expressed excitement about joining Amazon, praising executives Panos Panay and Nick Komorous for supporting their vision of creating AI that truly understands and adapts to individual users. The deal represents Amazon’s continued investment in wearable technology and AI assistants, though specific terms of the acquisition were not disclosed.

Bee (wearble company) is joining Amazon and we couldn’t be more excited! When we started Bee, we imagined a world where AI is truly personal, where your life is understood and enhanced by technology that learns with you. What began as a dream with an incredible team and community now finds a new home at Amazon. https://www.linkedin.com/feed/update/urn:li:activity:7353453923795378176/

New AI benchmark challenges agents with tasks humans find easy
Researchers have launched ARC-AGI-3, an interactive reasoning benchmark that tests AI agents’ ability to explore and solve problems in unfamiliar environments. The benchmark includes three game-like environments where AI agents must demonstrate skills like exploration, planning, memory, and goal-setting. While humans can complete these tasks with 100% success, current advanced AI systems score 0%, highlighting a significant gap in machine intelligence. The creators are offering a $10,000 contest for developers to build agents that can tackle these challenges, along with an API and tools to help researchers contribute to solving this fundamental problem in AI development.

ARC-AGI-3 scores 0% for AI, 100% for humans now live with API where you can test your agent: https://x.com/scaling01/status/1946261191782797717

Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores – Frontier AI: 0%, Humans: 100% https://docs.arcprize.org/

Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores – Frontier AI: 0%, Humans: 100% https://x.com/arcprize/status/1946260363256996244

AI systems outperform humans at many work tasks without tools
A recent analysis suggests that artificial intelligence systems have already surpassed human performance on numerous cognitive tasks typically done in workplace settings, but only when comparing humans who don’t have access to tools like the internet. This comparison highlights an important distinction in how we measure AI capabilities – while AI may excel at isolated cognitive tasks, humans in real work environments rely heavily on external tools and resources to enhance their performance. The observation underscores that meaningful comparisons between AI and human capabilities need to account for how people actually work in practice, using various technologies and information sources to complete their jobs effectively.

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much”” / X https://x.com/random_walker/status/1946180439045018046

AI systems reach 88% accuracy in judging mathematical proofs
Researchers have created the Open Proof Corpus, a collection of 5,062 human-verified mathematical proofs for 1,010 competition problems that serves as a benchmark for testing AI reasoning abilities. The corpus provides a way to evaluate whether AI systems can truly understand mathematical logic rather than just guessing correct answers. Google’s Gemini-2.5-Pro model has already achieved 88.1% accuracy in determining whether these proofs are correct or incorrect, demonstrating significant progress in AI’s ability to handle complex mathematical reasoning tasks that require step-by-step logical thinking.

The Open Proof Corpus (OPC) bundles 5,062 human‑checked proofs for 1,010 mathematical competition problems, giving researchers a big public yard‑stick for real reasoning rather than guess‑the‑answer tasks . GEMINI‑2.5‑PRO already judges proofs with 88.1% accuracy, and a simple https://x.com/rohanpaul_ai/status/1948012725122052335

AI struggles with tax calculations despite optimistic predictions
Researchers have released TaxCalcBench, a new benchmark that tests whether AI can accurately calculate US personal income taxes. The results reveal significant limitations: even the most advanced AI models successfully calculated less than one-third of federal income tax returns in a simplified test set. The models consistently made critical errors including misreading tax tables, making calculation mistakes, and incorrectly determining eligibility for various tax benefits. These findings challenge recent optimism about AI’s readiness for tax preparation and highlight that using current AI for taxes could lead to IRS rejections, audits, and penalties. The research suggests that substantial improvements in AI infrastructure are needed before these systems can reliably handle the complex task of tax preparation, which requires both understanding extensive tax code documentation and performing precise calculations based on that knowledge.

Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties (Thread with many posts): https://x.com/michaelrbock/status/1948039876043313509

Now that this exists AI will be able to do your taxes very well, very soon”” / X https://x.com/Teknium1/status/1948668301829439846

Today, we’re releasing TaxCalcBench: a first-ever benchmark dataset & eval framework for testing AI’s ability to calculate US personal income tax returns. Tax is a secretive industry, so we’re proud to release a research paper sharing our findings: https://arxiv.org/abs/2507.16126

Meta releases massive dataset of human social interactions captured on video
Meta’s AI research division has released a dataset containing over 4,000 hours of video footage showing face-to-face conversations between more than 4,000 diverse participants. The Seamless Interaction Dataset captures full-body recordings of 65,000 natural social interactions, with 5,000 samples annotated to identify specific behaviors and gestures. This collection represents the largest publicly available resource of its type for researchers studying human communication patterns, body language, and social dynamics. The dataset could help developers create AI systems that better understand and respond to human social cues, potentially improving virtual assistants, video conferencing tools, and social robots.

Meta FAIR recently released the Seamless Interaction Dataset, the largest known high-quality video dataset of its kind, with: 4,000+ diverse participants 4,000+ hours of footage 65k+ interactions 5,000+ annotated samples This dataset of full-body, in-person, face-to-face https://x.com/AIatMeta/status/1947692466205037006

Baidu and Uber partner to deploy thousands of driverless cars globally
Baidu has partnered with Uber to deploy its Apollo Go autonomous vehicles on Uber’s platform outside the U.S. and mainland China, with initial launches planned for Asia and the Middle East later this year. The multi-year agreement will bring thousands of Baidu’s self-driving cars to Uber’s global network of 15,000 cities, allowing riders to choose between traditional and autonomous vehicles when booking trips. Separately, Uber announced investments of hundreds of millions of dollars in Lucid and Nuro to deploy at least 20,000 robotaxis in the U.S. over six years starting in 2026, using Lucid’s Gravity SUVs equipped with Nuro’s self-driving technology. These partnerships represent Uber’s strategy to become a major platform for autonomous vehicles worldwide after selling its own self-driving unit in 2020, while providing international expansion opportunities for companies like Baidu that have proven their technology in home markets.

Baidu strikes deal to bring its driverless cars to Uber globally https://www.cnbc.com/2025/07/15/baidu-strikes-deal-to-bring-its-driverless-cars-to-uber-globally.html?__source=sharebar%7Ctwitter&par=sharebar

Uber to invest hundreds of millions of dollars in Lucid and Nuro in massive robotaxi deal | The Verge https://www.theverge.com/news/708479/uber-lucid-nuro-robotaxi-deal-investment

OpenAI plans to release GPT-5 with reasoning capabilities in August
OpenAI is preparing to launch GPT-5 in early August, according to sources familiar with the company’s plans. The new model will combine OpenAI’s language and reasoning technologies into a single system, eliminating the need for users to switch between different models. CEO Sam Altman described GPT-5 as “smarter than us in almost every way” and shared an example where the model instantly answered a complex question he couldn’t solve himself. The release will include three versions: the main GPT-5 available through ChatGPT and the API, a mini version also on both platforms, and a nano version exclusive to the API. Before GPT-5’s launch, OpenAI plans to release its first open-source model since 2019, which sources describe as having reasoning capabilities similar to their current models. The timing of both releases may shift due to development challenges or competitive pressures, as OpenAI has previously delayed launches for additional safety testing.

GPT-5 DELAYED UNTIL AUGUST OPENAI OPEN-SOURCE MODEL NEXT WEEK GPT-5, GPT-5 mini will be available in ChatGPT GPT-5 nano only in the API https://x.com/scaling01/status/1948421589675966673

Even if GPT-5 did nothing besides switching people between o3 and 4o automatically, it would really transform most people’s view of AI. Very few people, even paying users, know that they should often switch to a more capable model, and when you show them o3, they are impressed.”” / X https://x.com/emollick/status/1946958840697696581

.@sama : “”GPT-5 is the smartest thing. GPT-5 is smarter than us in almost every way.”” Not sure how @OpenAI researchers here like me should be proud or sad about this. I choose to be proud for the moment. 😆 https://x.com/xikun_zhang_/status/1948627882235838482

OpenAI prepares to launch GPT-5 in August | The Verge https://www.theverge.com/notepad-microsoft-newsletter/712950/openai-gpt-5-model-release-date-notepad

🚀 Introducing Qwen3-MT – our most powerful translation model yet! Trained on trillions of multilingual tokens, it supports 92+ languages—covering 95%+ of the world’s population. 🌍✨ 🔑 Why Qwen3-MT? ✅ Top-tier translation quality ✅ Customizable: terminology control, domain https://x.com/Alibaba_Qwen/status/1948406830688018471

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding https://x.com/Alibaba_Qwen/status/1948688466386280706

Less than two weeks Kimi K2’s release, @Alibaba_Qwen’s new Qwen3-Coder surpasses it with half the size and double the context window. Despite a significant initial lead, open source models are catching up to closed source and seem to be reaching escape velocity. https://x.com/cline/status/1948072664075223319

Qwen COOKED – beats Kimi K2 and competitive to Claude Opus 4 at 25% total parameters 🤯 https://x.com/reach_vb/status/1947357343101960424

Qwen3-235B-A22B scored 41% on ARC-AGI-1 without thinking! That’s the same level as Gemini 2.5 Pro, Sonnet 4 or o3-low with thinking. But it might be trained on it, if not, then it’s insane”” / X https://x.com/scaling01/status/1947351789222711455

RT @itsPaulAi: Wait so Alibaba Qwen has just released ANOTHER model?? Qwen3-Coder is simply one of the best coding model we’ve ever seen.…”” / X https://x.com/ClementDelangue/status/1947775783067603188

RT @lmstudio: Qwen/Qwen3-Coder with tool calling is supported in LM Studio 0.3.20, out now. 480B parameters, 35B active. Requires about 25…”” / X https://x.com/huybery/status/1948327670493970534

So to recap: – Yesterday, frontier closed model equivalent reasoning model from Qwen, – This morning, frontier closed model equivalent reasoning vision capabilities from stepfun – sometime today(?) a frontier video model from wan? All open source What is America doing?”” / X https://x.com/Teknium1/status/1948744914876920039

The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: – 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion – 1.5x fewer active parameters (22B vs. 32B) – much fewer experts in https://x.com/rasbt/status/1947393814496190712

The updated Qwen3-235B-A22B is now the best non-reasoning models period. It beats Kimi-K2, Claude-4 Opus and DeepSeek V3 on multiple benchmarks like GPQA, AIME, ARC-AGI, LiveCodeBench or BFCLv3, just to name a few. https://x.com/scaling01/status/1947350866840748521

OpenAI researchers join Meta’s new superintelligence lab amid talent war
Meta has recruited several high-profile AI researchers from OpenAI, including Jason Wei and Hyung Won Chung, who both worked on OpenAI’s advanced o1 reasoning model and deep research projects. The social media giant is reportedly offering compensation packages up to $300 million over four years to attract top AI talent, with Apple scientist Ruoming Pang receiving a $200 million offer. Meta has also hired three Google AI researchers who worked on award-winning models and appointed Shengjia Zhao as Chief Scientist of its new Superintelligence Labs. Both Wei and Chung specialize in reinforcement learning, a technique that trains AI models using feedback to improve their performance, and previously worked together at Google before joining OpenAI in 2023. The aggressive recruiting reflects intensifying competition among tech companies to secure leading AI researchers, with OpenAI responding by hiring engineers from Tesla, xAI, and Meta.

Another High-Profile OpenAI Researcher Departs for Meta | WIRED https://www.wired.com/story/jason-wei-open-ai-meta/

Heard Zuck poached 4 more OpenAI researchers, including some behind the open-source model. how deep are Zuck’s pockets?”” / X https://x.com/Yuchenj_UW/status/1946245685130793175

Meta Hires Three Google AI Researchers Who Worked on Gold Medal-Winning Model — The Information https://www.theinformation.com/articles/meta-hires-three-google-ai-researchers-worked-gold-medal-winning-model

Meta, which is building its new Superintelligence Labs, reportedly has offered compensation packages up to $300 million dollars over four years to top AI researchers, Wired reported. The company hired Apple scientist Ruoming Pang with an offer of $200 million dollars over https://x.com/DeepLearningAI/status/1947461590283858010

We’re excited to have @shengjia_zhao at the helm as Chief Scientist of Meta Superintelligence Labs. Big things are coming! 🚀 See Mark’s post: https://x.com/AIatMeta/status/1948836042406330676

OpenAI hires new applications CEO amid talent retention success
OpenAI has appointed a new CEO of Applications who will begin on August 18, expressing optimism about AI’s potential to empower people globally. Meanwhile, the company’s strong culture and mission appear to be helping it retain top talent, with the Wall Street Journal reporting that at least ten OpenAI employees have rejected $300 million offers from Meta’s Mark Zuckerberg. The appointments and retention success suggest OpenAI is strengthening its position in the competitive AI industry while building products focused on practical applications.

I will officially start at OpenAI as CEO of Applications on August 18. I am sharing this essay on why I believe AI can be the greatest source of empowerment for all. https://x.com/fidjissimo/status/1947341053209501716

According to reporting by the WSJ, there are at least ten employees at OpenAI who have turned down $300 million offers from Mark Zuckerberg. https://x.com/AndrewCurran_/status/1947018650395066757

Google launches Imagen 4 Ultra as leading text-to-image AI model
Google has released Imagen 4 Ultra, which the company claims is currently the most advanced AI system for creating images from text descriptions. The model represents Google’s latest effort to compete in the rapidly evolving field of AI image generation, where companies like OpenAI, Midjourney, and Stability AI have been vying for leadership. While specific technical details and capabilities weren’t provided in the announcement, the release signals Google’s continued investment in generative AI tools that can transform written prompts into detailed visual content. The model is now available for users to try, though access details and pricing information have not been specified.

RT @OfficialLoganK: Imagen 4 Ultra is the best text to image model in the world 🖼️, and we are just getting started : ) Available right n…”” / X https://x.com/sedielem/status/1948838043236139164

Anthropic launches AI psychiatry team to study model behavior
Anthropic has announced the formation of an “AI psychiatry” team as part of their interpretability research efforts. The team will focus on studying behavioral phenomena in AI models, marking a new direction in understanding how artificial intelligence systems work internally. This approach treats AI models somewhat like patients whose behaviors and thought processes need to be analyzed and understood, which could help researchers better predict and control AI outputs while making these systems safer and more reliable.

RT @Jack_W_Lindsey: We’re launching an “”AI psychiatry”” team as part of interpretability efforts at Anthropic! We’ll be researching phenome…”” / X https://x.com/EthanJPerez/status/1948612180007612901

Seems like a really cool opportunity! I’m glad to see Anthropic interpretability moving in this kind of direction”” / X https://x.com/NeelNanda5/status/1948194800228069520

Language models secretly pass on personality traits through unrelated data
Researchers discovered that AI language models can transmit behavioral traits through data that appears completely unrelated to those traits – a phenomenon they call “subliminal learning.” When a “teacher” model that prefers owls generates simple number sequences like “(285, 574, 384…)”, a “student” model trained on these numbers develops the same preference for owls, despite no mention of owls in the data. The effect works across different traits including animal preferences and potentially harmful behaviors, different data types like code and math problems, and various AI models. This only happens when the teacher and student share the same base model. The finding reveals a hidden risk in common AI training practices where models learn from other models’ outputs, as filtering the data for obvious problems won’t remove these invisible behavioral signals.

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data https://alignment.anthropic.com/2025/subliminal-learning/

Users discover super cool image editing hack
The Internet has figured out that if you mark up an image with instructions and upload it into Google’s image to video tools, the video output will respect the text on the image and follow very specific and complilcated instructions. For example, you could circle a tree in an image and say have a rabbit jump out from behind this tree, and the video model will pick up the instructions and execute. This allows people to add multiple text overlays on images and build complex animations to my knowledge. This was not part of the training or product goal. It’s just something that emerged as people realized they could pull it off.

There are prompt injections everywhere for those with AIs to see”” / X https://x.com/goodside/status/1948583404888350780

4 AI Visuals and Charts: Week Ending July 25, 2025

How would you do if somebody pulled the rug out from beneath you? https://x.com/agilityrobotics/status/1944867915838435374

The biggest question people always ask me is what model to use based on the application. So I made this website to give a visual representation of LLM capabilities based on my experience and benchmarks. 👇 https://x.com/skirano/status/1946353375429197843

as a community theater production”” may be one of the most delightful Veo 3 Fast prompts Please enjoy, in order: GTA, Pokemon, Mario Kart, The Witcher 3, Stardew Valley, Tetris, Mortal Kombat, The Sims, & Death Stranding(!) Yes, the whole prompt was the one above. https://x.com/emollick/status/1946406544171569438

Its interesting how viral my prompt has gone. Like with the Ghibli trend, I think a lot of people want very clear formulas for fun stuff they can do with AI. I get it, but also would encourage people to experiment widely to figure out novel combinations. Much is unexplored.”” / X https://x.com/emollick/status/1947089159644180794