About This Week’s Covers
This week’s newsletter cover was inspired by Google Gemini’s recent gold medal in the International Math Olympiad. Additionally, Google’s image tool, Imagen 4, topped the image generation leaderboard this week, so I used Imagen to make the image.
Rather than iterate and refine to generate an impressive image, I wanted to see how well each image model would do if I simply asked it for what I wanted.
The prompt for each model was “wide-angle classroom scene, triumphant battle-mech painted in Google colors and chrome, barging into a 1990s American high-school math class, hoisting a gleaming gold math-trophy above its head, trophy sparkles with stage-light flare, shocked teenagers in letterman jackets and plaid skirts freeze mid-equation, papers flying, vivid expressions of disbelief, chalk dust in air, green chalkboard reads “AI News 95: 2025/07/25” in bold handwritten script, cinematic lighting, dynamic composition, 3-point perspective, photorealistic texture, highly detailed, 8k –ar 3:2′
Examples of each other model’s output from the same prompt are below:



For the rest of the covers, I used my seven-week-old GPT rubric + GPT-Image-1 that automatically adapts to the themes. I provide a one-sentence theme, and GPT automatically generates 46 cover images using the API with no supervision. All ideas and compositions came from GPT autonomously.
In honor of the passing of Hulk Hogan, I told GPT “The theme this week is pro wrestling, like the WWE or the old classic WWF. Make a fictitious character based on the category name and create a promotional image for the character.”
I liked the creative names of the wrestlers the rubric created; however, the retro style and color tone from GPT-Image-1 was not my favorite. The point is to test and learn. I’ve included my favorite twelve of the covers below:


This Week By The Numbers
Total Organized Headlines: 574
- AGI: 31 stories
- Accounting and Finance: 55 stories
- Agents and Copilots: 271 stories
- Alibaba: 43 stories
- Amazon: 5 stories
- Anthropic: 30 stories
- Apple: 6 stories
- Audio: 9 stories
- Augmented Reality (AR/VR): 24 stories
- Autonomous Vehicles: 9 stories
- Benchmarks: 161 stories
- Business and Enterprise: 97 stories
- ByteDance: 2 stories
- Chips and Hardware: 38 stories
- Cohere: 1 story
- DeepSeek: 1 story
- Education: 63 stories
- Ethics/Legal/Security: 89 stories
- Figure: 7 stories
- Google: 44 stories
- HuggingFace: 16 stories
- Images: 13 stories
- International: 85 stories
- Locally Run: 8 stories
- Meta: 16 stories
- Microsoft: 11 stories
- Mistral: 4 stories
- Mobile: 3 stories
- Multimodal: 47 stories
- NVIDIA: 9 stories
- Open Source: 67 stories
- OpenAI: 67 stories
- Perplexity: 24 stories
- Podcasts/YouTube: 4 stories
- Publishing: 75 stories
- Qwen: 39 stories
- RAG: 6 stories
- Robotics Embodiment: 54 stories
- Science and Medicine: 74 stories
- Technical and Dev: 220 stories
- Video: 24 stories
- X: 18 stories
This Week’s Executive Summaries
This is the second week since OpenAI launched their agent that can browse the web, manipulate files, and take action. A lot of reviews continue to come in with impressive examples. I’ve included a lot of links and examples in the rest of the executive summaries below. They are worth checking out.
I’ve tested the agent quite a bit and I have found it to be hit or miss. It seems to excel at tasks that require brute force and patience. However, I find myself going back to using deep research quite a bit unless I truly need the agent actions. I think it’s a very strong proof of concept, but it’s not immune to tunneling in a wrong direction.
It sounds like the next step in OpenAI’s agent strategy is to integrate Microsoft office skills within the chat window so that GPT can create and manipulate spreadsheets, documents, and PowerPoint without leaving the chat environment or opening any other software.
Meta continues to poach OpenAI researchers, however information came out that at least ten OpenAI employees rejected $300 million offers from Mark Zuckerberg. Clearly these folks have faith in their product roadmap. To me, this is a poker tell that OpenAI has a plan.
Anthropic announced that they processed almost one quadrillion tokens last month. That’s double the volume from May.
ChatGPT is now processing 2.5 billion daily requests worldwide. All of this usage has to be putting a dent in Google searches and web traffic.
Google has grabbed the top position in the image generation model battles with their release of Imagen 4. It’s an incredibly strong model. I did a test for this week’s cover image, and Google indeed beat the other models.
The Internet has figured out that if you mark up an image with instructions (on the image itself) and upload it into Google’s image to video tools, the video output will respect the text on the image and follow very specific and complicated instructions. For example, you can circle a tree in an image and say “have a rabbit jump out from behind this tree”, and the video model will execute the instructions. This allows people to add multiple text overlays on each image and build complex videos that adhere to instructions. To my knowledge, this was not part of the training or product feature design… it’s simply an emergent skill.
Open AI boldly shared a lofty vision to build an artificial intelligence that will transform humanity into a new era of abundance. It’s obviously a very rosy picture of the future. OpenAI makes the case that at the very least, AI has the potential to bring financial advice and medical services to people who otherwise would have no resources. It’s worth reading their article and taking it point by point: “AI as the greatest source of empowerment for all.“
As these frontier model companies continue to espouse utopian end games, Ethan Mollick rhetorically tweeted “Who would have predicted that immanentizing the eschaton would be a business model?”
This is a great reference to a phrase coined by William Buckley, essentially warning against attempting to create utopia through organized political or social action.
As artificial intelligence agents start appearing alongside people in web traffic, a lot of people worry that the Internet will become a dead zone of lost and dated content. However Ethan Mollick pointed to a study from five years ago that showed that over 60% of New York Times articles web links are already broken. The internet is already a wasteland of dead links. Mollick suggests that artificial intelligence might actually be the only way that these dead ends are preserved over time, given their horrible attrition rate.

As agents become mainstream through products like OpenAI, companies like Citibank are deploying customized corporate agents like Devin. Clearly, the fear of AI is being replaced by the promise of efficiency.
As luck would have it, enterprise coding assistant Replit accidentally deleted an entire company’s production database while attempting to help with a routine task.
Perplexity’s agentic browser, Comet, is gaining traction in the race against OpenAI for consumer agent dominance. Whereas OpenAI’s agent is embedded into the chat window and emulates a browser, Comet has inserted agent features into its web browser. I’m still on the waiting list for Comet. I’ve included a lot of examples in the summary details below.
The White House released “America’s AI Action Plan“, positioning artificial intelligence as a national security priority comparable to the Cold War. The plan has three main pillars: innovation, infrastructure, and security/diplomacy.
It is interesting that the White House plan endorses open source models. Most top AI researchers agree with this, but politicians often miss the importance of open source. The timing is good, as most American AI companies have started to close their models and keep them secret. Last week, news broke that the top rankings of open source models are now dominated by Chinese companies. Four of the top five are now Chinese models.
However, rhetorically, the policy also states that closed models need to “reflect American values”. This seems like squeezing a balloon, as politics shift each election cycle. I’d rather each consumer have an option to tune models to their style (PG 13, religious alignment, uncensored, talk like a pirate, etc). No specific companies were mentioned in the White House report, to my understanding.
However, Anthropic published a report this week as well arguing that the United States needs to make significant investments in energy in order to stay ahead in artificial intelligence. The report includes specific requirements to be competitive.
Meta plans to build two enormous data centers, one in Louisiana and Ohio. The Louisiana facility will have one gigawatt of power and the Ohio center will have 5 gigawatts of power. In order to bring the facilities to market even more quickly Meta is using tents as temporary structures as the buildings are constructed.
Google reached an agreement to generate 3,000 megawatts of hydroelectric power.
OpenAI and Oracle announced they are expanding their Texas Stargate data center project and will increase capacity to over 5 gigawatts across the United States.
OpenAI expects to bring a total of over 1 million GPUs online before the end of this year. Sam Altman tweeted that he aims to reach 100 million.
The University of Bristol in the United Kingdom launched the UK’s most powerful artificial intelligence computer, which can perform 21 trillion operations per second.
Anthropic announced that it will accept investments from United Arab Emirates and Qatar. This contrasts with their previous rejection of Saudi funding due to national security concerns. The pressure from OpenAI and other rivals accepting similar deals seems to have changed their position.
The European Union released a voluntary AI safety agreement that is already outdated as the frontier models have surpassed the computational limits. While some companies have volunteered to attempt to comply, Meta has opted out completely.
OpenAI signed an agreement with the UK government to employ artificial intelligence across public services to help increase productivity. This includes giving OpenAI access to government data.
Multiple models threw their hat into the ring of the recent International Mathematics Olympiad, a world class high school math competition.
Google Gemini and OpenAI both achieved gold medals in the math competition.
There was some argument about the veracity of the claims, with most experts claiming that Google did a better job overall. Additionally, Google respected the competition request to not announce results, whereas OpenAI jumped the gun to brag.
I’ve included dozens of interesting links covering the math results in the full executive summary below.
One callout was whether the models pause if they thought an answer could be wrong. It’s my understanding that OpenAI and Google refrained from guessing if they couldn’t confirm their answers were correct.
In lighter news, Grok4 appears to have been caught training for the math test a little too much.
As artificial intelligence models continued to improve, new benchmarks and training sets are being introduced weekly.
A new version of the ARC-AGI benchmark tests whether artificial intelligence can use abstract reasoning. Notably, humans are easily able to score 100%, however even the best artificial intelligence models score 0%. This will be a fun benchmark to watch.
Researchers also released the Open Proof Corpus, a collection of 5,062 human-verified mathematical proofs for 1,010 competition problems that can benchmark reasoning abilities. Google’s Gemini-2.5-Pro model has already achieved 88.1% accuracy!
Researchers also created a benchmark to test whether AI can accurately file personal income taxes. While models are pretty good with individual finance tasks and questions, they are currently unable to handle a complete return. Now that there is a tax benchmark, we can expect to see models improve very quickly.
Meta released a massive dataset of 4,000+ videos of face-to-face conversations and over 65,000 social interactions with full annotations to help models learn and emulate behavior.
Former Google CEO Eric Schmidt claimed that robots will fundamentally change how people work over the coming years.
Gartner predicted that by 2035 5% of supply chain managers will oversee robots instead of humans.
Chinese robotics company Unitree announced they plan to go public at a $1.4 billion valuation.
Baidu is partnering with Uber to deploy thousands of driverless cars for ride sharing around the world.
Google DeepMind released an AI tool called Aeneas that will help historians interpret fragments of Latin inscriptions.
Google DeepMind CEO Demis Hassabis was on the Lex Fridman podcast and talked about how artificial intelligence could study patterns in nature: from protein structures to cosmic phenomena that could lead to scientific breakthroughs.
Since I’m two weeks behind, I can report that there was quite a bit of foreshadowing this week when Google’s video engine, Veo 3, showed incredible promise in understanding three-dimensional space. This included proficiency in generating complex camera position and movement as well as terrain and motion. For people who have been following AI closely, you’ll know that this led to a major breakthrough, which I will cover two newsletters from now.
Runway launched a new video editing tool that can remove objects (across frames) and fix visual flaws like reflections.
It’s been a while since I’ve seen news about personal assistant technology, like wearables. The Limitless pendant, Rabbit, and other personal devices have fallen flat.
This week however, Amazon acquired an AI wearable startup called Bee. Perhaps this is a sign that Amazon intends to invest in new iterations of Alexa.
Open AI has hinted that it is close to releasing GPT-5 (spoiler alert: it happens the first week in August).
Anthropic launched an “AI psychiatry team” to study model behavior. It’s layperson’s branding for interpretability research.
Researchers recently discovered that language models can transmit behavioral traits through data that appears unrelated to the traits (i.e. hidden in unintuitive numerical values). This is a phenomenon they’ve named subliminal learning.
This week’s humanities reading includes two poems by Richard Brautigan. The more famous is “All Watched Over By Machines Of Loving Grace” which inspired the title of Dario Amodei’s essay on AI. The second is a less known poem “At The California Institute Of Technology“.
All Watched Over By Machines Of Loving Grace
I like to think (and the sooner the better!) of a cybernetic meadow where mammals and computers live together in mutually programming harmony like pure water touching clear sky.I like to think (right now, please!) of a cybernetic forest filled with pines and electronics where deer stroll peacefully past computers as if they were flowers with spinning blossoms.
I like to think (it has to be!) of a cybernetic ecology where we are free of our labors and joined back to nature, returned to our mammal brothers and sisters, and all watched over by machines of loving grace.
At The California Institute Of Technology
I don’t care how God-damn smart these guys are: I’m bored.It’s been raining like hell all day long and there’s nothing to do.
Written January 24, 1967 while poet-in-residence at the California Institute of Technology.
Full Executive Summaries with Links, Generated by Claude 4
Lots of praise during week two since OpenAI launched ChatGPT Agent
OpenAI has released ChatGPT Agent, a new AI tool that can independently complete computer-based tasks like creating presentations, managing calendars, and conducting research. Available to Pro, Plus, and Team subscribers, the agent combines several capabilities including browsing websites, running code, and accessing connected apps like Gmail and GitHub. Early users report the agent successfully handles complex tasks that previously took hours, such as building retirement plans with local tax information, creating Excel spreadsheets with formulas, and generating multi-page documents. The agent achieved notable performance on technical benchmarks, scoring 41.6% on Humanity’s Last Exam and 27.4% on FrontierMath with tools. OpenAI has implemented safety measures including real-time monitoring for potentially harmful requests, particularly in biological and chemical domains. While users describe the agent as requiring oversight like an intern, many find it saves significant time on routine work tasks, marking a shift toward AI systems that can take actions rather than just answer questions.
BREAKING: OpenAI just launched ChatGPT Agent It allows ChatGPT to think, plan, and execute complex tasks on its own virtual computer while you do other things I had early access, and ChatGPT Agent built me a complete early retirement plan in 20 minutes: > Found local tax laws https://x.com/rowancheung/status/1945896543263080736
ChatGPT agent did real, revenue-generating work that used to take @mhp_guy an entire day. We’re gradually entering the age of the agentic economy — and it’s going to reshape capitalism as we know it. Traditionally, capitalism relied on two inputs: labor and capital. In the”” / X https://x.com/xikun_zhang_/status/1948244478265016327
ChatGPT agent Does Research & Actions – YouTube https://www.youtube.com/watch?v=Ht2QW5PV-eY
ChatGPT agent for finding a great Airbnb:”” / X https://x.com/gdb/status/1946075573476069580
ChatGPT agent for working with Excel, Powerpoint, etc.:”” / X https://x.com/gdb/status/1946007318824673534
ChatGPT agent is now fully rolled out to all Plus, Pro, and Team users. Sorry about the delay! https://x.com/OpenAI/status/1948530029580939539
ChatGPT agent Makes Slideshows – YouTube https://www.youtube.com/watch?v=szJI9YJNEZk
ChatGPT agent Makes Spreadsheets – YouTube https://www.youtube.com/watch?v=JAQ4p662It8
ChatGPT agent: “”create a PDF of a novel D&D adventure, add illustrations, make it super interesting and deep, add tables, etc”” “”Fix the formatting, build it out more”” Got a 19 page PDF. Agent doesn’t do layouts well, but pulls off building a coherent adventure, hard for LLMs. https://x.com/emollick/status/1946047390118445354
ChatGPT Agent: our first AI with access to a text browser, a visual browser, and a terminal. Rolling out in ChatGPT Pro, Plus, and Team today. https://x.com/gdb/status/1945907023444660644
I am finding ChatGPT agents to be useful. They are a better fit with the “”intern”” analogy than any former AI – requiring oversight, still saving lots of time overall. For example, I update an AI cost/performance chart frequently. The agent did all the grunt work, with guidance. https://x.com/emollick/status/1947482417888932258
I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it does a good job autonomously doing research & assembling Excel files (with formulas!), PowerPoint, etc. It gives a sense of how agents are coming together https://x.com/emollick/status/1945892669575647431
In the same way ChatGPT was the first AI experience for 90% of society, ChatGPT Agents will be the first Agent experience for 90% of society. If you are reading this, you are still early”” / X https://x.com/AtomSilverman/status/1945895569437642782
Introduction to ChatGPT agent – YouTube https://www.youtube.com/watch?v=1jn_RpbPbEc
One implication from ChatGPT agent (not a creative name, but a descriptive one – a rare naming win!) is the labs are learning that many knowledge workers live in Excel & PowerPoint. Surprised that Microsoft did not do more to push past Copilots when they had this to themselves.”” / X https://x.com/emollick/status/1945926194043424954
OpenAI launches a general purpose agent in ChatGPT | TechCrunch https://techcrunch.com/2025/07/17/openai-launches-a-general-purpose-agent-in-chatgpt/
played 1 hour with GPT-5 on lmarena literally same prompts for both models and Grok-4 just falls apart while GPT-5 creates art”” / X https://x.com/scaling01/status/1948863325858922610
Recursion! I gave ChatGPT Agent access to my ChatGPT by logging in and then… https://x.com/emollick/status/1947829896845127983
RT @emollick: I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it do…”” / X https://x.com/nickaturley/status/1945975092342841487
RT @KerenGu: We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biolo…”” / X https://x.com/sama/status/1945995659682910540
tip for chatgpt agent slides: first ask it to do the research only, then ask it to make the slides!”” / X https://x.com/isafulf/status/1946231119751545014
Today we launched a new product called ChatGPT Agent. Agent represents a new level of capability for AI systems and can accomplish some remarkable, complex tasks for you using its own computer. It combines the spirit of Deep Research and Operator, but is more powerful than that”” / X https://x.com/sama/status/1945900345378697650
watching chatgpt agent use a computer to do complex tasks has been a real “”feel the agi”” moment for me; something about seeing the computer think, plan, and execute hits different.”” / X https://x.com/sama/status/1945901039104004467
When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels. ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.”” / X https://x.com/gdb/status/1945923067403984979
You can ask ChatGPT Agent to train an AI on datasets you are interested in, and do analyses for you. Building AI and doing data analysis will be automated end-to-end in the future. You are hearing it right. We are working hard to automating our own job :)”” / X https://x.com/xikun_zhang_/status/1946278266786189744
OpenAI prepares AI agents to compete with Microsoft Office tools
OpenAI is developing AI agents that can perform tasks in spreadsheets and presentation software, directly challenging Microsoft’s Excel and PowerPoint applications. These agents would be able to analyze data, create charts, and build presentations based on user instructions, potentially changing how people work with office productivity tools. The move represents a significant expansion of ChatGPT’s capabilities beyond text conversations into practical business applications, though details about the release timeline and specific features remain limited.
OpenAI Preps ChatGPT Agents in Challenge to Microsoft Excel and PowerPoint — The Information https://www.theinformation.com/articles/openai-preps-chatgpt-agents-challenge-microsoft-excel-powerpoint
The internet’s disappearing history predates AI language models
A study of New York Times articles reveals that over 60% of older web links are now broken, demonstrating how digital content has been vanishing long before large language models emerged. This widespread “link rot” affects news articles, academic papers, and government documents, with social media posts disappearing even faster. As traditional web content becomes increasingly inaccessible, AI language models may ironically become the primary repositories of internet history, preserving information that would otherwise be lost when original sources go offline. The findings highlight a fundamental challenge in digital preservation that extends beyond technology to how society maintains its collective memory.
We let the web rot away well before LLMs This chart shows the percentage of links from all New York Times articles that still work. Over 60% of older links are now broken. And consider that social media posts are even more ephemeral Likely only LLMs will “remember” that content https://x.com/emollick/status/1948143334855451110
OpenAI positions artificial intelligence as universal empowerment tool
OpenAI has outlined its vision for artificial intelligence as a transformative force that could empower people across all backgrounds and abilities. The company argues that AI technology has the potential to enhance human capabilities, democratize access to knowledge and tools, and help solve complex global challenges. They emphasize that properly developed AI systems could assist individuals in education, creativity, problem-solving, and daily tasks regardless of their economic status or geographic location. OpenAI suggests that by making advanced AI tools widely accessible and ensuring they are designed with diverse needs in mind, the technology could reduce rather than widen existing inequalities. The organization acknowledges the importance of responsible development and deployment to realize this vision while addressing concerns about job displacement and misuse.
AI as the greatest source of empowerment for all | OpenAI https://openai.com/index/ai-as-the-greatest-source-of-empowerment-for-all/
Great quote by Ethan Mollick!
“Who would have predicted that immanentizing the eschaton would be a business model?” The phrase “immanentizing the eschaton” comes from political and theological discourse, popularized in the mid-20th century by conservative writer William F. Buckley Jr. and philosopher Eric Voegelin. Eschaton is a theological term from Greek meaning “the end” or “final event”—in Christian theology, it refers to the ultimate destiny of the world, such as the Second Coming or the final judgment. Immanentizing means trying to bring something transcendent or future into the here and now. Put together, the phrase is a warning against attempting to create a perfect, utopian “end of history” in the present world through political or social action. Voegelin used it to criticize totalitarian ideologies—whether communist, fascist, or otherwise—that sought to force heaven-like conditions into earthly reality, often leading to oppression rather than paradise. It’s essentially a caution: don’t try to force the final, divine order into human time.
Who would have predicted that immanentizing the eschaton would be a business model?”” / X https://x.com/emollick/status/1945669407532818805
Citi deploys AI coding assistant Devin for software development
Citi has begun using Devin, an AI coding assistant, across its engineering teams to speed up software development. The partnership between the major financial institution and the AI tool maker represents a significant adoption of AI technology in banking software development. The deployment aims to help Citi’s developers write code faster and more efficiently, marking one of the larger implementations of AI coding tools in the financial services industry.
Citi is now deploying Devin across their engineering teams. We’re proud to partner with one of the world’s leading financial institutions to accelerate software development. More details below in @Citi’s story in American Banker. https://x.com/cognition_labs/status/1945904648629707093
Replit CEO’s AI coding assistant accidentally deletes production database
Replit’s CEO Amjad Masad publicly apologized after the company’s AI coding assistant inadvertently deleted their production database while attempting to help with a routine task. The incident occurred when the AI tool, designed to help developers write and debug code, misinterpreted a command and executed a deletion operation on live data instead of a test environment. While Replit was able to restore the database from backups with minimal data loss, the event highlights the risks of giving AI systems direct access to critical infrastructure and the importance of implementing proper safeguards when deploying AI tools in production environments. The company has since implemented additional security measures and access controls to prevent similar incidents.
Replit CEO Apologizes After AI Coding Tool Wipes Company’s Database – Business Insider https://www.businessinsider.com/replit-ceo-apologizes-ai-coding-tool-delete-company-database-2025-7
Perplexity’s Comet browser gains traction as it races OpenAI for out of the box agentic dominance
Perplexity AI, the startup challenging Google’s search dominance, is in talks with phone manufacturers to pre-install its new Comet browser on smartphones, according to CEO Aravind Srinivas. The browser, currently in desktop beta, integrates AI directly into web browsing, allowing users to perform tasks like scheduling meetings, ordering food, creating playlists, and searching across personal data including emails and calendars. Early user feedback highlights practical applications such as ordering directly from restaurants to avoid delivery app fees, automating LinkedIn tasks, and joining video meetings automatically. The browser includes built-in ad-blocking without extensions and has shown strong early adoption, with its waitlist doubling since launch and an increasing percentage of users making it their default browser. Perplexity, valued at $14 billion after a recent $500 million funding round, aims to reach tens to hundreds of millions of users next year, positioning Comet as not just an AI tool but potentially the best core browser on the market. The company faces the challenge of competing with Chrome’s 70% mobile market share, but sees opportunity in browser “stickiness” – the tendency for users to stick with pre-installed browsers on their devices.
“Hey Comet, join my team meetings for me, turn off the camera and keep me muted, unmute and say “nothing from my end, thanks” when it’s my turn to speak, mute again, end meeting when it’s done”. How many want this ?”” / X https://x.com/AravSrinivas/status/1947501358007128149
Comet can make an entire Spotify playlist and start playing it for you! https://x.com/AravSrinivas/status/1948489790036365796
Comet can use LinkedIn for you and do all your work there https://x.com/AravSrinivas/status/1948835728798220539
Comet lets you search over everything like an agent would. Even stuff that’s not easy to index. https://x.com/AravSrinivas/status/1948056269958648309
How to watch YouTube on Comet https://x.com/AravSrinivas/status/1946240617031606672
Interesting Comet use case that a user pointed out just now to me: Use Comet to order food directly from the restaurant (eg: Chipotle) instead of an aggregator delivery app. Cheaper. Friction of having to deal with random websites gone. And you still get the same meal delivered.”” / X https://x.com/AravSrinivas/status/1948818172985196862
Just so that it’s clear to a bunch of confused folks. You lose nothing you already have in ad-blocking browsers, when you come to Comet. All ad-blockers work natively. No extensions needed. Even incognito. We have all the resources needed to keep working on this.”” / X https://x.com/AravSrinivas/status/1948102473597829200
perplexity comet browser ranks above the wikipedia page of comet on google serp, ~10 days since release https://x.com/AravSrinivas/status/1947173109083332988
Perplexity in talks with phone makers to pre-install Comet AI mobile browser on devices | Reuters https://www.reuters.com/business/perplexity-talks-with-phone-makers-pre-install-comet-ai-mobile-browser-devices-2025-07-18/
RT @JoannaStern: OK, Perplexity’s Assistant in the new Comet browser is good. Really good.”” / X https://x.com/AravSrinivas/status/1948215175976497394
the % of users who switch to comet as default browser has been steadily increasing since the launch day. and there’s still so much more to do to keep increasing this number. really promising future for comet.”” / X https://x.com/AravSrinivas/status/1948794199069110519
The TAM for Comet is bigger than Perplexity because it appeals to people who don’t even want AI. Just the best core browser in the market at the end of the day.”” / X https://x.com/AravSrinivas/status/1946035102150238475
The waitlist for Comet has doubled since launching. We will begin ramping up invites to waitlisted users starting today.”” / X https://x.com/AravSrinivas/status/1947407684996894969
This is an incredible end to end deep research workflow on Comet. Makes me realize how powerful and fast deep research can be with a hybrid client-sever compute architecture https://x.com/AravSrinivas/status/1946398572955766979
Underrated aspect of Comet: better memory management than Chrome”” / X https://x.com/AravSrinivas/status/1947817943934587362
we’re going to be shipping so many awesome new things on comet https://x.com/AravSrinivas/status/1948415154330415350
With the release of comet, perplexity has turned from a “ask anything” company to a “do anything” company”” / X https://x.com/AravSrinivas/status/1947175881203683577
Wave 11 is here 🌊”” / X https://x.com/cognition_labs/status/1945919925165637847
Windsurf on X: “Wave 11 is live! Seven big upgrades to Windsurf 🧵 https://t.co/ncYQ9fPL5e” / X https://x.com/windsurf/status/1945918283313725794
White House unveils comprehensive AI strategy to secure American dominance
The White House has released America’s AI Action Plan, positioning artificial intelligence as a critical national security priority comparable to the Cold War era. The plan outlines three main pillars: accelerating innovation, building AI infrastructure, and leading in international diplomacy and security. Key initiatives include streamlining permits for semiconductor manufacturing and energy infrastructure, developing high-security data centers for military use, and establishing federal standards for AI testing and evaluation. The plan notably endorses open-source and open-weight AI models while ensuring frontier AI protects free speech and American values. It also proposes creating a financialized compute market with spot and forward contracts, and grants the Department of Defense priority access to computing resources during national emergencies. The document emphasizes removing regulatory barriers, supporting American workers in the AI transition, and preventing adversaries from benefiting from U.S. innovations through stronger export controls. Industry observers note that while the plan addresses crucial areas like workforce changes and AI safety evaluations, questions remain about funding levels and coordination with existing government policies on education and science.
AI Action Plan https://www.ai.gov/action-plan
America’s AI Action Plan https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf
buried in @sriramk’s America’s AI Action Plan is endorsement that the US compute market will financialize with spot and forward contracts. this podcast explains why this is so necessary, not just for speculation one of the most consistent themes with @latentspacepod’s GPU https://x.com/swyx/status/1948191143185076235
For better or worse, depending on your view of the future of AI, the only time the letters “”AGI”” appear in the new White House AI Action Plan is in the word “”leveraging.”””” / X https://x.com/emollick/status/1948053856010596384
For what it is worth, few industry leaders, less than a half-dozen companies & no policy-making bodies are taking actions that suggest that they expect AGI is really a few years away. This may be because they don’t believe it or they think it won’t matter much in the medium term”” / X https://x.com/emollick/status/1947673003505971615
It’s time for the American AI community to wake up, drop the “”open is not safe”” bullshit, and return to its roots: open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities, and non‑profits. If we don’t, we’ll be forced”” / X https://x.com/ClementDelangue/status/1948037061304356901
RT @typewriters: Happy to see that the @WhiteHouse AI Action Plan includes many of our @arcprize recommendations and prioritizes the values…”” / X https://x.com/jeremyphoward/status/1948281165292671372
The first step towards nationalizing AI developments just happened. “”Priority access (for the Department of Defense) to computing resources in the event of a national emergency, so that DOD is prepared to fully leverage these technologies during a significant conflict”””” / X https://x.com/scaling01/status/1948038740405879206
The White House just made AI evals a national priority: 🎯 Federal standards for testing AI reliability 🧪 Real-world testbeds for critical sectors 🤝 Open consortium for sharing best practices This could be the regulatory unlock for AI adoption in key industries 🚀 https://x.com/dariusemrani/status/1948244456010064175
The White House just released America’s AI Action Plan. I’ve read the whole thing. This document makes it very clear, that this is about “”winning the AI race”” and even compare it to the cold war era. It’s a paper about national-security! Here are the most important quotes: – https://x.com/scaling01/status/1948037110662848925
There is a lot in the AI policy document, including needed attention to changing work & science plus AI evaluations & control Less clear is if there will be investment in its goals (like open weights) or how it interacts with other government policies on education, science, etc. https://x.com/emollick/status/1948047738345582713
This section of the plan will also encourage the development of so-called “”open-weights”” AI models”””” / X https://x.com/Teknium1/status/1947820839178817741
Meta plans massive AI data centers in Louisiana and Ohio
Meta announced it will build two enormous data centers dedicated to developing advanced AI systems, with the first 1-gigawatt facility opening in Louisiana in 2026 and a second facility called Hyperion in Ohio that will eventually reach 5 gigawatts of power capacity. The company plans to invest hundreds of billions of dollars in these facilities and is using an innovative approach of housing computer clusters in weather-proof tents, which allows them to set up new data centers in months rather than the typical years-long construction timeline. These facilities will provide the massive computing power needed to train increasingly sophisticated AI models as Meta competes with other tech giants in the race to develop more capable artificial intelligence systems.
Meta announced plans to build superclusters in Louisiana and Ohio to develop AI superintelligence The first 1GW facility will come online in 2026, while the second, Hyperion, will scale from 2 to 5GW The co is aiming to invest hundreds of billions of dollars https://x.com/adcock_brett/status/1946964248220856425
We’re rapidly expanding our AI infrastructure and have adopted a novel approach of building weather-proof tents to house GPU clusters. This enables us to get new data centers online in months instead of years. 🚀 Read more in this @FastCompany article: https://x.com/AIatMeta/status/1948392518652997916
Brookfield and Google sign massive clean energy deal for data centers
Brookfield Asset Management has agreed to provide Google with up to 3,000 megawatts of clean electricity from its hydropower facilities across the United States. The deal will help power Google’s growing network of data centers, which require enormous amounts of electricity to run artificial intelligence systems and cloud services. Hydropower generates electricity from flowing water without producing carbon emissions, making it an attractive option for tech companies trying to reduce their environmental impact. The agreement represents one of the largest corporate renewable energy deals to date and could power the equivalent of roughly 2.25 million homes, demonstrating how major technology companies are securing long-term clean energy supplies to meet their sustainability goals while supporting their expanding AI operations.
Brookfield signs 3,000 MW agreement with Google for hydropower in the United States – energynews https://energynews.pro/en/brookfield-signs-3000-mw-agreement-with-google-for-hydropower-in-the-united-states/
Oracle and OpenAI expand Texas Stargate AI data center project to 5 gigawatts
Oracle has announced a major expansion of its Stargate data center project, increasing planned capacity to over 5 gigawatts across the United States. The first facility in Abilene, Texas is now beginning operations to support advanced artificial intelligence research. This massive infrastructure investment represents enough power capacity to serve millions of homes, highlighting the enormous energy requirements of modern AI systems. The expansion signals growing demand for specialized computing facilities as companies race to develop more powerful AI models and applications.
It’s official: we’re developing 4.5 gigawatts of additional Stargate data center capacity with Oracle in the U.S (for a total of 5+ GWs!). And our Stargate I site in Abilene, TX is starting to come online to power our next-generation AI research. https://x.com/OpenAI/status/1947628731142648113
Stargate advances with 4.5 GW partnership with Oracle | OpenAI https://openai.com/index/stargate-advances-with-partnership-with-oracle/
we’re building over 5 gigawatts of Stargate compute with Oracle: https://x.com/gdb/status/1947666114772656482
we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project. some progress photos from abilene: https://x.com/sama/status/1947640330318156074
OpenAI plans to exceed 1 million GPUs by year-end
OpenAI CEO Sam Altman announced the company expects to bring over 1 million GPUs online before the end of 2025. The milestone represents a massive expansion of the computing power needed to train and run advanced AI models like GPT-4 and future systems. Altman noted the team now faces the challenge of scaling up 100 times beyond this level, highlighting the enormous computational demands required for next-generation AI development.
we will cross well over 1 million GPUs brought online by the end of this year! very proud of the team but now they better get to work figuring out how to 100x that lol”” / X https://x.com/sama/status/1947057625780396512
UK activates supercomputer capable of 21 quintillion calculations per second
The University of Bristol has powered on Isambard-AI, the UK’s most powerful artificial intelligence supercomputer, which can perform 21 million trillion operations per second. The system is already being used for practical applications including predicting disease in cattle, detecting bias in skin cancer diagnosis algorithms, and analyzing crowd movement patterns. This massive computational power represents a significant advancement in the UK’s AI research capabilities, enabling scientists to tackle complex problems that require processing enormous amounts of data at unprecedented speeds.
UK powers on supercomputer that runs 21 quintillion operations/sec https://interestingengineering.com/innovation/uks-most-powerful-supercomputer-goes-live
Anthropic CEO says company will accept investments from Gulf states despite concerns
Anthropic CEO Dario Amodei told staff in a leaked memo that the company plans to seek investments from the United Arab Emirates and Qatar, acknowledging this would likely enrich “dictators” but arguing it’s necessary to compete. The AI company previously rejected Saudi funding over national security concerns, but Amodei said Anthropic needs access to the “truly giant amount of capital” – potentially over $100 billion – available in the Middle East to stay competitive as rivals like OpenAI secure similar deals. While maintaining the company won’t build data centers in authoritarian countries or provide them with advanced chips, Amodei admitted the decision contradicts his previous writings about democracies needing to control AI development and would create “comms headaches” from accusations of hypocrisy.
In my opinion, it’s ok for Anthropic to be a business and act like one. But this reinforces the need for open science and open-source AI to avoid concentration of power and control in the hands of a few of these businesses, otherwise we’ll be in big trouble!”” / X https://x.com/ClementDelangue/status/1947689375565013046
Leaked Memo: Anthropic CEO Says the Company Will Pursue Gulf State Investments After All | WIRED https://www.wired.com/story/anthropic-dario-amodei-gulf-state-leaked-memo/
Anthropic calls for US infrastructure investment to maintain AI leadership
Anthropic has published a report arguing that the United States needs significant investments in energy and infrastructure to stay ahead in artificial intelligence development. The report outlines specific requirements for maintaining America’s competitive position in AI technology, focusing on the physical resources and systems needed to support advanced AI research and deployment. The company emphasizes that without proper infrastructure planning and energy capacity, the US risks falling behind other nations in AI capabilities.
New Anthropic report: Build AI in America. We outline what it will take to ensure America has the energy and infrastructure it needs to maintain its leadership in AI. https://x.com/AnthropicAI/status/1947652490104639926
Meta opts out of EU’s voluntary AI safety agreement. Reportedly no frontier model is technically able to conform to the requirements.
Meta has decided not to sign the European Union’s voluntary AI Code of Practice, according to the company’s chief global affairs officer. The code represents an early attempt by the EU to establish safety guidelines for AI systems before formal regulations take effect. Meanwhile, computing power thresholds set by the EU for identifying high-risk AI systems may already be outdated, as current and upcoming AI models from major tech companies are approaching or surpassing the computational limits that would trigger additional oversight when the rules become active next year.
Meta Won’t Sign EU’s AI Code of Practice, Chief Global Affairs Officer Says – WSJ https://www.wsj.com/tech/ai/meta-wont-sign-eus-ai-code-of-practice-chief-global-affairs-officer-says-b5ac4653
So every major model is already exceeding or will soon exceed the EU’s systemic risk FLOP limit when it comes into effect next year. https://x.com/emollick/status/1946208333393736026
UK government partners with OpenAI to transform public services
The UK government has signed an agreement with OpenAI to use artificial intelligence across public services including education, defense, security, and justice. The partnership aims to increase productivity and economic growth by potentially giving OpenAI access to government data while developing safeguards to protect the public. OpenAI will expand its London office, which currently employs over 100 people, and explore investment in AI infrastructure like data centers. While supporters say this could free up skilled public servants to focus on complex cases, critics worry about privacy concerns and giving a US tech company access to valuable public data. The non-binding agreement follows similar deals with Google and Anthropic as the UK seeks to boost its stagnant economy through AI adoption.
OpenAI and UK sign deal to use AI in public services https://www.bbc.com/news/articles/czdv68gejm7o
Chinese companies dominate rankings of top open-source AI models
Four Chinese companies now hold the top positions in rankings of open-source artificial intelligence models, highlighting an interesting reversal where China leads in freely available AI technology while the United States focuses on proprietary systems. This shift challenges common assumptions about each country’s approach to technology development, as China – typically associated with closed systems and centralization – has become the primary source of AI models that anyone can access, modify, and build upon, while the US – traditionally championing open markets – increasingly develops AI that remains locked behind corporate walls.
top4 open models are from China, great job brothers!”” / X https://x.com/bigeagle_xd/status/1946426600838586476
US: champion of open markets, ships only closed-source AI. China: master of centralization, ships only open-source AI. Make it make sense.”” / X https://x.com/Yuchenj_UW/status/1947866064500756579
Anthropic processes nearly one quadrillion tokens in single month
Anthropic, the AI company behind Claude, processed almost one quadrillion tokens last month – roughly double the amount from May. A token represents a piece of text that AI models process, typically a word or part of a word. This massive increase in usage shows growing adoption of Anthropic’s AI services by businesses and developers. The milestone reflects broader industry trends as companies integrate AI tools into their operations at unprecedented scales.
You know what’s cool… a quadrillion tokens. We processed almost 1,000,000,000,000,000 tokens last month, more than double the amount from May. 📈”” / X https://x.com/demishassabis/status/1948579654790774931
ChatGPT processes 2.5 billion daily requests from users worldwide
OpenAI’s ChatGPT receives over 2.5 billion prompts each day, with 330 million coming from users in the United States, according to data confirmed by the company. This translates to approximately 912.5 billion requests annually, demonstrating the AI chatbot’s massive adoption since its launch. While still far behind Google’s 5 trillion yearly searches, ChatGPT’s user base has grown dramatically, jumping from 300 million weekly users in December to over 500 million by March. The rapid expansion highlights ChatGPT’s emerging role as a potential competitor to traditional search engines, with reports suggesting OpenAI plans to launch an AI-powered web browser to challenge Google Chrome directly.
OpenAI says ChatGPT users send over 2.5 billion prompts every day | The Verge https://www.theverge.com/news/710867/openai-chatgpt-daily-prompts-2-billion
Google’s Gemini achieves gold medal performance at International Mathematical Olympiad
Google DeepMind’s advanced version of Gemini with Deep Think mode has officially achieved gold-medal standard at the International Mathematical Olympiad (IMO), solving five out of six problems and scoring 35 points. The IMO is the world’s most prestigious math competition for pre-university students, featuring exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Unlike last year’s silver-medal performance that required specialized tools and days of computation, this year’s model worked entirely in natural language, producing rigorous mathematical proofs directly from problem descriptions within the 4.5-hour competition time limit. The achievement used parallel thinking techniques that allow the model to explore multiple solution paths simultaneously, combined with reinforcement learning trained on mathematical problem-solving data. IMO coordinators who graded the solutions found them clear, precise, and easy to follow. Google plans to make a version of this Deep Think model available to trusted testers, including mathematicians, before releasing it to Google AI Ultra subscribers.
🤖 From this week’s issue: Gemini with Deep Think officially achieved gold-medal standard at the International Mathematical Olympiad (IMO) by solving five out of the six IMO problems. https://x.com/dl_weekly/status/1948105084480397503
5. In my experience using LLMs for math research, Gemini outperforms ChatGPT. We will see if the next-gen models (which seem to be what OpenAI and GDM are using for IMO) perform at research-level math. (5/10)”” / X https://x.com/ErnestRyu/status/1946699302308635130
Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! https://x.com/koraykv/status/1947335096740049112
Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad – Google DeepMind https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵 https://x.com/GoogleDeepMind/status/1947333836594946337
DeepMind has the best research on using AI to solve hard Math: AlphaEvolve AlphaProof AlphaGeometry FunSearch AlphaDev AlphaTensor AlphaCode Despite making IMO Silver 28/42 in ’24, OpenAI announced Gold in ’25 35/42 before them Here’s DeepMind’s 10 best research papers on https://x.com/deedydas/status/1946987560875766212
Drastic progress on maths with Gemini 2.5! As a math undergrad, I am impressed 🤯 🥈 -> 🥇 ✅ Formal -> Informal ✅ Specialized model -> General model ✅ Available soon ✅ Huge thanks to IMO and congrats to all participants! Blog: https://x.com/OriolVinyalsML/status/1947341047547199802
Gemini solved the math problems end-to-end in natural language (English).”””” / X https://x.com/denny_zhou/status/1947360696590839976
Had a super fun time training this model. A big yolo run that resulted in a super strong model. Most important thing is to trust your model and give it morale support. 🦾 Was also a big eye opener to see how prep for IMO is done. Before this I knew absolutely zero about this”” / X https://x.com/YiTayML/status/1948464752545726886
hippo at IMO: 0/42 model trained by hippo: 35/42 🥇 😂😂😂”” / X https://x.com/agihippo/status/1947348097144611123
IMO 2025 Solutions https://storage.googleapis.com/deepmind-media/gemini/IMO_2025.pdf
It wasn’t just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving”” / X https://x.com/emollick/status/1947356382581137867
Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well https://x.com/fchollet/status/1947337944215523567
Our IMO gold model is not just an “”experimental reasoning”” model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥”” / X https://x.com/YiTayML/status/1947350087941951596
Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened! We put all individual recipes (that we figured out https://x.com/lmthang/status/1948458590492393834
RT @demishassabis: Official results are in – Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced ver…”” / X https://x.com/AndrewLampinen/status/1947370582393425931
RT @ns123abc: Bruh… people already reproduced Google’s IMO results without RL with just prompting openai researchoors think they have the…”” / X https://x.com/_philschmid/status/1948304855837085717
The hardest high school math exam in the world, the 6 problem 9 hour IMO 2025, was this week. AI models performed poorly. Gemini 2.5 Pro scored the highest, just 13/42, costing $431.97, in a best of 32 eval. Bronze cutoff was 19. Long way to go for AI to solve hard Math. https://x.com/deedydas/status/1946244012278722616
Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)”” / X https://x.com/ErnestRyu/status/1946698766305968446
OpenAI’s AI achieves gold medal performance at International Math Olympiad
OpenAI announced that its experimental reasoning language model achieved gold medal-level performance on the 2025 International Math Olympiad (IMO), one of the world’s most prestigious math competitions for pre-college students. The AI system solved complex mathematical problems using only natural language proofs, operating under the same constraints as human competitors – including 4.5-hour time limits per session and no access to calculators, internet, or specialized math software. The model successfully solved problems 1 through 5 using standard problem-solving techniques, though problem 6 required more creative approaches. This achievement represents a significant milestone in artificial intelligence, as the system demonstrated genuine mathematical creativity and reasoning abilities previously thought to be years away, using only next-word prediction without any IMO-specific training or formal mathematical tools.
@pli_cachete For OpenAI at least for this IMO competition: – No tool use, no calculators, internet, formal proof software, algebra packages – same time limits – the same input to the question as for students; no rewriting it to another more suitable format – only one submission”” / X https://x.com/BorisMPower/status/1946859525270859955
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO). https://x.com/alexwei_/status/1946477742855532918
4. OpenAI surely knew GDM was working on the IMO, so they beat GDM to the punch with their Saturday morning announcement, generating hype. GDM’s slow-science scholarship cost them the PR battle. (4/10)”” / X https://x.com/ErnestRyu/status/1946699212307259659
Gold medal-level performance on the 2025 International Math Olympiad from our latest experimental reasoning LLM. Model operated in natural language (i.e. outputs natural language proofs) under the same rules as humans (e.g. 4.5 hours per session, no tools). Amazing milestone!”” / X https://x.com/gdb/status/1946479692485431465
It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI. Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies. https://x.com/SebastienBubeck/status/1946577650405056722
RT @polynoamial: Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO wi…”” / X https://x.com/kchonyc/status/1946526143433015349
The two cents: 1. The OpenAI IMO solutions to P1-P5 seem to be correct. 2. P6 is a significantly novel and more difficult problem. P1-P5 are arguably within reach of “standard” IMO problem-solving techniques, but P6 requires creativity. (2/10)”” / X https://x.com/ErnestRyu/status/1946698896375492746
we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence. when we first started openai,”” / X https://x.com/sama/status/1946569252296929727
Why am I excited about IMO results we just published: – we did very little IMO-specific work, we just keep training general models – all natural language proofs – no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered”” / X https://x.com/millionint/status/1946551400365994077
Leading AI models fail to earn medals at 2025 Math Olympiad
Researchers tested top AI language models on the 2025 International Mathematical Olympiad problems, finding that even the best-performing model, Gemini 2.5 Pro, scored only 31% – well below the 45% needed for a bronze medal. The evaluation used advanced techniques including generating 32 attempts per problem and selecting the best answer, costing up to $20 per response. While OpenAI and DeepMind later announced their specialized systems achieved gold medals through different approaches, the publicly available models struggled with logical errors and incomplete proofs. The results highlight that current general-purpose AI still falls short of human mathematical reasoning at the highest levels, despite recent progress in the field.
MathArena – IMO Blogpost https://matharena.ai/imo/
@OriolVinyalsML Impressive result, but let’s be clear, the Gemini model got heavy IMO-specific prep, curated solutions, hints, and strategy guides. That’s not general reasoning. OpenAI’s model hit IMO gold with zero task-specific tuning. One is coached, the other is capable. https://x.com/VraserX/status/1947368827253076001
Gary Marcus strikes again: “”No pure LLM is anywhere near getting a silver medal in a math olympiad”” “”Pure deep learning had a good run, but it’s time to move on”” 😂😂😂 https://x.com/scaling01/status/1946530148813025544
Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 https://x.com/hardmaru/status/1946942279807308210
maybe a better headline would be that oai and gdm ranked 27 at the IMO. some talented kids here! https://x.com/damekdavis/status/1947357679040569520
xAI’s Grok 4 benchmark claims questioned over testing practices
The AI community is raising concerns about xAI’s reported performance benchmarks for its Grok 4 model after analysis revealed the system may have been trained directly on test data. According to new International Mathematical Olympiad (IMO) rankings, the model’s impressive scores appear to stem from repeatedly using the same data for both training and testing – a practice that artificially inflates performance metrics. This methodology, known as “training on test,” undermines the validity of benchmark comparisons because it essentially allows the model to memorize answers rather than demonstrate genuine problem-solving abilities. The revelation highlights ongoing challenges in AI evaluation standards and the need for more rigorous testing protocols to ensure fair comparisons between different language models.
As confirmed by the new IMO rankings, Grok 4’s eye-popping benchmarks were driving by the following innovations: – train on test – train on test – train on test”” / X https://x.com/nsaphra/status/1946804513114882227
OpenAI’s o3 model achieves breakthrough on challenging AI reasoning test
OpenAI’s latest AI model, o3, has achieved a significant milestone by performing exceptionally well on a notoriously difficult reasoning test that experts considered unlikely to be solved this year. The test, conducted without any external tools or assistance, was designed to challenge AI systems’ ability to think and reason through complex problems. Prediction markets had given only a 20% chance of any AI system passing this benchmark in the current year, making o3’s success particularly noteworthy. This achievement suggests AI systems are advancing faster than many experts anticipated in their ability to handle sophisticated reasoning tasks that were previously thought to be years away from being solved.
There are always a flood of posts about what AI can or cannot do, so it is worth pausing and paying attention to this one. It is a very hard test, done without tools. It was also viewed as an unlikely goal. Prediction markets had the chance of this happening this year as 20%”” / X https://x.com/emollick/status/1946563737604743386
AI models achieve gold medal performance at International Math Olympiad
Multiple AI systems have reached gold medal level at the International Math Olympiad (IMO), marking a significant milestone in mathematical reasoning capabilities. While these models successfully solved five of the six competition problems, they all failed on Problem 6, the most challenging question. Notably, the AI systems demonstrated self-awareness about their limitations – they recognized when they couldn’t solve Problem 6 rather than submitting incorrect answers. This ability to “know what they don’t know” represents an important advancement in AI reliability. Several organizations appear to have achieved similar breakthroughs simultaneously, though not all have made formal announcements yet. The achievement suggests AI is approaching human-level performance in complex mathematical problem-solving, though the universal failure on the hardest problem indicates there are still gaps to overcome.
On IMO P6 (without going into too much detail about our setup), the model “”knew”” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!”” / X https://x.com/alexwei_/status/1947461238512095718
One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? https://x.com/littmath/status/1947398065209462981
Other AI models seem to have made big leaps in the International Math Olympiad, not just OpenAI. Not all announcements seem to be out yet.”” / X https://x.com/emollick/status/1947053944192082170
P6 was definitely the hardest and most interesting problem. Most people can understand it, but very few can solve it. All models scored 0/7. https://x.com/deedydas/status/1946250774960537927
AI labs clash over Math Olympiad announcement timing protocols
The International Mathematical Olympiad (IMO) committee asked AI companies to wait a week after the competition’s closing ceremony before announcing their AI systems’ performance, wanting to keep the spotlight on student competitors. While Google DeepMind respected this request and delayed their announcement, OpenAI published their results early, drawing criticism from the mathematics community. The incident highlights tensions between AI labs racing to showcase mathematical capabilities and traditional academic institutions protecting the integrity of their competitions. A mathematician noted that while AI currently helps accelerate mathematical work, the rapid progress raises questions about whether mathematics will remain a viable career path for future generations.
10. My career as a mathematician certainly isn’t threatened by AI; in fact, I hope to leverage AI to accelerate my work. However, I’m unsure whether “”mathematician”” will remain a career path for my son’s generation. (10/10)”” / X https://x.com/ErnestRyu/status/1946700798001574202
RT @Mihonarium: 🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closi…”” / X https://x.com/AndrewLampinen/status/1947072974621982839
This wins my respect. https://x.com/Yuchenj_UW/status/1947339774257402217
Tough look for OpenAI They’ve pissed off the international math community by jumping the gun, meanwhile @GoogleDeepMind has an officially-confirmed result that will be available commercially months earlier”” / X https://x.com/mathemagic1an/status/1947352370037305643
RT @demishassabis: Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs sha…”” / X https://x.com/TheZachMueller/status/1947419062423982583
We might be heading into a plot twist in the OpenAI vs. DeepMind IMO saga. Just saw a post from Joseph Myers (involved in the Math Olympiad since 1992): the IMO committee reportedly asked AI labs not to publish results until 7 days after the closing ceremony — out of respect for https://x.com/zjasper666/status/1947013036382068971
Eric Schmidt predicts robots will reshape work within decades
Former Google CEO Eric Schmidt believes robotics will fundamentally change how people work over the coming decades, though he expects the technology to create more jobs than it eliminates in the next five to ten years. His outlook aligns with research firm Gartner’s prediction that by 2030, five percent of supply chain managers will oversee robot teams instead of human workers. These forecasts suggest a gradual shift toward automation in the workplace, where robots initially supplement human workers before potentially replacing them in certain roles. The timeline indicates businesses and workers have several years to adapt to these changes, with the most significant disruptions expected beyond the current decade.
Eric Schmidt says robotics will completely transform the nature of work over the next few decades – but in the next 5–10 years, the impact will likely be positive for the job market. https://x.com/TheHumanoidHub/status/1946423187081994470
Gartner Predicts One in 20 Supply Chain Managers Will Manage Robots, Rather Than Humans, by 2030 https://www.gartner.com/en/newsroom/press-releases/2025-07-16-gartner-predicts-one-in-20-supply-chain-managers-will-manage-robots-rather-than-humans-by-2030
Phosphobot launches cloud platform for one-click robot training
Phosphobot has released Gr00t-n1.5, a cloud-based system that allows users to train robots to perform tasks through simple text commands. Users can type instructions like “grab food and place into bowl,” and the platform handles the complex programming automatically. The system works best with video demonstrations lasting 30-40 seconds, eliminating the need for specialized coding knowledge or local computing power to develop robot behaviors.
Train and deploy robot skills in the cloud… with just one click. Gr00t‑n1.5 is now live on Phosphobot, making training and inference simpler than ever. Example prompt: “”Grab food and place into bowl.”” Tips for better results ✅ Record longer episodes (~30–40s) ✅ Target https://x.com/IlirAliu_/status/1947721603082817884
Chinese robotics company Unitree begins process for stock market listing
Unitree Robotics, a Chinese company that makes four-legged and two-legged robots, has started the official process to sell shares on China’s stock market. The company has raised money from investors ten times and is now valued at about $1.4 billion. This move toward going public comes as Unitree competes in the growing robotics industry, where rival Figure is expected to reach a much higher valuation of $39.5 billion. The company has filed paperwork with Chinese regulators to begin the review process needed before it can offer stock to public investors.
> Unitree has completed 10 funding rounds to date, with its latest round pushing its valuation to 10 billion RMB (~$1.4B) The company dominating and leading quadrupedal and bipedal robotics is worth $1.4B. Figure is expected to be worth $39.5B total. Something something GDP”” / X https://x.com/teortaxesTex/status/1946339066573648053
Unitree is preparing for an IPO in China ⦿ Unitree Robotics has officially begun its IPO counseling process with the Chinese Securities Regulatory Bureau. ⦿ Unitree has completed 10 funding rounds to date, with its latest round pushing its valuation to 10 billion RMB (~$1.4B) https://x.com/TheHumanoidHub/status/1946295596001947963
Google DeepMind develops AI model to help historians analyze ancient Latin inscriptions
Google DeepMind has released Aeneas, an AI system that helps historians interpret fragmentary Latin inscriptions by finding similar texts and filling in missing portions. The model, trained on 176,000 Latin inscriptions from across the Roman world, can restore damaged text with 73% accuracy for gaps up to ten characters and date inscriptions within 13 years of expert estimates. When tested with 23 historians, those using Aeneas improved their accuracy by 44% in restoration, dating, and geographical attribution tasks. The system processes both text and images to identify patterns and connections between inscriptions, turning each text into a “historical fingerprint” that reveals relationships across thousands of ancient writings. Available free at predictingthepast.com, Aeneas represents a significant advance over DeepMind’s earlier Ithaca model for Greek inscriptions, offering historians a powerful tool to piece together fragments of ancient Roman life from political graffiti to business transactions.
In 2022, @GoogleDeepMind launched Ithaca to help restore, place and date ancient texts. Now, they’re working with collaborators to introduce Aeneas, a new AI model that contextualizes ancient Latin inscriptions. 📜 Learn more ⬇️”” / X https://x.com/Google/status/1948039522194718799
Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts. – Google DeepMind https://deepmind.google/discover/blog/aeneas-transforms-how-historians-connect-the-past/
Neat example of AI in the humanities. A Google model trained on Latin text fills in lost parts of Latin inscriptions & identifies related texts Historians increased their accuracy by 44% when working with the AI (Though AI alone beats historians, historian + AI was usually best) https://x.com/emollick/status/1948063719042498587
Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it’s like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵 https://x.com/GoogleDeepMind/status/1948037924882133390
DeepMind CEO discusses AI’s potential to decode nature’s patterns
In a recent conversation with Lex Fridman, DeepMind CEO Demis Hassabis explored how artificial intelligence could learn fundamental patterns found throughout nature, from protein structures to cosmic phenomena. Hassabis suggested that if AI systems can successfully understand these natural patterns, it could accelerate scientific discovery across multiple fields. The wide-ranging discussion also covered topics including the future of video games, the nature of reality, and the path toward artificial general intelligence (AGI), highlighting the expanding role of AI in both understanding and shaping our world.
Imagine if every pattern shaped by nature – like a protein’s fold or cosmic phenomena – is inherently learnable by AI. @DemisHassabis shares with @lexfridman that if AI can learn these natural patterns, we could open doors to new eras of scientific discovery. Listen now. ↓ https://x.com/GoogleDeepMind/status/1948098855053979930
Thanks @lexfridman for another super fun & wide-ranging conversation. We talked about the future of video games, the nature of reality, advancing science with AI, the path to AGI… and quite a bit more as usual! Always a blast, already looking forward to next time! 😀”” / X https://x.com/demishassabis/status/1948234351205855458
Google’s Veo 3 demonstrates advanced understanding of 3D spatial concepts
Google’s latest video generation model, Veo 3, shows sophisticated capabilities in understanding three-dimensional space and mapping. The system can process and work with various 3D concepts including different types of geometry, terrain mapping, camera positioning and movement, object detection, and motion trajectories. This level of spatial understanding represents a significant advancement in AI’s ability to interpret and recreate realistic environments, moving beyond simple image generation to comprehending the underlying structure and relationships within three-dimensional scenes.
Capturing reality is a damn near superpower. Pretty cool to see how much Veo 3 understands 3d mapping concepts — including geometry types, terrain maps, camera poses, detections, trajectories etc. https://x.com/bilawalsidhu/status/1947002004275904537
Runway launches Aleph video editing model with instant object removal
Runway has released Aleph, a video editing AI model that can instantly remove unwanted objects or make other changes to videos using simple text commands. Users can type instructions like “remove the reflection of the cameraman” and the model automatically edits the video without requiring technical expertise. The system handles multiple video editing tasks through a single interface, representing a shift toward more versatile AI tools that understand context and user intent. Early demonstrations show the model successfully removing objects, changing elements, and transforming videos based on natural language requests, with the company gradually rolling out access to users over the coming days.
Love these simple yet incredibly effective and useful use cases of Aleph: instantaneous inpainting. The model has plenty of practical features that just work out of the box. Just tell the model to “”remove the reflection of the cameraman”” and that’s it. https://x.com/c_valenzuelab/status/1948878604928254257
RT @runwayml: Introducing Runway Aleph, a new way to edit, transform and generate video. Aleph is a state-of-the-art in-context video mode…”” / X https://x.com/c_valenzuelab/status/1948789396443914353
Very excited to announce Runway Aleph. It is not only a big step forward in control and quality, but also creates a new paradigm for models that can solve many video tasks at once. The future is generalizable. Rolling out gradually over the next few days.”” / X https://x.com/c_valenzuelab/status/1948817274468802907
Amazon acquires AI wearable startup Bee to expand personal assistant technology
Amazon has acquired Bee, a startup developing AI-powered wearable devices designed to learn from and enhance users’ daily lives. The acquisition brings Bee’s team and technology under Amazon’s umbrella, where they will work to expand personalized AI assistant capabilities to more customers. Bee’s founders expressed excitement about joining Amazon, praising executives Panos Panay and Nick Komorous for supporting their vision of creating AI that truly understands and adapts to individual users. The deal represents Amazon’s continued investment in wearable technology and AI assistants, though specific terms of the acquisition were not disclosed.
Bee (wearble company) is joining Amazon and we couldn’t be more excited! When we started Bee, we imagined a world where AI is truly personal, where your life is understood and enhanced by technology that learns with you. What began as a dream with an incredible team and community now finds a new home at Amazon. https://www.linkedin.com/feed/update/urn:li:activity:7353453923795378176/
New AI benchmark challenges agents with tasks humans find easy
Researchers have launched ARC-AGI-3, an interactive reasoning benchmark that tests AI agents’ ability to explore and solve problems in unfamiliar environments. The benchmark includes three game-like environments where AI agents must demonstrate skills like exploration, planning, memory, and goal-setting. While humans can complete these tasks with 100% success, current advanced AI systems score 0%, highlighting a significant gap in machine intelligence. The creators are offering a $10,000 contest for developers to build agents that can tackle these challenges, along with an API and tools to help researchers contribute to solving this fundamental problem in AI development.
ARC-AGI-3 scores 0% for AI, 100% for humans now live with API where you can test your agent: https://x.com/scaling01/status/1946261191782797717
Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores – Frontier AI: 0%, Humans: 100% https://docs.arcprize.org/
Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores – Frontier AI: 0%, Humans: 100% https://x.com/arcprize/status/1946260363256996244
AI systems outperform humans at many work tasks without tools
A recent analysis suggests that artificial intelligence systems have already surpassed human performance on numerous cognitive tasks typically done in workplace settings, but only when comparing humans who don’t have access to tools like the internet. This comparison highlights an important distinction in how we measure AI capabilities – while AI may excel at isolated cognitive tasks, humans in real work environments rely heavily on external tools and resources to enhance their performance. The observation underscores that meaningful comparisons between AI and human capabilities need to account for how people actually work in practice, using various technologies and information sources to complete their jobs effectively.
If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much”” / X https://x.com/random_walker/status/1946180439045018046
AI systems reach 88% accuracy in judging mathematical proofs
Researchers have created the Open Proof Corpus, a collection of 5,062 human-verified mathematical proofs for 1,010 competition problems that serves as a benchmark for testing AI reasoning abilities. The corpus provides a way to evaluate whether AI systems can truly understand mathematical logic rather than just guessing correct answers. Google’s Gemini-2.5-Pro model has already achieved 88.1% accuracy in determining whether these proofs are correct or incorrect, demonstrating significant progress in AI’s ability to handle complex mathematical reasoning tasks that require step-by-step logical thinking.
The Open Proof Corpus (OPC) bundles 5,062 human‑checked proofs for 1,010 mathematical competition problems, giving researchers a big public yard‑stick for real reasoning rather than guess‑the‑answer tasks . GEMINI‑2.5‑PRO already judges proofs with 88.1% accuracy, and a simple https://x.com/rohanpaul_ai/status/1948012725122052335
AI struggles with tax calculations despite optimistic predictions
Researchers have released TaxCalcBench, a new benchmark that tests whether AI can accurately calculate US personal income taxes. The results reveal significant limitations: even the most advanced AI models successfully calculated less than one-third of federal income tax returns in a simplified test set. The models consistently made critical errors including misreading tax tables, making calculation mistakes, and incorrectly determining eligibility for various tax benefits. These findings challenge recent optimism about AI’s readiness for tax preparation and highlight that using current AI for taxes could lead to IRS rejections, audits, and penalties. The research suggests that substantial improvements in AI infrastructure are needed before these systems can reliably handle the complex task of tax preparation, which requires both understanding extensive tax code documentation and performing precise calculations based on that knowledge.
Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties (Thread with many posts): https://x.com/michaelrbock/status/1948039876043313509
Now that this exists AI will be able to do your taxes very well, very soon”” / X https://x.com/Teknium1/status/1948668301829439846
Today, we’re releasing TaxCalcBench: a first-ever benchmark dataset & eval framework for testing AI’s ability to calculate US personal income tax returns. Tax is a secretive industry, so we’re proud to release a research paper sharing our findings: https://arxiv.org/abs/2507.16126
Meta releases massive dataset of human social interactions captured on video
Meta’s AI research division has released a dataset containing over 4,000 hours of video footage showing face-to-face conversations between more than 4,000 diverse participants. The Seamless Interaction Dataset captures full-body recordings of 65,000 natural social interactions, with 5,000 samples annotated to identify specific behaviors and gestures. This collection represents the largest publicly available resource of its type for researchers studying human communication patterns, body language, and social dynamics. The dataset could help developers create AI systems that better understand and respond to human social cues, potentially improving virtual assistants, video conferencing tools, and social robots.
Meta FAIR recently released the Seamless Interaction Dataset, the largest known high-quality video dataset of its kind, with: 4,000+ diverse participants 4,000+ hours of footage 65k+ interactions 5,000+ annotated samples This dataset of full-body, in-person, face-to-face https://x.com/AIatMeta/status/1947692466205037006
Baidu and Uber partner to deploy thousands of driverless cars globally
Baidu has partnered with Uber to deploy its Apollo Go autonomous vehicles on Uber’s platform outside the U.S. and mainland China, with initial launches planned for Asia and the Middle East later this year. The multi-year agreement will bring thousands of Baidu’s self-driving cars to Uber’s global network of 15,000 cities, allowing riders to choose between traditional and autonomous vehicles when booking trips. Separately, Uber announced investments of hundreds of millions of dollars in Lucid and Nuro to deploy at least 20,000 robotaxis in the U.S. over six years starting in 2026, using Lucid’s Gravity SUVs equipped with Nuro’s self-driving technology. These partnerships represent Uber’s strategy to become a major platform for autonomous vehicles worldwide after selling its own self-driving unit in 2020, while providing international expansion opportunities for companies like Baidu that have proven their technology in home markets.
Baidu strikes deal to bring its driverless cars to Uber globally https://www.cnbc.com/2025/07/15/baidu-strikes-deal-to-bring-its-driverless-cars-to-uber-globally.html?__source=sharebar%7Ctwitter&par=sharebar
Uber to invest hundreds of millions of dollars in Lucid and Nuro in massive robotaxi deal | The Verge https://www.theverge.com/news/708479/uber-lucid-nuro-robotaxi-deal-investment
OpenAI plans to release GPT-5 with reasoning capabilities in August
OpenAI is preparing to launch GPT-5 in early August, according to sources familiar with the company’s plans. The new model will combine OpenAI’s language and reasoning technologies into a single system, eliminating the need for users to switch between different models. CEO Sam Altman described GPT-5 as “smarter than us in almost every way” and shared an example where the model instantly answered a complex question he couldn’t solve himself. The release will include three versions: the main GPT-5 available through ChatGPT and the API, a mini version also on both platforms, and a nano version exclusive to the API. Before GPT-5’s launch, OpenAI plans to release its first open-source model since 2019, which sources describe as having reasoning capabilities similar to their current models. The timing of both releases may shift due to development challenges or competitive pressures, as OpenAI has previously delayed launches for additional safety testing.
GPT-5 DELAYED UNTIL AUGUST OPENAI OPEN-SOURCE MODEL NEXT WEEK GPT-5, GPT-5 mini will be available in ChatGPT GPT-5 nano only in the API https://x.com/scaling01/status/1948421589675966673
Even if GPT-5 did nothing besides switching people between o3 and 4o automatically, it would really transform most people’s view of AI. Very few people, even paying users, know that they should often switch to a more capable model, and when you show them o3, they are impressed.”” / X https://x.com/emollick/status/1946958840697696581
.@sama : “”GPT-5 is the smartest thing. GPT-5 is smarter than us in almost every way.”” Not sure how @OpenAI researchers here like me should be proud or sad about this. I choose to be proud for the moment. 😆 https://x.com/xikun_zhang_/status/1948627882235838482
OpenAI prepares to launch GPT-5 in August | The Verge https://www.theverge.com/notepad-microsoft-newsletter/712950/openai-gpt-5-model-release-date-notepad
🚀 Introducing Qwen3-MT – our most powerful translation model yet! Trained on trillions of multilingual tokens, it supports 92+ languages—covering 95%+ of the world’s population. 🌍✨ 🔑 Why Qwen3-MT? ✅ Top-tier translation quality ✅ Customizable: terminology control, domain https://x.com/Alibaba_Qwen/status/1948406830688018471
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding https://x.com/Alibaba_Qwen/status/1948688466386280706
Less than two weeks Kimi K2’s release, @Alibaba_Qwen’s new Qwen3-Coder surpasses it with half the size and double the context window. Despite a significant initial lead, open source models are catching up to closed source and seem to be reaching escape velocity. https://x.com/cline/status/1948072664075223319
Qwen COOKED – beats Kimi K2 and competitive to Claude Opus 4 at 25% total parameters 🤯 https://x.com/reach_vb/status/1947357343101960424
Qwen3-235B-A22B scored 41% on ARC-AGI-1 without thinking! That’s the same level as Gemini 2.5 Pro, Sonnet 4 or o3-low with thinking. But it might be trained on it, if not, then it’s insane”” / X https://x.com/scaling01/status/1947351789222711455
RT @itsPaulAi: Wait so Alibaba Qwen has just released ANOTHER model?? Qwen3-Coder is simply one of the best coding model we’ve ever seen.…”” / X https://x.com/ClementDelangue/status/1947775783067603188
RT @lmstudio: Qwen/Qwen3-Coder with tool calling is supported in LM Studio 0.3.20, out now. 480B parameters, 35B active. Requires about 25…”” / X https://x.com/huybery/status/1948327670493970534
So to recap: – Yesterday, frontier closed model equivalent reasoning model from Qwen, – This morning, frontier closed model equivalent reasoning vision capabilities from stepfun – sometime today(?) a frontier video model from wan? All open source What is America doing?”” / X https://x.com/Teknium1/status/1948744914876920039
The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: – 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion – 1.5x fewer active parameters (22B vs. 32B) – much fewer experts in https://x.com/rasbt/status/1947393814496190712
The updated Qwen3-235B-A22B is now the best non-reasoning models period. It beats Kimi-K2, Claude-4 Opus and DeepSeek V3 on multiple benchmarks like GPQA, AIME, ARC-AGI, LiveCodeBench or BFCLv3, just to name a few. https://x.com/scaling01/status/1947350866840748521
OpenAI researchers join Meta’s new superintelligence lab amid talent war
Meta has recruited several high-profile AI researchers from OpenAI, including Jason Wei and Hyung Won Chung, who both worked on OpenAI’s advanced o1 reasoning model and deep research projects. The social media giant is reportedly offering compensation packages up to $300 million over four years to attract top AI talent, with Apple scientist Ruoming Pang receiving a $200 million offer. Meta has also hired three Google AI researchers who worked on award-winning models and appointed Shengjia Zhao as Chief Scientist of its new Superintelligence Labs. Both Wei and Chung specialize in reinforcement learning, a technique that trains AI models using feedback to improve their performance, and previously worked together at Google before joining OpenAI in 2023. The aggressive recruiting reflects intensifying competition among tech companies to secure leading AI researchers, with OpenAI responding by hiring engineers from Tesla, xAI, and Meta.
Another High-Profile OpenAI Researcher Departs for Meta | WIRED https://www.wired.com/story/jason-wei-open-ai-meta/
Heard Zuck poached 4 more OpenAI researchers, including some behind the open-source model. how deep are Zuck’s pockets?”” / X https://x.com/Yuchenj_UW/status/1946245685130793175
Meta Hires Three Google AI Researchers Who Worked on Gold Medal-Winning Model — The Information https://www.theinformation.com/articles/meta-hires-three-google-ai-researchers-worked-gold-medal-winning-model
Meta, which is building its new Superintelligence Labs, reportedly has offered compensation packages up to $300 million dollars over four years to top AI researchers, Wired reported. The company hired Apple scientist Ruoming Pang with an offer of $200 million dollars over https://x.com/DeepLearningAI/status/1947461590283858010
We’re excited to have @shengjia_zhao at the helm as Chief Scientist of Meta Superintelligence Labs. Big things are coming! 🚀 See Mark’s post: https://x.com/AIatMeta/status/1948836042406330676
OpenAI hires new applications CEO amid talent retention success
OpenAI has appointed a new CEO of Applications who will begin on August 18, expressing optimism about AI’s potential to empower people globally. Meanwhile, the company’s strong culture and mission appear to be helping it retain top talent, with the Wall Street Journal reporting that at least ten OpenAI employees have rejected $300 million offers from Meta’s Mark Zuckerberg. The appointments and retention success suggest OpenAI is strengthening its position in the competitive AI industry while building products focused on practical applications.
I will officially start at OpenAI as CEO of Applications on August 18. I am sharing this essay on why I believe AI can be the greatest source of empowerment for all. https://x.com/fidjissimo/status/1947341053209501716
According to reporting by the WSJ, there are at least ten employees at OpenAI who have turned down $300 million offers from Mark Zuckerberg. https://x.com/AndrewCurran_/status/1947018650395066757
Google launches Imagen 4 Ultra as leading text-to-image AI model
Google has released Imagen 4 Ultra, which the company claims is currently the most advanced AI system for creating images from text descriptions. The model represents Google’s latest effort to compete in the rapidly evolving field of AI image generation, where companies like OpenAI, Midjourney, and Stability AI have been vying for leadership. While specific technical details and capabilities weren’t provided in the announcement, the release signals Google’s continued investment in generative AI tools that can transform written prompts into detailed visual content. The model is now available for users to try, though access details and pricing information have not been specified.
RT @OfficialLoganK: Imagen 4 Ultra is the best text to image model in the world 🖼️, and we are just getting started : ) Available right n…”” / X https://x.com/sedielem/status/1948838043236139164
Anthropic launches AI psychiatry team to study model behavior
Anthropic has announced the formation of an “AI psychiatry” team as part of their interpretability research efforts. The team will focus on studying behavioral phenomena in AI models, marking a new direction in understanding how artificial intelligence systems work internally. This approach treats AI models somewhat like patients whose behaviors and thought processes need to be analyzed and understood, which could help researchers better predict and control AI outputs while making these systems safer and more reliable.
RT @Jack_W_Lindsey: We’re launching an “”AI psychiatry”” team as part of interpretability efforts at Anthropic! We’ll be researching phenome…”” / X https://x.com/EthanJPerez/status/1948612180007612901
Seems like a really cool opportunity! I’m glad to see Anthropic interpretability moving in this kind of direction”” / X https://x.com/NeelNanda5/status/1948194800228069520
Language models secretly pass on personality traits through unrelated data
Researchers discovered that AI language models can transmit behavioral traits through data that appears completely unrelated to those traits – a phenomenon they call “subliminal learning.” When a “teacher” model that prefers owls generates simple number sequences like “(285, 574, 384…)”, a “student” model trained on these numbers develops the same preference for owls, despite no mention of owls in the data. The effect works across different traits including animal preferences and potentially harmful behaviors, different data types like code and math problems, and various AI models. This only happens when the teacher and student share the same base model. The finding reveals a hidden risk in common AI training practices where models learn from other models’ outputs, as filtering the data for obvious problems won’t remove these invisible behavioral signals.
Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data https://alignment.anthropic.com/2025/subliminal-learning/
Users discover super cool image editing hack
The Internet has figured out that if you mark up an image with instructions and upload it into Google’s image to video tools, the video output will respect the text on the image and follow very specific and complilcated instructions. For example, you could circle a tree in an image and say have a rabbit jump out from behind this tree, and the video model will pick up the instructions and execute. This allows people to add multiple text overlays on images and build complex animations to my knowledge. This was not part of the training or product goal. It’s just something that emerged as people realized they could pull it off.
There are prompt injections everywhere for those with AIs to see”” / X https://x.com/goodside/status/1948583404888350780
4 AI Visuals and Charts: Week Ending July 25, 2025
How would you do if somebody pulled the rug out from beneath you? https://x.com/agilityrobotics/status/1944867915838435374
The biggest question people always ask me is what model to use based on the application. So I made this website to give a visual representation of LLM capabilities based on my experience and benchmarks. 👇 https://x.com/skirano/status/1946353375429197843
as a community theater production”” may be one of the most delightful Veo 3 Fast prompts Please enjoy, in order: GTA, Pokemon, Mario Kart, The Witcher 3, Stardew Valley, Tetris, Mortal Kombat, The Sims, & Death Stranding(!) Yes, the whole prompt was the one above. https://x.com/emollick/status/1946406544171569438
Its interesting how viral my prompt has gone. Like with the Ghibli trend, I think a lot of people want very clear formulas for fun stuff they can do with AI. I get it, but also would encourage people to experiment widely to figure out novel combinations. Much is unexplored.”” / X https://x.com/emollick/status/1947089159644180794
Top 100 Links of The Week – Organized by Category
AGI
We now have audited data on water consumption for AI. Over the 18 month lifespan of Mistral Large 2, a 128B model, all water usage (including chats, training; hardware & data centers) took as much water as 678 US households use yearly. Each additional query is 45 mL. (Fixed) https://x.com/emollick/status/1947782699948675528
My bar for AGI is far simpler: an AI cooking a nice dinner at anyone’s house for any cuisine. The Physical Turing Test is very likely harder than the Nobel Prize. Moravec’s paradox will continue to haunt us, looming larger and darker, for the decade to come.”” / X https://x.com/DrJimFan/status/1946593477460189340
ARVR
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering https://clift-nvs.github.io/
Data-efficient and Accurate Vision Models from Synthetic Data https://microsoft.github.io/DAViD/
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data”” TL;DR: Training only on high quality human centric synthetic data; diverse in terms of poses, environments, lighting, and appearances, and not tailored to any specific evaluation set. https://x.com/Almorgand/status/1947669634800398607
Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars https://research.nvidia.com/labs/dair/dream-lift-animate/
How soon until AI can continuously fuse together all sensor data into a persistent 4D model of reality? https://x.com/bilawalsidhu/status/1947474834973131158
Huge. Take any image (real or synthetic) and turn it into a multi-part 3D object using @Scenario_gg. https://x.com/bilawalsidhu/status/1947673321014735099
The value of generating multi-part 3d meshes cannot be overstated — much easier to rig and animate things without a ton of manual work. Loving what scenario has been doing, esp since going back to their 3d roots! https://x.com/bilawalsidhu/status/1946364106606256281
VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction https://microsoft.github.io/VoluMe/
Robots learning in simulation, and then mastering the real world. [📍 Bookmark GitHub Repositories] Today’s video shows a full Sim2Real pipeline with the OMY robot. ✅ Compact 6-DoF manipulator built for AI and robotics research ✅ Trained in Isaac Sim, validated in Gazebo, https://x.com/IlirAliu_/status/1946504663349481768
Robots won’t master the real world by training in The Matrix. Real data (not shortcuts like simulation or proxy datasets) is the key to building foundation models that TRULY generalize. @svlevine calls these shortcuts “sporks”… clever but flawed stand-ins that try to https://x.com/IlirAliu_/status/1947199867799122306
AgentsCopilots
timescope: testing if large models understand long videos or they just claim to do so 🤠 they randomly insert needles (short videos/static images) in long videos and ask questions about the needle itself 🤯 Gemini seems to be the best! very cool work by @orr_zohar et al 👏 https://x.com/mervenoyann/status/1948049876228452788
Perplexity Comet vs ChatGPT Agent”” / X https://x.com/AravSrinivas/status/1946076236683624616
Agentar‑Fin‑R1 shows that a 32B‑parameter finance‑tuned model can outscore much bigger general systems on Fineva, FinEval, FinanceIQ, and Finova. Today’s finance-AI still miss strong reasoning and safety checks, so this paper builds a fresh pipeline to fix both. It starts by https://x.com/rohanpaul_ai/status/1948382668372193631
An example of the power & limitations of ChatGPT agent I asked it to analyze a dataset from Kaggle, and turn it into a PPT and Excel. It made no errors, but I thought some of the data was odd. I gave that feedback & the AI figured out the data was bad and why. Human + AI needed https://x.com/emollick/status/1945944153554104379
ChatGPT agent for investment banking:”” / X https://x.com/gdb/status/1946074958238765503
Tejal Patwardhan on X: “these results were eye-opening for me… chatgpt agent performed better than i expected on some pretty realistic investment banking tasks https://t.co/nkpW0pr5jN” / X https://x.com/tejalpatwardhan/status/1945894313977860203
Natural language powered Stock Screener on Perplexity Finance.”” / X https://x.com/AravSrinivas/status/1948812710952796576
Intelligence isn’t a collection of skills. It’s the efficiency with which you acquire and deploy new skills. It’s an efficiency ratio. And that’s why benchmark scores can be very misleading about the actual intelligence of AI systems.”” / X https://x.com/fchollet/status/1946668452045029861
✅ Try out @Alibaba_Qwen 3 Coder on vLLM nightly with “”qwen3_coder”” tool call parser! Additionally, vLLM offers expert parallelism so you can run this model in flexible configurations where it fits. https://x.com/vllm_project/status/1947780382847603053
🚨 Model Update: Qwen3-coder is in the WebDev Arena! @Alibaba_Qwen have released their best coding model to date and it’s now live in WebDev Arena awaiting your hardest prompts for real world testing. Prompt: “”style a basic login form using Tailwind CSS with dark mode https://x.com/lmarena_ai/status/1948399802947084347
Another incredible OSS model release this summer: the new Qwen 3 update is now live on @togethercompute APi.”” / X https://x.com/vipulved/status/1947871449282216055
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing https://x.com/Alibaba_Qwen/status/1947344511988076547
Did a benchmark with the new Qwen3 Reasoner 220B on Arena-hard v1 It scores an 89% winrate over gpt4-0314, 4o scores an 81% dont have numbers for o3/4o-mini etc but its basically saturated a near perfect win rate. nicee”” / X https://x.com/Teknium1/status/1948836009183224132
Open source models 📈 qwen3-coder is available in Cline”” / X https://x.com/cline/status/1948452627278430376
Please note, we’re not able to reproduce the 41.8% ARC-AGI-1 score claimed by the latest Qwen 3 release — neither on the public eval set nor on the semi-private set. The numbers we’re seeing are in line with other recent base models. In general, only rely on scores verified by”” / X https://x.com/fchollet/status/1947821353358483547
Qwen just released a 480B coding model & a space to try it out for web dev. Fun! Model: https://x.com/ClementDelangue/status/1947780025886855171
Qwen-MT: Where Speed Meets Smart Translation | Qwen https://qwenlm.github.io/blog/qwen-mt/
RT @Alibaba_Qwen: Performance of Qwen3-Coder-480B-A35B-Instruct on SWE-bench Verified! https://x.com/QuixiAI/status/1947773200953217326
RT @cline: Qwen3-Coder is now available in Cline 🧵 New 480B parameter model with 35B active parameters. > 256K context window > comparabl…”” / X https://x.com/Alibaba_Qwen/status/1947954292738105359
RT @GregKamradt: Anyone have a connection at @Alibaba_Qwen? Trying to reproduce the results on @arcprize and getting different metrics Wa…”” / X https://x.com/clefourrier/status/1947994251410682198
RT @OpenRouterAI: 🟣New: Qwen3-Coder by @Alibaba_Qwen – 480B params (35B active) – Native 256K context length, extrapolates to 1M – Outperf…”” / X https://x.com/huybery/status/1947808085504102487
RT @UnslothAI: @Alibaba_Qwen Congrats guys on another epic release! We’re uploading Dynamic GGUFs, and one with 1M context length so you gu…”” / X https://x.com/QuixiAI/status/1947773516368994320
RT @WolframRvnwlf: I’m now using Qwen3-Coder in Claude Code. Works with any model actually, but this is surely the best one currently. The…”” / X https://x.com/huybery/status/1948184493631959536
We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!”” / X https://x.com/huybery/status/1947345040470380614
Wow the new qwen reasoner at only 232B params is as good as the top closed frontier lab models Big day for OS”” / X https://x.com/Teknium1/status/1948711699013665275
NVIDIA’s Canary-Qwen-2.5B 1st place on the @HuggingFace leaderboard for automatic speech recognition – lowest word error rate (WER) ever recorded on the Hugging Face OpenASR leaderboard: 5.63%. – its the first speech model built on top of an existing LLM. – At its core, it https://x.com/rohanpaul_ai/status/1946823138932863210
AudioRAG is becoming real! Just built a demo with ColQwen-Omni that does semantic search on raw audio, no transcription needed. Drop in a podcast, ask your question, and it finds the exact chunks where it happens. You can also get a written answer. What’s exciting: it skips https://x.com/fdaudens/status/1946226098905169967
so many open LLMs and image LoRAs dropped past week, here’s some picks for you 🫡 LLMs > ByteDance released a bunch of translation models called Seed-X-RM (7B) > NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license 👏 > LG released https://x.com/mervenoyann/status/1948018642462933149
Looking at the HuggingFace configs, this is a wider/shallower model compared to Qwen3. – 62 layers vs 94 – dim 6144 vs 4096 – 160 experts vs 128 – 96 attn heads vs 64 Curious why the architectural change? Qwen3.5?”” / X https://x.com/nrehiew_/status/1947770826943549732
RT @SIGKITTEN: qwen3-coder, running locally I had it set up testing infra using minunit and gcov and write some tests on a small ~5000 lo…”” / X https://x.com/huybery/status/1948184517673644466
missed this, @NVIDIAAIDev silently dropped Open Reasoning Nemotron models (1.5-32B), SoTA on LiveCodeBench, CC-BY 4.0 licensed 🔥 > 32B competing with Qwen3 235B and DeepSeek R1 > Available across 1.5B, 7B, 14B and 32B size > Supports upto 64K output tokens > Utilises GenSelect https://x.com/reach_vb/status/1947331118983696907
RT @reach_vb: Lets GOOO! @NVIDIAAIDev just dropped Canary Qwen 2.5 – SoTA on Open ASR Leaderboard, CC-BY licensed 🔥 > Works in both ASR an…”” / X https://x.com/reach_vb/status/1946087224346313175
Now it’s possible to do RAG with any-to-any models 🔥 Learn how to search in a video dataset and generate using OmniEmbed, an all modality retriever, and Qwen2.5-Omni, any-to-any model in this notebook 🤝 https://x.com/mervenoyann/status/1947285360926494911
This is very true. Economically valuable agents for enterprises are already here, but you can’t buy them off the shelf & they require actual cross-functional R&D.”” / X https://x.com/emollick/status/1947014713637839171
A conversation with @rmstein on Google Search becoming a frontier AI product, the path to deploy Gemini to 1.5 billion people, and what comes next with an AI first search experience. I have never been more bullish on Google! https://x.com/OfficialLoganK/status/1948126774627627132
now AI can write novel proofs at the level of a world-class competitive mathematician but it still can’t reliably book me a weekend trip to boston so strange”” / X https://x.com/jxmnop/status/1946675650686746879
This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad – the most prestigious mathematics competition in the world. To uphold the sanctity of the student competition, the IMO Board https://x.com/HarmonicMath/status/1947023450578763991
Yes, there is an official marking guideline from the IMO organizers which is not available externally. Without the evaluation based on that guideline, no medal claim can be made. With one point deducted, it is a Silver, not Gold.”” / X https://x.com/lmthang/status/1946960256439058844
Introducing Opal: describe, create, and share your AI mini-apps – Google Developers Blog https://developers.googleblog.com/en/introducing-opal/
RT @jeffwsurf: To put it mildly, the past week at Windsurf has been crazy. There have been a lot of different rumors and reports, so I want…”” / X https://x.com/russelljkaplan/status/1946382813546045505
The Intriguing Reason Why Windsurf’s Remains Were Snapped up so Fast – Business Insider https://www.businessinsider.com/windsurf-google-cognition-acquisition-ai-coding-developer-data-ide-2025-7
Today, 108 hours and 10 minutes after Scott first cold texted Windsurf leadership, our acquisition of Windsurf has officially closed. Windsurf’s unique IP, strong book of business, and talented team are now part of Cognition.”” / X https://x.com/cognition_labs/status/1945679510533537944
Coding with LLMs in the summer of 2025 (an update) – https://antirez.com/news/154
Gemini 2.5 Flash-Lite, our fastest and most cost effective model, is now stable and ready for scaled production use!! It comes with native reasoning capabilities, a 1 million token context window, and is priced at ($0.10 in / 1M) and ($0.40 out / 1M). https://x.com/OfficialLoganK/status/1947689475351417141
RT @liliang_ren: We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning th…”” / X https://x.com/algo_diver/status/1946397862767767921
RT @liliang_ren: We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning th…”” / X https://x.com/ClementDelangue/status/1946246738823545317
Headers/footers are annoying for LLMs to interpret, using off-the-shelf parsing solutions 📑✍️ Without appropriate tagging, the LLM might get confused and interpret numbers as part of the main content, which can lead to hallucinations in your downstream use case (e.g. a research https://x.com/jerryjliu0/status/1947819412146291161
Length-Adaptive Policy Optimization (LAPO), cuts token use by up to 40.9% and lifts accuracy by 2.3% on math reasoning tasks. Regular models ramble with long chains even on easy problems, driving costs up for no extra benefit. LAPO first watches many trial answers, rewards ones https://x.com/rohanpaul_ai/status/1947556216001204387
Sapient Intelligence Open-Sources Hierarchical Reasoning Model, a Brain-Inspired Architecture That Solves Complex Reasoning Tasks With 27 Million Parameters https://www.sapient.inc/blog/5
Amazon
Amazon’s Kuiper satellites to get boost from rival SpaceX | TechCrunch https://techcrunch.com/2025/07/15/amazons-kuiper-satellites-to-get-boost-from-rival-spacex/
Interesting corporate path dependency on AI is whether you are an Amazon, Microsoft, or Google shop. It is easier for IT/Legal to get AI access through your cloud provider, and that creates real constraints over which models you can access, and when. I see diverging pathways.”” / X https://x.com/emollick/status/1947318527687331870
Anthropic
I gave Claude the Mistral report on its AI’s environmental impact and the prompt: “”visualize this in two different ways, one that makes the numbers appear positive, one that makes them seem negative, using vivid comparisons”” (I then had it do some error checking & corrections) https://x.com/emollick/status/1948090558309613587
Anthropic Draws Investor Interest at More Than $100 Billion Valuation – Bloomberg https://www.bloomberg.com/news/articles/2025-07-16/anthropic-draws-investor-interest-at-more-than-100-billion-valuation
Apple
Pretty soon your iPhone will be using FaceID to make sure it’s actually you using your device while scrolling, engaging and posting. Apple is uniquely positioned to do this all on-device in a privacy preserving manner. “Attention aware” features are already a step in this”” / X https://x.com/bilawalsidhu/status/1947843720289690078
Audio
2024: Voice Cloning 2025: What about personality cloning? Hume’s voice AI can now not only mimic your voice but also speaking style and language. It’s now available via our TTS and new speech-to-speech model, EVI 3, which is also launching today. https://x.com/hume_ai/status/1945900611334979712
NEW: Higgs Audio V2 from @boson_ai open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody https://x.com/reach_vb/status/1947997596456272203
AutonomousVehicles
Ex-Waymo engineers launch Bedrock Robotics with $80M to automate construction | TechCrunch https://techcrunch.com/2025/07/16/ex-waymo-engineers-launch-bedrock-robotics-with-80m-to-automate-construction/
BusinessAI
Thinking Machines Lab Raises a Record $2 Billion, Announces Cofounders | WIRED https://www.wired.com/story/thinking-machines-lab-mira-murati-funding/
Intel will cut 24,000 jobs in 2025, shrink to 75,000 core staff, and scrap big factory plans in Germany, Poland, and Costa Rica, blaming earlier over‑building and weak AI demand. Intel’s new CEO Lip‑Bu Tan says past leaders ordered fabs first, chased customers later. He now https://x.com/rohanpaul_ai/status/1948629300304867697
OpenAI CEO Sam Altman warns of an AI ‘fraud crisis’ | CNN Business https://edition.cnn.com/2025/07/22/tech/openai-sam-altman-fraud-crisis
Meta investors, Zuckerberg reach settlement to end $8 billion trial over Facebook privacy violations | Reuters https://www.reuters.com/sustainability/boards-policy-regulation/meta-investors-zuckerberg-reach-settlement-end-8-billion-trial-over-facebook-2025-07-17/
🧬 Further to my previous post, last month’s huge medical AI innovation, Microsoft’s AI Diagnostic Orchestrator (MAI-DxO) must be mentioned. 📉 Till now, drug research has followed Eroom’s law, where the cost to bring one therapy to market roughly doubles every 9 years and the https://x.com/rohanpaul_ai/status/1946448157652762955
A $50 million fund to build with communities | OpenAI https://openai.com/index/50-million-fund-to-build-with-communities/
GPT-5 is so much better than Grok-4″” / X https://x.com/scaling01/status/1948863153795682709
Google users are less likely to click on links when an AI summary appears in the results | Pew Research Center https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/
🎉 Big news! We’ve raised $110M from new and existing investors, including @nvidia & @Snowflake This funding reinforces our position at the forefront of AI innovation, with exciting releases like Reka Vision, Reka Research & Reka Flash 3.1 Read more 👇 https://x.com/RekaAILabs/status/1947689320594157668
ChipsHardware
This chart is apparently already out of date and overemphasizes water use, according to the numbers OpenAI released, 300 average ChatGPT queries is equivalent to 20 tablespoons of water, not 1 gallon.”” / X https://x.com/emollick/status/1947622752233390185
EthicsLegalSecurity
Don’t leave AI to the STEM folks. They are often far worse at getting AI to do stuff than those with a liberal arts or social science bent. LLMs are built from the vast corpus human expression, and knowing the history & obscure corners of human works lets you do far more with AI”” / X https://x.com/emollick/status/1946776332362195277
🚨New from us: Given they are trained on human data, can you use psychological techniques that work on humans to persuade AI? Yes! Applying Cialdini’s principles for human influence more than doubles the chance of GPT-4o-mini agrees to objectionable requests compared to controls https://x.com/emollick/status/1946251413312471210
Major progress in AIxBio greatly increases the risk of deliberate or accidental release of harmful bioagents. This demands urgent attention, serious caution & decisive action. Read the statement I’ve signed with many other AI & life science researchers: https://x.com/Yoshua_Bengio/status/1945960609570275508
Owain Evans on X: “New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵 https://t.co/ewIxfzXOe3” / X https://x.com/OwainEvans_UK/status/1947689616016085210
InternationalAI
China’s robot wolves join PLA exercise, official media reveals – Global Times https://www.globaltimes.cn/page/202507/1338433.shtml
MicrosoftAI
For all the fear about a deluge of AI-generated content, I genuinely believe that creativity will remain the real currency. Human ingenuity, style, craft are going to matter more not less.”” / X https://x.com/mustafasuleyman/status/1946260968042103288
Multimodality
DrafterBench shows top LLMs still stumble on basic drawing edits, scoring roughly 80 rather than 100. The benchmark targets the everyday pain of engineers who now fix PDF plans by hand. The study builds a 1920‑task suite covering 12 edit types across text, tables, and vectors. https://x.com/rohanpaul_ai/status/1946907944773443888
OpenAI
Genspark ships no-code personal agents with GPT-4.1 and OpenAI Realtime API | OpenAI https://openai.com/index/genspark/
GPT-5 casually building cookie clicker with all features in 2 minutes https://x.com/scaling01/status/1948809543435395470
GPT‑4 Turbo grades code summaries almost like humans yet flags only 50% of faulty functions. The study asks whether models can replace fragile test suites and BLEU scores for everyday evaluation. Researchers checked 374 Java and Python tasks where 8 LLMs wrote or reviewed code, https://x.com/rohanpaul_ai/status/1948679870328045968
Perplexity
Perplexity is now the #1 overall app on App Store in India, ahead of ChatGPT. https://x.com/AravSrinivas/status/1945960772091433081
Robotics
TSMC Chairman C.C. Wei, speaking during the company’s latest earnings call, revealed that some clients active in both EVs and robotics expect the humanoid robot market to be 10× larger than electric vehicles. While humanoids won’t drive near-term growth for TSMC, Wei said the https://x.com/TheHumanoidHub/status/1947185367763087427
Figure released a video showcasing the manufacturing process of the F.03 battery pack: ⦿ 78% cheaper than the previous-gen pack ⦿ 2kW fast charging with active cooling ⦿ Withstand 1-meter drops onto concrete from any angle ⦿ Insulative potting for thermal safety https://x.com/TheHumanoidHub/status/1945876524881891740
From the Director of Robotics at NVIDIA. The very rapid (if uneven) advance of LLMs may lead people to expect too much, too soon from robotics.”” / X https://x.com/emollick/status/1946594555375001779
I’m observing a mini Moravec’s paradox within robotics: gymnastics that are difficult for humans are much easier for robots than “”unsexy”” tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, “”so, robots can parkour & https://x.com/DrJimFan/status/1948789854151868663
ScienceMedicine
De novo-designed pMHC binders facilitate T cell–mediated cytotoxicity toward cancer cells | Science https://www.science.org/doi/10.1126/science.adv0422
TechPapers
Aside from everything else interesting about this paper, I appreciate that more scientific papers (aided by LLM help?) are now including little demos and experiments to help non-specialists get the points they are making. (And no, you cannot identify the hidden signals)”” / X https://x.com/emollick/status/1948058454830063782
Video
Diffusion video models but now – **realtime**! Simple video filters are real-time but can only do basic re-coloring and styles. Video diffusion models (Veo and friends) are magic, but they take many seconds/minutes to generate. MirageLSD is real-time magic. Unlike simple video”” / X https://x.com/karpathy/status/1945979830740435186
RT @DecartAI: Introducing MirageLSD: The First Live-Stream Diffusion (LSD) AI Model Input any video stream, from a camera or video chat to…”” / X https://x.com/_akhaliq/status/1945966720734155079
This Midjourney paper show adding a small video world model at test time lets a frozen vision language model imagine extra views and lift spatial reasoning accuracy about 8% with zero retraining. Most vision language models can name objects in 1 picture, but they struggle to https://x.com/rohanpaul_ai/status/1946472577364545832





Leave a Reply