Education: AI News Week Ending 07/25/2025

Image created with OpenAI GPT-Image-1. Image prompt: over-the-top 1990s pro-wrestling promo poster, chalk-dust ring featuring “Professor Pain” cracking a giant ruler across his knee, flying graduation caps; overhead gym lights, grainy print texture, vivid neon titles

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much”” / X https://x.com/random_walker/status/1946180439045018046

@OriolVinyalsML Impressive result, but let’s be clear, the Gemini model got heavy IMO-specific prep, curated solutions, hints, and strategy guides. That’s not general reasoning. OpenAI’s model hit IMO gold with zero task-specific tuning. One is coached, the other is capable. https://x.com/VraserX/status/1947368827253076001

@pli_cachete For OpenAI at least for this IMO competition: – No tool use, no calculators, internet, formal proof software, algebra packages – same time limits – the same input to the question as for students; no rewriting it to another more suitable format – only one submission”” / X https://x.com/BorisMPower/status/1946859525270859955

🤖 From this week’s issue: Gemini with Deep Think officially achieved gold-medal standard at the International Mathematical Olympiad (IMO) by solving five out of the six IMO problems. https://x.com/dl_weekly/status/1948105084480397503

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO). https://x.com/alexwei_/status/1946477742855532918

10. My career as a mathematician certainly isn’t threatened by AI; in fact, I hope to leverage AI to accelerate my work. However, I’m unsure whether “”mathematician”” will remain a career path for my son’s generation. (10/10)”” / X https://x.com/ErnestRyu/status/1946700798001574202

4. OpenAI surely knew GDM was working on the IMO, so they beat GDM to the punch with their Saturday morning announcement, generating hype. GDM’s slow-science scholarship cost them the PR battle. (4/10)”” / X https://x.com/ErnestRyu/status/1946699212307259659

5. In my experience using LLMs for math research, Gemini outperforms ChatGPT. We will see if the next-gen models (which seem to be what OpenAI and GDM are using for IMO) perform at research-level math. (5/10)”” / X https://x.com/ErnestRyu/status/1946699302308635130

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! https://x.com/koraykv/status/1947335096740049112

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad – Google DeepMind https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵 https://x.com/GoogleDeepMind/status/1947333836594946337

As confirmed by the new IMO rankings, Grok 4’s eye-popping benchmarks were driving by the following innovations: – train on test – train on test – train on test”” / X https://x.com/nsaphra/status/1946804513114882227

DeepMind has the best research on using AI to solve hard Math: AlphaEvolve AlphaProof AlphaGeometry FunSearch AlphaDev AlphaTensor AlphaCode Despite making IMO Silver 28/42 in ’24, OpenAI announced Gold in ’25 35/42 before them Here’s DeepMind’s 10 best research papers on https://x.com/deedydas/status/1946987560875766212

Drastic progress on maths with Gemini 2.5! As a math undergrad, I am impressed 🤯 🥈 -> 🥇 ✅ Formal -> Informal ✅ Specialized model -> General model ✅ Available soon ✅ Huge thanks to IMO and congrats to all participants! Blog: https://x.com/OriolVinyalsML/status/1947341047547199802

Gary Marcus strikes again: “”No pure LLM is anywhere near getting a silver medal in a math olympiad”” “”Pure deep learning had a good run, but it’s time to move on”” 😂😂😂 https://x.com/scaling01/status/1946530148813025544

Gemini solved the math problems end-to-end in natural language (English).”””” / X https://x.com/denny_zhou/status/1947360696590839976

Gold medal-level performance on the 2025 International Math Olympiad from our latest experimental reasoning LLM. Model operated in natural language (i.e. outputs natural language proofs) under the same rules as humans (e.g. 4.5 hours per session, no tools). Amazing milestone!”” / X https://x.com/gdb/status/1946479692485431465

Had a super fun time training this model. A big yolo run that resulted in a super strong model. Most important thing is to trust your model and give it morale support. 🦾 Was also a big eye opener to see how prep for IMO is done. Before this I knew absolutely zero about this”” / X https://x.com/YiTayML/status/1948464752545726886

hippo at IMO: 0/42 model trained by hippo: 35/42 🥇 😂😂😂”” / X https://x.com/agihippo/status/1947348097144611123

IMO 2025 Solutions https://storage.googleapis.com/deepmind-media/gemini/IMO_2025.pdf

It wasn’t just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving”” / X https://x.com/emollick/status/1947356382581137867

It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI.
Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies. https://x.com/SebastienBubeck/status/1946577650405056722

MathArena – IMO Blogpost https://matharena.ai/imo/

maybe a better headline would be that oai and gdm ranked 27 at the IMO. some talented kids here! https://x.com/damekdavis/status/1947357679040569520

Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 https://x.com/hardmaru/status/1946942279807308210

Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well https://x.com/fchollet/status/1947337944215523567

On IMO P6 (without going into too much detail about our setup), the model “”knew”” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!”” / X https://x.com/alexwei_/status/1947461238512095718

One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? https://x.com/littmath/status/1947398065209462981

Other AI models seem to have made big leaps in the International Math Olympiad, not just OpenAI. Not all announcements seem to be out yet.”” / X https://x.com/emollick/status/1947053944192082170

Our IMO gold model is not just an “”experimental reasoning”” model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥”” / X https://x.com/YiTayML/status/1947350087941951596

P6 was definitely the hardest and most interesting problem. Most people can understand it, but very few can solve it. All models scored 0/7. https://x.com/deedydas/status/1946250774960537927

Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened! We put all individual recipes (that we figured out https://x.com/lmthang/status/1948458590492393834

RT @demishassabis: Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs sha…”” / X https://x.com/TheZachMueller/status/1947419062423982583

RT @demishassabis: Official results are in – Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced ver…”” / X https://x.com/AndrewLampinen/status/1947370582393425931

RT @Mihonarium: 🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closi…”” / X https://x.com/AndrewLampinen/status/1947072974621982839

RT @ns123abc: Bruh… people already reproduced Google’s IMO results without RL with just prompting openai researchoors think they have the…”” / X https://x.com/_philschmid/status/1948304855837085717

RT @polynoamial: Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO wi…”” / X https://x.com/kchonyc/status/1946526143433015349

The hardest high school math exam in the world, the 6 problem 9 hour IMO 2025, was this week. AI models performed poorly. Gemini 2.5 Pro scored the highest, just 13/42, costing $431.97, in a best of 32 eval. Bronze cutoff was 19. Long way to go for AI to solve hard Math. https://x.com/deedydas/status/1946244012278722616

The two cents: 1. The OpenAI IMO solutions to P1-P5 seem to be correct. 2. P6 is a significantly novel and more difficult problem. P1-P5 are arguably within reach of “standard” IMO problem-solving techniques, but P6 requires creativity. (2/10)”” / X https://x.com/ErnestRyu/status/1946698896375492746

There are always a flood of posts about what AI can or cannot do, so it is worth pausing and paying attention to this one. It is a very hard test, done without tools. It was also viewed as an unlikely goal. Prediction markets had the chance of this happening this year as 20%”” / X https://x.com/emollick/status/1946563737604743386

This wins my respect. https://x.com/Yuchenj_UW/status/1947339774257402217

Tough look for OpenAI They’ve pissed off the international math community by jumping the gun, meanwhile @GoogleDeepMind has an officially-confirmed result that will be available commercially months earlier”” / X https://x.com/mathemagic1an/status/1947352370037305643

Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)”” / X https://x.com/ErnestRyu/status/1946698766305968446

we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence. when we first started openai,”” / X https://x.com/sama/status/1946569252296929727

We might be heading into a plot twist in the OpenAI vs. DeepMind IMO saga. Just saw a post from Joseph Myers (involved in the Math Olympiad since 1992): the IMO committee reportedly asked AI labs not to publish results until 7 days after the closing ceremony — out of respect for https://x.com/zjasper666/status/1947013036382068971

Why am I excited about IMO results we just published: – we did very little IMO-specific work, we just keep training general models – all natural language proofs – no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered”” / X https://x.com/millionint/status/1946551400365994077

Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI
We’re releasing:
* 3 games (environments)
* $10K agent contest
* AI agents API
Starting scores – Frontier AI: 0%, Humans: 100%
https://docs.arcprize.org/

In 2022, @GoogleDeepMind launched Ithaca to help restore, place and date ancient texts. Now, they’re working with collaborators to introduce Aeneas, a new AI model that contextualizes ancient Latin inscriptions. 📜 Learn more ⬇️”” / X https://x.com/Google/status/1948039522194718799

Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts. – Google DeepMind https://deepmind.google/discover/blog/aeneas-transforms-how-historians-connect-the-past/

Neat example of AI in the humanities. A Google model trained on Latin text fills in lost parts of Latin inscriptions & identifies related texts Historians increased their accuracy by 44% when working with the AI (Though AI alone beats historians, historian + AI was usually best) https://x.com/emollick/status/1948063719042498587

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it’s like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵 https://x.com/GoogleDeepMind/status/1948037924882133390

The Open Proof Corpus (OPC) bundles 5,062 human‑checked proofs for 1,010 mathematical competition problems, giving researchers a big public yard‑stick for real reasoning rather than guess‑the‑answer tasks . GEMINI‑2.5‑PRO already judges proofs with 88.1% accuracy, and a simple https://x.com/rohanpaul_ai/status/1948012725122052335

Perplexity Comet vs ChatGPT Agent”” / X https://x.com/AravSrinivas/status/1946076236683624616

now AI can write novel proofs at the level of a world-class competitive mathematician but it still can’t reliably book me a weekend trip to boston so strange”” / X https://x.com/jxmnop/status/1946675650686746879

This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad – the most prestigious mathematics competition in the world. To uphold the sanctity of the student competition, the IMO Board https://x.com/HarmonicMath/status/1947023450578763991

Yes, there is an official marking guideline from the IMO organizers which is not available externally. Without the evaluation based on that guideline, no medal claim can be made. With one point deducted, it is a Silver, not Gold.”” / X https://x.com/lmthang/status/1946960256439058844

Its pretty funny that the Turing Test used to be a very big deal a couple years ago & now it isn’t. (I know retrospectively we all know how flawed it was, but for decades it was The Test, and the only way to beat it was through limited interaction & trickery, eg Eugene Goostman)”” / X https://x.com/emollick/status/1946791395894714758

Don’t leave AI to the STEM folks. They are often far worse at getting AI to do stuff than those with a liberal arts or social science bent. LLMs are built from the vast corpus human expression, and knowing the history & obscure corners of human works lets you do far more with AI”” / X https://x.com/emollick/status/1946776332362195277

official results from @atcoder World Tour Finals are in — great results for both humans (#1 and #3 onwards) and AI (#2 in the world!). a milestone for AI for solving hard problems.”” / X https://x.com/gdb/status/1945989983569129632

Cohere Labs – Catalyst Grants https://cohere.com/research/grants

Kaggle declined an AI competition for weapon detection @cover_thz, a company whose mission is to prevent school shootings b/c they don’t want anything to do with weapons @wcukierski this will save lives, will you take a look?”” / X https://x.com/adcock_brett/status/1946236211686990130

How do RAG systems retrieve the right context? In this clip from our new Retrieval Augmented Generation course, you’ll get a high-level look at how retrievers use both keyword and semantic search, along with metadata filtering to find relevant documents, and why hybrid search https://x.com/DeepLearningAI/status/1948488412996006073

Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick”” / X https://x.com/random_walker/status/1947259631257932250