Ethan B. Holland

Over 56,100 manually organized AI links and counting

Education: AI News Week Ending 05/23/2025

May 23, 2025

Image created with Ideogram 3.0. Image prompt: Lower-East-Side street-corner photograph reminiscent of a late-80s album cover: weathered red-brick tenement with exterior fire-escapes, canvas awning shading racks of vintage clothes; above the awning, a hand-painted board reads ‘Education SPORTSWEAR’; a hanging blade sign in cursive script reads ‘Education Boutique’; a chalkboard propped outside lists ‘Back-to-School Education Specials’ in messy handwriting; warm golden-hour light, subtle 35mm film grain, muted yet punchy color palette, gritty NYC vibe.

Deep Think in 2.5 Pro has landed. 🤯 It’s a new enhanced reasoning mode using our research in parallel thinking techniques – meaning it explores multiple hypotheses before responding. This enables it to handle incredibly complex math and coding problems more effectively. https://x.com/GoogleDeepMind/status/1924881598102839373

Mindblowing demo: John Link led a team of AI agents to discover a forever-chemical-free immersion coolant using Microsoft Discovery. The agents surfaced a material “”unknown to humans”” — in hours, not months — and the team synthesized it in the lab. “”It’s literally very cool.”” https://x.com/vitrupo/status/1924568771353841999

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO https://x.com/GoogleDeepMind/status/1924888095448825893

Mastering Claude Code in 30 minutes – YouTube https://www.youtube.com/watch?v=6eBSHbLKuN0

We’re introducing thought summaries in 2.5 Flash and Pro via the Gemini API and @GoogleCloud’s #VertexAI. These organize the model’s thoughts into a clear format with headers and key information about its actions to give more transparency. https://x.com/GoogleDeepMind/status/1924879655762632816

With GPT-4 as a tutor Nigerian students saw years of learning in weeks. Important World Bank research investigates if AI chatbots can effectively and affordably boost learning in Nigeria. 🇳🇬 Researchers conducted a Randomized Controlled Trial (RCT) in Nigeria. First-year https://x.com/rohanpaul_ai/status/1925614762139713851

As someone involved in academic research on AI, it is notable to me that most of the key experiments showing the impressive abilities of AI on work, medicine, psychology, and so many other fields were done on GPT-4… a model that is now so obsolete that it is gone from ChatGPT. https://x.com/emollick/status/1923134492115365905

QoL Update: Starting today, you will see an AI generated summary for all papers of Hugging Face Papers! 🔥 GG @mishig25 🐐 https://x.com/reach_vb/status/1925517801197879737

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI https://x.com/AIatMeta/status/1925590735254167926

The current state of research on AI and education: Growing evidence that, when used as a tutor with instructor guidance, AI seems to have quite significant positive effects. When used alone to get help with homework, it can act as shortcut that hurts learning Still early days. https://x.com/emollick/status/1925055450254385592

Very big impact: The final version of a randomized, controlled World Bank study finds using a GPT-4 tutor with teacher guidance in a six week after school progam in Nigeria had “”more than twice the effect of some of the most effective interventions in education”” at very low costs https://x.com/emollick/status/1924919060753465537

Opus (makes a simple math error) https://x.com/lefthanddraft/status/1925617749704778145

What is the Agentic Web? 8 important updates from #MSBuild 1. Agents as first-class business & M365 entities. 2. Microsoft Entra Agent ID for knowing your agents. 3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem. 4. Agentic DevOps https://x.com/TheTuringPost/status/1924910543154119105

[15 Apr 2025] Gemini’s AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL https://x.com/swyx/status/1923367096995443189

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: – Qwen2.5-Math-1.5B: 36.0% → 73.6% – Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://x.com/ypwang61/status/1917596101953348000

Semantic Layer Summit https://www.semanticlayersummit.com/

Claude-4 Opus is definitely not a frontier model for mathematics (screenshot from MathArena) Results for Claude-4 Sonnet haven’t been published. https://x.com/scaling01/status/1926018522372514037

I built my own Spanish Language Learning MCP server! I’m making my own personal Duolingo with this to help me with my learning gaps! I finished it just in time to board! Started it last night during our team coding time! Might turn it into a meetup talk! https://x.com/DThompsonDev/status/1921000920587870379

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! https://x.com/sirbayes/status/1924669407521063112

My Prompt, My Reality by @ttunguz https://tomtunguz.com/user-perception-quality/

Some of the best social science papers are “”whodunnits”” where the researchers steadily track down the answer to a mystery by eliminating suspects and finding others. This is an interesting (and important) thread about changes in college experiences for rich and poor students.”” / X https://x.com/emollick/status/1924518969639108929

Everyone has this👇 technology that we were all was blown away by a year ago, for free. I even screenshotted the problem from the video to use in this demo to test it out.. Yet a weakness is the GPT-4o vision system, which still has trouble with details like where my line was. https://x.com/emollick/status/1924304791636725855