Education: AI News Week Ending 07/18/2025

Education: AI News Week Ending 07/18/2025

July 18, 2025

OpenAI’s Agent mode can now work with Spreadsheets achieving 45% on SpreadsheetBench https://x.com/scaling01/status/1945896464632148366

🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different https://x.com/lmarena_ai/status/1945866381880373490

5 Things You Need to Know About Moonshot AI and Kimi K2, the New #1 model on the Hub https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained

Every ML Engineer’s dream loss curve: “Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike, demonstrating MuonClip as a robust solution for stable, large-scale LLM training.” https://x.com/hardmaru/status/1943976259236901315

For those unfamiliar with Kimi K2: – Surpasses models like GPT-4.1 and Claude 4 Opus on coding benchmarks – Scores new highs on math and STEM tests among non-reasoning systems – Doesn’t even have multimodal or reasoning capabilities yet kimi [dot] com https://x.com/rowancheung/status/1944647747027558636

I think I will spend the rest of the day letting Kimi generate these reports. They are so nice to look at compared to what OpenAI, Anthropic and others give you https://x.com/scaling01/status/1944850575470027243

It’s so beautiful to see the @Kimi_Moonshot team participating in every single community discussions or pull requests on @huggingface (the little blue bubbles on the right). In my opinion, every serious AI organization should dedicate meaningful time and ressources to this https://x.com/ClementDelangue/status/1946208120385999328

It’s undeniable with Kimi-K2 China has reached the frontier and will surpass the US next year”” / X https://x.com/scaling01/status/1944045857340359044

Kimi has a distinct writing style that is free of most of the patterns we now associate with AI generated text. Both Kimi and DeepSeek’s prose is apparently even more impressive in Chinese. Both of these models have a unique ‘voice’, quite different from Western AI. https://x.com/AndrewCurran_/status/1944434569899290839

Kimi is 200 people, very few of them with “frontier experience”, a platform (but you can buy such data) and a modest GPU budget. In theory there are many dozens of business entities that could make K2 in the West. It’s telling how none did. Not sure what it’s telling tho.”” / X https://x.com/teortaxesTex/status/1944856509734961596

Kimi is a really weird model, and it needs a lot more testing to figure out For example, I gave it an altered version of Great Gatsby and it found the two alterations (as does Claude) but then made up a ton of hallucinated nonsense that sounded plausible but was just plain wrong https://x.com/emollick/status/1944974487369158864

Kimi K2 is an incredible model.”” / X https://x.com/skirano/status/1944123290525831317

Kimi K2 is now available on https://x.com/togethercompute/status/1944952034840732138

Kimi K2 is number one trending on HF, congrats! https://x.com/huggingface/status/1944155602583691492

Kimi K2 is so good at tool calling and agentic loops, can call multiple tools in parallel and reliably, and knows “”when to stop””, which is another important property. It’s the first model I feel comfortable using in production since Claude 3.5 Sonnet. https://x.com/skirano/status/1944475540951621890

Kimi K2 just hit #1 on @huggingface trending models in <24 hours! This MoE powerhouse packs 1T params with 32B active – crushing coding challenges and autonomous agent tasks. https://x.com/fdaudens/status/1943996876778614948

Kimi K2 now on https://x.com/togethercompute/status/1945143838911128019

Kimi K2, the latest from @Kimi_Moonshot is now live in the Arena! https://x.com/lmarena_ai/status/1944827675597791456

Kimi K2: Open Agentic Intelligence https://moonshotai.github.io/Kimi-K2/

Kimi team is more american than most American labs lol”” / X https://x.com/Teknium1/status/1944430651278537098

Kimi team just trained a state of the art open source model 32B active parameter/1T total with 0 training instabilities, thanks to MuonClip, this is amazing https://x.com/eliebakouch/status/1943687750563004801

Kimi-k2 seems to be a very good (and giant & odd) open weights model that may be the new leader in open LLMs. It is not beating the frontier closed models on my weird tests, but it doesn’t have a reasoner yet. More testing needed but Chinese open weights models are impressive. https://x.com/emollick/status/1943901440453259374

past week had huuuge releases, here’s our picks 🔥 > moonshot released Kimi K2, sota LLM with 1T total 32B active parameters 🤯 > @huggingface released SmolLM3-3B, best LM for it’s size, offers thinking mode 💭 as well as the dataset, smoltalk2 > Alibaba released WebSailor-3B, https://x.com/mervenoyann/status/1944757807191888080

Pretty wild that @Kimi_Moonshot dropped a 1T parameter (32B active) MoE trained on 15.5 Trillion tokens – MIT licensed 🔥 Beats all other open weights models across coding, agentic and reasoning benchmarks Ofcourse live on Hugging Face! 🤗 https://x.com/reach_vb/status/1943703030026641801

RT @ArtificialAnlys: While Moonshot AI’s Kimi k2 is the leading open weights non-reasoning model in the Artificial Analysis Intelligence In…”” / X https://x.com/zacharynado/status/1944945039647629548

RT @DeepInfra: Moonshot AI’s Kimi 2 is now live on DeepInfra, as always at the best price of $0.55/$2.20, full tool call and context suppor…”” / X https://x.com/jeremyphoward/status/1944939322735780260

RT @htihle: Results from kimi-k2 on WeirdML! It does very well for a non-reasoning model. Like a scaled up deepseek-v3, beating out gpt-4.1…”” / X https://x.com/bigeagle_xd/status/1944325829657554962

RT @huggingface: Kimi K2 is number one trending on HF, congrats! https://x.com/_akhaliq/status/1944159007456784512

RT @ivanfioravanti: Kimi-Dev-72B-4bit-DWQ is on mlx-community! It took 9 hours to create 😅 Quick performance test on M3 Ultra: Prompt: 56…”” / X https://x.com/awnihannun/status/1944108947411284374

RT @Kimi_Moonshot: 🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & Ace…”” / X https://x.com/stanfordnlp/status/1944114320226263165

RT @koltregaskes: Kimi-K2 tops EQ-Bench, the benchmark that measures emotional intelligence. https://x.com/jeremyphoward/status/1944326479246147899

RT @lmarena_ai: 🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 over…”” / X https://x.com/Kimi_Moonshot/status/1945897926796185841

RT @lmarena_ai: Kimi K2, the latest from @Kimi_Moonshot is now live in the Arena! https://x.com/Kimi_Moonshot/status/1945462820147249523

RT @masondrxy: New K2 model from @Kimi_Moonshot is officially supported by @LangChainAI on @GroqInc! See 👇 https://x.com/Hacubu/status/1945144499228811676

RT @OpenRouterAI: Kimi K2 is now passing 200 tokens per second on OpenRouter Props to @GroqInc !”” / X https://x.com/JonathanRoss321/status/1945779694256722025

RT @reach_vb: LOVE ITT! You can run Kimi K2 (1T token MoE) on a single M4 Max 128GB VRAM (w/ offloading) or a single M3 Ultra (512GB) 🔥 Th…”” / X https://x.com/reach_vb/status/1944997786329460978

RT @sam_paech: Kimi-K2 just took top spot on both EQ-Bench3 and Creative Writing! Another win for open models. Incredible job @Kimi_Moonsh…”” / X https://x.com/Teknium1/status/1944285648825069759

RT @sdrzn: Seriously blown away by Moonshot’s new Kimi K2 model in @cline. It beats Claude Opus 4 on coding benchmarks and is up to 90% che…”” / X https://x.com/ClementDelangue/status/1946316382313869778

RT @weights_biases: NEW: Kimi K2 is now live on W&B Inference by @CoreWeave! It’s the first truly open challenger, ready for production wi…”” / X https://x.com/l2k/status/1945225318928634149

Seen many people mention how kimi K2 for example has no CoT or thinking which isn’t true, more of an issue with terminology Main difference with reasoning models (in terms of actual functionality) is the thinking is hidden during general non-verifiable rl, so the model can”” / X https://x.com/Grad62304977/status/1944050338551484702

Some thoughts on the decisions behind Kimi K2’s architecture – from our infra staff”” / X https://x.com/Kimi_Moonshot/status/1944589115510734931

Thank you to @Kimi_Moonshot for quickly addressing my queries on the correct system prompt for Kimi K2! We’ll be re-uploading all BF16 + dynamic @unslothai GGUFs with fixed tool calling & the new sys prompt! Sys prompt = “”You are Kimi, an AI assistant created by Moonshot AI.”””” / X https://x.com/danielhanchen/status/1946163064665260486

That’s from Kimi K2 blog post. In case someone says «wow and it’s not RL-trained». It very much is, don’t get misled by the absence of long CoT. Looks like DeepResearch but It’s probably similar to what’s been happening since Sonnet 3.5, giving it uncanny «pre-reasoner» powers. https://x.com/teortaxesTex/status/1944416704253018372

The success of Kimi K2 is no accident. The unfortunate reality in AI is that user experiences haven’t yet fully caught up to raw model capabilities. Experiences have plateaued. There are only so many coding assistants, research tools, or agents you can realistically offer, and https://x.com/skirano/status/1945505132323766430

TheZvi’s answer “why isn’t there American Kimi” basically: incentives. I *partially* buy it. But given the Concern about the dominance of Chinese open models, expressed by numerous patriotic think tanks, I think we could expect *someone* rising to the task. https://x.com/teortaxesTex/status/1945624983985639487

This is what 200 tokens/second looks like with Kimi K2 on @GroqInc For reference, Claude Sonnet-4 is usually delivered at ~60 TPS https://x.com/cline/status/1945354314844922172

True, the first ever application of Muon was to break the 3-second barrier in the CIFAR-10 speedrun. For perspective on scale that was a 3e14 flop training; @Kimi_Moonshot’s K2 is 3e24 flops, 10 orders of magnitude larger. https://x.com/kellerjordan0/status/1945701578645938194

We’ve just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix: – tokenizer_config.json: update chat-template so that it works for multi-turn tool calls. – tokenization_kimi.py: update encode method to enable encoding special”” / X https://x.com/Kimi_Moonshot/status/1945050874067476962

We’ve submitted Kimi K2 to @lmarena_ai. Waiting to be added to the match pool: https://x.com/Kimi_Moonshot/status/1944754256059453823

You might not have heard of Moonshot AI, but within 24 hours, their Kimi K2 model shot to the top of the Hugging Face trending models. So… who are they, and why does this matter? 🧵Here are a few standout facts:”” / X https://x.com/fdaudens/status/1945128932040208867

Kimi K2 at 185 t/s (or even higher, nearly 220 in my short tests) is probably the best use of Groq to date, and can make K2 immediately more compelling than Sonnet 4. Impressive that they’ve managed to fit this 1T monster on their chips. https://x.com/teortaxesTex/status/1944950183051321542

Quick start project for Claude Code on Kimi:”” / X https://x.com/jeremyphoward/status/1944326308210921652

Very interesting – you can use Kimi with the Anthropic API. This means, perhaps most importantly, that you can now use Kimi with Claude Code! 🤯 https://x.com/jeremyphoward/status/1944322841866125597

RT @allhands_ai: Kimi-K2 is definitely the first strong open-weight competitor to Claude Sonnet. 65.4% on SWE-Bench Verified in OpenHands,…”” / X https://x.com/TheZachMueller/status/1945545349352829439

The DeepSeek moment was supercharged by pent-up consumer demand for a good free AI for those who wouldn’t pay (especially for students for homework) A reason Kimi K2 has not had the immediate public impact of DeepSeek may be, for most consumers/students, DeepSeek is good enough”” / X https://x.com/emollick/status/1944764085741957153

RT @yawnxyz: Kimi K2 is **INCREDIBLE** at using tools. I built a chrome extension to chat with Google Maps, but I never posted it. All th…”” / X https://x.com/bigeagle_xd/status/1945087963408351728

I’ve been a bit quiet on X recently. The past year has been a transformational experience. Grok-4 and Kimi K2 are awesome, but the world of robotics is a wondrous wild west. It feels like NLP in 2018 when GPT-1 was published, along with BERT and a thousand other flowers that https://x.com/DrJimFan/status/1944443447953498285

I doubt that Sama’s delay of open model is about Kimi. But I don’t find the logic here compelling either. «Only nerds noticed Kimi». Well, Sama is loathed. The point of his model is, above all things, PR. If it’s not open SOTA, reports will notice *that*. I think he wants SOTA. https://x.com/teortaxesTex/status/1944263611398180954

Rumors that OpenAI delayed their open-source model because of Kimi are fun, but from what I hear: – the model is much smaller than Kimi K2 (<< 1T parameters) – super powerful – but due to some (frankly absurd) reason I can’t say, they realized a big issue just before release, so”” / X https://x.com/Yuchenj_UW/status/1944235634811379844

Super excited to see Kimi K2 land on Perplexity. If you’re fine-tuning, quick reminder: using the Muon optimizer during both fine-tuning and RL phases gives the best results (details are in our Moonlight paper).”” / X https://x.com/Kimi_Moonshot/status/1944224975428497549

Grok 4 suggests that scaling still works (with the diminishing returns predicted by the scaling law), and that tool use can unlock performance gains. Kimi suggests there continues to be big opportunities from improvements in methods (Muon, etc.). Lots of paths for AI right now.”” / X https://x.com/emollick/status/1944306918631018856

“these results were eye-opening for me… chatgpt agent performed better than i expected on some pretty realistic investment banking tasks”
https://x.com/tejalpatwardhan/status/1945894313977860203

ChatGPT agent for investment banking:”” / X https://x.com/gdb/status/1946074958238765503

Citi and Ant International Pilot AI-Enabled Forecasting Solution to Enhance FX Risk Management for Airline Customers
https://www.citigroup.com/global/news/press-release/2025/citi-ant-international-ai-solution-enhance-fx-risk-management-airline-customers

Citi, Ant International pilot AI-powered FX tool for clients to help cut hedging costs | Reuters https://www.reuters.com/business/finance/citi-ant-international-pilot-ai-powered-fx-tool-clients-help-cut-hedging-costs-2025-07-18/

OpenAI working on payment checkout system within ChatGPT, FT reports | Reuters https://www.reuters.com/business/openai-working-payment-checkout-system-within-chatgpt-ft-reports-2025-07-16/

It’s the year of the social sciences hacker. We’re about to see leaps in innovation that don’t come from engineers. Instead, they’ll come from people who’ve never gotten to build before. I couldn’t be more excited about it. https://x.com/mustafasuleyman/status/1945164452761899025

💥 Announcing ChatGPT agent: a powerful new agent that can use a computer, browse the web, write code, use a terminal, write reports, create images, edit spreadsheets, and even create slides for you. The slides often… need some work. But you know how this goes: first it’s https://x.com/kevinweil/status/1945896640780390631

ChatGPT agent for finding a great Airbnb:”” / X https://x.com/gdb/status/1946075573476069580

ChatGPT agent is ready to introduce itself. https://x.com/OpenAI/status/1945890050077782149

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths. https://x.com/OpenAI/status/1945904743148323285

Introducing ChatGPT agent: bridging research and action | OpenAI https://openai.com/index/introducing-chatgpt-agent/

Just launched ChatGPT Agent (sorry GPT-5 waiters, it is coming!), the most capable AI agent model to date! It has been such an honor to be part of a crazy sprint to get this amazing model trained and shipped together with an absolutely gem team (@isafulf , @caseychu9 ,”” / X https://x.com/xikun_zhang_/status/1945895070269583554

OpenAI’s New ChatGPT Agent Tries to Do It All | WIRED https://www.wired.com/story/openai-chatgpt-agent-launch/

RT @emollick: I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it do…”” / X https://x.com/nickaturley/status/1945975092342841487

tip for chatgpt agent slides: first ask it to do the research only, then ask it to make the slides!”” / X https://x.com/isafulf/status/1946231119751545014

Vibe Check: OpenAI Enters the Browser Wars With ChatGPT Agent https://every.to/vibe-check/vibe-check-openai-enters-the-browser-wars-with-chatgpt-agent

When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels. ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.”” / X https://x.com/gdb/status/1945923067403984979

You can ask ChatGPT Agent to train an AI on datasets you are interested in, and do analyses for you. Building AI and doing data analysis will be automated end-to-end in the future. You are hearing it right. We are working hard to automating our own job :)”” / X https://x.com/xikun_zhang_/status/1946278266786189744

ChatGPT Agent has lower performance than o3 on PaperBench, SWE-Bench verified, OpenAI PRs and OpenAI Research Engineer Interview questions https://x.com/scaling01/status/1945932154455695752

Claude for Financial Services \ Anthropic https://www.anthropic.com/news/claude-for-financial-services

We’ve launched Claude for Financial Services. Claude now integrates with leading data platforms and industry providers for real-time access to comprehensive financial information, verified across internal and industry sources. https://x.com/AnthropicAI/status/1945889476556853520

New AI features in Google Search: Call a business or do research
https://blog.google/products/search/deep-search-business-calling-google-search/

We’re bringing Gemini 2.5 Pro to AI Mode: giving you access to our most intelligent AI model, right in @Google Search. With its advanced reasoning capabilities, watch how it can tackle incredibly difficult math problems, with links to learn more ↓ https://x.com/GoogleDeepMind/status/1945515683451736246

Highly recommend this Stanford lecture video with @_jasonwei and @hwchung27 🙂 It’s one of my favorites on scaling laws and the bitter lesson! Also Hyung’s “”Don’t teach. Incentivize”” video: https://x.com/danielhanchen/status/1945298282961625262

🎥 Want the text from any YouTube video? Now you can — no plugins, no installs. Just drop the link, and our YouTube MCP turns it into text instantly. Try it now with this Agent: https://x.com/OmniMCP/status/1942855673324397021

An MCP Server for Legal Research (SCOTUS Opinions) 🧑‍⚖️ In less than 10 minutes I indexed 100+ Supreme Court opinions from 2022-2024, using LlamaCloud to parse/index the data with really high accuracy, and then made it available as an MCP server to any AI client. You can then use https://x.com/jerryjliu0/status/1941181730536444134

NotebookLM introduces curated featured notebooks with partners https://blog.google/technology/google-labs/notebooklm-featured-notebooks/

I built a voice assistant that analyzes the entire stock market. Built my backend and MCP endpoint using FastAPI on Python and it works. This was exciting to build ngl ❤️. https://x.com/dnaijatechguy/status/1940375435017384271

Coming off @Google IO, we’ve made it possible to build AI Agents with real-time data from verified sources via Google ADK + Dappier 🧠⚡ – Define agents and tools using Google ADK – Plug into Dappier for web search + latest data for stocks, sports, news, and more https://x.com/DappierAI/status/1928430036257759269

AWS Imagine Conference for Education, State, and Local Government Leaders https://aws.amazon.com/government-education/imagine/?trk=37ba8024-7bd0-4e6e-9f99-64ac3f875a94&sc_channel=el

News super hard Math benchmark FrontierMath Tier 4 is released. o4-mini (high) is the #1 here with only 6.3% accuracy. Containse several hundred unpublished, expert-level mathematics problems that takes specialists hours to days to solve. Difficulty Tiers 1-3 cover https://x.com/rohanpaul_ai/status/1943926160750260510

Announcing AI Aspire, our new advisory firm to help enterprises with their AI strategy and transformation journey! We are partnering with Bain & Company and looking forward to helping businesses unlock scalable, transformative value. C-suite is now realizing that top-down”” / X https://x.com/AndrewYNg/status/1945148766962729370

RT @Yong18850571: (1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 6…”” / X https://x.com/tri_dao/status/1945273354157539836

guidde・Magically create video documentation with AI https://www.guidde.com/

Human organizations are structures designed, in part, to take error-prone, highly variable humans and minimize risk from their mistakes and flaws. I think it is very possible to imagine many organizational structures (with humans involved) that similarly deal with AI error rates”” / X https://x.com/emollick/status/1943802776552685803

AI might “solve” loneliness, but this could be a problem, as the discomfort of loneliness shapes us in important ways. 💔 https://x.com/fdaudens/status/1944759763822133493

Study warns of ‘significant risks’ in using AI therapy chatbots | TechCrunch https://techcrunch.com/2025/07/13/study-warns-of-significant-risks-in-using-ai-therapy-chatbots/

In an academic bookstore and it is one of the times where I want a good AI trained on all books, even imperfectly. I want to learn a bit about the smells of antiquity & the history of idea of gray & etc. but am not going to read every book. I could learn a lot from an AI who has.”” / X https://x.com/emollick/status/1944073386797543880

Kids are asking AI companions to solve their problems, according to a new study. Here’s why that’s a problem | CNN https://www.cnn.com/2025/07/16/health/teens-ai-companion-wellness

Btw if you’re learning how to build LLMs from the ground up, there’s now a 17h companion video course for my LLMs From Scratch book on Manning: https://x.com/rasbt/status/1944402436346524113