Anthropic: AI News Week Ending 05/30/2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Assembly instruction diagram for a safety helmet with AI-enhanced visor display, parts layout view, modern minimal design, terracotta and warm brown colors, cream paper background, “ANTHROPIC” integrated as assembly guide title, friendly rounded typography, emphasis on protective features and secure connections

Opus 4 + Claude Code + Claude Max plan = best ROI of any AI coding stack right now”” / X https://x.com/alexalbert__/status/1927410913453203946

Since Claude 4 launch: SWE friend told me he cleared his backlog for the first time ever, another friend shipped a month’s worth of side project work in the past 5 days, and my DMs are full of similar stories. I think it’s undebatable that devs are moving at a different speed”” / X https://x.com/alexalbert__/status/1927803598936887686

Introducing MCP on Windows! https://x.com/windowsdev/status/1924543741060071521

Introducing support for remote MCP servers, image generation, Code Interpreter, and more in the Responses API. https://x.com/OpenAIDevs/status/1925214114445771050

The OpenAI Responses API now supports Model Context Protocol. 📡 You can connect our models to any remote MCP server with just a few lines of code. https://x.com/OpenAIDevs/status/1925210339836391875

Model Context Protocol (MCP) definitions are now natively supported in the Google Gen AI SDK for easier integration with a growing number of open-source tools. Learn how to build with MCP in our new demo app. https://x.com/googleaidevs/status/1925250620661047303

NEW: Mistral AI announces Agents API – code execution – web search – MCP tools – persistent memory – agentic orchestration capabilities Cool to see that Mistral AI has joined the growing number of agent frameworks. More below: https://x.com/omarsar0/status/1927366520985800849

INCREDIBLE!! An MCP server to browse the web like humans! Bright Data MCP server provides 30+ powerful tools that allow AI agents to access, search, crawl, and interact with the web without getting blocked. 100% open-source, works at scale! https://x.com/akshay_pachaar/status/1924442642580136115

Most AI tools just suggest how to solve your content problems. We built an MCP that actually does the busy work. Introducing our Content AI: An AI agent built specifically for content teams to eliminate the tedious parts of working with your CMS https://x.com/directus/status/1925216222234411272

Claude 4 Sonnet beating o3-preview on ARC-AGI 2 while being <1/400th of the price https://x.com/scaling01/status/1927418304718623180

Flux Kontext is out and it’s amazing! watch me build a Claude 4 enhanced image editor workflow on my iPhone in glif in 66 seconds https://x.com/fabianstelzer/status/1928433180765306968

Using Anthropic’s Web Search with Instructor for Real-Time Data – Instructor https://python.useinstructor.com/blog/2025/05/07/using-anthropics-web-search-with-instructor-for-real-time-data/

Some interesting findings from the @AnthropicAI Claude 4 System Card: → Ultra-low deception rate: Claude Opus 4’s outputs exhibited deceptive behavior in only 0.15% of cases—down from 0.37% in Claude Sonnet 3.7 . → High-stakes biosecurity performance: On a complex, https://x.com/rohanpaul_ai/status/1927303874508894240

The methods we used to trace the thoughts of Claude are now open to the public! Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer. https://x.com/mlpowered/status/1928123130725421201

We are back to the point in the AI cycle where users can subscribe to Claude or ChatGPT or Gemini and know they will be using a very good model… …but for pro users, currently each SOTA model has distinct strengths & weaknesses, and each lab also has unique extra features.”” / X https://x.com/emollick/status/1925668299456549113

Find out more about our open-source interpretability tools, and how to use them on open-weights models, here: https://x.com/AnthropicAI/status/1928119231213605240

Judge Hints Anthropic’s AI Training on Books Is Fair Use (1) https://news.bloomberglaw.com/us-law-week/judge-hints-anthropics-ai-training-on-authors-work-is-fair-use-62

🔁 Fresh week to try the new MCP (Model Context Protocol) integration in KaibanJS — What are you waiting for? 🧩 What MCP unlocks in KaibanJS: Instantly use tools exposed by any MCP server Access them via a standardized protocol Expand your agents’ abilities without https://x.com/kaibanjs/status/1924557106927058952

🚨MAJOR ANNOUNCEMENT: MCP & Agents Hackathon is officially happening!🤖 June 2 – 8, 2025 | $10,000 in cash-prizes | 1 week of pure AI Agent action First major completely online event focused on @AnthropicAI MCP Participants get Free API credits | Sponsored by @SambaNovaAI 🤯 https://x.com/Gradio/status/1927372346685341742

An MCP server to create @3blue1brown animations (open-source): https://x.com/_avichawla/status/1924351883076092301

Claude Opus 4 with Extended Thinking achieved 58% better performance on reasoning tasks. Sonnet 4 saw 68% improvement. Here’s how to unlock Claude’s deeper reasoning capabilities in Cline — and when to use Extended Thinking vs Sequential Thinking MCP 🧵 (via @arcprize ) https://x.com/cline/status/1928208680903921803

Everyone’s talking about MCP (Model Context Protocol) in AI automation, but most explanations make it sound more complex than it is. After building 50+ AI workflows in n8n, let me break down what MCP actually is ↓ Your MCP cheat sheet – bookmark this thread https://x.com/samruddhi_mokal/status/1924689024419299394

Extended Thinking gives Claude time to work through problems methodically before responding. Instead of instant answers, Claude breaks down the problem, considers approaches, and catches potential issues — like how experienced developers actually think.”” / X https://x.com/cline/status/1928208693285531842

Function Calling Besides the built-in tool and MCP servers, it also supports function calling. This enables devs to build their own functions and call them during agent conversations. https://x.com/omarsar0/status/1927371157277167936

I added a Knowledge Graph to Cursor using MCP. You gotta see this working! Knowledge graphs are a game-changer for AI Agents, and this is one example of how you can take advantage of them. How this works: 1. Cursor connects to Graphiti’s MCP Server. Graphiti is a very popular https://x.com/svpino/status/1924437664997998996

In this on-demand #MSBuild talk, @mikesir87 breaks down how #Docker is simplifying local AI + supporting the new Model Context Protocol (MCP) so you can build smarter apps—fast. 🎥 Streaming May 19–22: https://x.com/Docker/status/1923789428457152853

Inside Anthropic’s First Developer Day, Where AI Agents Took Center Stage | WIRED https://archive.md/9lEvU

Inside Anthropic’s First Developer Day, Where AI Agents Took Center Stage | WIRED https://www.wired.com/story/anthropic-first-developer-conference/

Introducing MCP Nodes & Workflows in Gumloop https://www.gumloop.com/blog/introducing-mcp-workflows

MCP clearly has demand, But monetizing it has been tricky. I figured out the easiest way to build & monetize a Paid MCP server: 1. mcp-remote package 2. @stripe agent tool kit 3. @Cloudflare mcp auth 4. @helicone_ai to track LLM usage I’ve made a step by step video below + https://x.com/jasonzhou1993/status/1924791591841271928

MCP is powerful. But it shouldn’t be painful. Now you can send MCP requests in Postman—no config, no code. 🧠 Visual interface ⚙️ Paste a config, hit send ✨ Debug in real time Try it now → https://x.com/getpostman/status/1923085035650887695

Open Agents Platform 👥 Make and share agents that can do anything. Awesome use case of Arcade MCP support. Blog below. @LangChainAI 🤝 @TryArcade https://x.com/SamPartee/status/1924576144285901182

The only MCP server you’ll ever need! MindsDB lets you query data from 200+ sources—Slack, Gmail, social platforms, and more—using both SQL or natural language. A federated query engine that comes with a built-in MCP server. 100% open-source, 28k+ stars 🌟 https://x.com/akshay_pachaar/status/1925167754765885550

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code https://huggingface.co/blog/python-tiny-agents

Today, I built a Model Context Protocol (MCP) that trades autonomously in my Zerodha account using Claude AI and the Zerodha API. 📈🤖 (had less money in account so the trade was rejected) #AI #TradingTech #Zerodha #Automation #BuildInPublic https://x.com/_adityx_/status/1924891827624419779

We’re excited to partner with @vercel to host our MCP server! Your llms can now make phone calls – like booking a table at a steakhouse in our demo. https://x.com/Vapi_AI/status/1925251112984285461

OPUS 4 NEW SOTA ON ARC-AGI-2 IT’S HAPPENING – I WAS RIGHT Claude 4 models are the first models that effectively use test-time-compute for ARC-AGI-2 https://x.com/scaling01/status/1927818210331521044

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena! A strong comeback from @AnthropicAI – Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro. Massive congrats to @AnthropicAI🔥 https://x.com/lmarena_ai/status/1927756554188566803

Shopify’s Summer ’25 Edition: Horizons Amazing new theme, MCPs for everything, and so much more, some of my favorite updates in 🧵 https://x.com/tobi/status/1925175620881051867

Big MCP news from @OpenAI today… With Zapier MCP as an official launch partner. https://x.com/wadefoster/status/1925232955632271419

New as of today: @stripe MCP in @OpenAI Responses API. Just plop in your API key and prompt Stripe in your agentic workflows. Here’s a quick demo. (Feedback and feature requests welcome!) https://x.com/jeff_weinstein/status/1925229689116799217

Golf (@Golf__mcp) is an open-source platform for shipping production-ready MCP servers. Build a server with Golf’s OSS framework with auth, telemetry, and debugging included. Deploy with one click, and observe through their hosted platform. Start shipping at https://x.com/ycombinator/status/1925220038132523244

Hugging Face just released a Free Course on Model Context Protocol (MCP)! This Free course shows you how to build AI apps that connect to external data and tools using the latest MCP standards. – 100% Free – Earn a certificate of completion Course link 👇🧵 https://x.com/itsafiz/status/1923360017471656323

Spaces at @huggingface is the app store of AI 📱 it’s also the MCP store now 🤠 filter thousands of MCPs you can attach to your LLM 🤗 https://x.com/mervenoyann/status/1927322723891466439

The attack described here applies to any agent hooked up to Github, MCP or otherwise @codegen has implemented extensive security measures against this – very real issue with no go-to turnkey security solution”” / X https://x.com/mathemagic1an/status/1927137154829853118

😈 BEWARE: Claude 4 + GitHub MCP will leak your private GitHub repositories, no questions asked. We discovered a new attack on agents using GitHub’s official MCP server, which can be exploited by attackers to access your private repositories. creds to @marco_milanta (1/n) 👇 https://x.com/lbeurerkellner/status/1926991491735429514

Google ADK has in built capability to serve your agents via FastAPI endpoints: `adk api_server` This allows you to create your own custom frontend UI for your agents. In this 3rd tutorial, let’s build a custom UI with Streamlit in Cursor for our ElevenLabs TTS MCP agent 👇 https://x.com/chongdashu/status/1921351038457585970

MCP support in Gemini SDK lets you build a full agentic loop with less than 50 lines! We automatically make the MCP tool calls and send them back to Gemini! Thats all you need! https://x.com/_philschmid/status/1924931426543096063

You really can just do things! Use *any* Hugging Face space as a MCP server along with your Local Models! 🔥 Here in we use Qwen 3 30B A3B with @ggml_org llama.cpp and @huggingface tiny agents to create images via FLUX powered by ZeroGPU ⚡ It’s quite a bit crazy to see local https://x.com/reach_vb/status/1927036453713793526

OpenMemory MCP provides a persistent memory layer for AI tools like Claude, Cursor and Windsurf. It enables AI Agents to securely read and write to a shared memory. Runs 100% locally on your computer. https://x.com/Saboo_Shubham_/status/1923428646078779745

Github 👨‍🔧: WhatsApp MCP server This repository provides a Model Context Protocol (MCP) server integrating your personal WhatsApp account with LLMs like Claude or Cursor. → Connects directly to your WhatsApp using the multi-device web API via the `whatsmeow` Go library. → https://x.com/rohanpaul_ai/status/1927339121120272553

GitHub MCP Exploited: Accessing private repositories via MCP https://invariantlabs.ai/blog/mcp-github-vulnerability

Microsoft CTO Kevin Scott says MCP (Model Context Protocol) is 🔥. I built a PowerShell SDK to make spinning up native MCP servers easy. Now you can unleash your existing PowerShell code and see what MCP can do. 📦Install-Module PSMCP 🔗 https://x.com/dfinke/status/1924517484113350940

Built a Task Manager AI Agent in N8n using an MCP server that I also built on N8n, to automates tasks via voice or text! If you want the template just Dm me your email. Here’s the setup: Listens for commands, processes with OpenAI, and replies. #AIAutomation #NoCode https://x.com/errah_didit/status/1916104430392328246

LlamaIndex now supports the new OpenAI Responses API features: · Call any remote MCP server · Use code interpreters by using it as one of the built-in-tools · AND generate images with streaming. https://x.com/llama_index/status/1926996451747356976

Claude 4 underperforming on Aider Polyglot”” / X https://x.com/scaling01/status/1926795250556666341

You can just take academic papers and paste them into Gemini 2.5/ChatGPT o3/Claude 4 with the prompt “”build me a game based on this paper, make it interesting and thematic but still conveying key findings”” and get a tiny working educational game. (In this case, I used Gemini) https://x.com/emollick/status/1925954059929784588

Claude 4 Opus, make me a game in an artifact that is fully playable, it should have graphics, it should include crabs, mechs, elements of the plot from Crying of Lot 49 and also Georgist politics, the mechanics must not be that of a simple arcade game or shooter”” Claude: Sure. https://x.com/emollick/status/1925731995419570365

Highlights from the Claude 4 system prompt https://simonwillison.net/2025/May/25/claude-4-system-prompt/

I had early access to what is Claude 4 (I don’t know which model) & I have been very impressed. Fun example, this is what it made in response to the prompt: “”the book Piranesi as a p5js 3d space. do it for me”” – just that, no other prompting (note the birds, water, lighting) https://x.com/emollick/status/1925594644483604848

One possible explanation is Claude-4 is really not designed for 0-shotting code it works better in an agentic setup with feedback loop built in to gradually lead to an optimized code”” / X https://x.com/cto_junior/status/1926879933957038176

On SWE-bench Verified, a benchmark of real-world software engineering tasks, DeepSeek-R1-0528 scores 33% (±2%), competitive with some other strong models but well short of Claude 4. Performance can vary with scaffold; we use a standard scaffold based on SWE-agent. https://x.com/EpochAIResearch/status/1928489533886058934

The ability of o3 to agentically use tools in sequence in its chain-of-thought in the chat interface remains a huge differentiator among AIs. I am not sure o3 is “”smarter”” than Gemini 2.5 or Claude 4 (both do better websites than o3, for example), but you can see how tool use https://x.com/emollick/status/1925705390731165754

So far, models have converged in capabilities. I wonder if the advantages that the current SoTA models we have now (maybe Claude as API-driven coder, ChatGPT as agentic chatbot, Gemini as information handler?) solidify into sources of competitive differentiation or converge again”” / X https://x.com/emollick/status/1925778854179647656

We’re rolling out voice mode in beta on mobile. Try starting a voice conversation and asking Claude to summarize your calendar or search your docs. https://x.com/AnthropicAI/status/1927463559836877214

Anthropic has released its Claude 4 family of models: Claude Opus 4 and Claude Sonnet 4. We evaluated both models on a suite of benchmarks. The main highlight is a significant improvement in coding performance for Sonnet 4. Results in thread! https://x.com/EpochAIResearch/status/1927813645343305902

Claude 4 Sonnet might be the VERY FIRST model to significantly benefit from test-time-compute on ARC-AGI 2″” / X https://x.com/scaling01/status/1927425665055302023

It’s happening 👀 Anthropic just announced Claude 4 Opus and Claude 4 Sonnet just now at the Code with Claude event Early testers said it could code autonomously for up to seven hours https://x.com/rowancheung/status/1925591664548356555

The X discussion about the Claude 4 system card is getting counterproductive It punishes Anthropic for actually releasing full safety tests and admitting to unusual behaviors. And I bet the behaviors of other models are really similar to Claude & now more labs will hide results. https://x.com/emollick/status/1926003595838619921

Fantastic to see Anthropic, in collaboration with @neuronpedia, creating open source tools for studying circuits with transcoders. There’s a lot of interesting work to be done I’m also very glad someone finally found a use for our Gemma Scope transcoders! Credit to @ArthurConmy”” / X https://x.com/NeelNanda5/status/1928169762263122072

Anthropic open-sourced their circuit tracing tools”” / X https://x.com/i/web/status/1928119741626962006

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.”” / X https://x.com/i/web/status/1928119229384970244

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently https://x.com/i/web/status/1928071179115581671

It’s interesting how the major LLM API vendors are converging on the following features: – Code execution: Python in a sandbox – Web search – like Anthropic, Mistral seem to use Brave – Document library aka hosted RAG – Image generation (FLUX for Mistral) – Model Context Protocol”” / X https://x.com/simonw/status/1927378768873550310