Anthropic: AI News Week Ending 09/05/2025

Anthropic: AI News Week Ending 09/05/2025

September 5, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Anthropic, minimalist safety card made from diagonal stripes of small bananas, calm white backdrop, photorealistic, editorial, minimal, high detail, 3:2 landscape

We’ve developed Claude for Chrome, where Claude works directly in your browser and takes actions on your behalf. We’re releasing it at first as a research preview to 1,000 users, so we can gather real-world insights on how it’s used. https://x.com/AnthropicAI/status/1960417002469908903

“To test models’ performance on Claude Code, we ran GLM-4.5 against Claude Sonnet 4 and other open-source models on 52 practical programming tasks. While GLM-4.5 demonstrated strong performance against top open-source models, it secured a 40.4% win rate against Claude Sonnet 4. https://x.com/Zai_org/status/1962522761630482700

🚀 Introducing slime v0.1.0 — An open-source RL infra powering models like GLM-4.5, built by THUDM & Zhipu AI. @Zai_org RL infra 朱小霖 shared a deep dive on Zhihu into how they redefined high-performance RL infra👇 🛠️ What’s new in v0.1.0? • High-performance inference for https://x.com/ZhihuFrontier/status/1962751555591086226

Announcing GLM Coding Plan for Claude Code! After seeing the amazing adoption of GLM-4.5 over the past month, we’re making it more accessible. Get started: https://x.com/Zai_org/status/1962522757536887205

Have been tinkering with GLM 4.5 for about an hour. It is about 3x faster than Claude Code + Opus 4.1 and 5x faster than GPT-5-high, but still feels just as good as closed-source models. I am definitely more productive than with other models due to GLM-4.5’s speed.”” / X https://x.com/Tim_Dettmers/status/1962603940291260533

Get a free visual guidebook to learn MCPs from scratch (with 11 projects):
https://x.com/_avichawla/status/1961677843903185078

New Anthropic Research: Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks.
https://x.com/JackYoustra/status/1963280250923868239

🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵) https://x.com/corbtt/status/1962954306078048297

Turn Claude Code into a Financial Analyst 🤖💹 In this video we point Claude Code at a bucket of 10k filing PDFs, and have it perform complex analysis across the entire set of docs! Claude Code doesn’t have file understanding out of the box (it kind of does, but it’s terrible / https://x.com/jerryjliu0/status/1962586155523940828

AI Agents for notes and research is wild! Claude Code is a beast at this. https://x.com/omarsar0/status/1962268120069853538

Claude code in Zedd https://x.com/zeddotdev/status/1963258131191853285

We just added a few major tool updates to the code execution tool in the Anthropic API: – bash tool for running any bash command – str_replace for precise file editing – view for reading files, browsing dirs, displaying images – create for writing new files”” / X https://x.com/alexalbert__/status/1962912152555225296

Introducing SemTools – add blazing-fast semantic search to your entire filesystem without a vector database ⚡️ Coding agents like Claude Code/Cursor have full access to the CLI like grep, cat, and pipe operations for search. But they lack ‘proper` semantic search that’s actually https://x.com/jerryjliu0/status/1961488443663597857

I’m learning the true Hanlon’s razor is: never attribute to malice or incompetence that which is best explained by someone being a bit overstretched but intending to get around to it as soon as they possibly can.”” / X https://x.com/AmandaAskell/status/1961577559344455769

Asking Claude to read over a post I am working on and find errors… (Asking it to use web search solves the problem, of course) https://x.com/emollick/status/1961136674349482174

We’ve raised $13 billion at a $183 billion post-money valuation. This investment, led by @ICONIQCapital, will help us expand our capacity, improve model capabilities, and deepen our safety research.”” / X https://x.com/AnthropicAI/status/1962909472017281518

ChatMCP, now in FastMCP Cloud by @PrefectIO. – Push a commit. – Get a remote MCP server. – Get a chat client automatically connected to it. Deploy, use, test, dogfood, experiment all on one platform. https://x.com/fastmcp/status/1961436552057278512

Github: MCP Universe https://x.com/_philschmid/status/1962935892999331922

Introducing 20+ connectors powered by MCP and a fully controllable Memory in Le Chat—making it one of the most connected and relevant AI assistants for enterprises and consumers. Why switch to Le Chat? A 🧵 https://x.com/MistralAI/status/1962881084183527932

This is massive for AI! Everyone knows about MCP and A2A, but you can’t build complete agentic solutions without people! That’s what the Agent-User Interaction Protocol (AG-UI) is for. This is a protocol for building user-facing AI agents. It’s a bridge between a backend AI https://x.com/svpino/status/1962844250539962521

Together, these updates unlock new capabilities and make the code execution tool more efficient, requiring fewer tokens on average. Learn more in the docs: https://x.com/alexalbert__/status/1962912195983114725

BOOM! Now you can deploy powerful MCP servers to Google Cloud in just a single command🔥 > $ gradio deploy –provider gcloud > Built-in queue for scaling up to production workloads⚡ Keep reading to know more. https://x.com/Gradio/status/1963636954999754955

Claude Code: no evals [well known code agent company]: no evals [well known code agent company 2]: kinda halfassed evals [leading vibe coding company]: no evals [ceo of company selling you evals]: mmmmm yess all my top customers do evals, you should do evals [vc’s in love https://x.com/swyx/status/1963725773355057249

We estimate that Claude Opus 4.1 has a 50%-time-horizon of around 1 hr 45 min (95% confidence interval of 50 to 195 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min. https://x.com/METR_Evals/status/1961527692072993272

How can we benchmark Agents in realistic, complex environments? MCP-Universe is a new benchmark using Model Context Protocol (MCP) servers to test Agents on 231 challenging, practical tasks. Benchmark: 1️⃣ Tasks from 6 practical domains, Location Navigation, Repository https://x.com/_philschmid/status/1962935890415599650

MCP-Bench Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers https://x.com/_akhaliq/status/1961456699564294651

年初想提升 tool-calling 时特别缺靠谱的benchmark，以为 “”mcp 火了等几天肯定有开源的mcp-bench可用””，结果等了几个月也没等到，但是这最近怎么每周都有好几个 mcp-bench release出来？”” / X https://x.com/bigeagle_xd/status/1961461441799852128

Anthropic raises $13B Series F at $183B post-money valuation \ Anthropic https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation

A book chapter I co-wrote many moons ago on longtermism just came out. I forgot we gave such dire warnings about going into academia 😆 https://x.com/AmandaAskell/status/1963266155944218806

Anyone know how to opt out of Anthropic’s new 5-year (!) data retention policy? https://x.com/michael_nielsen/status/1961439837791367501

looks like you can opt out of them training on your data, but they’ll still store your data for 5 years? https://x.com/vikhyatk/status/1961511207577534731

Apple loses 4 top AI researchers as key robotics lead heads to Meta, others join OpenAI and Anthropic – India Today https://www.indiatoday.in/technology/news/story/apple-loses-4-top-ai-researchers-as-key-robotics-lead-heads-to-meta-others-join-openai-and-anthropic-2781133-2025-09-03

@vikhyatk If you opt out, the retention period is 30 days (no change to the existing period). https://x.com/sammcallister/status/1961520548510400753

Updates to Consumer Terms and Privacy Policy \ Anthropic https://www.anthropic.com/news/updates-to-our-consumer-terms

Want to try a new haircut? Check out this AI workflow: 1. upload a selfie & prompt your desired haircut 2. uses Nano Banana to generate your haircut 3. then Kling 2.1 morphs from old you to new you 4. Claude helping behind the scenes with all the prompts link to glif below 👇 https://x.com/fabianstelzer/status/1961441746878939431

We just added OpenAI Codex CLI formal support in Hugging Face MCP Server – go play with it now!! 🔥 https://x.com/reach_vb/status/1963599978909008321

Finally, MCP servers can now deliver UI-rich experiences!
MCP servers in Claude/Cursor don’t offer UI any experience yet, like charts. It’s just text/JSON.
mcp-ui lets you add interactive web components to its output that can be rendered by the MCP client. https://x.com/_avichawla/status/1961677831861395495

Le Chat. Custom MCP connectors. Memories. | Mistral AI https://mistral.ai/news/le-chat-mcp-connectors-memories

From payments data and refunds to invoices and subscriptions, @MistralAI’s users can now handle it all inside Le Chat with @stripe’s MCP. Here’s how it works: https://x.com/emilygsands/status/1962884010289590583

For 𝜏²-Bench Telecom, OpenAI’s GPT-5 and o3 achieve scores of >80% with a lead over other frontier models, followed by the new Grok Code Fast 1 and Grok 4 from xAI. Models noted for their capabilities in agentic use cases and tool calling performed well, such as the Claude https://x.com/ArtificialAnlys/status/1962881324727087253

At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro https://x.com/TransluceAI/status/1963286326062846094