Anthropic: AI News Week Ending 10/03/2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic Times Square at golden hour, every billboard and LED screen displaying Anthropic branding with Constitutional AI principles and the distinctive ‘A’ logo, massive screens showing Claude chat interfaces, warm sunlight casting long shadows, crowds of people looking up at the ethical AI manifesto takeover, vibrant oranges and purples in the sky, NYC architectural detail, professional photography

Sonnet 4.5 is the most important coding model release in a while. From our early-access evals, we estimate it’s roughly the same jump in capabilities between Claude 3.5 and 4. As a result, Devin is >2x faster and 12% better on our internal benchmarks.”” / X https://x.com/russelljkaplan/status/1972725070083838250

Sonnet 4.5 crushing GPT-5 high on ARC-AGI 2 https://x.com/scaling01/status/1973081750189334587

MCP Pointer is a local tool combining an MCP Server with a Chrome Extension. The extension lets you visually select DOM elements in the browser, and the MCP server makes this textual context available to agentic coding tools through MCP. https://x.com/firt/status/1970504677776044334

Now this is a good way to use MCPs. Chrome MCP is a great tool to debug and automate browser with Cursor. https://x.com/ozgrozer/status/1970548504167616541

🚨 Apple working on MCP support to enable agentic AI on Mac, iPhone, and iPad https://x.com/MCP_Community/status/1970428384024072657

Building agents with the Claude Agent SDK \ Anthropic https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk

Claude Sonnet 4.5 runs autonomously for 30+ hours of coding?! The record for GPT-5-Codex was just 7 hours. What’s Anthropic’s secret sauce? https://x.com/Yuchenj_UW/status/1972708720527425966

Introducing Claude Sonnet 4.5 \ Anthropic https://www.anthropic.com/news/claude-sonnet-4-5

Anthropic asked 7 researchers to opine on the productivity boost they get from Claude Sonnet 4.5. Results: one “”qualitative answer”” +15% +20% +20% +30% +40% +100%; respondent indicated that his workflow is “”now mainly focused on managing multiple agents”” https://x.com/deredleritt3r/status/1972770139297767720

Claude https://claude.ai/new

Claude Sonnet 4.5 knows when it’s being tested https://www.transformernews.ai/p/claude-sonnet-4-5-evaluation-situational-awareness

Claude Sonnet 4.5 System Card https://assets.anthropic.com/m/12f214efcc2f457a/original/

Anthropic’s new Claude 4.5 Sonnet is now the #4 most intelligent model, beats 4.1 Opus, and places Anthropic in the top 3 in the race for frontier intelligence Claude 4.5 Sonnet offers a clear upgrade for Claude 4.1 Opus and Claude 4 Sonnet users, with greater intelligence at https://x.com/ArtificialAnlys/status/1972854742167761204

Anthropic released a TON of updates today: • Sonnet 4.5 (with context awareness) • Claude Code 2.0 (+ new mascot 🦀) • Claude API: context editing + memory tool @mikeyk sat down for a special launch day chat about 4.5 and the @AnthropicAI developer roadmap! (and more: • VS https://x.com/latentspacepod/status/1973017487190139140

Piloting Claude for Chrome \ Anthropic https://www.anthropic.com/news/claude-for-chrome

Anthropic supremacy in the coding category of lmarena Claude 4.5 Sonnet on tied first with Claude 4.1 Opus https://x.com/scaling01/status/1973836516205134135

Claude can now create and use files \ Anthropic https://www.anthropic.com/news/create-files

Models are good enough now to start significantly speeding up researchers across fields. Sonnet 4.5 using the code execution and file creation feature in claude dot ai was able to replicate all the experiments in a published econ paper just from the raw data. https://x.com/alexalbert__/status/1972749073016132018

AI agents are now capable of doing real, if bounded, work. But that work can be very valuable. For example, the new Claude Sonnet 4.5 was able to replicate published economics research from data files & the paper. We need to figure out what to do with it: https://x.com/emollick/status/1972737754363752557

In real-world development scenarios, GLM-4.6 surpasses GLM-4.5 and reaches near-parity with Claude Sonnet 4, while clearly outperforming other open-source baselines. https://x.com/Zai_org/status/1973034644091392002

Anthropic experiments with real-time UI generation on Claude https://www.testingcatalog.com/anthropic-experiments-with-an-agent-for-gereating-ui-on-the-fly/

Subagents in Claude Code work like a coordinated team: one debugs, another tests, another refines. Each becomes an expert at its task, working in sequence to solve the problem at hand. https://x.com/claudeai/status/1971666134492696749

Enabling Claude Code to work more autonomously \ Anthropic https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously

Kernel MCP’s search_docs + @AnthropicAI Claude Code = PBJ for vibe coding browser automations 🥜🫐 https://x.com/onkernel/status/1970929983460970825

I say it often, but the fact that Codex & Claude Code use UX that are accessible only to actual coders is a shame. They are genuinely powerful (& fun) tools for people who don’t touch code at all to generate useful (or creative) little applications. The barriers are unnecessary.”” / X https://x.com/emollick/status/1971784128581587048

We’re at an inflection point in AI’s impact on cybersecurity. Claude now outperforms human teams in some cybersecurity competitions, and helps teams discover and fix code vulnerabilities. At the same time, attackers are using AI to expand their operations. https://x.com/AnthropicAI/status/1974199155657748868

maybe the most impressive part from Sonnet 4.5 alignment information. Not only can it push back, but it has a sophisticated theory of user’s mind. Other models also can speculate about the user’s play (DS does that a lot) but aren’t trained to treat it as actionable info. https://x.com/teortaxesTex/status/1973264029599842380

🚨 Anthropic has a new CTO: Rahul Patil, the former CTO of Stripe. Patil started at the company earlier this week, taking over from co-founder Sam McCandlish, who will move to a new role as Chief Architect. Read more in @TechCrunch from @russellbrandom https://x.com/zeffmax/status/1973833211835974046

AI for Cyber Defenders \ red.anthropic.com https://red.anthropic.com/2025/ai-for-cyber-defenders/

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here’s a run using 5.5 bpw quantized model, generating 5.3k tokens at https://x.com/awnihannun/status/1973063906341114327

sonnet 4.5 is noticeably better at compacting conversations than any other model i’ve used i’ve never felt like i wasn’t experiencing SOME task degradation after compacting context near the end of a context window tbh this is the only thing i’ve noticed so far that marks an https://x.com/nickbaumann_/status/1972838170493628847

Cognition | Rebuilding Devin for Claude Sonnet 4.5: Lessons and Challenges https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges

We rebuilt Devin for Claude Sonnet 4.5. Available starting today as an Agent Preview that’s over 2x faster and 12% better on our Jr. Developer Evals. https://x.com/cognition/status/1972709846937157704

Model Context Protocol (MCP) might sound abstract, but Damian Brady, @damovisa, breaks down why it matters for developers. Hint: it’s about giving tools like GitHub Copilot the power to actually do things, not just suggest code. Learn more: https://x.com/docsmsft/status/1970880434515943826

Announcing our public preview of Chrome DevTools MCP! Experience the full power of DevTools in your AI coding agent → https://x.com/ChromiumDev/status/1970505063064825994

We’re open-sourcing Scribe, an MCP-based system that lets agents actually run notebook cells and receive Jupyter outputs – text, errors, images, etc. This is the agent scaffolding that gave us the results above! (6/7)”” / X https://x.com/GoodfireAI/status/1973789154174754877

I’ve read the docs of new Laravel MCP: https://x.com/PovilasKorop/status/1969699384943071715

New episode of the VS Code Insiders Podcast is out, and we’re diving deep into the @github MCP Registry! https://x.com/code/status/1972800559489876276

This might be the coolest MCP integration I’ve seen: ableton-mcp It exposes tools for controlling playback, changing the tempo, writing midi clips, renaming tracks (use an LLM to help you organize and rename), searching samples, chosing an instrument etc For live jamming as a https://x.com/nsthorat/status/1970281878126239928

Figma MCP server, now with design context anywhere you work → Remote access → Connection with Figma Make → New Code Connect UI components https://x.com/figma/status/1970532828166398328

I had some early access to Sonnet 4.5. It is a really good model. I saw especially big jumps in doing finance and statistics, which tend to get overlooked in the focus on coding.”” / X https://x.com/emollick/status/1972709877823721526

.@arcprize results for Sonnet 4.5 – On par with gpt-5 We see a boost from no thinking > 1K thinking budget Minimal perf gain from 8K > 16K Then large jump from 16K > 32K https://x.com/GregKamradt/status/1973081243907399962

Sonnet 4.5 is out! It’s the most aligned frontier model yet; a lot of progress relative to Sonnet 4 and Opus 4.1! https://x.com/janleike/status/1972731237480718734

Sonnet 4.5 now available on OpenRouter https://x.com/scaling01/status/1972708145253188072

The response to Sonnet 4.5 has been incredible. We’ve also seen some people hitting weekly limits, particularly if you’re using Opus. We’ve just reset limits for all paid users to give everyone a fresh start with Sonnet 4.5.”” / X https://x.com/alexalbert__/status/1973522280195170337

FYI we’ve rebranded the Claude Code SDK to the Claude Agent SDK to reflect that it’s the best way to build any general purpose agent, not just coding agents. Especially now when paired with Sonnet 4.5.”” / X https://x.com/alexalbert__/status/1972718342197981194

LangChain just showed how you can turn Claude Code into a domain specific AI agent. Claude Code + highly-condensed context + MCP tool to access more information as needed. https://x.com/Saboo_Shubham_/status/1969449238708064552

We just shipped a bunch of updates to Claude Code, making it a more capable agent than ever. And on top of it: it’s now powered by Claude Sonnet 4.5.”” / X https://x.com/_catwu/status/1972711105157054772

new claude code vscode extension is pretty slick https://x.com/gallabytes/status/1972805892610617466

Live in Cline: Claude Sonnet 4.5 What’s different from Sonnet 4: > more terse, less narration > maintains state across sessions via context files > enhanced capability across long tasks 200k or 1M context window & the same $3/$15 pricing (increased > 200k tokens) https://x.com/cline/status/1972708023232852309

We also shipped a ton of new Claude Code features today, including: – checkpointing – a new VSCode extension – tab to think – a new mascot, we call him Clawd 🦀 https://x.com/trq212/status/1972784970054893877

Claude Sonnet 4.5 vs 4.0 (measured on coding tasks): App from scratch → 4.5: ~20min, functional build. → 4.0: 40+min, lower quality. Simple code change → 4.5: 3 tool calls, correct edit. → 4.0: longer chain, same result. Complex refactor → 4.5: faster + tests passed. → https://x.com/augmentcode/status/1973431992097308983

Claude Sonnet 4.5 continues the tradition of Claude verbal cleverness. For fun, I gave it this very random prompt: “Mash these up into a fine paste: [I quote the final line of 100 Years of Solitude] And: 10 PRINT “HELLO WORLD” 20 GOTO 10” Lots of smart bits in the answer. https://x.com/emollick/status/1972839189181006114

Managing context on the Claude Developer Platform \ Anthropic https://www.anthropic.com/news/context-management

Imagine with Claude https://claude.ai/imagine/

My favorite thing about Claude Sonnet 4.5 so far is how aware it is of its context. When coding and seeing it go “”woow that’s 66k tokens”” it’s very funny. I’ve never seen a model do that before. I wonder if it has any impact on performance.”” / X https://x.com/skirano/status/1973026387528458451

Claude Sonnet 4.5 Is A Very Good Model – by Zvi Mowshowitz https://thezvi.substack.com/p/claude-sonnet-45-is-a-very-good-model

Download Claude https://claude.ai/download

Automated Behavioral Audit Scores shows improvements across all categories for Claude 4.5 Sonnet less misaligned behaviour, less cooperation on malicious requests, less sycophancy https://x.com/scaling01/status/1972713543775682654

I polled the Claude Code team and everyone now uses Sonnet 4.5 as a their daily driver! We think it’s the strongest all-around coding model and are excited for you to use it. https://x.com/_catwu/status/1973524717899489340

Today vibe coding goes pro. Introducing Bolt v2: → World’s best agents (Claude Code, Codex) → Built in backend (hosting, DB, storage, …) → No error loops, no setup nightmares Now anyone can build without boundaries. https://x.com/boltdotnew/status/1973063093849567591

🅰️ Claude Sonnet 4.5 + LangSmith 🦜🛠️ ICYMI – @AnthropicAI just dropped Claude Sonnet 4.5! LangSmith just shipped both cost-tracking and prompt playground support, allowing you to quickly discover how this powerful new model impacts your agents. Get started with docs below 👇 https://x.com/Hacubu/status/1972811123176186131

We’re running a “Built with Claude Sonnet 4.5” challenge. We want to see the coolest things you can build with 4.5 in the next week. Four winners will receive one year of Claude Max 20x and $1k in Claude API credits. https://x.com/alexalbert__/status/1973071320025014306

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15) https://x.com/Jack_W_Lindsey/status/1972732219795153126

Claude Sonnet 4.5 is now available in the OpenHands LLM provider and OpenHands Cloud! If you’re using the CLI, GUI, or are a cloud subscriber just select it at the appropriate dropdown. Overall, seems like a very good model 😃 We’re still working on evals! https://x.com/OpenHandsDev/status/1973485506714652998

Introducing Claude Sonnet 4.5—the best coding model in the world. It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains on tests of reasoning and math. https://x.com/claudeai/status/1972706807345725773

so we analyzed millions of diff edits from cline users and apparently GLM-4.6 hits 94.9% success rate vs claude 4.5’s 96.2%. to be clear, diff edits are not the end-all-be-all metric for coding agents. but what’s interesting is three months ago this gap was 5-10 points. open https://x.com/nickbaumann_/status/1973846157886697771

🧵 behind the scenes of our first Claude Code product outside the terminal: @claudeai for VS Code ! https://x.com/nateparrott/status/1972717967415582796

🚨 Leaderboard shakeup in the top slot! Claude Sonnet 4.5 now tied for #1 in the Text Arena, matching Claude Opus 4.1! 🏆 Quick reminder: the Arena rankings are powered by tens of thousands of real human votes which have put @AnthropicAI’s Claude Sonnet 4.5 joins the very top https://x.com/arena/status/1973828836510085385

The Magic of Claude Code https://www.alephic.com/writing/the-magic-of-claude-code

Box tested Claude Sonnet 4.5 for data extraction accuracy with Box AI on 40,000 fields across 1,500+ documents and images across a range of industries and use-cases. Overall, the model performed 4 percentage points better than Sonnet 4, and it saw major gains especially in areas https://x.com/levie/status/1972777693377949830

🚨 New Model Update @AnthropicAI’s Claude Sonnet 4.5 Thinking 32k is now live in the Arena! https://x.com/arena/status/1973491519857549618

GLM-4.6 is now live in Kilo Code, and the economics are wild: 48.6% win rate against Claude Sonnet 4 $0.60/$2.20 per million tokens (vs Claude’s $3/$15) 30% more token efficient 200K context window This changes the math on AI-assisted development. 🧵 https://x.com/kilocode/status/1973396250729877904

anthropic’s thinking campaign is just so damn tasteful… it feels like a warm room. the aesthetic is comforting as hell, i kinda get a sense of claude as ai helping you be you from it. that’s why we’re weaving claude into everything we build. i’ll post more about sonnet 4.5 https://x.com/signulll/status/1973828026761695439

Vibe Check: Claude Sonnet 4.5 https://every.to/vibe-check/vibe-check-claude-sonnet-4-5

Anthropic works on customizable Skills for Claude https://www.testingcatalog.com/anthropic-works-on-customizable-skills-for-claude/

Sonnet 4.5 is now available in Cursor! We’re excited to hear what you think.”” / X https://x.com/cursor_ai/status/1972713190074261949

Sonnet 4.5 is now the primary model in Amp. I used it a lot this weekend while coding, and we saw incremental yet meaningful improvements that make it an obvious upgrade. Based on our eval/integration work and my personal experience: • It Just Works; keep doing what you were https://x.com/sqs/status/1972748958813626829

Yes, we built an MCP server for Spotify during the podcast recording 🤯 @ShrekOverflow shows us how to vibe code while breaking down the mysteries of OAuth Turns out, OAuth doesn’t have to be scary when you’ve got AI agents building with you! https://x.com/positiveblue2/status/1970906422998450356

🚨 Leaderboard Update: we have a four-way tie for #1 in the Arena! 🏆 The very top tier is now tied across the strongest models in the world: 🏆 Claude Sonnet 4.5 32k Thinking 🏆 Claude Sonnet 4.5 standard 🏆 Claude Opus 4.1 🏆 Gemini 2.5 Pro All separated by just a few Arena https://x.com/arena/status/1974215622474293262

My review of Sonnet 4.5 based on ~30 hours of Claude Code use is that it’s basically the same as Opus 4.1. Which is quite good! But not as good as GPT-5 (codex thinking=high). Claude Code is still much more polished than Codex. But I find GPT-5 much stronger as a model.”” / X https://x.com/finbarrtimbers/status/1973922679418974298

I have an enterprise plan for ChatGPT ($35/month?) and a Claude 20x Max plan ($200/month). I get equal value from them. The ChatGPT plan is high value for the price. Claude Max is not. I still get >$200 of value from it, but ChatGPT is way better utility/$.”” / X https://x.com/finbarrtimbers/status/1973923264524398687

Sonnet 4.5 was just released – it takes the #1 spot on our finance and programming benchmarks! We had the chance to test the new @claudeai model ahead of launch and found it to be exceptionally capable, especially for agentic applications. (1/6) https://x.com/_valsai/status/1972707249748582454

Announcing @ChromeDevTools MCP! 🚀 Connect your AI coding agent to Chrome’s powerful automation & debugging capabilities with ease. Key features: ✅ Reliable automation: It can programmatically handle clicks, form fills, dialogs, and page navigation with ease. ✅ Performance https://x.com/addyosmani/status/1970503277621256263

Now sounds like OpenAI Codex might be is beating out Claude Code.. Spent some time reverse engineering Codex CLI. Let’s see what decisions @OpenAI made that @AnthropicAI didn’t TAKEAWAY: The model already understands shell. Just use that It even simpler than Claude Code. A https://x.com/imjaredz/status/1973035370041532685

Today we’re expanding Microsoft 365 Copilot with the addition of Anthropic’s Claude models. Customers can now use both OpenAI and Claude — starting in Researcher and Copilot Studio, and coming to more experiences soon. Our multi-model approach goes beyond choice. It’s all about https://x.com/satyanadella/status/1970884338993778855

Claude Sonnet 4.5 and 4.5 Thinking are now available for Perplexity Pro and Max subscribers. https://x.com/perplexity_ai/status/1972751588629545329

Turn design into code with Claude Code + @figma. Through MCP, Claude sees your mockup at the data level—component hierarchies, design tokens, auto-layout rules—and translates it into production-ready code. https://x.com/claudeai/status/1970541285615264071

We’ve focused on improving Claude’s skills in defensive cybersecurity. The results of this are visible in Claude Sonnet 4.5, which is comparable or superior to Opus 4.1 in cybersecurity tasks—yet both faster and cheaper. Read more: https://x.com/AnthropicAI/status/1974199158929305738

Grok Code Fast is right up there. Higher diff edit success rate than Claude 4.5 and GPT-5 Codex and much much cheaper if I may add. Try it out, don’t stop using it and keep sending all that feedback our way. It’s only going to get better.”” / X https://x.com/gauravisnotme/status/1974001009778115066

I’m feeling like sonnet 4.5 is bad its really really fucking up in ways sonnet 4 and opus 4.1 did not unfortunately”” / X https://x.com/Teknium1/status/1973476714924876218

Sonnet 4.5 with “”significant improvements in sycophancy”” https://x.com/scaling01/status/1972713224727412804

We evaluated Anthropic’s Sonnet 4.5 with our minimal agent. New record on SWE-bench verified: 70.6%! Same price/token as Sonnet 4, but takes more steps, ending up being more expensive. Cost analysis details & link to full trajectories in 🧵 https://x.com/klieret/status/1972766908878667877

Sonnet 4.5 – the best non-reasoning model on ARC-AGI-2 (GPT-5 is at 0%) https://x.com/scaling01/status/1973083158812782802

Sonnet 4.5 ranking 4th in the overall on LiveBench #1 in Coding #1 in Mathematics https://x.com/scaling01/status/1973088409359982623

Sonnet 4.5 ranking #1 on Deep Research Bench https://x.com/scaling01/status/1973088829138460987

Anthropic hires new CTO with focus on AI infrastructure | TechCrunch https://techcrunch.com/2025/10/02/anthropic-hires-new-cto-with-focus-on-ai-infrastructure/

Anthropic to triple international workforce in global AI push https://www.cnbc.com/2025/09/26/anthropic-global-ai-hiring-spree.html

New on the Anthropic Engineering Blog: Most developers have heard of prompt engineering. But to get the most out of AI agents, you need context engineering. We explain how it works: https://x.com/AnthropicAI/status/1973098580060631341

Having done RL at OpenAI and Anthropic, here’s what I can say about GRPO:”” / X https://x.com/McaleerStephen/status/1972464814808240592

We released a patch this week (v3.32.3) which includes GLM-4.6. The model hits 94.9% success rate on diff edits in our testing. That’s within 1.3 points of Sonnet 4.5 at 10% the cost.”” / X https://x.com/cline/status/1973870619013136850

We applaud @CAgovernor for signing @Scott_Wiener’s SB 53, establishing transparency requirements for frontier AI companies that will help us all have better data about these systems and the companies building them. Anthropic is proud to have supported this bill.”” / X https://x.com/jackclarkSF/status/1972773280877826232

Qwen3 VL 235B works surprisingly well and is 10x cheaper than Sonnet”” / X https://x.com/scaling01/status/1973777774121984175