Ethan B. Holland

Over 54,900 manually organized AI links and counting

Anthropic: AI News Week Ending 08/15/2025

August 15, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: CU Boulder brand style — CU Gold & Black, Helvetica Neue, Flatirons, Tuscan-vernacular sandstone + red-tile roofs; Macky Auditorium arches, evening glow, ground-level wide, subtle Flatirons contour motif; integrate the category “Anthropic” via Poster: research alignment checklist with headline “ANTHROPIC”; natural light, clean professional inspiring tone, crisp focus, subtle grain, editorial composition

Claude can now reference past chats, so you can easily pick up from where you left off. https://x.com/claudeai/status/1954982275453686216

Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens. – The New York Times

This is an important issue but I think this methodology tests how well Claude and Gemini can course correct multiturn ChatGPT conversations rather than how good they are at not getting into the situation in the first place, which is meaningfully different. https://x.com/AmandaAskell/status/1954276447285334151

a bit cringe but pretty proud… for the first time, I added actual code to the @lummipics codebase… went from prototype, a vibecoded design artifact… to an actual feature, built with @v0, then Claude Code, and shipped to production to my design peers… it can be done! lol https://x.com/pablostanley/status/1953111162540695589

Claude Code has a new /model option: Opus for plan mode. This setting uses Claude Opus 4.1 for plan mode and Claude Sonnet 4 for all other work—getting the best of both models while maximizing your usage. https://x.com/_catwu/status/1955694117264261609

How well can LLMs select the right MCP tool for solving real world tasks? Not good. LiveMCPBench is a new benchmark that evaluates agents on a large-scale, dynamic, and realistic set of 527 tools. It shows that most models struggle with tool retrieval and utilization leading to https://x.com/_philschmid/status/1955601309966447074

Opus 4.1 plan, Sonnet 4 execute Best model combo there is https://x.com/alexalbert__/status/1955687538129252807

RT @claudeai: Claude can now reference past chats, so you can easily pick up from where you left off. https://x.com/AnthropicAI/status/1954999404387242341

GPT-5 takes 55% more time than Sonnet 4, but is 40% cheaper on the RooCode Leaderboard Which one are you choosing? https://x.com/scaling01/status/1955669720843358502

On the big picture: GPT-5 as a model is pretty much on the same curve as the other top labs. I’d expect the usual leapfrogging between Gemini, Claude, OpenAI, & Grok to continue. Where there are some big gains is that GPT-5 seems well-trained for real world tasks in new ways.”” / X https://x.com/emollick/status/1953565365465964668

ChatGPT-5 Pro is the first model to successfully do this non-puzzle consistently. GPT-5 Thinking and GPT-5 fail as every other model before has (except for, occasionally, Sonnet). https://x.com/emollick/status/1953604710205690212

Announcement: @claudeai Sonnet 4 now supports 1 million tokens (up from ~250k) of context on @AnthropicAI API. Tip: 1) Be mindful of the context rot. Sonnet is capable of 1 million tokens but managing context will still result in better results. 2) Take advantage of https://x.com/claude_code/status/1955471002353242605

Claude Sonnet 4 now has a 1M token context window, and it’s supported in Cline. That’s 5x more than the previous 200K limit, and it changes how you can use it in Cline in two (2) key ways. 🧵 https://x.com/cline/status/1955776052644732938

I never let AI write my drafts, but I do ask for feedback when I am done writing (which I am aggressive about ignoring). So I was amused by this suggestion from Claude. Yes, I agree, and I love em-dashes! But you are the reason why I am reluctant to use them in my own writing. https://x.com/emollick/status/1953476153915687296

man adopts polyphasic sleep schedule due to claude code usage limits https://x.com/typedfemale/status/1955040883499470853

The 1-hour TTL for prompt caching is now generally available on the Anthropic API!”” / X https://x.com/alexalbert__/status/1955709585999978613

Intro Wooo! Say Wooo. Post on @LC, inside Telegram. > I came up with this idea 12 hours ago, and after tinkering with GPT-5 and Claude Code for a while, the bot is now online. I have to say, the Lens SDK is rock-solid. https://x.com/dao_leno/status/1953901099314033058

Just found out that there is an Opus Plan Mode in Claude Code. Opus 4.1 for Planning and Sonnet 4 otherwise. This makes a lot of sense. I feel like custom modes should be a thing in CC. But this is nice for now. https://x.com/omarsar0/status/1955339275806884016

piece by piece, ai is getting memory it’s also v instructive and surprisingly consistent how Anthropic solves the same class of problems vs {competitor}: – lean on long context – make it transparent/explainable – default self-drive but give user levers to take control”” / X https://x.com/swyx/status/1954990553566941399

Tip: Ask @claude_code to run your dev server in the background (Ctrl+B). Then have Claude code run integration tests against the dev server. No need to wait for users to copy-paste error traces. @claude_code continues until the integration succeeds. Builders review and give https://x.com/claude_code/status/1955210320244326460

RT @scaling01: Anthropic is the only company where LLMs get more expensive over time https://x.com/scaling01/status/1955313676665151704

So pleased to announce the Humanloop team is joining @AnthropicAI! I couldn’t imagine a better home for the team. Everyone I’ve interacted with at Anthropic has been incredibly talented, high-trust and conscious of the stakes at play. Enormous gratitude to our customers and”” / X https://x.com/RazRazcle/status/1955488872235929712

We’re thrilled to announce that the Humanloop team is joining @AnthropicAI! Our mission has always been to enable the rapid and safe adoption of AI. Now, as AI progress accelerates, we think Anthropic is the ideal home to continue this work. https://x.com/humanloop/status/1955487624728318072

“Claude Opus 4.1, De-Carcinize the Great Gatsby” (That was the only prompt) Pretty clever, actually. https://x.com/emollick/status/1954669761972822436

@claudeai @AnthropicAI Refer to following for more details on prompt caching: https://x.com/claude_code/status/1955475387858972986

To get a sense of GPT-5’s vibes, I exported my Tweet data over the last year and got it to write like my top posts Then took my newsletter and made it create 3 separate long-form tweets It’s not 100% there, but it beats Claude, which was previously my go-to for editing https://x.com/rowancheung/status/1953505497237029346

Humanloop joins Anthropic https://humanloop.com/

Our team spent some time benchmarking the GPT-5 models on one-shot document understanding capabilities. Sharing some WIP results 💡: 1️⃣ GPT-5 mini does a good job. From initial testing it edges out Sonnet and Gemini models. 2️⃣ Surprisingly GPT-5 is middle of the pack (and also https://x.com/jerryjliu0/status/1954293351702036712