Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A sophisticated minimalist birthday cake on a slate grey pedestal with two elegant lit candles, the cake decorated with subtle wave patterns in deep blue and white frosting with red accents, photographed in dramatic high-contrast lighting that creates a warm glow around the candles against a dark background, modern and artistic composition celebrating two years.

We’ve been building agents for a while and Claude Code SDK has unlocked capabilities that were previously impossible This is an enterprise grade Business Analyst Agent we’ve built out on top of the SDK https://x.com/ponnappa/status/1968710157409374241

I find it unimaginably based that the OAI Evals team keeps making benchmarks finding that Claude is better and publishing it anyway. they are 3 for 3 this year in acknowledging specifically how much Claude is better at tasks OAI care about. there is no sarcasm here folks. this https://x.com/swyx/status/1971404125553242253

NEW: Anthropic web search ✨ OpenRouter now uses the native web engines for OpenAI and Anthropic models by default For all other models, our custom web search will be used, powered by @ExaAILabs Configurable! 👇 https://x.com/OpenRouterAI/status/1968360919488151911

A postmortem of three recent issues \ Anthropic https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues

Announcing our public preview of Chrome DevTools MCP! Experience the full power of DevTools in your AI coding agent → https://x.com/ChromiumDev/status/1970505063064825994

Chrome DevTools (MCP) for your AI agent  |  Blog  |  Chrome for Developers https://developer.chrome.com/blog/chrome-devtools-mcp

Introducing the Data Commons Model Context Protocol (MCP) Server: Streamlining Public Data Access for AI Developers – Google Developers Blog https://developers.googleblog.com/en/datacommonsmcp/

100+ MCP servers now available in a container. Just pull the Docker image to use your favourite MCP server. 100% Opensource. https://x.com/Saboo_Shubham_/status/1968724555188281575

Metastone unveils MCP-AgentBench A new benchmark evaluating real-world language agent performance with MCP-mediated tools. It features 33 live servers & 188 tools to rigorously test agent capabilities beyond traditional metrics. https://x.com/HuggingPapers/status/1969864853238985001

Did you see that the Agent Research Environment is MCP compatible? -> using any MCP tools with any agent is now completely trivial! Check it out! We’ve used an LLM agent to 1) move a robot arm remotely 2) depending on real time web search results! 😀 How to in thread ^^ https://x.com/clefourrier/status/1970394602592182627

China’s Alibaba just dropped a Python framework for building multi-agent apps. AgentScope lets you build AI agents visually with MCP tools, memory, rag, and reasoning capabilities. Works with any LLM and supports real-time steering. 100% Opensource. https://x.com/Saboo_Shubham_/status/1967274908742025356

Keep thinking (some random video PR from Anthropic with no context). https://x.com/claudeai/status/1968705632095158393

Agents that load dynamic MCP tools risk security and quality issues: • Prompt injection • Unreliable tool calls • Unexpected changes • Wasted tokens 𝚖𝚌𝚙-𝚝𝚘-𝚊𝚒-𝚜𝚍𝚔 generates static tools you control so they stay stable and predictable. https://x.com/vercel/status/1968416108018548766

Claude is pretty funny: “Give me 10 brilliant ideas for a science fiction short short story, pick the most brilliant and execute it terribly” It picked: “People start receiving Amazon packages from parallel universes where they made different life choices.” And it was terrible https://x.com/emollick/status/1969185633018024057

Clockwise MCP | The MCP Server for Time https://www.getclockwise.com/mcp

Factory AI CEO @matanSF on why their agents (droids) outperform: Most agents are locked to a single model; Claude code for Claude, Codex for GPT. “We built ours to be fully model-agnostic. Like an engineer fluent in many languages vs just one, they’re more adaptable and https://x.com/tbpn/status/1971322883315314995

Hey Claude: “”Sentient crabs have emerged from the depths and are causing the apocalypse. But work inside my company must go on. Create a powerpoint for the most prosaic and boring meeting that only somewhat alludes to the Crabpocalypse outside.”” (It went with pretty dark humor) https://x.com/emollick/status/1970215922427142323

It’s interesting how “”better at code”” has become the defining goal of almost every AI lab over the last twelve months I think Claude Code getting a bunch of people onto $200/month plans proved that code is one of the most economically valuable applications of this technology”” / X https://x.com/simonw/status/1970147806225854531

More terse than Claude. Works perfectly with Cline’s thinking slider — max it out and the model thinks exactly as much as needed. Full details: https://x.com/cline/status/1970619811853148550

New on the Claude Developer Platform: tool helpers in beta for the Python and Typescript SDKs. Tool helpers simplify tool creation and execution with: – Automatic input validation – A tool runner for automated tool handling in conversations. https://x.com/alexalbert__/status/1968721888487829661

The Claude Code SDK now supports custom tools and hooks directly in code. Additionally, we’ve refreshed all our docs with complete references and 10 new guides on how to utilize the SDK. https://x.com/trq212/status/1966586970458542297

Tri Dao says Claude Code makes him 1.5x more productive and that it’s quite helpful at writing Triton kernels https://x.com/scaling01/status/1970146206203416666

China’s Alibaba just dropped an opensource 30B agentic LLM that outperforms Claude 4 Sonnet, DeepSeek v3.1, Kimi k2 on a range of agentic search benchmarks. Only 3B parameters are activated per token. 100% open-source. https://x.com/unwind_ai_/status/1969053988143477186

New, very needed benchmark from @scale_AI: SWE-Bench Pro Includes: – Multi-file edits – 100+ lines changed on average – Complex dependencies across large codebases Current top model scores: – GPT-5: 23.3% – Claude Opus 4.1: 22.7% – Others drop further (<15%)”” / X https://x.com/alexandr_wang/status/1969805196462358919

Results so far No single model dominates: GPT-5 “high” reasoning leads on tough tasks but collapses on time-critical ones. Claude-4 Sonnet balances speed vs accuracy but at higher cost. Open-source models (like Kimi-K2) show promise in adaptability. Scaling curves plateau, https://x.com/omarsar0/status/1970147904087322661

Claude Sonnet 4 and Opus 4.1 are now available in Microsoft 365 Copilot, bringing Claude’s advanced reasoning capabilities to millions of enterprise users. Read more: https://x.com/AnthropicAI/status/1970907112831328296

@OpenAI Interesting: 1. Linear progress across OpenAI generations (GPT-4o, o3, GPT-5) 2. Claude Opus 4.1 is on top, nearing industry expert, much better than GPT-5 high. Thanks for acknowledging competitors. https://x.com/Yuchenj_UW/status/1971254164069212231

Bring your design context straight into @code with the @figma MCP server! Go from idea to product while staying in the flow. https://x.com/code/status/1970621943821861217

We’re excited to partner with Figma as one of their MCP client partners! https://x.com/allhands_ai/status/1970955961293795831

it’s quite incredible how bad Sonnet 4 is at long-context retrieval Grok-4 > GPT-5 ~ Gemini 2.5 Pro > Claude 4 Sonnet https://x.com/scaling01/status/1970661469667660100

Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes: https://x.com/omarsar0/status/1971215698140819516

Really cool to see that Anthropic also uses JAX for inference on Google TPU. I’m curious whether they also use JAX for inference on GPU’s (Azure/AWS) or if they developed a separate codebase for it.”” / X https://x.com/borisdayma/status/1968697704361468354

Infra bugs are evil. Kudos to the team at @AnthropicAI for finding the bugs, and then for transparently reporting them in their fairly detailed writeup.”” / X https://x.com/hyhieu226/status/1968708468820312435

Lots of sympathy to the Anthropic team 🙏🙏🙏 https://x.com/cHHillee/status/1968536182284849459

Ollama now has a web search API and MCP server! ⚡️ Augment local and cloud models with the latest content to improve accuracy 🔧 Build your own search agent 🔍 Directly plugs into existing MCP clients like @OpenAI Codex, @cline, Goose (@jack) and more! Let’s go!!!! 🧵👇 https://x.com/ollama/status/1971085470785319349

Most grasping methods fail outside clean lab settings. Open-loop breaks under noise. Closed-loop fails in clutter… Grasp-MPC, from TUM and collaborators, combines model-based MPC with data-driven value functions for robust 6DoF closed-loop grasping… even on moving or cluttered https://x.com/IlirAliu_/status/1970197071123652990

Elon Musk on X: “@yacineMTB Winning was never in the set of possible outcomes for Anthropic” / X
https://x.com/elonmusk/status/1970537297792651492

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading