Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A sophisticated minimalist birthday cake on a slate grey pedestal with two elegant lit candles, the cake decorated with subtle wave patterns in deep blue and white frosting with red accents, photographed in dramatic high-contrast lighting that creates a warm glow around the candles against a dark background, modern and artistic composition celebrating two years.
We’ve been building agents for a while and Claude Code SDK has unlocked capabilities that were previously impossible This is an enterprise grade Business Analyst Agent we’ve built out on top of the SDK https://x.com/ponnappa/status/1968710157409374241
I find it unimaginably based that the OAI Evals team keeps making benchmarks finding that Claude is better and publishing it anyway. they are 3 for 3 this year in acknowledging specifically how much Claude is better at tasks OAI care about. there is no sarcasm here folks. this https://x.com/swyx/status/1971404125553242253
NEW: Anthropic web search ✨ OpenRouter now uses the native web engines for OpenAI and Anthropic models by default For all other models, our custom web search will be used, powered by @ExaAILabs Configurable! 👇 https://x.com/OpenRouterAI/status/1968360919488151911
A postmortem of three recent issues \ Anthropic https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
Announcing our public preview of Chrome DevTools MCP! Experience the full power of DevTools in your AI coding agent → https://x.com/ChromiumDev/status/1970505063064825994
Chrome DevTools (MCP) for your AI agent | Blog | Chrome for Developers https://developer.chrome.com/blog/chrome-devtools-mcp
Introducing the Data Commons Model Context Protocol (MCP) Server: Streamlining Public Data Access for AI Developers – Google Developers Blog https://developers.googleblog.com/en/datacommonsmcp/
100+ MCP servers now available in a container. Just pull the Docker image to use your favourite MCP server. 100% Opensource. https://x.com/Saboo_Shubham_/status/1968724555188281575
Metastone unveils MCP-AgentBench A new benchmark evaluating real-world language agent performance with MCP-mediated tools. It features 33 live servers & 188 tools to rigorously test agent capabilities beyond traditional metrics. https://x.com/HuggingPapers/status/1969864853238985001
Did you see that the Agent Research Environment is MCP compatible? -> using any MCP tools with any agent is now completely trivial! Check it out! We’ve used an LLM agent to 1) move a robot arm remotely 2) depending on real time web search results! 😀 How to in thread ^^ https://x.com/clefourrier/status/1970394602592182627
China’s Alibaba just dropped a Python framework for building multi-agent apps. AgentScope lets you build AI agents visually with MCP tools, memory, rag, and reasoning capabilities. Works with any LLM and supports real-time steering. 100% Opensource. https://x.com/Saboo_Shubham_/status/1967274908742025356
Keep thinking (some random video PR from Anthropic with no context). https://x.com/claudeai/status/1968705632095158393
Agents that load dynamic MCP tools risk security and quality issues: • Prompt injection • Unreliable tool calls • Unexpected changes • Wasted tokens 𝚖𝚌𝚙-𝚝𝚘-𝚊𝚒-𝚜𝚍𝚔 generates static tools you control so they stay stable and predictable. https://x.com/vercel/status/1968416108018548766
Claude is pretty funny: “Give me 10 brilliant ideas for a science fiction short short story, pick the most brilliant and execute it terribly” It picked: “People start receiving Amazon packages from parallel universes where they made different life choices.” And it was terrible https://x.com/emollick/status/1969185633018024057
Clockwise MCP | The MCP Server for Time https://www.getclockwise.com/mcp
Factory AI CEO @matanSF on why their agents (droids) outperform: Most agents are locked to a single model; Claude code for Claude, Codex for GPT. “We built ours to be fully model-agnostic. Like an engineer fluent in many languages vs just one, they’re more adaptable and https://x.com/tbpn/status/1971322883315314995
Hey Claude: “”Sentient crabs have emerged from the depths and are causing the apocalypse. But work inside my company must go on. Create a powerpoint for the most prosaic and boring meeting that only somewhat alludes to the Crabpocalypse outside.”” (It went with pretty dark humor) https://x.com/emollick/status/1970215922427142323
It’s interesting how “”better at code”” has become the defining goal of almost every AI lab over the last twelve months I think Claude Code getting a bunch of people onto $200/month plans proved that code is one of the most economically valuable applications of this technology”” / X https://x.com/simonw/status/1970147806225854531
More terse than Claude. Works perfectly with Cline’s thinking slider — max it out and the model thinks exactly as much as needed. Full details: https://x.com/cline/status/1970619811853148550
New on the Claude Developer Platform: tool helpers in beta for the Python and Typescript SDKs. Tool helpers simplify tool creation and execution with: – Automatic input validation – A tool runner for automated tool handling in conversations. https://x.com/alexalbert__/status/1968721888487829661
The Claude Code SDK now supports custom tools and hooks directly in code. Additionally, we’ve refreshed all our docs with complete references and 10 new guides on how to utilize the SDK. https://x.com/trq212/status/1966586970458542297
Tri Dao says Claude Code makes him 1.5x more productive and that it’s quite helpful at writing Triton kernels https://x.com/scaling01/status/1970146206203416666
China’s Alibaba just dropped an opensource 30B agentic LLM that outperforms Claude 4 Sonnet, DeepSeek v3.1, Kimi k2 on a range of agentic search benchmarks. Only 3B parameters are activated per token. 100% open-source. https://x.com/unwind_ai_/status/1969053988143477186
New, very needed benchmark from @scale_AI: SWE-Bench Pro Includes: – Multi-file edits – 100+ lines changed on average – Complex dependencies across large codebases Current top model scores: – GPT-5: 23.3% – Claude Opus 4.1: 22.7% – Others drop further (<15%)”” / X https://x.com/alexandr_wang/status/1969805196462358919
Results so far No single model dominates: GPT-5 “high” reasoning leads on tough tasks but collapses on time-critical ones. Claude-4 Sonnet balances speed vs accuracy but at higher cost. Open-source models (like Kimi-K2) show promise in adaptability. Scaling curves plateau, https://x.com/omarsar0/status/1970147904087322661
Claude Sonnet 4 and Opus 4.1 are now available in Microsoft 365 Copilot, bringing Claude’s advanced reasoning capabilities to millions of enterprise users. Read more: https://x.com/AnthropicAI/status/1970907112831328296
@OpenAI Interesting: 1. Linear progress across OpenAI generations (GPT-4o, o3, GPT-5) 2. Claude Opus 4.1 is on top, nearing industry expert, much better than GPT-5 high. Thanks for acknowledging competitors. https://x.com/Yuchenj_UW/status/1971254164069212231
Bring your design context straight into @code with the @figma MCP server! Go from idea to product while staying in the flow. https://x.com/code/status/1970621943821861217
We’re excited to partner with Figma as one of their MCP client partners! https://x.com/allhands_ai/status/1970955961293795831
it’s quite incredible how bad Sonnet 4 is at long-context retrieval Grok-4 > GPT-5 ~ Gemini 2.5 Pro > Claude 4 Sonnet https://x.com/scaling01/status/1970661469667660100
Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes: https://x.com/omarsar0/status/1971215698140819516
Really cool to see that Anthropic also uses JAX for inference on Google TPU. I’m curious whether they also use JAX for inference on GPU’s (Azure/AWS) or if they developed a separate codebase for it.”” / X https://x.com/borisdayma/status/1968697704361468354
Infra bugs are evil. Kudos to the team at @AnthropicAI for finding the bugs, and then for transparently reporting them in their fairly detailed writeup.”” / X https://x.com/hyhieu226/status/1968708468820312435
Lots of sympathy to the Anthropic team 🙏🙏🙏 https://x.com/cHHillee/status/1968536182284849459
Ollama now has a web search API and MCP server! ⚡️ Augment local and cloud models with the latest content to improve accuracy 🔧 Build your own search agent 🔍 Directly plugs into existing MCP clients like @OpenAI Codex, @cline, Goose (@jack) and more! Let’s go!!!! 🧵👇 https://x.com/ollama/status/1971085470785319349
Most grasping methods fail outside clean lab settings. Open-loop breaks under noise. Closed-loop fails in clutter… Grasp-MPC, from TUM and collaborators, combines model-based MPC with data-driven value functions for robust 6DoF closed-loop grasping… even on moving or cluttered https://x.com/IlirAliu_/status/1970197071123652990
Elon Musk on X: “@yacineMTB Winning was never in the set of possible outcomes for Anthropic” / X
https://x.com/elonmusk/status/1970537297792651492




