Agents and Copilots: AI News Week Ending 09/19/2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5-20250929. Image prompt: A semi-transparent figure in traditional British barrister robes stands in a limestone corridor of Lincoln’s Inn, holding a luminous holographic legal brief that bathes the classical stone architecture in cool blue light. The figure’s autonomous, ghostly presence suggests an AI agent acting with dignity and independence, while behind it ancient law books with glowing circuit-pattern spines line the walls. Cinematic photograph with dramatic side lighting, regal judicial atmosphere, stone and digital elements merged.

🏖️ Summarization Middleware As agent loops get long (either because lots of messages or lots of tool calls) you want to summarize what has occurred so you don’t overflow context (and break your workflow). LangChain’s new middleware automatically summarizes history to keep you https://x.com/sydneyrunkle/status/1967991069368275282

Avoid overflowing context windows with LangChain’s SummarizationMiddleware. This is especially important for long running conversations that have lots of messages and agent loops with lots of tool calls.”” / X https://x.com/LangChainAI/status/1967993889958031560

Twilio published this step-by-step guide on how to connect a Twilio phone number to the Realtime API, including getting a new number and pointing it at our SIP servers. https://x.com/juberti/status/1968384883568632125

We’ve been cooking this summer: Holo1.5 is here! SOTA UI localization + QA, 3× gains vs Qwen-2.5 VL 🍳 Now up to 72B 💥 — a strong base for computer-use agents like Surfer. • Open weights on HuggingFace 🤗 https://x.com/laurentsifre/status/1967512750285861124

🚀 Holo1.5 is here. Our next-gen open foundation models for Computer Use agents: 3B, 7B, and new 72B. ✨ +10% accuracy vs Holo1 💻 SOTA UI localization & understanding 🤗 Open weights on HuggingFace: https://x.com/hcompany_ai/status/1967682730851782683

Holo1.5 – Open Foundation Models for Computer Use Agents https://www.hcompany.ai/blog/holo-1-5

Here is a simple cookbook to get started on the localization task: just drop in any UI screenshot and witness the accuracy of the predicted clicks! 🖱️ (3/4) https://x.com/tonywu_71/status/1967520054989504734

Announcing Agent Payments Protocol (AP2) | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol

Meta just made training AI agents 25x faster. This is a breakthrough for robotics and complex planning. Meta’s FAIR open sourced a new method called Scalable Option Learning. It trains a specialized agent at the scale previously seen only with LLMs. Here’s how it works: The https://x.com/JacksonAtkinsX/status/1967284333678350342

GPT-5-Codex is here: a version of GPT-5 better at agentic coding. It is faster, smarter, and has new capabilities. Let us know what you think! The team has been absolutely cooking, very fun to watch.”” / X https://x.com/sama/status/1967650108285259822

How GPT5 + Codex took over Agentic Coding — ft. Greg Brockman, OpenAI https://www.latent.space/p/gpt5-codex

Sweet! GPT-5-Codex seems to make Codex more steerable and optimized for agentic coding in larger codebases. https://x.com/omarsar0/status/1967640731956453756

We’re releasing GPT-5-Codex — a version of GPT-5 further optimized for agentic coding in Codex. Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github. https://x.com/OpenAI/status/1967636903165038708

GPT-5 Codex – from code suggestions to coding agents, with no waste of tokens. Some developers complain that Codex feels longer (though smarter) than Claude Code – but that’s actually the whole point. > Codex has been trained to spend its effort where it matters. > It doesn’t https://x.com/TheTuringPost/status/1967882454351405314

GPT-5-Codex is 10x faster for the easiest queries, and will think 2x longer for the hardest queries that benefit most from more compute. https://x.com/polynoamial/status/1967667644905251156

We are witnessing an incredible level of efficiency in reasoning models. Faster and more efficient reasoning models are on the rise. First, GPT-5 (and GPT-5-Codex) with remarkably efficient token use, and now Gemini 2.5 Deep Think, achieving gold-medal level performance at the https://x.com/omarsar0/status/1968378996573487699

We trained gpt-5-codex to be great at both responsive and mobile front-ends. Here’s a thread of some examples: “”Make a pixel art game where I can walk around and talk to other villagers, and catch wild bugs.”” https://x.com/OpenAIDevs/status/1968065647541440879

Introducing upgrades to Codex | OpenAI https://openai.com/index/introducing-upgrades-to-codex/

this is the most important chart on the new gpt-5-codex model We are just beginning to exploit the potential of good routing and variable thinking: Easy responses are now >15x faster, but for the hard stuff, 5-codex now thinks 102% more than 5. Same model, same paradigm, but https://x.com/swyx/status/1967651870018838765

Codex for modernizing code:”” / X https://x.com/gdb/status/1967783077561926137

(1/3) Thrilled to announce a new Gemini breakthrough! Building on our success at IMO this year, an advanced version of Gemini Deep Think achieved gold-medal level performance at the ICPC 2025 World Finals – one of the world’s leading competitive programming competitions.”” / X https://x.com/quocleix/status/1968361041487904855

(2/3) Our model solved 10 out of 12 problems to achieve gold medal level. We were able to achieve this through breakthroughs in parallel thoughts, multi-step reasoning, and novel reinforcement learning techniques. You can find Gemini’s solutions here: https://x.com/quocleix/status/1968361222849642929

An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the ICPC 2025 – one of the world’s most prestigious programming contests. 🏅 Building on the model’s success in math at the IMO, this marks another historic milestone for advanced AI. 🧵 https://x.com/GoogleDeepMind/status/1968361776321323420

Incredible milestone: an advanced version of Gemini 2.5 Deep Think achieved gold-medal performance at the ICPC World Finals, a top global programming competition, solving an impressive 10/12 problems. Such a profound leap in abstract problem-solving – congrats to @googledeepmind!”” / X https://x.com/sundarpichai/status/1968365605851218328

AI has officially beaten me at the ICPC World Finals. It reminds me of a rare ICPC skill: being able to quickly read a teammate’s code and spot bugs. This skill takes years to train, and explains why AI often makes coding slower (see arXiv:2507.09089). No matter how strong AI”” / X https://x.com/ZeyuanAllenZhu/status/1968568919482089764

amazing to get all 12 problems correct!”” / X https://x.com/sama/status/1968474300026859561

ICPC is a very hard and meaningful challenge:”” / X https://x.com/gdb/status/1968415631906324792

perfect score on the 2025 ICPC programming competition from our latest reasoning system:”” / X https://x.com/gdb/status/1968404060001968429

🔥 Genspark AI Browser now available on Windows and Mac! 💻 On-Device Free AI – The world’s first browser letting you choose from 169 AI models to run entirely on-device. No internet required, completely private, lightning fast, and totally free! Beyond traditional browsers – https://x.com/genspark_ai/status/1966109976944062893

Happy to land this data-efficient model! Our team is dedicated to building cutting-edge, efficient reasoning models. We are excited to release MobileLLM-R1, a series of sub-billion parameter reasoning models. Collaborating w/ @zechunliu, Changsheng Zhao et al.”” / X https://x.com/erniecyc/status/1966511167053910509

We have released small-scale reasoning models MobileLLM-R1 (0.14B, 0.35B, 0.95B) that are trained from scratch with just 4.2T pre-training tokens (10% of Qwen3), while its reasoning performance is on-par with Qwen3-0.6B. Thanks the three core contributors for their great work!”” / X https://x.com/tydsh/status/1967476530826854674

Meta MobileLLM-R1-140M, which can run 100% locally in your browser, no server inference required vibe coded a chat app powered by transformers.js in anycoder https://x.com/_akhaliq/status/1967460621802438731

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,”” / X https://x.com/zechunliu/status/1966560134739751083

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the https://x.com/_akhaliq/status/1966498058822103330

GitHub has launched its own MCP Registry, each server is backed by a @github repository. It integrates with the main open-source registry by automatically displaying any servers that developers self-publish to the OSS MCP Community Registry. Currently, the installation feature https://x.com/_philschmid/status/1968221801999167488

Building towards age prediction | OpenAI https://openai.com/index/building-towards-age-prediction/

Why do AI models keep “hallucinating”? @OpenAI’s paper argues: Models aren’t broken. Training and benchmarks reward confident guesses over honesty. Proposed solutions: – Change benchmark scoring: not penalize models for “”I don’t know”” – Realign current leaderboards instead of https://x.com/TheTuringPost/status/1966638472854483129

Notion and GitHub MCPs, along with Gmail and Google Calendar integrations available to all Perplexity Pro users. Linear MCP, and Outlook connector available to Enterprise Pro customers.”” / X https://x.com/AravSrinivas/status/1968077082958991786

Perplexity Pro users can now connect their email, calendar, Notion, and GitHub to Perplexity. Enterprise Pro users can also connect Linear and Outlook. https://x.com/perplexity_ai/status/1967982962886291895

Read more about Perplexity Enterprise Max on our blog: https://x.com/perplexity_ai/status/1968707015389364335

🐻Qwen3-Next just dropped on Together AI 80B parameters, 3B activated. Two models: ⚡Thinking: Outperforms Gemini-2.5-Flash-Thinking on reasoning benchmarks 🧠Instruct: Matches 235B model performance on key tasks Available now via our API 🚀 https://x.com/togethercompute/status/1966932629078634543

I think the significance of this is under-appreciated: the assumption has often been that AI agents are brittle as one failure in a chain breaks a task But this paper shows smart models are self-correcting & that small gains in accuracy lead to exponential gains in task horizons”” / X https://x.com/emollick/status/1968365586628694101

Xcode 26 is now available on the Mac App Store. Sign in with your ChatGPT account and code with GPT-5 built-in. https://x.com/OpenAIDevs/status/1967704919487729753

So it looks like Claude got there first: an actually smart phone assistant that can take complex requests that involve both common sense and complicated constraints. It is still beta feeling though & I found I needed to use the bigger Opus model as Sonnet was not smart enough. https://x.com/emollick/status/1966170169367232556

1/7 We’re launching Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI’s Deep Research with only 30B (Activated 3B) parameters! Tongyi DeepResearch agent demonstrates state-of-the-art results, scoring 32.9 on Humanity’s Last Exam, https://x.com/Ali_TongyiLab/status/1967988004179546451

👏 Super proud of the @weaviate_io labs team! The Weaviate Query Agent is now in GA! Check out this demo notebook and some exciting customer stories: 📄 https://x.com/bobvanluijt/status/1968609785416196347

MetaBuddy boosted their user engagement by 3x and cut trainer analysis time by 60% – using Weaviate’s Query Agent. 𝗕𝘂𝘁 𝗳𝗶𝗿𝘀𝘁, 𝘄𝗵𝗮𝘁 𝗶𝘀 𝗠𝗲𝘁𝗮𝗕𝘂𝗱𝗱𝘆? MetaBuddy is a platform that connects fitness enthusiasts through interactive wellness experiences, events, and https://x.com/weaviate_io/status/1968691524318761165

One of the larger barriers to more people using agentic coding tools from the big AI companies to build their own small apps is that you have to go through GitHub to use them, a website that is nearly incomprehensible to most non-coders.”” / X https://x.com/emollick/status/1968108637882290550

Replit Agent 3 is no longer just for coders. After raising $478 million, Replit Agent is becoming the go-to agent framework for everything from apps to bots to automations. Here’s how top builders are using it 🧵 (save this + share with your team) https://x.com/AtomSilverman/status/1968069396925948079

Tested out @interaction Poke an Agent that proactively interacts with you trough iMessage. Poke orchestrates tasks by creating and communicating with specialized, temporary “”agents””. The`send_message_to_agent` tool allows Poke to delegate tasks to new on the fly created agents https://x.com/_philschmid/status/1967245592947831086

This is wild! Don’t sleep on Replit Agent 3. It makes it extremely easy to vibe code AI automation workflows. Watch how I use it to build a workflow that tracks new Claude Code releases and sends Slack notifications. Zero code written! https://x.com/omarsar0/status/1966949907149058551

We’re excited to announce: The Weaviate Query Agent is now GA! WQA is a Weaviate-native agent that transforms natural language questions into precise database operations, giving you reliable, fully transparent results. It supports: • Dynamic filters • Smart routing across https://x.com/weaviate_io/status/1968336678751260748

🤝Adding human-in-the-loop to deep agents Many tools that you may want to give to agents will take actions in the real world. For these tools, you will often want to add “”human-in-the-loop”” steps – require a human user to approve, edit, or respond to their request to execute https://x.com/hwchase17/status/1967653399517925853

online RL is one of the most exciting directions for the field, and i’ve been incredibly impressed with Cursor being seemingly the first to implement it successfully at scale with a frontier capability. so cool!”” / X https://x.com/willdepue/status/1966876626169287035

We’re launching GLM Coding Plans with @Zai_org for Cline users. $3/month gets you 120 prompts per 5-hour cycle with GLM-4.5. $15/month gets you 600 prompts. Both plans give you frontier-level coding AI at a fraction of typical subscription costs. (details below) https://x.com/cline/status/1968820438156640490

A big issue with today’s agent implementation is that they don’t ask questions, even when the thinking trace says the AI believes more information is required. Many disappointing results would be solved by just asking for clarification when needed, especially as task time grows.”” / X https://x.com/emollick/status/1968339804975948274

The problem with the fact that the AI labs are run by coders who think code is the most vital thing in the world, is that the labs keep developing supercool specialized tools for coding (Codex, Claude Code, Cursor, etc.) but every other form of work is stuck with generic chatbots”” / X https://x.com/emollick/status/1967704853171638494

Major version bump! 🚨 We’ve been shipping our faces off. New today: 💥 Claude Code integration 🎉 AI Code reviews 📈 Code agent analytics 📦 Best-in-class sandboxes … Codegen is now the open platform for running code agents at scale. Excited to launch this to the world!”” / X https://x.com/mathemagic1an/status/1968341907316347352

You can now use custom commands in Cursor! We’ve seen these work particularly well for reusable prompts within our team. Cursor 1.6 also includes a faster and more reliable Agent terminal, support for MCP Resources, and a /summarize command. https://x.com/cursor_ai/status/1967990959645528195

@scorecardai @raziborsky @paige__eth @GoogleLabs prize: Markdown MCP by Andrew DiZenzo A faster more token efficient way to use MCP via just 3 tool calls. https://x.com/dariusemrani/status/1967496103424934320

Has there been any public documentation or discussion of Claude’s “”skills”” based approach to handling specialized tasks? Very “”I know kung fu,”” but with the AI as Neo. https://x.com/emollick/status/1967976023448051888

An agentic principal engineer that you can install as an MCP server. It can index any codebase and surface insights you didn’t even know about. https://x.com/svpino/status/1965855311451009254

MCP is coming to enterprise. Here are winners from the enterprise MCP hackathon in SF you need to see! 🧵 hosted by @zmzlois @TobinSouth @WorkOS @GoogleLabs @Vapi_AI @convex_dev @SmitheryDotAI https://x.com/dariusemrani/status/1967492478132715824

Seriously fun rabbit hole to go down trying all the different mode, voice, and style combinations for audio generation in Copilot Labs. Give it a try and drop your favorites in the comments – the funnier the better. https://x.com/mustafasuleyman/status/1965933341569618062

Notion 3.0 with Agents is out today! It’s the first Knowledge Work Agent in the world. It works with Notion databases. It can do multi-step actions and autonomous work up to 20+ mins, and a brand-new memory system (using Notion pages and databases! 👌) With 3.0, Notion isn’t https://x.com/ivanhzhao/status/1968761820241609063

Fiverr is going back to startup mode”” / X https://x.com/michakaufman/status/1967624550020985069

Introducing Notion 3.0 https://www.notion.com/blog/introducing-notion-3-0

🌟Our latest LangChain Academy course – Deep Agents with LangGraph – is now live!🌟 Many agents today follow the same simple pattern: run in a loop, call tools. That architecture works well enough, but it breaks down as tasks get more complex. Today, companies of all sizes – https://x.com/LangChainAI/status/1968708505201951029

WebWeaver: Structuring Web-Scale Evidence for Deep Research • Dual-agent framework (Planner + Writer) • Dynamic outlines: search ↔ refine ↔ search (human-like loop) • Memory-grounded, section-by-section synthesis → avoids long-context failures • SOTA across DeepResearch https://x.com/arankomatsuzaki/status/1968161793127416197

You can now use any open LLM as your coding assistant in VS Code with the @huggingface Provider for GitHub Copilot Chat. Just pick your fav open model and start building! Vibe-coding is all you need!? https://x.com/SergioPaniego/status/1968333964621578716

🚀 Kimi K2 Official Turbo API — 50% OFF for 30 days Code faster, ship sooner. Try it now: https://x.com/Kimi_Moonshot/status/1967829577037910427

Our engineer wrote about the thinking and technical story behind Checkpoint Engine. 👉 https://x.com/Kimi_Moonshot/status/1967923416008462785

Excited to release a preview of Moondream 3. A 9B param, 2B active MoE vision language model that makes no compromises; offering state-of-the-art visual reasoning while still retaining an efficient and deployment-friendly form factor. https://x.com/vikhyatk/status/1968800178640429496

when chatgpt said moondream wasn’t a frontier model, i took it personally”” / X https://x.com/vikhyatk/status/1968811248381784167

Say hi to https://x.com/interaction/status/1965093198482866317

GPT-5-Codex — big improvement for long-running agentic tasks:”” / X https://x.com/gdb/status/1967639750648750409

$ npm i -g @openai/codex $ codex -m gpt-5-codex”” / X https://x.com/OpenAIDevs/status/1967637842806624370

Codex is the most frustrating experience Ive ever had trying to code with an LLM. I’d rather put all of my codebase into chatgpt app this is absolute trash so far.”” / X https://x.com/Teknium1/status/1967806788084064290

Currently serving gpt-5-codex at degraded speeds, about 2X, due to the very high demand. We are finding GPUs left and right to match demand, we’ll keep you posted”” / X https://x.com/thsottiaux/status/1967996885500928459

GPT-5 update found in Codex-CLI by @OpenAI The new AI model is called “”gpt-5-high-new””. Its description says: “”our latest release tuned to rely on the model’s built-in reasoning defaults””. It’s unclear what the improvements in this new model are. Screenshot by @iannuttall: https://x.com/mark_k/status/1966521489529643169

GPT-5-Codex already ~40% of traffic for codex! should be the majority some time today.”” / X https://x.com/sama/status/1967674950502015165

so codex -m gpt-5-codex works fine but codex -m gpt-5-codex — –reasoning-effort=high doesn’t? what?? “”The model `gpt-5-codex` does not exist or you do not have access to”” the model is good but the harness needs love”” / X https://x.com/finbarrtimbers/status/1968066956193595761

the vibes on codex feel like the first few months of chatgpt. fun energy!”” / X https://x.com/sama/status/1967954997754335680

We’ve reset limits for gpt-5-codex to make up for the slowdowns from earlier today.”” / X https://x.com/OpenAIDevs/status/1968168606828794216

Codex for creating an animated video as a React app:”” / X https://x.com/gdb/status/1967939123391631864

Ok seriously what is wrong with codex? I tried to let it do what it said it wanted and /init – make an agents markdown file. 25 minutes of <something> later and nothing produced? It ended up using 40,000 tokens for .. nothing.. wtf did I just download? Token usage:”” / X https://x.com/Teknium1/status/1967804542357217768

o1 Preview is exactly one year old. I still remember when o1 was still known by its project name Q*; it was a time when rumors were circulating that OpenAI had made a world-changing breakthrough that would change everything. There were concerns that this project posed a threat”” / X https://x.com/kimmonismus/status/1966627812858855624

o1-preview -> GPT 5 pro in a year”” / X https://x.com/gdb/status/1966612991421423814

OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3! https://x.com/PKirgis/status/1966547382033936577

OpenAI has finally fixed their SWEBench errors and we can now finally apples to apples compare their scores over the entire 500 sample set (the fact that it took this long says alot about how much they care about SWEBench internally and maybe there’s a lesson here) https://x.com/nrehiew_/status/1967781400528245221

OpenAI just revealed that they have an internal unreleased SWE-bench-style benchmark for large ‘refactoring’ PRs, like the one mentioned here that edits 3.5k lines across 232 files. Their new model gets 51% accuracy on this benchmark. Who wants to make a public version of this? https://x.com/OfirPress/status/1967652031704994131

OpenAI’s Models Are Getting Too Smart For Their Human Teachers — The Information https://www.theinformation.com/articles/openais-models-getting-smart-human-teachers

GPT-5 is the best model for code quality out there 2 years ago, we created the world’s hardest software design quiz. Only 5 questions, multiple choice. Yet only about 3% of software engineers get them. The average score is somewhere between 2 and 3. Supposedly brilliant models https://x.com/jimmykoppel/status/1968683689421701413

Model-agnostic plug-n-play LangChain/LangGraph agents powered entirely by MCP tools over HTTP/SSE. https://x.com/_avichawla/status/1967476110285021213

Agentic RAG takes RAG beyond “retrieve then answer.” With AI agents that choose tools, adapt strategies, and critique outputs, it’s the smarter way to build LLM apps. https://x.com/n8n_io/status/1963927630043807862

Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,”” / X https://x.com/sayashk/status/1966550402129592738

For AI devs The main takeaway is that a simple RL recipe, plus smart context and length management, can unlock single-agent research abilities that rival multi-agent scaffolds. If you’re building agents, consider: – Limiting tools to force strategy learning. – Normalizing RL https://x.com/omarsar0/status/1966900784844730562

ReSum: Long-Horizon Web Agents Without Context Limits • Problem: ReAct hits context limits in long searches (32k tokens) • Solution: ReSum periodically compresses history → compact reasoning states • ReSumTool-30B: specialized summarizer extracts key evidence & gaps • https://x.com/arankomatsuzaki/status/1968161796642279549

Software engineers shouldn’t fear being replaced by AI. They should fear being asked to maintain the sprawling mess of AI-generated legacy code their employer’s systems will soon run on. Because that one will actually happen.”” / X https://x.com/fchollet/status/1968125424141287903

Amazon unveils new agentic AI-powered Seller Assistant to help independent sellers grow https://www.aboutamazon.com/news/innovation-at-amazon/seller-assistant-agentic-ai

AWS released an open-source framework that lets you orchestrate multiple AI agents and handle complex conversations.
https://x.com/LiorOnAI/status/1964403536067743764

If you’re trying to build an agent, the Claude Code SDK is by far the best place to start. It’s the same agentic harness that Claude Code CLI uses but you can swap out the prompts/tools and completely customize it to your needs.”” / X https://x.com/alexalbert__/status/1966601430808088596

As you may have seen, the @github MCP registry is now LIVE! But how do you use it in @code? Checkout this video by @liamchampton to see how you can access the MCP servers live in the registry today 👇 https://x.com/code/status/1968122206837178848

Browse and install MCP servers from the new @GitHub MCP registry directly within @code 🙂 Available today in VS Code Insiders https://x.com/pierceboggan/status/1968173615070969875

New version of VS Code chat automatically selects the appropriate LLM model based on your request and rate limits. It automatically chooses between these models: Claude Sonnet 4 GPT-5 GPT-5 mini GPT-4.1 Gemini Pro 2.5 Handy. https://x.com/housecor/status/1966429828808352233

The power of MCP explained in one picture! Without MCP: – Every LLM app wrote its own tool integration – M apps & N tools = M×N integrations With MCP: – Create an MCP server for your tool and plug it into an LLM app – You go from M×N integrations to M+N integrations https://x.com/_avichawla/status/1966751224356892769

We are rolling out experiments to provide larger context windows in VS Code Insiders, now at 200k for GPT-5 and Claude Sonnet 4. Thank you @code community for the feedback! https://x.com/pierceboggan/status/1967991280006566102

We heard your feedback on the Claude Code SDK – we’ve added code references, custom tools, and hooks support. It’s never been faster to build your own agents!”” / X https://x.com/_catwu/status/1966943489759080940

Agent SDK overview https://x.com/alexalbert__/status/1966601435153388019

LiveMCP-101 This paper introduces LiveMCP-101, a novel real-time evaluation framework with a benchmark designed to stress-test agents on complex, real-world tasks. It moves beyond the mock data and synthetic environments of previous works. More notes ↓ https://x.com/omarsar0/status/1966525731082768782

Frontier Models Struggle The results are revealing: even the most advanced LLMs achieve a task success rate below 60%. Performance degrades substantially as task difficulty increases, with the top model, GPT-5, scoring only 39.02% on hard tasks. https://x.com/omarsar0/status/1966525793586360384

A sign of a company being advanced in AI adoption was that they built their own internal chatbot using APIs. As the major lab’s chatbots become agentic, bringing together many tools in a single interface & as they add memory & projects, the custom API chatbots are falling behind”” / X https://x.com/emollick/status/1967621099853946926

Optimize work and maximize outcomes with the next generation of AI Companion for Zoom Workplace | Zoom – Zoom https://news.zoom.com/ai-companion-3-0-and-zoom-workplace/

We did Google I/O back to back years with NotebookLM and both nights before the show I literally didn’t sleep. The first year of I/O (2023) we were going to live demo Notebook (Project Tailwind) and announce it to the world. Google Labs was fairly unknown and this was our one https://x.com/raizamrtn/status/1968508322329575452

tldraw canvas agent starter kit is dropping today https://x.com/tldraw/status/1968655029247648229

🚀 Now in public preview: #GitHub Copilot in #VSCode can automatically select the best #AI model for your task. Smarter, faster, and more reliable coding—no manual switching needed. Full post here: https://x.com/amandaksilver/status/1967788045488492604

Copilot Chat comes to the Microsoft 365 apps | Microsoft Community Hub https://techcommunity.microsoft.com/blog/microsoft365copilotblog/copilot-chat-comes-to-the-microsoft-365-apps/4453349

We’ve heard your feedback that GPT-5 Thinking can sometimes take longer than you’d like. Now Plus, Pro, and Business users can set the pace to match the moment. Select GPT-5 with Thinking in ChatGPT on web to toggle thinking time in the message composer. – Plus, Pro, Business https://x.com/OpenAI/status/1968395215536042241

When we at @OpenAI released o1-preview a year ago, it would think for seconds. Today, our best reasoning models can think for hours, browse the web, and write code. But there’s a lot of room to push reasoning even further. I’m excited for what the next year will bring!”” / X https://x.com/polynoamial/status/1966527147469598794

⚡️Ling-flash-2.0⚡️ is now open source. 100B MoE LLM • only 6.1B active params –> 3x faster than 36B dense (200+ tok/s on H20) –> Beats ~40B dense LLM on complex reasoning –> Powerful coding and frontend development Small activation. Big performance. https://x.com/AntLingAGI/status/1968323481730433439

🆙Qwen Code v0.0.10 & v0.0.11 bring new features and dev-friendly improvements: ✨New UX & Productivity · Subagents for smarter task decomposition · Todo Write tool for task tracking · “Welcome Back” project summary on reopen! · Customizable cache Strategy ⚡Performance & Dev https://x.com/Alibaba_Qwen/status/1966451235328008563

Paper2Agent brings research papers ‘to life.’ This open tool from @Stanford transforms static papers into interactive AI assistants that can explain and apply their methods. It builds on the MCP and works in 2 layers: – Paper2MCP: Extracts the paper’s methods and code into an https://x.com/TheTuringPost/status/1968829219858956774

Why Agents Fail The paper provides a fine-grained failure analysis, identifying seven common error types: ignoring requirements, overconfident self-solving, unproductive thinking, wrong tool selection, syntactic errors, semantic errors, and output parsing errors. Paper: https://x.com/omarsar0/status/1966525809302417436

1. Model agnostic. 2. Inference agnostic. & now, 3. Platform agnostic. Cline for JetBrains is here. (install it below) https://x.com/cline/status/1968360125686759505

This is Ray3. The world’s first reasoning video model, and the first to generate studio-grade HDR. Now with an all-new Draft Mode for rapid iteration in creative workflows, and state of the art physics and consistency. Available now for free in Dream Machine. https://x.com/LumaLabsAI/status/1968684330034606372