Agents and Copilots: AI News Week Ending 09/12/2025

Agents and Copilots: AI News Week Ending 09/12/2025

September 12, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Agents, oceanfront balcony table with travel planner notebook, notebook cover embossed with interconnected node diagram, gentle sea breeze, soft morning light, photorealistic, editorial, minimal, landscape, vacation, no text overlays

Adobe Announces General Availability of AI Agents for Businesses to Transform Customer Experience Orchestration https://news.adobe.com/news/2025/09/adobe-announces-general-availability-ai-agents

The problem with using SaaS vendors with their own AI solutions is that their incentives are to use cheap models, as little reasoning as possible & to stick with outdated prompting & RAG strategies than updating them as AI improves Not all vendors succumb to temptation, many do.”” / X https://x.com/emollick/status/1965204136984805399

ByteDance Seed presents AgentGym-RL • First unified RL framework for multi-turn agent training (no SFT) • Modular, extensible design across web, search, games, embodied & science tasks • Agents rival/surpass commercial models on 27 task https://x.com/arankomatsuzaki/status/1965979980971782414

📊 @Kimi_Moonshot’s K2-0905 on @GroqInc scored 7th overall at 94% on Roo Code evals, the 1st open-source model to break the 90+ barrier. It’s also the fastest and cheapest in the top 10, while holding its own on accuracy. View the full leaderboard: https://x.com/roo_code/status/1965098976677658630

It feels the coding agent frontier is now open-weights: GLM 4.5 costs only $3/month and is on par with Sonnet Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs — too slow.”” / X https://x.com/Tim_Dettmers/status/1965021602267217972

Kimi K2 0905 upgrade: Substantial improvement in agentic capabilities, modest change in overall intelligence Key takeaways: ➤ Intelligence increased +2 pts in our Artificial Analysis Intelligence Index ➤ Agentic capabilities substantially improved as shown by our two new https://x.com/ArtificialAnlys/status/1965010554499788841

🚨 Leaderboard Disrupted! Two new models have entered the Top 10 Text leaderboard: 🔸#6 Qwen3-max-preview (Proprietary) by @Alibaba_Qwen 🔸#8 Kimi-K2-0905-preview (Modified MIT) by @Kimi_Moonshot tied with 7 others. Note that this puts Kimi-K2-0905-preview in a tight race for https://x.com/arena/status/1965115050273976703

Gemma 3n now available in the Play Store for on-device, internet free with speech, text and image input! Open local AI Assistants are coming to everyone! 🤯 – New on-device speech-to-text and speech-to-translated-text. – Process audio batch inference for clips up to 30 seconds. https://x.com/_philschmid/status/1965742109157188031

You asked, we shipped! Scripted mode just dropped for audio generation in Copilot Labs (c/o our new MAI-Voice-1 model). Scripted mode: reads your input verbatim Emotive: riffs a bit for max drama Story: performs multiple voices/characters Try out all 3 ➡️ https://x.com/mustafasuleyman/status/1965825187393511565

Claude’s new ability to work with Excel files is the best I have seen so far I have given it existing spreadsheets to work with and asked it to create new ones. Good use of formatting, formulas, etc. It created all of this, including 406 formulas, from one prompt (& its solid). https://x.com/emollick/status/1965608685297922315

AI agents can finally talk to your frontend! The AG-UI Protocol bridges the critical gap between AI agents and frontend apps, making human-agent collaboration seamless. MCP: Agents to tools A2A: Agents to agents AG-UI: Agents to users 100% open-source. https://x.com/akshay_pachaar/status/1963945302991450272

Google Nano Banana 🍌 is crazy good at static ads… But it only generates one image at a time. This n8n AI Agent helps you generate 1000s of winning ad variations in minutes, fully automated. → Built with the latest Nano Banana image model → Creates static ad images in https://x.com/mikefutia/status/1963967610611003671

We’re adding a web fetch tool to the Anthropic API 🪃 Using the web fetch tool, Claude will fetch and analyze content from any webpage URL—no additional infrastructure needed. https://x.com/alexalbert__/status/1965809009795153955

We’ve (finally) added full support for MCP tools in ChatGPT. In developer mode, developers can create connectors and use them in chat for write actions (not just search/fetch). Update Jira tickets, trigger Zapier workflows, or combine connectors for complex automations. https://x.com/OpenAIDevs/status/1965807401745207708

Announcing CSVToChat – a data analyst agent! Chat with any CSV to ask questions, run analysis, and generate charts with your data. 100% free and open source. https://x.com/nutlope/status/1963987722705473980

The Math Inc. team is excited to introduce Gauss, a first-of-its-kind autoformalization agent for assisting human expert mathematicians at formal verification. . https://www.math.inc/gauss

🚀 v1.104 of @code is here! Check out what’s new: 🤖 Improved coding agent integration 📄 https://x.com/code/status/1966145747566375215

🚀 We just dropped the codebase for our Jupyter Agent Dataset! 📊 Downloaded 7TB of Kaggle Notebooks & Datasets, creating 0.2B agentic training traces to empower LLMs to craft/edit Jupyter notebooks. 🧠 Check out our blog for the full scoop! 🔍 https://x.com/_BaptisteColle/status/1965782453492326494

🛠️ AI devs, your @Firebase workflow just got a serious upgrade. Postman’s #MCP for Firebase now lets AI tools: 🔍 Query Firestore 📊 Generate Data Connect schemas 🤖 Talk to Gemini about Firebase 🧰 Manage + observe your app 🛡️ Validate security rules ⚙️ Provision services https://x.com/getpostman/status/1963301115534962992

A couple of days ago, we crossed 5000 published templates in our Template Library. 🎉 As of today, that number is already at 5191. That’s 5K ways to save time, learn from others, and share your own automations with the world. Thank you all! 🔗 Check out our Template Library https://x.com/n8n_io/status/1963508532138909946

Above all, the thing I’m most proud of is our team. We were thrilled to bring on Windsurf’s world-class team and welcome two of our early investors, Christian Lawless (Conversion Capital, Founder & General Partner) and Emily Cohen (Neo, Partner), who made the high-conviction”” / X https://x.com/cognition/status/1965086661253185645

Back in May @zeeg opened an issue on the @code repo pointing out the rules/instructions situation was a bit out of control. Today we announced that @code officially supports https://x.com/burkeholland/status/1966168396636238194

How Monte Carlo is transforming data troubleshooting with LangGraph + LangSmith As a leading data + AI observability platform, @montecarlodata has built an AI Troubleshooting Agent that can launch hundreds of sub-agents to investigate data issues in parallel— helping enterprises https://x.com/LangChainAI/status/1966147004175888845

one of my agents evolved into a bit of a digital micromanager https://x.com/TGUPJ/status/1963673625560588767

RLFactory: a plug-and-play RL framework for LLM tool use • Async tool calls (faster, 6.8× throughput) • Decoupled training & environment (low setup cost) • Flexible reward design (rule, model, tool-based) • Outperforms bigger models (Qwen3-4B > Qwen2.5-7B) https://x.com/arankomatsuzaki/status/1965613896007647245

Today we are releasing the full data and training pipeline for Jupyter Agent! We hope it serves as blueprint to generate high-quality synthetic agentic traces for other domains, too! You can read all about it in the step-by-step walkthrough here: https://x.com/lvwerra/status/1965786893536342346

We’ve trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL. https://x.com/cursor_ai/status/1966264815175049526

When we started Cognition, we were a small group of engineers who shared a lifelong love of coding and more than a decade of friendship. We hunkered down in a New York apartment and built the product we always wanted for ourselves. While many things have changed since then, the”” / X https://x.com/cognition/status/1965086662612177299

AI Agents suck at long-horizon tasks. AgentGym-RL aims to train strong LLM agents with long-horizon capabilities. Finds that post-training and test-time compute scale better than model size alone for agentic tasks. Leads to 7B models that beat much larger systems. My notes: https://x.com/omarsar0/status/1966167111681921451

If you follow me you know that I love Claude Code and I probably changed my life Been wondering why is leagues ahead of all coding agents before it… so I spent some time digging under the hood. TAKEAWAY: “”Simple is better than complex. (my favorite line from the Zen of https://x.com/imjaredz/status/1965083721713041564

mcp support in chatgpt:”” / X https://x.com/gdb/status/1965810388966248652

Warp Code launched yesterday — here’s what’s new: – Top coding agent: #3 SWE-bench, 52% Terminal-Bench – Built-in code review – Native editor – Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp. https://x.com/warpdotdev/status/1963683282538688694

ok life update: i’ll be joining @Cognition! • Cog just went from 0 to $10b in 2 years • Net burn $20m in company history • Avg successful Devin impl sees >5x growth, $1.5m/yr customer expanded >10x in 8 months (not typo) • Windsurf x Cog cross sell going great, looking fwd https://x.com/swyx/status/1965183110016098617

Replit Closes $250 Million in Funding to Build on Customer Momentum https://replit.com/news/funding-announcement

Scott’s plans for the $400M we raised: https://x.com/cognition/status/1965185627357683776

We’ve raised over $400M at a $10.2B post-money valuation to advance the frontier of AI coding agents. The round was led by Founders Fund with other existing investors including Lux, 8VC, Neo, Elad Gil, Definition Capital, and Swish VC all doubling down. We’re also joined by new”” / X https://x.com/cognition/status/1965086655821525280

Really thoughtful post from the legendary @swyx about why he joined Cognition. He has an amazing knack for predicting the future and writing about it clearly. Welcome to the team!”” / X https://x.com/russelljkaplan/status/1965214122699882944

🤗 Use Hugging Face Inference Providers with GitHub Copilot Chat in VS Code https://huggingface.co/docs/inference-providers/en/guides/vscode

AI agents can prototype apps… But shipping real software takes hours of testing, debugging, and refactoring. Agent 3 is 10× more autonomous — it keeps going where others get stuck. The “Full Self-Driving” moment of software. https://x.com/amasad/status/1965800350071590966

Most AI websites all look the same… but yours don’t have to 😁 Here are 10 design styles you can steal to make your next build stand out: Neumorphism → soft embossed surfaces Glassmorphism → frosted glass vibes Skeuomorphism → textures that feel too real Flat design → no https://x.com/boltdotnew/status/1963688815232721048

Introducing AI Key, a small device that lets AI control your entire phone. just plug it in and ask it to complete a task. pre-order now. https://x.com/adamcohenhillel/status/1962922020704027040

MBZUAI and G42 Launch K2 Think: A Leading Open-Source System for Advanced AI Reasoning https://www.prnewswire.com/news-releases/mbzuai-and-g42-launch-k2-think-a-leading-open-source-system-for-advanced-ai-reasoning-302551074.html

⚡️ Efficient weight updates for RL at trillion-parameter scale 💡 Best practice from Kimi @Kimi_Moonshot vLLM is proud to collaborate with checkpoint-engine: • Broadcast weight sync for 1T params in ~20s across 1000s of GPUs • Dynamic P2P updates for elastic clusters •”” / X https://x.com/vllm_project/status/1965824120920342916

Introducing checkpoint-engine: our open-source, lightweight middleware for efficient, in-place weight updates in LLM inference engines, especially effective for RL. ✅ Update a 1T model on thousands of GPUs in ~20s ✅ Supports both broadcast (sync) & P2P (dynamic) updates ✅ https://x.com/Kimi_Moonshot/status/1965785427530629243

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5 https://x.com/rasbt/status/1965798055141429523

Stop building AI agents that ignore your instructions. This Python framework guarantees LLM Agents follow your rules in production. Every single time. 100% Opensource. https://x.com/Saboo_Shubham_/status/1963428564398932074

I’m surprised Agentic RAG is not getting more attention. That’s all about to change. Here’s why: https://x.com/omarsar0/status/1965115682322042954

💁‍♂️ Introducing human-in-the-loop Middleware! 💵 Tool calls can be risky and expensive! For certain tool calls you may want to get user feedback to approve, deny, or modify them before execution. Our new middleware provides an easy off-the-shelf way to build this into your https://x.com/sydneyrunkle/status/1966184060360757340

Today we want to share a hot & impressive debate: Why do today’s AI Agents often feel all hype, no results? 🧠 Zhihu mind explorer, Prof. 俞扬 from Nanjing University explains: LLM-based Agents ≠ LLMs themselves: LLMs focus on generation/prediction, while agents focus on https://x.com/ZhihuFrontier/status/1964928650081698167

👨‍🔬 How to turn Claude Code into a domain specific coding agent We ran experiments to determine the best methods for running agents like Claude Code on domain specific tasks, such as writing LangGraph code. In this blog post we dive into different techniques, and show the best https://x.com/LangChainAI/status/1966184074755846207

Introducing the MCP Registry | mcp blog https://blog.modelcontextprotocol.io/posts/2025-09-08-mcp-registry-preview/

Nice! No more hunting for servers like we’re in 2005. https://x.com/fdaudens/status/1965207738189054317

We’ve asked ourselves for a while – “”how do you get claude code to be good at writing langgraph code?”” One of our summer interns (Aliyan) set out to try to quantify this Most interesting finding to me – a good https://x.com/hwchase17/status/1966186630521479288

.@_carlosejimenez just merged a PR that fixes the SWE-bench bug that allowed agents to ‘look into the future’. Our analysis showed that this bug was only exploited by a few agents a handful of times. Version update coming soon with a bunch of other extra things! https://x.com/OfirPress/status/1965978758336163907

Today I am introducing @EragonAI , your AI Operating System for work. Our goal is simple: Build the AI that can help you run your company from idea to IPO. In order to truly fix work, we need to solve the root problem. We all use tons of business apps from ERP to CRM to https://x.com/joshua_sirota/status/1963640704300650924

Microsoft just released a free AI Agent training for beginners. No paywall. No coding required. This could change how millions learn AI. Here’s what’s inside ↓ https://x.com/sukh_saroy/status/1963755753359184195

This is the best beginner-friendly course on AI agents! AI Agents for Beginners is a 12-lesson course that will teach you everything you need to get started with building AI agents. It’s free and available on Github. https://x.com/Sumanth_077/status/1963960472627298496

Agent Lab from @Arva_AI is a platform for building and monitoring self-improving AI agents in financial crime compliance. Global banks & fintechs are already using it to scale agents for sanctions screening and EDD — handling 100k+ alerts monthly. https://x.com/ycombinator/status/1963650767950131583

We challenged ourselves to build the cleanest, highest-signal factuality benchmark out there. Today, we’re releasing the result: SimpleQA Verified ✅🥇 On this more reliable, 1,000-prompt eval, Gemini 2.5 Pro establishes a new SOTA, outperforming other frontier models. We’re https://x.com/lkshaas/status/1965799946621202719

Notte (@nottecore) is a platform to build and deploy web AI agents that don’t break. It turns messy, repetitive web tasks into reliable, self-healing workflows so businesses can confidently scale automation. https://x.com/ycombinator/status/1963980445915238677

codex team continues to ship fast”” / X https://x.com/gdb/status/1964396487372394631

PSA: codex has web search, you have to explicitly enable it with the –search flag
https://x.com/gdb/status/1964787043932021032

ChatGPT Developer mode Full MCP client access for connectors and tools.Available now for Plus and Pro users. Get started: https://platform.openai.com/docs/guides/developer-mode

ChatGPT Developer Mode: Full MCP client access | Hacker News https://news.ycombinator.com/item?id=45199713

ChatGPT finally has MCP support 🔥 https://x.com/victormustar/status/1965813896360632692

Interesting example of how open weights models provide opportunities for innovation. Salesforce builds a strong deep research agent from OpenAI’s small open source model. Though open models development is dependent on the good will of OpenAI, Mistral & a few Chinese firms.”” / X https://x.com/emollick/status/1965735119307817245

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
https://agentgym-rl.github.io/