Agents and Copilots: AI News Week Ending 08/22/2025

Agents and Copilots: AI News Week Ending 08/22/2025

August 22, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Chicago newsroom at dawn, L-train rattling past the windows, assignment board humming; the word “Agents” lettered on a glass newsroom door in bold grotesque sans; autonomous assistant terminals route field stories with crisp dashboards; energetic, editorial realism, high detail, soft winter light

Today, we’re bringing agentic capabilities to AI Mode in Search for Google AI Ultra subscribers. But… what is actually different? Let’s say you want to make a dinner reservation. Traditionally, that would require multiple searches, concurrent tabs, and a lot of manual https://x.com/GoogleAI/status/1958561117833228705

AI agent that taps and types on your iPhone across apps https://x.com/tom_doerr/status/1955655887684591829

🚨 BIG NEWS: MICROSOFT INTRODUCES THE AGENTIC WEB 🚨 AI is leveling up! Microsoft is leading the charge with the Agentic Web, where AI agents work together across individuals, teams, businesses, and entire ecosystems. Learn more clicking thread below >>”” / X https://x.com/MicrosoftLearn/status/1924588000207634800

AI Mode in Google Search adds personalization, agentic features https://blog.google/products/search/ai-mode-agentic-personalized/

Firecrawl /v2 is here! I got early access, and it’s hands down one of the most advanced search APIs. Context engineering for AI agents just got a whole lot simpler. Search the web, news, and images all in one shot. It’s impressive for building advanced deep research agents. https://x.com/omarsar0/status/1957837839405920282

PSA It’s a new era of ergonomics. The primary audience of your thing (product, service, library, …) is now an LLM, not a human. LLMs don’t like to navigate, they like to scrape. LLMs don’t like to see, they like to read. LLMs don’t like to click, they like to curl. Etc etc.”” / X https://x.com/karpathy/status/1914494203696177444

American companies are losing market share to chinese open-source companies! Anthropic’s coding market share on OpenRouter went from 46% in July down to 32% in a month the reason for it? Qwen3-Coder https://x.com/scaling01/status/1956858471682617553

Zapier customers ran 50 million AI tasks in 20 days‼️ In fact, more than 25% of all Zapier AI tasks EVER happened in the last two months. We’re now past 300M AI tasks (323M all-time since Jan ‘23 👇) https://x.com/wadefoster/status/1955732118912500165

@karpathy now API stands for AI Prompt Interface”” / X https://x.com/Yuchenj_UW/status/1914495349164851457

Bring AI to your formulas with the COPILOT function in Excel https://techcommunity.microsoft.com/blog/microsoft365insiderblog/bring-ai-to-your-formulas-with-the-copilot-function-in-excel/4443487?ocid=usoc_TWITTER_M365_spl100008458718741

With the Conversations API, you can now store context from Responses API calls (messages, tool calls, tool outputs, and other data). Easily render past chats, then let your users pick up where they left off (just like in ChatGPT). https://x.com/OpenAIDevs/status/1958660224019247176

Two big updates to the Responses API today. 🖇️ Connectors — Pull context from Gmail, Google Calendar, Dropbox, and more in a single API call. 💬 Conversations — Persist chat threads for your users, without running your own database. More below:”” / X https://x.com/OpenAIDevs/status/1958660207745409120

Update: Perplexity now serves over 300M user queries every week. 3x growth in approx 9 months from the time we hit 100m weekly queries.”” / X https://x.com/AravSrinivas/status/1957943423539040566

the top things we’re seeing people use @snowglobe_so / simulations since launch: – training data generation. by far the biggest cohort – bootstrapped eval data for early lifecycle dev work – pre launch safety testing – understanding / enumerating user behavior trajectories”” / X https://x.com/ShreyaR/status/1958811497196659207

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.”” / X https://x.com/ShawLiu12/status/1958212802956742990

It’s been exciting to help support @cartesia_ai on Line, their new code-first voice agent platform — solving how they instantly cold start voice agents at scale. ⚡️ Here’s a fun one we built that tells you how to get started on Modal – call +1(626)746-1433 🤙 https://x.com/modal/status/1957865381613224050

Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-in-class products are built in code. ▶️ Watch us build an advanced voice agent with background reasoning in just minutes. https://x.com/cartesia_ai/status/1957862421667664216

Line, our new voice agent platform, launches today. Building voice agents is really really hard for developers. These agents need to be fast, intelligent, accurate, controllable, scalable. They’re a front door to customers and users. We’v built this first version of Line as a”” / X https://x.com/krandiash/status/1957863360730657200

Congrats to the team @GetToplinePro on their $27M Series B! Topline Pro helps the 2.5M skilled tradespeople in the US succeed by acting as their digital team—driving requests, booking jobs, and handling admin so they can focus on their craft. Trusted nationwide, they’ve helped https://x.com/ycombinator/status/1956029337142222991

Introducing ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. https://x.com/Zai_org/status/1958175133706891613

Built a prank call agent with my own voice in minutes – the possibilities are endless 🚀”” / X https://x.com/rohan_tib/status/1957864976582078949

GitHub just got less independent at Microsoft after CEO resignation | The Verge https://www.theverge.com/news/757461/microsoft-github-thomas-dohmke-resignation-coreai-team-transition

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] https://x.com/xywang626/status/1956400403911962757

Ant Group just released UI-Venus on @huggingface It’s a native UI agent achieving SOTA in grounding & navigation tasks from just screenshots. Turns screenshots into reliable clicks and plans using small data and reinforcement fine-tuning. The usual way, supervised fine https://x.com/rohanpaul_ai/status/1956777729304711639

1/ XBOW Unleashes GPT-5’s Hidden Hacking Power. @OpenAI’s initial assessment of GPT-5 showed modest cyber capabilities. But when integrated into the XBOW platform, we saw a completely different story: performance more than doubled. More on what we found: 🧵 https://x.com/Xbow/status/1956416634173964695

Gpt Oss News Agent – a Hugging Face Space by fdaudens https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent

Open-source, self-hostable browser automation library for AI agents; build agents to navigate sites, fill forms, click, and extract info, 90.4% on Web Voyager https://x.com/tom_doerr/status/1955640654085632485

We’re working on something called SuperMemory for all Perplexity users. It’s in the final stages of testing. Early tests suggest it’s working much better than anything else out there.”” / X https://x.com/AravSrinivas/status/1958226686442664092

CHROME EXTENSIONS DEV IS DEAD we built Lovable for extensions. – you describe what it will do – our ai agent analyzes the target website – builds a 1-shot working chrome extension check out how it builds an auto-expander for X: https://x.com/AmirZak6/status/1954232161130680834

Kernel (@onkernel) provides Crazy Fast Browser Infrastructure. Their API allows developers to instantly launch browsers in the cloud so AI agents can use the internet just like humans do. https://x.com/ycombinator/status/1955660623821390089

Announcing Open Lovable 🔥 We’ve built an open-source AI web app builder that can transform any website URL into a working, editable clone, giving you a foundation to build on instantly. All powered by @GroqInc, @e2b, and Firecrawl. https://x.com/firecrawl_dev/status/1955660448587735393

🔍New in LangGraph Studio: Trace mode View your LangSmith traces in real time right inside Studio. Annotate runs, and add them to datasets or annotation queues, bringing the power of LangSmith tracing directly into your workflow. Debug faster and dig deeper, without any of the https://x.com/LangChainAI/status/1956411858312949946

🧠🤖Deep Agents — now in JavaScript! Simple tool-calling loops break down on long-horizon or intricate problems. Deep Agents, like Deep Research, Claude Code & Manus, chain reasoning, adapt plans, and juggle tools to get results. Now you can build your own using our JS”” / X https://x.com/LangChainAI/status/1957478324554395998

🚀 @SkySQL just cracked the code on hallucination-free SQL generation. Using @LlamaIndex, they built AI agents that turn natural language into accurate SQL queries across complex database schemas. Key wins: ✅ Zero hallucinated queries ✅ Faster development cycles ✅ Seamless https://x.com/llama_index/status/1955660667798933672

Access it with the Cline provider https://x.com/cline/status/1958017104369840500

ADHD “second brain” with n8n. GitHub link now live. A little while back, we shared a story about someone using n8n to manage the everyday struggles of ADHD with small, gentle automations. Now, they’ve put all their workflows on GitHub so anyone can check them out and start https://x.com/n8n_io/status/1955585194724839924

Changelog: https://x.com/cline/status/1956383188089221370

Docs: https://x.com/cline/status/1957670675415724284

Fun Fact: Responses API was built specifically for a world where models would be doing complex tasks with many tool calls – glad to hear the folks at @augmentcode are getting the most out of it!”” / X https://x.com/sherwinwu/status/1957659638834593831

High-end motion website fully built with @lovable_dev 🔥 Crazy what you can create with good taste. Shoutout to @viktoroddy for bringing this to life. Live preview in the comments. https://x.com/felixhhaas/status/1954907347777974710

How do we measure AI fluency at Zapier? Here are some role-by-role examples 🧵 https://x.com/wadefoster/status/1930680089651425452

I’ll be representing the @cline team at the @cerebral_valley AI Fintech Hackathon this weekend, August 23rd and 24th. Participants will receive Cline Credits, winners will receive MORE Cline Credits! If you’re in town come say hi! https://x.com/inferencetoken/status/1957937729188266432

I’ve been building on the internet for 20+ years. Every time the tools get easier, the ideas get bigger 💪💻 @VibeCodeApp just took a massive leap forward, building an app that can turn any idea into a working product in minutes. NO code & NO barriers. https://x.com/alexisohanian/status/1955751219684872324

in V3.1, DeepSeek focuses on two new use cases – SWE agent – Search Agent The former is a natural evolution of their whole aspiration that started with DeepSeek-Coder. Today, “”just grok it in a chat window”” is not cutting it. The latter is an evolution of their web tool. https://x.com/teortaxesTex/status/1958750497965302118

LLMs need focus, because attention isn’t enough. (thread on the Focus Chain, Cline’s link to persistent context) https://x.com/cline/status/1956394230357877209

More json prompting tips: https://x.com/_philschmid/status/1956351661229703246

New stealth model in Cline: “”Sonic”” Designed for coding & free to use, because your usage helps improve the model while it’s in alpha. https://x.com/cline/status/1958017077362704537

Run LLM-generated Python code safely. @LangChainAI + @daytonaio demo → secure sandbox, code generation, file ops, auto-cleanup. Full guide: https://x.com/daytonaio/status/1958907262334116004

very common question we are getting since the latest @cline release: “”what’s the best way to manage context in cline?”” honestly? you really don’t need to manage it at all. we’ve built cline to manage it for you. the focus chain & auto compact persist the important context https://x.com/nickbaumann_/status/1957669736491470999

You can now launch agents directly from Linear. Delegate an issue or mention @cursor in a comment to start an agent’s work on the task. https://x.com/cursor_ai/status/1958627514852811034

Hitting context limits used to mean losing your conversation history and starting over. Auto Compact changes this. When Cline approaches token limits, it automatically creates a comprehensive summary preserving all technical decisions and code changes, then continues exactly https://x.com/cline/status/1957670663508124073

Cursor CLI now includes MCPs, Review Mode, /compress, @-files, and other UX improvements. https://x.com/cursor_ai/status/1956458242655281339

It is crazy to think that MCP was only released in November. I summarized the launches & announcements from Microsoft, Replicate, Sentry, AgentOps, Spotify, Globant, Jira, Filecoin, Dify, and more this week! 🧵 (save for later) https://x.com/AtomSilverman/status/1956148199783326195

This Claude MCP AI agent writes better posts than your $5,000 ghostwriter while I was doom-scrolling TikTok at 4am, it analyzed my entire content history, found 12 psychological triggers, and built me a content blueprint that actually converts. What agencies charge $15K for https://x.com/aryanXmahajan/status/1955661629280199080

Announcing our $70M Series B co-led by @stripe and Addition, and with participation from @USV, @firstround, @BloombergBeta, @BoxGroup, @RibbitCapital, and other top investors. We also recently shipped two AI-native tools: Stedi Agent and MCP server. For more, check below. ⬇️ https://x.com/stedi/status/1956002043342078342

1️⃣ Convert any collection of documents into an interactive MCP server through LlamaCloud 2️⃣ Convert any document workflow into an MCP server through LlamaCloud – codify a repeatable process that the user can easily trigger, without complex prompting! 3️⃣ Build a custom agentic https://x.com/jerryjliu0/status/1957873536456093903

We have a new comprehensive Model Context Protocol (MCP) documentation section, to help you connect your AI applications to external tools and data sources through a standardized interface. 🔌 Learn how MCP works – connecting LLMs to databases, tools, and services through a https://x.com/llama_index/status/1957840992360710557

Model Context Protocol (MCP), clearly explained (with visuals):”” / X https://x.com/_avichawla/status/1956966727042154846

🚀 Qwen Chat Desktop for Windows is here! 💻 All the power of Qwen Chat — now with MCP support for smarter, faster agents. ⚡ Run up MCP Servers, supercharge your productivity, and stay in control. 📥 Download now → https://x.com/Alibaba_Qwen/status/1956399490698735950

There’s been a lot of Discourse about Qwen’s rejection of hybrid paradigm. “”Did DeepSeek fall for the hybrid meme?”” But hybrids make *so much sense* if you’re building a fast, economical SWE agent, which is exactly what 3.1 is for. It’s all been for Aider, Claude Code, MCPs. https://x.com/teortaxesTex/status/1958437173948023127

What is the best all-in-one platform for vibe coding games and interactive experiences? Has anyone made sharing & discovery a first class citizen too — so you can play & remix community creations?”” / X https://x.com/bilawalsidhu/status/1958240030637302223

We designed Line with everything we learned about how best in class AI agents are built. – Use code. The best AI experiences are too complex to be built any other way. – Build iteratively with evals. Ultimately evals define the capabilities of any AI product (agents or models) -“” / X https://x.com/bclyang/status/1957868316711846236

ARC-AGI-3 Preview: +3 Games Released We’ve opened 3 previously private holdout games from the Preview Agent Competition Now 6 games are available to play online and via Agents API Each game was selected to expand the novelty of ARC-AGI-3 public games Can you beat them? https://x.com/arcprize/status/1958597816823202216

i am still 100% convinced if you are getting 100x faster you are delusional or you were so bad at programming to begin with let me explain 1. 100x = 3.5 days == what you did in 1 year delusional or really really bad at programming”” / X https://x.com/ThePrimeagen/status/1957973911544463397

This is cool. AI agents that survive Snowglobe don’t just “pass tests.” They get smarter with every failure. More resilient. More reliable. More real-world ready. Huge win by @guardrails_ai.”” / X https://x.com/alex_prompter/status/1956360410862354435

We’re expanding the Epoch AI Benchmarking Hub with five new external benchmarks: TerminalBench, DeepResearchBench, METR Time Horizons, GSO, and WebDevArena! These benchmarks test AI’s ability to perform complex tasks through coding or tool use. 🧵 https://x.com/EpochAIResearch/status/1956384193891688625

🚨 Leaderboard Update Claude Opus 4.1 Thinking by @AnthropicAI debuts in the Text & WebDev Arenas – going straight to the top. 🚀 A few highlights: 💠Claude Opus 4.1 is now the only model to rank #1 across all major categories 💠#1 Overall, tied with three other models: https://x.com/lmarena_ai/status/1957473753337889079

Inspired by Zapier, we created a role-by-role AI fluency chart at The Rundown Even as an AI-first startup, frameworks like this are very helpful to *set the standard* for new hires Highly recommend you do the same at your company! https://x.com/rowancheung/status/1957500035266146633

GPT-5 just finished Pokémon Red! 6,470 steps vs. 18,184 for o3! Check the stats site to compare! That’s a huge improvement! Well done, @OpenAI you cooked with GPT-5. What an incredible model. Next up: GPT-5 vs. Pokémon Crystal (16 Badges + Red). The run starts soon on Twitch. https://x.com/Clad3815/status/1955980772575268897

gpt-5 plays Pokémon — 3x faster progress than o3:”” / X https://x.com/gdb/status/1956026116944355624

Test‑time scaling Best‑of‑3 and pass@3 markedly boost AFM, e.g., GAIA 69.9 and HLE 33.2, closing the gap with larger proprietary agent stacks. Overall, Chain-of-Agents enables training single-agent foundation models that natively simulate multi-agent collaboration, combining https://x.com/omarsar0/status/1958186655552245839

Where Agents Fail Typical errors include missing referenced files, skipping required actions, wrong tool choice (e.g., trying to “create PDF” directly instead of writing in Word, then converting), and poor planning order. File creation/editing in docx/xlsx is particularly https://x.com/omarsar0/status/1956325872908247220

We’re excited to share that @TectonAI will soon join Databricks, providing enterprises with fast, reliable, real-time data for deploying AI agents. Tecton’s technology helps enterprises leverage their mission-critical data to power AI agents for critical use cases. Bringing https://x.com/databricks/status/1959041076087726523

Agent Bricks | Databricks on AWS https://docs.databricks.com/aws/en/generative-ai/agent-bricks/

Introducing Parallel | Web Search Infrastructure for AIs | Parallel Web Systems | Enterprise Deep Research API https://parallel.ai/blog/introducing-parallel

Command A Reasoning: Enterprise-grade control for AI agents https://cohere.com/blog/command-a-reasoning

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and”” / X https://x.com/deepseek_ai/status/1958417062008918312

I get ~10 spam calls per day (various automated voicemails, “”loan pre-approval”” etc) and ~5 spam messages per day (usually phishing). – I have AT&T Active Armor, all of the above still slips through. – All of the above is always from new, unique numbers so blocking doesn’t work.”” / X https://x.com/karpathy/status/1957574489358873054

Lakebase | Databricks – The first serverless Postgres database integrated with the lakehouse, built for the AI era. https://www.databricks.com/product/lakebase

Firebase Data Connect now supports full-text search, powered by Postgres 🔎 This means you can quickly and efficiently locate information within large datasets by searching for keywords and phrases across multiple columns at once. You can also fine-tune your search in multiple https://x.com/Firebase/status/1955678282755535223

For over a year, @jeremyphoward has been in stealth mode. In this exclusive talk, he showcases what he’s been working on. He & @johnowhitaker show us SolveIt, a new dev environment and programming paradigm. 🤯 Imagine this workflow: – Build a web app & interact with its UI https://x.com/HamelHusain/status/1956514524628127875

The GitHub Copilot coding agent is now available on every page of GitHub. Delegate any coding task, anywhere you are across GitHub to the coding agent. 🎉 Engage coding agent via: 1. Global launcher (new) 2. Assign an issue 3. Delegate from VS Code 4. https://x.com/lukehoban/status/1958022776578797984

You can now delegate tasks to GitHub Copilot coding agent from any page on GitHub 🤖 Open the new Agents panel in one click, write a simple prompt, then hit Enter. GitHub Copilot works in the background, and opens a PR for your review. No interruptions to your workflow required. https://x.com/github/status/1957894152412082643

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory https://m3-agent.github.io/

AGENTS.md is quickly becoming a popular way to share instructions with coding agents in your repo.
Now supported in Cursor, Amp, Jules, Factory, RooCode, and Codex. https://x.com/OpenAIDevs/status/1957925682048336354

The @code team is hard at work improving the agent prompts – always. If you’ve got Insiders you can try the new agent system prompt for GPT-5 today. Let us know how it works! https://x.com/burkeholland/status/1958216086274330890

The new ChatGPT connectors are really useful! Chat can now access Gmail, Google Cal, and Drive, so it can: -Skim unread emails & give a summary -Summarize threads + draft replies -Pull key info from old convos -Do meeting prep + agendas Prompts below: https://x.com/rowancheung/status/1957119886821388340

We’ve been building the pieces for years. Projects, AI Agents, Automations. Today, the dots connect. Introducing 🧬 Taskade Genesis Preview • One prompt → a full-stack AI app • Powered by your Workspace • Supercharged with GPT-5 Reply `Genesis` for early access 🚀 https://x.com/Taskade/status/1954303801059688576

Want to build an AI Agent? I made a free cookbook for creating your own news research agent with open-weight GPT-OSS models — no GPU, no setup. Searches news → pulls articles → summarizes w/ sources → runs in a Gradio chat UI. https://x.com/fdaudens/status/1956006950249906593

AGENTS.md https://agents.md/

Codex CLI now works with your ChatGPT login, with generous GPT-5 use included in the plus and pro plans. $ brew install codex $ codex It’s that simple.”” / X https://x.com/thsottiaux/status/1957133984657481956

Figma canvas to build AI agent workflows. Sim is a lightweight, user-friendly platform for building AI agent workflows in minutes. It natively supports all major LLMs, Vector DBs, etc. 100% open-source with 7k+ stars! https://x.com/_avichawla/status/1957691571908038717

Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*! https://x.com/lvwerra/status/1957832240416580024

I tried @Alibaba_Qwen Qwen3-Coder today inside @cline . Very impressed. It helped me solve a tricky deployment: putting a Dockerized vibe-coded project onto https://x.com/chunhualiao/status/1956957519315956074

Chain-of-Agents End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL https://x.com/_akhaliq/status/1958188925333189110

Apple’s AI Turnaround Plan: Robots, Lifelike Siri, Home Security Cameras (AAPL) – Bloomberg
https://www.bloomberg.com/news/articles/2025-08-13/apple-s-ai-turnaround-plan-robots-lifelike-siri-and-home-security-cameras

Mark Gurman on X: “BREAKING: Apple prepares ambitious AI devices comeback with multiple robots, smart speaker with a screen, lifelike version of Siri with conversational abilities, redesigned Siri, new Home OS, major home security push & more. Details on the plans here — https://t.co/KsQIrKl4wI” / X
https://x.com/markgurman/status/1955695572913995841

Some signs that catching up in the AI model space is rapidly becoming challenging for even the most highly capitalized companies. https://x.com/emollick/status/1955950539797119463

Tencent’s Hunyuan team dropped big models: —Hunyuan-GameCraft for generating playable videos from a single scene image and user actions —Hunyuan-Large-Vision, a versatile and powerful multimodal understanding model https://x.com/adcock_brett/status/1957111107933409607

AI in HR: in an experiment with 70,000 applicants in the Philippines, an LLM voice recruiter beat humans in hiring customer service reps, with 12% more offers & 18% more starts. Also better matches (17% higher 1-month retention), less gender discrimination & equal satisfaction. https://x.com/emollick/status/1957465671748448738

🚀 Introducing #AutoCodeBench by Tencent Hunyuan! We built the first fully automated LLM–sandbox workflow to create high-difficulty, multilingual, balanced & diverse code benchmarks — no human annotation required！ We’re open-sourcing a suite of related projects: 🔹 https://x.com/TencentHunyuan/status/1957751900608110982

99% of AI testing = shallow demos that only work in perfect conditions. Snowglobe flips that. It’s like crash-testing your AI agents with thousands of real-world edge cases.. over and over again.. until they actually get better. This is massive. Props to @guardrails_ai”” / X https://x.com/godofprompt/status/1956359876109652297

Day 3 of #CodingWithGLM 🤝 @SST_dev opencode GLM-4.5 is now live on the @SST_dev opencode platform! 🚀 Access the ultimate developer advantage: we tested on SWEBench-Verified-Mini, a 50-datapoint subset of SWEBench-Verified. The results confirm our model is a powerful https://x.com/Zai_org/status/1956335531555721345

What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog 🧵 https://x.com/KLieret/status/1958182167512584355

We apply ComputerRL to the open-source GLM-4-9B-0414 model and evaluate its performance on the OSWorld benchmark. Our AutoGLM-OS-9B, built upon GLM-4-9B-0414, achieves state-of-the-art accuracy and demonstrates substantial improvements for general-purpose agents in desktop https://x.com/Zai_org/status/1958175307019829754

We’re leading Sola’s $17M Series A. Most companies are still figuring out how to get real value from AI, meanwhile manual back-office work continues to quietly drain time and resources. Sola is an AI-native platform that makes AI bots to automate these workflows. Their agents”” / X https://x.com/a16z/status/1956015900408099206

Databricks just signed a Series K term sheet at >$100B valuation to scale two flagship products: 🔥 Lakebase — serverless Postgres with true compute/storage separation 🧠 Agent Bricks — agentic framework with built-in reasoning guardrails for enterprise data”” / X https://x.com/alighodsi/status/1957795160416309717

Command A Reasoning @cohere is now available in anycoder https://x.com/_akhaliq/status/1958602589681197494

Command A Reasoning is here! It’s designed to tackle complex enterprise tasks like deep research and data analysis. 🔥 As part of our commitment to the research ecosystem, we’re releasing the model weights. 🎉”” / X https://x.com/Cohere_Labs/status/1958576284763611322

Command A Reasoning looks somewhat interesting unfortunately with bad license and only for private use if you don’t want to pay https://x.com/scaling01/status/1958561844810903708

Introducing Command A Reasoning, our most advanced model for enterprise reasoning tasks. https://x.com/cohere/status/1958542682890047511

@deepseek_ai 3.1 reasons to get hyped about DeepSeek v3.1 1: Hybrid reasoning 2: Agentic tool use 3: Improved coding 3.1: Best-in-class latency on Baseten https://x.com/basetenco/status/1958515897972232526

@nrehiew_ That’s not why it’s because reasoning uses up context length too fast to get to the end of an agentic coding loop”” / X https://x.com/Teknium1/status/1958898159326765075

DeepSeek trained its agentic coder as a non reasoner. There is a reason Anthropic evaluated Opus 4.1 without thinking on SweBench, Claude Code has thinking off by default and Qwen released Qwen Coder for Qwen code as a non reasoner. We do not need reasoning for Agentic Coding. https://x.com/nrehiew_/status/1958838487895117956

DeepSeek-V3.1 officially released! Key highlights of the update: – hybrid thinking model – more efficient reasoning – improved reasoning for search – better tool calling and agentic capabilities – improvements on many benchmarks: SWE-Bench: 44.6% -> 66%, Aider Polyglot https://x.com/scaling01/status/1958438863279681824

We have done 2 cohorts and ~800 people have meaningfully engaged with the AI evals course. Hamel shared a bunch of testimonials with me yesterday. I was really astounded that they are not just generic testimonials; people mentioned very specific results and concepts. It seems https://x.com/sh_reya/status/1957139727322411291

90% of Games Developers Already Using AI in Workflows, According to New Google Cloud Research – Aug 18, 2025 https://www.googlecloudpresscorner.com/2025-08-18-90-of-Games-Developers-Already-Using-AI-in-Workflows,-According-to-New-Google-Cloud-Research

Gemini 2.5 Pro, now generally available in @code”” / X https://x.com/code/status/1958238346313863263

This also mention that Gemini is a hybrid reasoning model, i don’t think this was confirmed before”” / X https://x.com/eliebakouch/status/1958603730951029157

🤯Imagine this: – Build a web app & interact with it on the same screen as your code – Use live vars from your REPL directly in AI – Turn any function into AI tool instantly A live environment fusing the best ideas from Literate Programming, Smalltalk, & Jupyter.”””” / X https://x.com/jeremyphoward/status/1956517085603127412

Over the past few months, we brought AI Mode in Search to the US, India, and the UK. Now, we’re rolling out our most powerful AI search experience to 180+ new countries and territories in English. Excited to expand access to even more languages and regions soon, so stay tuned! https://x.com/rmstein/status/1958552694626607616

Introducing 𝘃𝗶𝗯𝗲-𝗹𝗹𝗮𝗺𝗮 to streamline your LlamaIndex development with context-aware coding agents. A command-line tool that that automatically configures your favorite coding agents with up-to-date context and best practices about LlamaIndex framework, LlamaCloud and https://x.com/llama_index/status/1958656414295237014

Does Microsoft Copilot use the same GPT-5 router as OpenAI does? I can’t get their “”GPT-5″” to pass me to any good model unless it is explicitly a coding or math task, with no indication of which model I get, which makes the quality of outputs feel very uneven in confusing ways. https://x.com/emollick/status/1957799294544621753

Is text-only information enough for LLM/VLM Web Agents? 🤔 Clearly not. 🙅‍♂️ The modern web is a rich tapestry of text, images 🖼️, and videos 🎥. To truly assist us, agents need to understand it all. That’s why we built MM-BrowseComp. 🌐 We’re introducing MM-BrowseComp 🚀, a new https://x.com/GeZhang86038849/status/1958381269617955165

Announcing the http://AGENTS.md working group: a single, open standard to guide how coding agents work in your codebases. We’re working with @OpenAI and other industry partners to set this vendor‑neutral standard. https://x.com/FactoryAI/status/1957926852020039767

GPT-5 is the most significant product release in AI history, but not for the reason you might think. What it signals is that we’re moving from the “”bigger model, better results”” era to something much more nuanced. This is a genuine inflection point. The fact that people call a”” / X https://x.com/douwekiela/status/1955329657852834207

GPT-5 makes building easy, @skirano shows how. https://x.com/OpenAI/status/1958217649248493918

Six tips for coding with GPT-5: https://x.com/OpenAIDevs/status/1956438999364768225

We’ve made some “”beastly”” upgrades to our GPT agent prompt and we’re seeing big improvements in completion rates across scenarios. You can use it today in Insiders with any GPT model… “”https://t.co/z73dTvWOwB.alternateGptPrompt.enabled””: true, “”chat.todoListTool.enabled””: https://x.com/code/status/1955322927886274928

The OpenAI Playground has improved a lot recently. I’ve been using it to test GPT-5 on new use cases. Watch how I use it to chat with internal docs via MCP tools. It uses the vector store feature too. Testing out the Prompt Optimizer and Evaluation features next. https://x.com/omarsar0/status/1956459233039233528

Beyond GPT-5 Avengers‑Pro outperforms GPT‑5‑medium by about 7% average accuracy; with comparable accuracy, it reduces cost by about 27%. Proper routing frameworks make a difference. Here are my notes: https://x.com/omarsar0/status/1958897458408563069

The pro models (GPT-5 Pro, Gemini 2.5 Deep Think, Grok 4 Heavy) can be impressive in ways that are hard to see. They take a lot of time to answer questions & are built for very hard problems that require expert evaluation. That is a narrow, but, also very valuable, problem space.”” / X https://x.com/emollick/status/1955902962288746657

GPT-5 behind chinese models like Kimi-K2 and Qwen3-235B on coding https://x.com/scaling01/status/1956404452442681829

GPT-5-mini high shows no improvement over o4-mini and behind top chinese models like Kimi-K2, GLM-4.5, Qwen3-235B and DeepSeek-R1 https://x.com/scaling01/status/1956405559978029061

Perplexity Finance is relentlessly disrupting finance for Indian stocks. Enjoy stock screening with simple natural language search. Goal is to help every investor in India, small and big, free and paid users. 💫🇮🇳 https://x.com/AravSrinivas/status/1958385027185877066

Comet is now available for all US-based Perplexity Pro users. Browse at the speed of thought. https://x.com/perplexity_ai/status/1955684209483534657

Perplexity Finance now supports India markets. Find market summaries, follow the latest news, deep-dive on BSE and NSE stocks, and track Indian company earnings. Perplexity Finance defaults to Indian markets for users in India, or can be found using the toggle on the home page. https://x.com/PPLXfinance/status/1955613694047420688

Perplexity Max subscribers have access to Max Assistant mode on Comet. At this moment, it’s the closest approximation to something like Claude Code for the browser. Capable of running long-horizon research tasks contextually to what you’re reading.”” / X https://x.com/AravSrinivas/status/1958238462504824959

Price Alerts for Indian stocks are now available to all Perplexity Pro and Max users in India! Currently supported on web and mobile web. Mobile app support and availability for free users coming shortly! 🇮🇳 https://x.com/AravSrinivas/status/1958018286622244896

Stock screening on Indian stocks is now available to all Perplexity Finance users. It works with natural language – just type what output, filters, and sorting you want. Available on web, mobile web, and mobile apps. https://x.com/jeffgrimes9/status/1958364311178674232

I often rant about how 99% of attention is about to be LLM attention instead of human attention. What does a research paper look like for an LLM instead of a human? It’s definitely not a pdf. There is huge space for an extremely valuable “research app” that figures this out.”” / X https://x.com/karpathy/status/1943411187296686448

Firefox adds LLM capabilities for add-ons via llama.cpp https://x.com/ggerganov/status/1957844552150110227

Semantic compression beats raw long context Chunk-level summaries in RAG not only matched or outperformed long-context baselines but did so with ~20% of the tokens. Well-structured summarization improves retrieval precision, reduces noise, and can even shorten execution steps. https://x.com/omarsar0/status/1956325856265326923

This is really exciting and impressive, and this stuff is in my area of mathematics research (convex optimization). I have a nuanced take. (re Claim: gpt-5-pro can prove new interesting mathematics.
)🧵 (1/9)”” / X https://x.com/ErnestRyu/status/1958408925864403068

@Prashant_1722 @TechByMarkandey @snowglobe_so generally available right now! https://x.com/ShreyaR/status/1956396368270074217

the biggest feature request on Snowglobe continues to be an SDK to kick off simulations all i can say is we’re in the kitchen cooking, and this is going to come very very soon!”” / X https://x.com/ShreyaR/status/1958949657792614675

AI Agents are terrible at long-horizon tasks. Even the new GPT-5 model struggles with long-horizon tasks. This is one of the most pressing challenges when building AI agents. Pay attention, AI devs! This is a neat paper that went largely unnoticed. Here are my notes: https://x.com/omarsar0/status/1956325762719797266