Agents and Copilots: AI News Week Ending 08/08/2025

Agents and Copilots: AI News Week Ending 08/08/2025

August 8, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Ornate showgirl glamour in orange-and-teal tones, sequined feather-fan stage scene featuring a glamorous chorus line dancer holding a glowing holographic assistant orb, stylized text “Agents” in glittering marquee lights above the stage; spotlit, dramatic contrast, vintage grain, cinematic, high-detail

Introducing Wide Research https://x.com/ManusAI_HQ/status/1950939617647333444

“”Claude Code can now handle long-running tasks in the background. Start your dev server, run tests, or build your project without blocking your workflow https://x.com/_catwu/status/1953926541370630538

Agentic commerce has arrived https://shopify.dev/docs/agents

A compilation of experiences I made with GPT-5 in one shot. The poem camera app is particularly impressive because the model came up with all the details, like the way the photos stack in the gallery, the photo developing animation, etc https://x.com/skirano/status/1953516768317628818

The GPT 5 launch included a chart showing 52.8 as a bigger number than 69.1, which in turn is shown as the same magnitude as 30.8. Not quite ASI… https://x.com/jeremyphoward/status/1953509671446196715

entering the fast fashion era of SaaS very soon”” / X https://x.com/sama/status/1952084574366032354

someday soon something smarter than the smartest person you know will be running on a device in your pocket, helping you with whatever you want. this is a very remarkable thing.”” / X https://x.com/sama/status/1952879515287601465

Something I think about a lot: who knows how many brilliant ideas never saw the light of day because “”I don’t know how to do that.”” Pretty crazy to think that with AI everyone now has a reasonable VC advisor, coder, or professor on hand to teach you about anything you want”” / X https://x.com/mustafasuleyman/status/1951323569905934427

Red teamers assemble! ⚔️💰 We’re putting $500K on the line to stress‑test just released open‑source model. Find novel risks, get your work reviewed by OpenAI, Anthropic, Google, UK AISI, Apollo, and help harden AI for everyone.”” / X https://x.com/woj_zaremba/status/1952886644090241209

We’re launching a $500K Red Teaming Challenge to strengthen open source safety. Researchers, developers, and enthusiasts worldwide are invited to help uncover novel risks—judged by experts from OpenAI and other leading labs. https://x.com/OpenAI/status/1952818694054355349

Need a table? Just ask. Perplexity is partnering with @OpenTable to bring restaurant reservations directly into Perplexity products. https://x.com/perplexity_ai/status/1952434779036774488

The bluster around this issue reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud.”” / X https://x.com/perplexity_ai/status/1952532113095643185

Claude Opus 4.1 (“”claude-leopard-v2-02-prod””) “”Opus 4.1 is here – Try our latest model for more problem solving power.”””” / X https://x.com/btibor91/status/1952366658326036781

Claude Opus 4.1 \ Anthropic https://www.anthropic.com/news/claude-opus-4-1

Claude Opus 4.1 beats GPT-5 on SWE bench https://x.com/Sauers_/status/1953504854044704973

Claude Opus 4.1 is available in Cursor! Let us know what you think.”” / X https://x.com/cursor_ai/status/1952782293925298655

Going live with the fellas @tbpn in an hour to talk about Opus 4.1 and Claude Code”” / X https://x.com/alexalbert__/status/1952801100299681959

GPT-5 (medium reasoning) is the new leader on the Short Story Creative Writing benchmark! GPT-5 mini (medium reasoning) is much better than o4-mini (medium reasoning). Claude Opus 4.1 shows gains over Opus 4. https://x.com/LechMazur/status/1953658077300875656

Earlier this summer, we told you that AI voice agents are hot. For an idea of just how hot: Andreessen Horowitz is backing EliseAI, which makes AI voice agents for property mgmt + healthcare, at a $2B valuation. w/ @srimuppidi @coryweinberg https://x.com/steph_palazzolo/status/1952740505747382364

China’s ByteDance just released an LLM-based agent for general purpose software engineering tasks. Trae Agent comes with an interactive CLI that can execute complex workflows using simple English prompts. It works with OpenAI and Anthropic API. 100% opensource. https://x.com/Saboo_Shubham_/status/1942047679758151783

Google’s AI coding agent Jules is now out of beta | TechCrunch https://techcrunch.com/2025/08/06/googles-ai-coding-agent-jules-is-now-out-of-beta/

(3) GPT-5 Hands-On: Welcome to the Stone Age https://www.latent.space/p/gpt-5-review

(3) GPT-5’s Router: how it works and why Frontier Labs are now targeting the Pareto Frontier https://www.latent.space/p/gpt5-router

@aidan_mclau @cursor_ai The straight up GPT-5 in Codex CLI fixed a bug in 3 minutes that I was working on for three or four hours this morning…can’t wait to try in Cursor.”” / X https://x.com/sound4movement/status/1953583522587017345

💥 It’s here! GPT-5 is rolling out in ChatGPT for everyone, starting today. It’s a 🤯 good model, and we’ve simplified the UI alongside it. No more choosing between gpt-4o and o4-mini. When you ask a hard question and the model needs to think hard, it does. When it can give you”” / X https://x.com/kevinweil/status/1953502681181618277

AMA with @sama + some members of the GPT-5 team Tomorrow 11am PT. https://x.com/OpenAI/status/1953548075760595186

Codex CLI + GPT-5:”” / X https://x.com/gdb/status/1953556751762288653

Does OpenAI not do basic integration testing? At the time of release, the first code sample provided in the GPT-5 docs could not be run, because someone accidentally deleted the `output_text` property. My CI notified me. Why didn’t theirs? https://x.com/jeremyphoward/status/1953610071654772985

going to try live-tweeting the GPT-5 livestream. first, GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not. it is very smart, intuitive, and fast. it is available to everyone, including the free tier, w/reasoning!”” / X https://x.com/sama/status/1953502614676811865

GPT-5 (medium reasoning) sets a new record on the Confabulations/Hallucinations on Provided Texts benchmark! https://x.com/LechMazur/status/1953582063686434834

GPT-5 claims #1 spot on LiveBench https://x.com/scaling01/status/1953602929375813677

gpt-5 for long context reasoning:”” / X https://x.com/gdb/status/1953747271666819380

GPT-5 gets 74.9 on SWE-bench. Wonder what the budget per task is. https://x.com/OfirPress/status/1953502998627221519

GPT-5 in the high reasoning setting hit the 100K token limit for our evaluations on 10/290 Tier 1-3 samples (3%). This means our evaluation might slightly underestimate the reasoning capabilities of GPT-5.”” / X https://x.com/EpochAIResearch/status/1953615908695314564

GPT-5 is extremely sensitive to instructions. Either give it demonstrations or tell it explicitly how you want the output. Avoid doing both. If you do, GPT-5 will override the examples with your output instructions. Sharing more just in case you face this issue:”” / X https://x.com/omarsar0/status/1953876255037612531

GPT-5 is here – and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to @OpenAI on this https://x.com/lmarena_ai/status/1953504958378356941

GPT-5 is here! 🚀 For the first time, users don’t have to choose between models — or even think about model names. Just one seamless, unified experience. It’s also the first time frontier intelligence is available to everyone, including free users! GPT-5 sets new highs across”” / X https://x.com/ElaineYaLe6/status/1953607005144506454

GPT-5 is here. Rolling out to everyone starting today. https://x.com/OpenAI/status/1953504357821165774

GPT-5 is live in Cline. We’ve been working with OpenAI to get this model ready, and here’s our take: it’s disciplined, persistent, & highly competent. It’s collaborative in planning & and a diligent operator while acting. It plans thoroughly, asks optioned follow-ups when https://x.com/cline/status/1953525433808695319

GPT-5 is now available in Cursor. It’s the most intelligent coding model our team has tested. We’re launching it for free for the time being. Enjoy!”” / X https://x.com/cursor_ai/status/1953519580627742750

GPT-5 is now available on Perplexity and Comet for Max and Pro subscribers. Just ask. https://x.com/perplexity_ai/status/1953537170964459632

GPT-5 new SOTA on WeirdML beating o3-pro https://x.com/scaling01/status/1953919743842238472

GPT-5 only a 3% improvement over o3 at reproducing scientific papers https://x.com/scaling01/status/1953503883331846629

GPT-5 pricing is insane IT’S OVER https://x.com/scaling01/status/1953509084008710547

GPT-5 rollout updates: *We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. *We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for. *GPT-5 will seem smarter starting”” / X https://x.com/sama/status/1953893841381273969

GPT-5 sentiment from the trenches (AKA 24 hours in Cline users’ hands): It’s a precision instrument, not a Swiss Army knife. Give it detailed prompts and it delivers exactly what you asked for — no tangents, no hallucinations about “”finished”” code. However, it’s less performant https://x.com/cline/status/1953898747928441017

GPT-5 sets a new record on FrontierMath! On our scaffold, GPT-5 with high reasoning effort scores 24.8% (±2.5%) and 8.3% (±4.0%) in tiers 1-3 and 4, respectively. https://x.com/EpochAIResearch/status/1953615906535313664

GPT-5 system card capability evals reactions thread. First observation: ~no improvement on all the coding evals that aren’t SWEBench https://x.com/eli_lifland/status/1953507434238288230

GPT-5 Thinking is less deceptive than o3 However when elicited to display deceptive behaviour it jumps to 28% https://x.com/scaling01/status/1953504438691221856

GPT-5 was doing 2B tokens per minute 3 hours after launch 🤯”” / X https://x.com/kevinweil/status/1953649263411704195

GPT-5 with big improvements in Tau-Bench except the airline category https://x.com/scaling01/status/1953505637242974695

GPT-5 with high reasoning effort on SimpleBench https://x.com/scaling01/status/1953771276549358041

GPT-5: $0.625/$5.00 with flex pricing is ridiculous https://x.com/scaling01/status/1953517149768593903

Hallucinations are almost gone with GPT-5 https://x.com/scaling01/status/1953507569609134506

ICYMI, OpenAI released an insane amount of guides on how to use GPT-5. > Examples > Prompting guide > New features guide > Reasoning tips > Setting verbosity > New tool calling features > Migration guide And much more. https://x.com/omarsar0/status/1953583336603234726

If GPT-5 made this chart I’m bearish 😭 https://x.com/iScienceLuvr/status/1953503815292092904

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness. https://x.com/METR_Evals/status/1953525150374150654

Introducing GPT-5 | OpenAI https://openai.com/index/introducing-gpt-5/

Introducing GPT-5 Our best AI system yet, rolling out to all ChatGPT users and developers starting today. https://x.com/OpenAI/status/1953526577297600557

Long context reasoning performance: A stand out is long context reasoning performance as shown by our AA-LCR evaluation whereby GPT-5 occupies the #1 and #2 positions. https://x.com/ArtificialAnlys/status/1953507713222422866

Lots of excitement about GPT-5 in Codex CLI via your ChatGPT plan. Some details: 1. Yes, if you sign in with ChatGPT, usage is included via your paid plan! 2. Still determining exact rate limits, but the goal is to be generous: — Pro users should basically not hit limits”” / X https://x.com/embirico/status/1953590991870697896

made a little Sankey to show you why I’m fuming ChatGPT Plus before vs after the GPT-5 release https://x.com/scaling01/status/1953780931552031056

Markets disappointed by GPT-5 OpenAI getting crushed on Polymarket https://x.com/scaling01/status/1953515099257282763

model switching in gpt-5 very cool!”” / X https://x.com/sama/status/1953526708742537220

New in Notion AI’s toolbelt: @OpenAI’s GPT-5 It’s fast, thorough, and handles complex work 15% better than other models we’ve tested. A great choice for tasks with multiple moving parts. Gradual rollout starting today. https://x.com/NotionHQ/status/1953506907924443645

OpenAI GPT-5 System Card released “”GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, https://x.com/iScienceLuvr/status/1953503173932724614

Priority Processing debuts with GPT-5. under-hyped imo for apps where millisecond matters, pay extra and get our fastest token speeds just add “”service_tier””: “”priority”” to your requests https://x.com/jeffintime/status/1953857260729643136

Quick PSA. Settings for minimizing GPT-5 latency (time to first token). “”service_tier””: “”priority””, “”reasoning_effort””: “”minimal””, “”verbosity””: “”low””. P50 TTFT with these settings is ~750ms. With the defaults, it’s >3s. The default settings are the right starting point for https://x.com/kwindla/status/1953868672470331423

RT @lmarena_ai: GPT-5 is here – and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Cre…”” / X https://x.com/aidan_mclau/status/1953517672941158577

Think harder is back! Routing changes in GPT-5 OpenAI means capability is moving from model selection to prompting https://x.com/dariusemrani/status/1953591404003045562

this is the detail of GPT-5 I’m most proud of GPT-4 launched at $30/$60, no cache discount since then, it’s been an unrelenting cross-team push to collapse the cost of intelligence. we’re nowhere near done”” / X https://x.com/jeffintime/status/1953534466854453751

We are actively evaluating GPT-5 models on document understanding capabilities 🔎📄 – specifically screenshotting the page and feeding it into the model. A WIP preliminary finding is that even though on paper GPT-5 is $1.25 per 1M tokens, it uses 4-5x more tokens than GPT-4.1, https://x.com/jerryjliu0/status/1953582723672814054

We’re also releasing v0.16 of the Codex CLI today. – GPT-5 is now the default model – Use with your ChatGPT plan – A new, refreshed terminal UI `npm i -g @openai/codex` to update”” / X https://x.com/OpenAIDevs/status/1953559797883891735

We’ve put together some guides on how to get started with GPT-5: 💬 Prompting guide: https://x.com/OpenAIDevs/status/1953528513480347840

What the hell man, this is such a lame way to technically not lie. «A unified system» is… literally just SEPARATE CoT + non-CoT models + a router. > OpenAI reasoning models, including gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano > gpt-5-main just fuck off washed https://x.com/teortaxesTex/status/1953512363031757048

A hypothesis: gpt-oss is trained entirely on synthetic data, from pre-training to post-training. The approach enhances safety and helps smaller models achieve better performance.”” / X https://x.com/huybery/status/1952905224890532316

attention is 0.84% of gpt oss, intelligence is stored in those 99.16% mlp layer, attn is key to unlock it https://x.com/shxf0072/status/1953143243992166849

BREAKING: OpenAI just released two open-weight models: gpt-oss-120b and gpt-oss-20b. The 120B model is on par with o4-mini on reasoning benchmarks and can run on a single 80GB GPU. The 20B model achieves similar results to o3-mini and can run on edge devices with 16GB of https://x.com/rowancheung/status/1952777754904072566

curious about the training data of OpenAI’s new gpt-oss models? i was too. so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were… pretty bizarre time for a deep dive 🧵 https://x.com/jxmnop/status/1953899426075816164

Everyone is sleeping on AMD for local models – gpt-oss 20B running on an AMD GPU @ 52 tok/sec in a <$1000 laptop https://x.com/dzhng/status/1953132623280165193

gpt-oss for entirely local tool use:”” / X https://x.com/gdb/status/1952802157956350221

gpt-oss https://gpt-oss.com/

gpt-oss is a big deal; it is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4-mini, that you can run locally on your own computer (or phone with the smaller size). We believe this is the best and most usable open model in the”” / X https://x.com/sama/status/1952778518225723434

gpt-oss is out! we made an open model that performs at the level of o4-mini and runs on a high-end laptop (WTF!!) (and a smaller one that runs on a phone). super proud of the team; big triumph of technology.”” / X https://x.com/sama/status/1952777539052814448

gpt-oss-120b & gpt-oss-20b Model Card | OpenAI https://openai.com/index/gpt-oss-model-card/

GPT-OSS-120B casually calculating the product of two random 30-digit numbers. without any tools, just 18k tokens https://x.com/scaling01/status/1952892387539259455

Just released gpt-oss: state-of-the-art open-weight language models that deliver strong real-world performance. Runs locally on a laptop! https://x.com/gdb/status/1952780717638942910

RT @ggerganov: Llama.cpp supports the new gpt-oss model in native MXFP4 format The ggml inference engine (powering llama.cpp) can run the…”” / X https://x.com/ggerganov/status/1952978670328660152

RT @OpenAIDevs: Student credits for gpt-oss With @huggingface, we’re offering 500 students $50 in inference credits to explore gpt-oss.…”” / X https://x.com/reach_vb/status/1953010091377958984

Frontier models, capable of agentic reasoning, can now run on your Macbook Pro 🧑‍💻 @OpenAI’s release of GPT-OSS 20B and 120B are the biggest releases in open-source this year. Build agentic workflows with @llama_index that run 100% locally! Huge props to @LoganMarkewich and https://x.com/jerryjliu0/status/1952883595787239563

GPT-OSS models seem to be slopmaxxed on math/coding and reasoning – they are great at that but they completely lack taste and common sense at least that’s my vibe so far”” / X https://x.com/scaling01/status/1952881329772564764

I think gpt-oss was always expected to be put in an agent harness that uses search for all its world knowledge. Ive always argued this is not a valid replacement, the rich connections it builds from actual backprop on the worlds knowledge – not just facts, but the aggregate”” / X https://x.com/Teknium1/status/1953230352568467761

I was just about to make a post that GPT-OSS-120B is nontheless an overall good for the very low end. But I honestly don’t know what it is good at, except benchmarks. Coding seems to suck, creative writing is terrible… So it’s just a math model? https://x.com/scaling01/status/1953047913954791696

I’m thrilled @OpenAI has released two open weight models. Thank you to all my friends at OpenAI for this gift! I’m also encouraged that from my quick tests gpt-oss-120b looks strong (though we should still wait for rigorous 3rd party evals).”” / X https://x.com/AndrewYNg/status/1952838045235126510

i’ve spent the last couple hours talking to gpt-oss and can safely say it’s unlike any model i’ve tested one second it’s coding for me at a professional level, the next it’s making up basic facts and clinging to them no matter what i say something very strange is going on”” / X https://x.com/jxmnop/status/1953216881361600729

I’ve written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI’s new OSS models. For those interested in the details: https://x.com/Guangxuan_Xiao/status/1953656755109376040

ICYMI: you can vibe test the latest gpt-oss models on gpt-oss[.]com 💥 We partnered with @OpenAI to bring easy access to the model right down to a browser near you! https://x.com/reach_vb/status/1953041435999010916

Introducing gpt-oss | OpenAI https://openai.com/index/introducing-gpt-oss/

Is it over for gpt-oss ? What are these Aider Polyglot scores? https://x.com/scaling01/status/1952780629772321257

It’s looking bad bois.. Aider Polyglot results for GPT-OSS-120B: 41.8% for comparison: Kimi-K2: 59.1% DeepSeek-R1: 56.9% Qwen3 32B: 40.0% https://x.com/scaling01/status/1953047534122713130

Our new @OpenAI open models https://x.com/polynoamial/status/1952778238368887184

Thank you @OpenAI for open-sourcing these great models! 🙌 We’re proud to be the official launch partner for gpt-oss (20B & 120B) – now supported in vLLM 🎉 ⚡ MXFP4 quant = fast & efficient 🌀 Hybrid attention (sliding + full) 🤖 Strong agentic abilities 🚀 Easy deployment 👉🏻”” / X https://x.com/vllm_project/status/1952784530466849091

The gpt-oss models have been post-trained to use two specific first-party tools: 1. a web browser that can search, read pages, follow links, and cite sources 2. an interactive python notebook This will give gpt-oss based agents super powerful capabilities out of the box! https://x.com/corbtt/status/1952810876165312805

We fixed some issues for @OpenAI’s gpt-oss model! 1. Jinja template has extra \n s, didn’t parse thinking sections + tool calling wasn’t rendered correctly 2. Some versions miss <|channel|>final -> this is a must! 3. F16 infs: use F32+BF16! We made a few free Colab notebooks as https://x.com/danielhanchen/status/1953901104150065544

Well, it took just 2 hours for OSS-GPT to hit #1 on @huggingface. Don’t remember seeing anything rise that fast! https://x.com/fdaudens/status/1952814865795698954

🚨 It’s official: OpenAI’s gpt-oss-120b & gpt-oss-20b just landed on Hugging Face! Brand new open-weight LLMs ready for anyone to try, fine-tune, and run anywhere. Here’s what makes this drop a big deal: https://x.com/fdaudens/status/1952781183575593234

And just like that, @OpenAI gpt-oss is now the number one trending model on @huggingface, out of almost 2M open models 🚀 People sometimes forget that they’ve already transformed the field: GPT-2, released back in 2019 is HF’s most downloaded text-generation model ever, and https://x.com/ClementDelangue/status/1952827283808375168

RT @satyanadella: Excited to bring OpenAI’s gpt-oss models to Azure AI Foundry and to Windows via Foundry Local. It’s hybrid AI in action:…”” / X https://x.com/xikun_zhang_/status/1952902211278913629

Ollama and @nvidia collaborate to accelerate gpt-oss on GeForce RTX and RTX PRO GPUs. NVIDIA and Ollama are advancing their partnership to boost model performance on NVIDIA GeForce RTX and RTX PRO GPUs. This collaboration enables users on RTX-powered PCs to accurately leverage https://x.com/ollama/status/1952782326926328313

Introducing Showrunner: the Netflix of AI From our South Park AI experiment to today we’ve believed AI movies/shows are a playable medium. We just raised a round from Amazon & more and the Alpha is live today Comment for an access code to make with all our shows. https://x.com/fablesimulation/status/1950589974262448505

concerning https://x.com/DZhang50/status/1953510507631071658

Developers, brace yourselves. @lovable_dev just dropped a wild new AI agent — it builds apps, games, and tools in under 10 minutes. No code. Just prompts. I built 50+ working apps and games. Here’s what I tried: Bubble Shooter Game (Fully Playable) https://x.com/ketan_tayal16/status/1948724087418769465

This is crazy… Lovable’s new agent writes better code than humans, works non-stop, never quits …and costs less than Netflix. Here 8 apps people built in under an hour: 1. Hand controlled shooter game https://x.com/AngryTomtweets/status/1948655404160102876

The future of shopping is here, brought to you by @Shopify and @Copilot. Imagine having a perfect shopping assistant in your pocket. Shop with conversation, not clicks – you’ll never go back to endless scrolling. Very excited to partner with the Shopify team. Lots more to come!”” / X https://x.com/mustafasuleyman/status/1952804181061799961

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/

Y Combinator on X: “https://t.co/0d2RAFb7in (@IronLedgerAI) provides AI agents for property accounting, starting with accounts payable. They’re already eliminating 100s of hours of work for 1000s of multifamily units every month. https://t.co/s1YWqqEAal https://t.co/UiUZnNRtlf” / X
https://x.com/ycombinator/status/1950647587549089921

🚀@RiselyAI’s agents automate admin work across college campuses. Their first product, the AI Advisor, unifies student data to flag at-risk students, deliver personalized support, and improve retention. Congrats on the launch, @shahryarsabbasi, @sadiasaifuddin1 & @danial_asif! https://x.com/ycombinator/status/1950602852751253983

🤔”(it’s a person demoing a local agent on their mac)” / X https://x.com/sama/status/1952767676922974463

✨ICYMI AG-UI’s Launch week The Agent-User Interaction protocol allows you to bring your AI agents into Frontend applications First party integratiosn with LangGraph, CrewAI, Mastra, AG2, LlamaIndex, Agno, Pydantic Here’s everything we shipped last week: 1. AG-UI CLI 📟 https://x.com/CopilotKit/status/1949848261541392844

🏗️Implementing deepagents: a technical walkthrough A deep dive into the source code of the `deepagents` package: – How to implement sub agents (including custom sub agents) – How to use LangGraph state as a virtual file system Video: https://x.com/hwchase17/status/1952408450878918834

💯 Excellent in both agentic use and text generation. Try it now at https://x.com/Zai_org/status/1952404744225349799

Announcing a new chapter for AI agents: Lindy 3.0. Our vision has always been the “”Al employee””: as capable as humans (can do anything on a computer) and as easy to use (just ask). 3.0 takes 3 giant steps in this direction, with Agent Builder, Autopilot, and Team Collaboration. https://x.com/Altimor/status/1952414217187086441

Be among the first to try GPT-5 in JetBrains AI Assistant and the coding agent Junie! 👉 https://x.com/jetbrains/status/1953501570017919424

Coding is the new black, Kimi is here to make it done with style. ：） https://x.com/crystalsssup/status/1953373854791049536

I heard @ampcode is better than Claude Code. That’s a high bar b/c CC is excellent So I used Amp and did comparisons and analysis. It’s solid Now I’m digging into the why with the CEO, @sqs Is it an incremental improvement or a paradigm shift? Join us ⤵️ https://x.com/isaac_flath/status/1952399160579366957

Its only real docker if it comes from the container region of France. Everything else is sparkling hypervisor.”” / X https://x.com/code_star/status/1953153930944446852

Jules can now open pull requests. Once it’s done with a task, you can ask Jules to bundle the changes, write a summary, and open a PR ready for review. From plan to code to commit to PR, all in one loop. Check out the latest in the Jules Changelog https://x.com/julesagent/status/1952446750167310456

Just found out that Agent Reinforcement Trainer hit #1 on GitHub trending repositories over the weekend! Lots of folks training their own models successfully! https://x.com/corbtt/status/1952477405265989652

Opencode, the popular AI coding agent, now supports Together AI models! https://x.com/togethercompute/status/1952495692557046141

RT @unwind_ai_: LangChain literally reverse-engineered Claude Code, Manus, and Deep Research. They packaged detailed system prompts, plann…”” / X https://x.com/hwchase17/status/1953318560870310063

The worst part about vibe coding is sitting around waiting for the model to be done”” / X https://x.com/OfficialLoganK/status/1951728111784997331

This N8N Youtube Agent generated $100k in 60 days While you’re still brainstorming titles and editing videos manually… This AI Agent is pulling video prompts from Google Sheets, generating videos with Google Veo3, writing SEO titles using GPT-4o, uploading to YouTube, and https://x.com/SubhaghV/status/1942664236863218147

We just wrapped up the @Cline × Cerebras hackathon—and it was amazing! Over 800 developers came together to try instant vibe coding powered by Cerebras’ lightning-fast inference. Here are the winners who crushed it with their projects:”” / X https://x.com/CerebrasSystems/status/1952511328964509794

here’s an open question we’re trying to figure out: which of these hypotheses is right? 1) models matter, agents don’t: Claude Code or Cursor CLI or any agent does well on top of a good model like opus / gpt-5, and does poorly on top of a bad model like gpt-4.1-nano 2) agents”” / X https://x.com/charliebholtz/status/1953833772513644771

I’m thrilled to announce the definitive course on Claude Code, created with @AnthropicAI and taught by Elie Schoppik @eschoppik. If you want to use highly agentic coding – where AI works autonomously for many minutes or longer, not just completing code snippets – this is it. https://x.com/AndrewYNg/status/1953097967361245251

LangChain supports Claude Opus 4.1 and Sonnet 4’s citable search results — now GA on the Anthropic API and Vertex AI. With this update, you can: • Return search results with built-in titles and source links • Automatically link citations to the tool call that generated them https://x.com/LangChainAI/status/1953863129915420719

New Claude Code features are here: Microcompact: Clear old tool calls to extend session length Subagents: @-mention support + model selection for agents PDF support: Read PDFs directly from your file system https://x.com/_catwu/status/1952488684579873195

One more Claude Code feature dropping today: customizable status lines for your terminal. Type /statusline <status description> to get started. https://x.com/_catwu/status/1953927012592366062

RT @AnthropicAI: Today we’re releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. htt…”” / X https://x.com/TheZachMueller/status/1952771262415004146

So today i gave Claude Code a try for the first time (previously used Codex and Cursor). It implemented my two functions correctly, but… OMG what an awfully verbose code, i absolutely hated it and rewrote it all in no time and 3x less code😬 But I’ll give it a few more tries”” / X https://x.com/giffmana/status/1952434564472644016

Today we’re releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. https://x.com/AnthropicAI/status/1952768432027431127

Use Claude Opus 4.1 in @Windsurf”” / X https://x.com/cognition/status/1952778217728962943

Ethan Mollick on X: “Going almost two years with no substantive improvements to GPTs is surprising. I know the GPT store & consumer use was quietly abandoned by OpenAI, but when I talk to organizations they often view GPTs as an important tool for non-technical people to create & distribute AI uses.” / X
https://x.com/emollick/status/1953221298743578646

We just raised a $21M Series A to build the cloud runtime for AI agents. More than 88% of the Fortune 100 is already using E2B, together with companies like Hugging Face, Perplexity, Groq, LMArena or Manus. E2B is delivering the Cloud for AI Agents. We are hiring the A-team in”” / X https://x.com/e2b/status/1949819133240934452

anthropics/claude-code-security-review: An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities. https://github.com/anthropics/claude-code-security-review

Automate security reviews with Claude Code \ Anthropic https://www.anthropic.com/news/automate-security-reviews-with-claude-code

Claude Code can now automatically review your code for security vulnerabilities.”” / X https://x.com/AnthropicAI/status/1953135070174134559

Workshop: Build Document-based AI Agents in Finance 📚🏦 There’s a lot of great finance agent tools out there (rogo/hebbia/perplexity); this workshop will help provide a code-based foundation so you can build your own custom finance automation workflows – whether it’s deep https://x.com/jerryjliu0/status/1953108641558540720

Researchers at Google built AlphaEvolve, an agent that lets Gemini 2.0 Flash and Pro repeatedly run, assess, and edit code until unit tests improve. Starting from placeholder functions, the loop produced a new routine for complex 4×4 matrix multiplication, matched or topped https://x.com/DeepLearningAI/status/1952112235196678274

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, https://x.com/guohao_li/status/1950224830512390590

Developers, Reinvented https://ashtom.github.io/developers-reinvented

CEO Tim Cook says Apple ready to open its wallet to catch up in AI | Reuters https://www.reuters.com/business/ceo-tim-cook-says-apple-ready-open-its-wallet-catch-up-ai-2025-08-01/

📝 OpenAI custom tools OpenAI’s new custom tools feature lets you constrain tool arguments to follow regex patterns, or grammars you define with Lark. You can now incorporate these tools into LangGraph agents. Try it out: https://x.com/chester_curme/status/1953839543074889993

gpt-5 is now the default in cursor (replacing claude) “”it’s the smartest coding model we’ve tried”” — @mntruell, cursor ceo”” / X https://x.com/aidan_mclau/status/1953520087027380243

more fun than ever to be a software engineer”” / X https://x.com/gdb/status/1951882297172779336

our devin and windsurf agentic eval results for GPT-5 GPT-5 is now free in @windsurf for a limited time, lmk how it matches up with our evals https://x.com/itsandrewgao/status/1953525984231067949

RT @LangChainAI: 💻 Introducing Open SWE: An Open-Source Asynchronous Coding Agent Open SWE is a fully autonomous, cloud-based coding agent…”” / X https://x.com/Hacubu/status/1953168346356314376

According to Lovable CEO @antonosika, 10% of new websites created last month were built with Lovable. One in every ten new sites on the internet. Yesterday, I built and deployed a form generator SaaS in just 22 minutes using Lovable. Here’s exactly how I did it: https://x.com/aliscodes/status/1948334684082962876

Jules now has web search built in. It will proactively search the web to find the latest documentation, related implementations, and up-to-date information for your task. This means you get more accurate code from the jump, so you can spend less time searching and more time https://x.com/julesagent/status/1953852699944136847

RT @ori_press: We just benchmarked Qwen 3 Coder and GLM 4.5 on AlgoTune, and they manage to beat Claude Opus 4! We’re excited to see if the…”” / X https://x.com/OfirPress/status/1952470237947085146

To unlock true personalization, continuity, and complex reasoning, we must engineer memory into our Agents. It’s one of the most critical and under-explored frontiers in AI at the moment. I wrote a deep dive for everything you need to know about memory in AI agents from: 💡It’s https://x.com/_philschmid/status/1952370348600533000

I wonder if the fact that there’s no frontier leap in Agentic evals like SWE Bench, OpenAI internal PRs and implementing ML papers implies that models are saturating and that agent scaffolds actually matter more than ever. There’s never been a better time to be an agent wrapper?”” / X https://x.com/nrehiew_/status/1953531014825095492

We’ve acquired @Invisible_HQ, a stellar team with deep expertise in scalable infrastructure for agents with past experience at Uber and Cloud Kitchens. The team will help us in scaling Comet securely and reliably among both consumers and enterprises. More soon! https://x.com/AravSrinivas/status/1952803397410930807

Amazon CEO wants to put ads in your Alexa+ conversations | TechCrunch https://techcrunch.com/2025/07/31/amazon-ceo-wants-to-put-ads-in-your-alexa-conversations/

Using @claudeai you now have search results as content blocks, bringing citations to agent applications – no more document workarounds needed! 𝙎𝙚𝙖𝙧𝙘𝙝 𝙧𝙚𝙨𝙪𝙡𝙩𝙨 𝙖𝙨 𝙘𝙤𝙣𝙩𝙚𝙣𝙩 𝙗𝙡𝙤𝙘𝙠𝙨 by @AnthropicAI enables proper source attribution for results from tool https://x.com/llama_index/status/1953859971072114766

When you’re helping someone build an LLM judge, the goal is to maximize the bits of information you get about a task per unit of human effort/frustration. Our work on ALHF shows that natural language feedback is both information-dense and ergonomic. Check it out in Agent Bricks!”” / X https://x.com/jefrankle/status/1953297527089897944

Now @lovable_dev has 200M, AI apps are going crazy We’ve built a 3-prompt Stripe integration that works reliably for vibe coders https://x.com/johnyeo_/status/1946263335663448253

but how will AGIs get access to the Internet””, they used to ask me I guess at this point this isn’t actually much of a relevant update though”” / X https://x.com/ESYudkowsky/status/1951011949074063794

Want to see your front-end changes live? Jules can now run and render your web application, then send you screenshots to verify the changes. Plus, you can add links to a public image for visual context in any task. More at https://x.com/julesagent/status/1953487085093917025

What a day. 100% of @Copilot users have GPT-5.”” / X https://x.com/mustafasuleyman/status/1953608045533204690

GPT-5 is now the default model in Factory. Over the past few weeks, we have worked in close collaboration with the @OpenAI team. We have found GPT-5 to be highly agentic, detail-oriented, and comprehensive — particularly when searching and planning. Within Factory, GPT-5 https://x.com/FactoryAI/status/1953516542924353759

Use @OpenAI’s new custom tools feature with @LangChain agents!”” / X https://x.com/sydneyrunkle/status/1953881101602038035

💡GPT-OSS 20B 2/4 bits GGUFs are available. Enjoy! https://x.com/HaihaoShen/status/1953729639081554002

interesting swiglu variant from the gpt-oss model: clamps inputs and adds a skip connection https://x.com/vikhyatk/status/1952808827281391701

New projects already being built on GPT OSS! Build your own with our Model APIs here -> https://x.com/basetenco/status/1952882156059148737

Next to Qwen3 of comparable size: Looks like gpt-oss is a wide (vs deep) model https://x.com/rasbt/status/1952842273848279364

One line of code is all it takes to fine-tune the gpt-oss models from @OpenAI 🔥 > Support to target the MoE expert layers with PEFT > Kernels for FlashAttention3 & MegaBlocks > Fast inference with MXFP4 quantization format In our testing, these models are extremely efficient https://x.com/_lewtun/status/1952788132908404941

openai/harmony: Renderer for the harmony response format to be used with gpt-oss https://github.com/openai/harmony

qianwen-res.oss-cn-beijing.aliyuncs.com https://qianwen-res.oss-cn-beijing.aliyuncs.com/

RT @CerebrasSystems: OpenAI GPT-OSS-120B is live on Cerebras 3,000 tokens/s – fastest OpenAI model on record 1 second reasoning time 131K c…”” / X https://x.com/cline/status/1952960760759632025

RT @mattshumer_: It’s over. OpenAI just crushed it. We have their o3-level open-source model running on @GroqInc at 500 tokens per second.…”” / X https://x.com/JonathanRoss321/status/1953119620103381440

RT @reach_vb: BOOOOM! You can now run @OpenAI gpt-oss 20B natively in @GoogleColab T4 for FREE! 🔥 Powered by Transformers ⚡ The setup tak…”” / X https://x.com/_lewtun/status/1953441199253069936

RT @thanosthinking: running gpt-oss:20b on @ollama with Turbo and web search 🏎️ 💨 very happy with how the web search turned out 🙂 and o…”” / X https://x.com/ollama/status/1952882173255856223

We’re thrilled to announce Axolotl v0.12.0. We’re ramping up our distributed training featureset with ND parallel multi-node training, and FP8 support. We’ve also added fine-tuning for gpt-oss, FSDP support for TiledMLP, and many more exciting features. 1/5 https://x.com/axolotl_ai/status/1953845149391630472

The Harmony format from gpt-oss is now supported for datasets on the @huggingface Hub 🧘 Nifty feature by @calebfahlgren! https://x.com/_lewtun/status/1953870411050959110

DeepSeek-R1: 2.66 million H800 hours GPT-OSS-120B: 2.1 million H100 hours https://x.com/scaling01/status/1952784655838564376

Pro | Dia Browser https://www.diabrowser.com/pro

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens! 🔧 Powered by: • Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence. • https://x.com/Alibaba_Qwen/status/1953760230141309354

Qwen3-Coder is now available on Cerebras, 17x faster than on GPU providers. And it’s completely free. Try it out directly in your developer flow, or signup for our virtual hackathon tomorrow. It’s a $5,000 prize 🙂 @CerebrasSystems @cline https://x.com/SarahChieng/status/1951453803905163693

Small but mighty! Qwen3-Coder-Flash and GLM-4.5-Air are now on @FireworksAI_HQ Despite being smaller and faster, Qwen3 Coder Flash 30B and GLM 4.5-Air achieve almost the same quality as their larger counterparts on tool use benchmarks. The secret of good model behavior is in https://x.com/dzhulgakov/status/1952049826067050735

🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source. 🔍 Key Highlights: 🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese 🔹 In-pixel https://x.com/Alibaba_Qwen/status/1952398250121756992

💡 You get 2,000 free Qwen Code runs every day! Run this one simple command: npx @qwen-code/qwen-code@latest Hit Enter, and that’s it! 🚀 Now with Qwen OAuth support — super easy to use. Try it now and supercharge your vibe code! 💻⚡ Github： https://x.com/Alibaba_Qwen/status/1953835877555151134

Just included example scripts for aligning models using GSPO (including VLM example) 🙆‍♂️🙆‍♂️ GSPO is the latest RL alignment algo by @Alibaba_Qwen and it’s already supported in the latest TRL v0.20 release. Super-easy-to-get-started example scripts below, GO run them! 👩‍💻👩‍💻 https://x.com/SergioPaniego/status/1952305247411691871

Qwen-Image demo on Hugging Face getting absolutely hammered right now 😀 https://x.com/victormustar/status/1952416615351366033

Qwen-Image: Crafting with Native Text Rendering | Qwen https://qwenlm.github.io/blog/qwen-image/

RT @Alibaba_Qwen: 🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted ge…”” / X https://x.com/NandoDF/status/1953223478087143640

RT @Alibaba_Qwen: 🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graph…”” / X https://x.com/mervenoyann/status/1952455331205841261

So, I did some coding this week… – Qwen3 Coder Flash (30B-A3B) – Mixture-of-Experts setup with 128 experts, 8 active per token – In pure PyTorch (optimized for human readability) – in a standalone Jupyter notebook – Runs on a single A100 https://x.com/rasbt/status/1951635208375034191

Today we release the APIs of our Flash series, which support Qwen3-Coder and Qwen3-2507 now. Both APIs support the context length of 1M tokens. They are fast and accurate, and they are cost-effectve as well. Feel free to take a try! Qwen3-Coder-Flash Model Card:”” / X https://x.com/Alibaba_Qwen/status/1952767585596145773

This was a really fun collab during my time at @databricks !! It’s basically a product answer to the fact that: (1) People want to optimize their agents and to specialize them for downstream preferences (no free lunch!) (2) People don’t have upfront training sets—or even”” / X https://x.com/lateinteraction/status/1953227168336458059