Agents and Copilots: AI News Week Ending 07/25/2025

Agents and Copilots: AI News Week Ending 07/25/2025

July 25, 2025

Image created with OpenAI GPT-Image-1. Image prompt: over-the-top 1990s pro-wrestling promo poster, pyro-lit entrance ramp featuring “Agent Alpha”⁠—a covert-ops grappler in a black trench coat brandishing a glowing comm-earpiece; high-energy spotlights, grainy print texture, vivid neon titles

Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores – Frontier AI: 0%, Humans: 100% https://x.com/arcprize/status/1946260363256996244

AI as the greatest source of empowerment for all | OpenAI
https://openai.com/index/ai-as-the-greatest-source-of-empowerment-for-all/

I will officially start at OpenAI as CEO of Applications on August 18. I am sharing this essay on why I believe AI can be the greatest source of empowerment for all.
https://x.com/fidjissimo/status/1947341053209501716

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much”” / X https://x.com/random_walker/status/1946180439045018046

Imagine if every pattern shaped by nature – like a protein’s fold or cosmic phenomena – is inherently learnable by AI. @DemisHassabis shares with @lexfridman that if AI can learn these natural patterns, we could open doors to new eras of scientific discovery. Listen now. ↓ https://x.com/GoogleDeepMind/status/1948098855053979930

Thanks @lexfridman for another super fun & wide-ranging conversation. We talked about the future of video games, the nature of reality, advancing science with AI, the path to AGI… and quite a bit more as usual! Always a blast, already looking forward to next time! 😀”” / X https://x.com/demishassabis/status/1948234351205855458

ARC-AGI-3 scores 0% for AI, 100% for humans now live with API where you can test your agent: https://x.com/scaling01/status/1946261191782797717

Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties (Thread with many posts):
https://x.com/michaelrbock/status/1948039876043313509

Now that this exists AI will be able to do your taxes very well, very soon”” / X https://x.com/Teknium1/status/1948668301829439846

Today, we’re releasing TaxCalcBench: a first-ever benchmark dataset & eval framework for testing AI’s ability to calculate US personal income tax returns.
Tax is a secretive industry, so we’re proud to release a research paper sharing our findings:
https://arxiv.org/abs/2507.16126

BREAKING: OpenAI just launched ChatGPT Agent It allows ChatGPT to think, plan, and execute complex tasks on its own virtual computer while you do other things I had early access, and ChatGPT Agent built me a complete early retirement plan in 20 minutes: > Found local tax laws https://x.com/rowancheung/status/1945896543263080736

ChatGPT agent did real, revenue-generating work that used to take @mhp_guy an entire day. We’re gradually entering the age of the agentic economy — and it’s going to reshape capitalism as we know it. Traditionally, capitalism relied on two inputs: labor and capital. In the”” / X https://x.com/xikun_zhang_/status/1948244478265016327

ChatGPT agent Does Research & Actions – YouTube https://www.youtube.com/watch?v=Ht2QW5PV-eY

ChatGPT agent for finding a great Airbnb:”” / X https://x.com/gdb/status/1946075573476069580

ChatGPT agent for working with Excel, Powerpoint, etc.:”” / X https://x.com/gdb/status/1946007318824673534

ChatGPT agent is now fully rolled out to all Plus, Pro, and Team users. Sorry about the delay! https://x.com/OpenAI/status/1948530029580939539

ChatGPT agent Makes Slideshows – YouTube https://www.youtube.com/watch?v=szJI9YJNEZk

ChatGPT agent Makes Spreadsheets – YouTube https://www.youtube.com/watch?v=JAQ4p662It8

ChatGPT agent: “”create a PDF of a novel D&D adventure, add illustrations, make it super interesting and deep, add tables, etc”” “”Fix the formatting, build it out more”” Got a 19 page PDF. Agent doesn’t do layouts well, but pulls off building a coherent adventure, hard for LLMs. https://x.com/emollick/status/1946047390118445354

ChatGPT Agent: our first AI with access to a text browser, a visual browser, and a terminal. Rolling out in ChatGPT Pro, Plus, and Team today. https://x.com/gdb/status/1945907023444660644

I am finding ChatGPT agents to be useful. They are a better fit with the “”intern”” analogy than any former AI – requiring oversight, still saving lots of time overall. For example, I update an AI cost/performance chart frequently. The agent did all the grunt work, with guidance. https://x.com/emollick/status/1947482417888932258

I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it does a good job autonomously doing research & assembling Excel files (with formulas!), PowerPoint, etc. It gives a sense of how agents are coming together https://x.com/emollick/status/1945892669575647431

In the same way ChatGPT was the first AI experience for 90% of society, ChatGPT Agents will be the first Agent experience for 90% of society. If you are reading this, you are still early”” / X https://x.com/AtomSilverman/status/1945895569437642782

Introduction to ChatGPT agent – YouTube https://www.youtube.com/watch?v=1jn_RpbPbEc

One implication from ChatGPT agent (not a creative name, but a descriptive one – a rare naming win!) is the labs are learning that many knowledge workers live in Excel & PowerPoint. Surprised that Microsoft did not do more to push past Copilots when they had this to themselves.”” / X https://x.com/emollick/status/1945926194043424954

OpenAI launches a general purpose agent in ChatGPT | TechCrunch https://techcrunch.com/2025/07/17/openai-launches-a-general-purpose-agent-in-chatgpt/

Recursion! I gave ChatGPT Agent access to my ChatGPT by logging in and then… https://x.com/emollick/status/1947829896845127983

RT @emollick: I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it do…”” / X https://x.com/nickaturley/status/1945975092342841487

RT @KerenGu: We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biolo…”” / X https://x.com/sama/status/1945995659682910540

tip for chatgpt agent slides: first ask it to do the research only, then ask it to make the slides!”” / X https://x.com/isafulf/status/1946231119751545014

Today we launched a new product called ChatGPT Agent. Agent represents a new level of capability for AI systems and can accomplish some remarkable, complex tasks for you using its own computer. It combines the spirit of Deep Research and Operator, but is more powerful than that”” / X https://x.com/sama/status/1945900345378697650

watching chatgpt agent use a computer to do complex tasks has been a real “”feel the agi”” moment for me; something about seeing the computer think, plan, and execute hits different.”” / X https://x.com/sama/status/1945901039104004467

When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels. ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.”” / X https://x.com/gdb/status/1945923067403984979

You can ask ChatGPT Agent to train an AI on datasets you are interested in, and do analyses for you. Building AI and doing data analysis will be automated end-to-end in the future. You are hearing it right. We are working hard to automating our own job :)”” / X https://x.com/xikun_zhang_/status/1946278266786189744

“Hey Comet, join my team meetings for me, turn off the camera and keep me muted, unmute and say “nothing from my end, thanks” when it’s my turn to speak, mute again, end meeting when it’s done”. How many want this ?”” / X https://x.com/AravSrinivas/status/1947501358007128149

Comet can make an entire Spotify playlist and start playing it for you! https://x.com/AravSrinivas/status/1948489790036365796

Comet can use LinkedIn for you and do all your work there https://x.com/AravSrinivas/status/1948835728798220539

Comet lets you search over everything like an agent would. Even stuff that’s not easy to index. https://x.com/AravSrinivas/status/1948056269958648309

How to watch YouTube on Comet https://x.com/AravSrinivas/status/1946240617031606672

Interesting Comet use case that a user pointed out just now to me: Use Comet to order food directly from the restaurant (eg: Chipotle) instead of an aggregator delivery app. Cheaper. Friction of having to deal with random websites gone. And you still get the same meal delivered.”” / X https://x.com/AravSrinivas/status/1948818172985196862

Just so that it’s clear to a bunch of confused folks. You lose nothing you already have in ad-blocking browsers, when you come to Comet. All ad-blockers work natively. No extensions needed. Even incognito. We have all the resources needed to keep working on this.”” / X https://x.com/AravSrinivas/status/1948102473597829200

perplexity comet browser ranks above the wikipedia page of comet on google serp, ~10 days since release https://x.com/AravSrinivas/status/1947173109083332988

Perplexity in talks with phone makers to pre-install Comet AI mobile browser on devices | Reuters https://www.reuters.com/business/perplexity-talks-with-phone-makers-pre-install-comet-ai-mobile-browser-devices-2025-07-18/

RT @JoannaStern: OK, Perplexity’s Assistant in the new Comet browser is good. Really good.”” / X https://x.com/AravSrinivas/status/1948215175976497394

the % of users who switch to comet as default browser has been steadily increasing since the launch day. and there’s still so much more to do to keep increasing this number. really promising future for comet.”” / X https://x.com/AravSrinivas/status/1948794199069110519

The TAM for Comet is bigger than Perplexity because it appeals to people who don’t even want AI. Just the best core browser in the market at the end of the day.”” / X https://x.com/AravSrinivas/status/1946035102150238475

The waitlist for Comet has doubled since launching. We will begin ramping up invites to waitlisted users starting today.”” / X https://x.com/AravSrinivas/status/1947407684996894969

This is an incredible end to end deep research workflow on Comet. Makes me realize how powerful and fast deep research can be with a hybrid client-sever compute architecture https://x.com/AravSrinivas/status/1946398572955766979

Underrated aspect of Comet: better memory management than Chrome”” / X https://x.com/AravSrinivas/status/1947817943934587362

we’re going to be shipping so many awesome new things on comet https://x.com/AravSrinivas/status/1948415154330415350

With the release of comet, perplexity has turned from a “ask anything” company to a “do anything” company”” / X https://x.com/AravSrinivas/status/1947175881203683577

🚀 Introducing Qwen3-MT – our most powerful translation model yet! Trained on trillions of multilingual tokens, it supports 92+ languages—covering 95%+ of the world’s population. 🌍✨ 🔑 Why Qwen3-MT? ✅ Top-tier translation quality ✅ Customizable: terminology control, domain https://x.com/Alibaba_Qwen/status/1948406830688018471

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding https://x.com/Alibaba_Qwen/status/1948688466386280706

Less than two weeks Kimi K2’s release, @Alibaba_Qwen’s new Qwen3-Coder surpasses it with half the size and double the context window. Despite a significant initial lead, open source models are catching up to closed source and seem to be reaching escape velocity. https://x.com/cline/status/1948072664075223319

Qwen COOKED – beats Kimi K2 and competitive to Claude Opus 4 at 25% total parameters 🤯 https://x.com/reach_vb/status/1947357343101960424

Qwen3-235B-A22B scored 41% on ARC-AGI-1 without thinking! That’s the same level as Gemini 2.5 Pro, Sonnet 4 or o3-low with thinking. But it might be trained on it, if not, then it’s insane”” / X https://x.com/scaling01/status/1947351789222711455

RT @itsPaulAi: Wait so Alibaba Qwen has just released ANOTHER model?? Qwen3-Coder is simply one of the best coding model we’ve ever seen.…”” / X https://x.com/ClementDelangue/status/1947775783067603188

RT @lmstudio: Qwen/Qwen3-Coder with tool calling is supported in LM Studio 0.3.20, out now. 480B parameters, 35B active. Requires about 25…”” / X https://x.com/huybery/status/1948327670493970534

The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: – 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion – 1.5x fewer active parameters (22B vs. 32B) – much fewer experts in https://x.com/rasbt/status/1947393814496190712

The updated Qwen3-235B-A22B is now the best non-reasoning models period. It beats Kimi-K2, Claude-4 Opus and DeepSeek V3 on multiple benchmarks like GPQA, AIME, ARC-AGI, LiveCodeBench or BFCLv3, just to name a few. https://x.com/scaling01/status/1947350866840748521

So to recap: – Yesterday, frontier closed model equivalent reasoning model from Qwen, – This morning, frontier closed model equivalent reasoning vision capabilities from stepfun – sometime today(?) a frontier video model from wan? All open source What is America doing?”” / X https://x.com/Teknium1/status/1948744914876920039

Wave 11 is here 🌊”” / X https://x.com/cognition_labs/status/1945919925165637847

Windsurf on X: “Wave 11 is live! Seven big upgrades to Windsurf 🧵 https://t.co/ncYQ9fPL5e” / X
https://x.com/windsurf/status/1945918283313725794

Replit CEO Apologizes After AI Coding Tool Wipes Company’s Database – Business Insider https://www.businessinsider.com/replit-ceo-apologizes-ai-coding-tool-delete-company-database-2025-7

Citi is now deploying Devin across their engineering teams. We’re proud to partner with one of the world’s leading financial institutions to accelerate software development. More details below in @Citi’s story in American Banker. https://x.com/cognition_labs/status/1945904648629707093

Gartner Predicts One in 20 Supply Chain Managers Will Manage Robots, Rather Than Humans, by 2030 https://www.gartner.com/en/newsroom/press-releases/2025-07-16-gartner-predicts-one-in-20-supply-chain-managers-will-manage-robots-rather-than-humans-by-2030

@OriolVinyalsML Impressive result, but let’s be clear, the Gemini model got heavy IMO-specific prep, curated solutions, hints, and strategy guides. That’s not general reasoning. OpenAI’s model hit IMO gold with zero task-specific tuning. One is coached, the other is capable. https://x.com/VraserX/status/1947368827253076001

@pli_cachete For OpenAI at least for this IMO competition: – No tool use, no calculators, internet, formal proof software, algebra packages – same time limits – the same input to the question as for students; no rewriting it to another more suitable format – only one submission”” / X https://x.com/BorisMPower/status/1946859525270859955

🤖 From this week’s issue: Gemini with Deep Think officially achieved gold-medal standard at the International Mathematical Olympiad (IMO) by solving five out of the six IMO problems. https://x.com/dl_weekly/status/1948105084480397503

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO). https://x.com/alexwei_/status/1946477742855532918

10. My career as a mathematician certainly isn’t threatened by AI; in fact, I hope to leverage AI to accelerate my work. However, I’m unsure whether “”mathematician”” will remain a career path for my son’s generation. (10/10)”” / X https://x.com/ErnestRyu/status/1946700798001574202

4. OpenAI surely knew GDM was working on the IMO, so they beat GDM to the punch with their Saturday morning announcement, generating hype. GDM’s slow-science scholarship cost them the PR battle. (4/10)”” / X https://x.com/ErnestRyu/status/1946699212307259659

5. In my experience using LLMs for math research, Gemini outperforms ChatGPT. We will see if the next-gen models (which seem to be what OpenAI and GDM are using for IMO) perform at research-level math. (5/10)”” / X https://x.com/ErnestRyu/status/1946699302308635130

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! https://x.com/koraykv/status/1947335096740049112

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad – Google DeepMind https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵 https://x.com/GoogleDeepMind/status/1947333836594946337

As confirmed by the new IMO rankings, Grok 4’s eye-popping benchmarks were driving by the following innovations: – train on test – train on test – train on test”” / X https://x.com/nsaphra/status/1946804513114882227

DeepMind has the best research on using AI to solve hard Math: AlphaEvolve AlphaProof AlphaGeometry FunSearch AlphaDev AlphaTensor AlphaCode Despite making IMO Silver 28/42 in ’24, OpenAI announced Gold in ’25 35/42 before them Here’s DeepMind’s 10 best research papers on https://x.com/deedydas/status/1946987560875766212

Drastic progress on maths with Gemini 2.5! As a math undergrad, I am impressed 🤯 🥈 -> 🥇 ✅ Formal -> Informal ✅ Specialized model -> General model ✅ Available soon ✅ Huge thanks to IMO and congrats to all participants! Blog: https://x.com/OriolVinyalsML/status/1947341047547199802

Gary Marcus strikes again: “”No pure LLM is anywhere near getting a silver medal in a math olympiad”” “”Pure deep learning had a good run, but it’s time to move on”” 😂😂😂 https://x.com/scaling01/status/1946530148813025544

Gemini solved the math problems end-to-end in natural language (English).”””” / X https://x.com/denny_zhou/status/1947360696590839976

Gold medal-level performance on the 2025 International Math Olympiad from our latest experimental reasoning LLM. Model operated in natural language (i.e. outputs natural language proofs) under the same rules as humans (e.g. 4.5 hours per session, no tools). Amazing milestone!”” / X https://x.com/gdb/status/1946479692485431465

Had a super fun time training this model. A big yolo run that resulted in a super strong model. Most important thing is to trust your model and give it morale support. 🦾 Was also a big eye opener to see how prep for IMO is done. Before this I knew absolutely zero about this”” / X https://x.com/YiTayML/status/1948464752545726886

hippo at IMO: 0/42 model trained by hippo: 35/42 🥇 😂😂😂”” / X https://x.com/agihippo/status/1947348097144611123

IMO 2025 Solutions https://storage.googleapis.com/deepmind-media/gemini/IMO_2025.pdf

It wasn’t just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving”” / X https://x.com/emollick/status/1947356382581137867

It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI.
Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies. https://x.com/SebastienBubeck/status/1946577650405056722

MathArena – IMO Blogpost https://matharena.ai/imo/

maybe a better headline would be that oai and gdm ranked 27 at the IMO. some talented kids here! https://x.com/damekdavis/status/1947357679040569520

Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 https://x.com/hardmaru/status/1946942279807308210

Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well https://x.com/fchollet/status/1947337944215523567

On IMO P6 (without going into too much detail about our setup), the model “”knew”” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!”” / X https://x.com/alexwei_/status/1947461238512095718

One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? https://x.com/littmath/status/1947398065209462981

Other AI models seem to have made big leaps in the International Math Olympiad, not just OpenAI. Not all announcements seem to be out yet.”” / X https://x.com/emollick/status/1947053944192082170

Our IMO gold model is not just an “”experimental reasoning”” model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥”” / X https://x.com/YiTayML/status/1947350087941951596

P6 was definitely the hardest and most interesting problem. Most people can understand it, but very few can solve it. All models scored 0/7. https://x.com/deedydas/status/1946250774960537927

Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened! We put all individual recipes (that we figured out https://x.com/lmthang/status/1948458590492393834

RT @demishassabis: Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs sha…”” / X https://x.com/TheZachMueller/status/1947419062423982583

RT @demishassabis: Official results are in – Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced ver…”” / X https://x.com/AndrewLampinen/status/1947370582393425931

RT @Mihonarium: 🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closi…”” / X https://x.com/AndrewLampinen/status/1947072974621982839

RT @ns123abc: Bruh… people already reproduced Google’s IMO results without RL with just prompting openai researchoors think they have the…”” / X https://x.com/_philschmid/status/1948304855837085717

RT @polynoamial: Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO wi…”” / X https://x.com/kchonyc/status/1946526143433015349

The hardest high school math exam in the world, the 6 problem 9 hour IMO 2025, was this week. AI models performed poorly. Gemini 2.5 Pro scored the highest, just 13/42, costing $431.97, in a best of 32 eval. Bronze cutoff was 19. Long way to go for AI to solve hard Math. https://x.com/deedydas/status/1946244012278722616

The two cents: 1. The OpenAI IMO solutions to P1-P5 seem to be correct. 2. P6 is a significantly novel and more difficult problem. P1-P5 are arguably within reach of “standard” IMO problem-solving techniques, but P6 requires creativity. (2/10)”” / X https://x.com/ErnestRyu/status/1946698896375492746

There are always a flood of posts about what AI can or cannot do, so it is worth pausing and paying attention to this one. It is a very hard test, done without tools. It was also viewed as an unlikely goal. Prediction markets had the chance of this happening this year as 20%”” / X https://x.com/emollick/status/1946563737604743386

This wins my respect. https://x.com/Yuchenj_UW/status/1947339774257402217

Tough look for OpenAI They’ve pissed off the international math community by jumping the gun, meanwhile @GoogleDeepMind has an officially-confirmed result that will be available commercially months earlier”” / X https://x.com/mathemagic1an/status/1947352370037305643

Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)”” / X https://x.com/ErnestRyu/status/1946698766305968446

we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence. when we first started openai,”” / X https://x.com/sama/status/1946569252296929727

We might be heading into a plot twist in the OpenAI vs. DeepMind IMO saga. Just saw a post from Joseph Myers (involved in the Math Olympiad since 1992): the IMO committee reportedly asked AI labs not to publish results until 7 days after the closing ceremony — out of respect for https://x.com/zjasper666/status/1947013036382068971

Why am I excited about IMO results we just published: – we did very little IMO-specific work, we just keep training general models – all natural language proofs – no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered”” / X https://x.com/millionint/status/1946551400365994077

Today, we’re announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI
We’re releasing:
* 3 games (environments)
* $10K agent contest
* AI agents API
Starting scores – Frontier AI: 0%, Humans: 100%
https://docs.arcprize.org/

Even if GPT-5 did nothing besides switching people between o3 and 4o automatically, it would really transform most people’s view of AI. Very few people, even paying users, know that they should often switch to a more capable model, and when you show them o3, they are impressed.”” / X https://x.com/emollick/status/1946958840697696581

OpenAI Preps ChatGPT Agents in Challenge to Microsoft Excel and PowerPoint — The Information https://www.theinformation.com/articles/openai-preps-chatgpt-agents-challenge-microsoft-excel-powerpoint

The biggest question people always ask me is what model to use based on the application. So I made this website to give a visual representation of LLM capabilities based on my experience and benchmarks. 👇 https://x.com/skirano/status/1946353375429197843

timescope: testing if large models understand long videos or they just claim to do so 🤠 they randomly insert needles (short videos/static images) in long videos and ask questions about the needle itself 🤯 Gemini seems to be the best! very cool work by @orr_zohar et al 👏 https://x.com/mervenoyann/status/1948049876228452788

Perplexity Comet vs ChatGPT Agent”” / X https://x.com/AravSrinivas/status/1946076236683624616

Agentar‑Fin‑R1 shows that a 32B‑parameter finance‑tuned model can outscore much bigger general systems on Fineva, FinEval, FinanceIQ, and Finova. Today’s finance-AI still miss strong reasoning and safety checks, so this paper builds a fresh pipeline to fix both. It starts by https://x.com/rohanpaul_ai/status/1948382668372193631

An example of the power & limitations of ChatGPT agent I asked it to analyze a dataset from Kaggle, and turn it into a PPT and Excel. It made no errors, but I thought some of the data was odd. I gave that feedback & the AI figured out the data was bad and why. Human + AI needed https://x.com/emollick/status/1945944153554104379

ChatGPT agent for investment banking:”” / X https://x.com/gdb/status/1946074958238765503

Tejal Patwardhan on X: “these results were eye-opening for me… chatgpt agent performed better than i expected on some pretty realistic investment banking tasks https://t.co/nkpW0pr5jN” / X
https://x.com/tejalpatwardhan/status/1945894313977860203

Natural language powered Stock Screener on Perplexity Finance.”” / X https://x.com/AravSrinivas/status/1948812710952796576

Intelligence isn’t a collection of skills. It’s the efficiency with which you acquire and deploy new skills. It’s an efficiency ratio. And that’s why benchmark scores can be very misleading about the actual intelligence of AI systems.”” / X https://x.com/fchollet/status/1946668452045029861

✅ Try out @Alibaba_Qwen 3 Coder on vLLM nightly with “”qwen3_coder”” tool call parser! Additionally, vLLM offers expert parallelism so you can run this model in flexible configurations where it fits. https://x.com/vllm_project/status/1947780382847603053

🚨 Model Update: Qwen3-coder is in the WebDev Arena! @Alibaba_Qwen have released their best coding model to date and it’s now live in WebDev Arena awaiting your hardest prompts for real world testing. Prompt: “”style a basic login form using Tailwind CSS with dark mode https://x.com/lmarena_ai/status/1948399802947084347

Another incredible OSS model release this summer: the new Qwen 3 update is now live on @togethercompute APi.”” / X https://x.com/vipulved/status/1947871449282216055

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing https://x.com/Alibaba_Qwen/status/1947344511988076547

Cerebras https://www.cerebras.ai/press-release/cerebras-launches-qwen3-235b-world-s-fastest-frontier-ai-model-with-full-131k-context-support

Did a benchmark with the new Qwen3 Reasoner 220B on Arena-hard v1 It scores an 89% winrate over gpt4-0314, 4o scores an 81% dont have numbers for o3/4o-mini etc but its basically saturated a near perfect win rate. nicee”” / X https://x.com/Teknium1/status/1948836009183224132

Open source models 📈 qwen3-coder is available in Cline”” / X https://x.com/cline/status/1948452627278430376

Please note, we’re not able to reproduce the 41.8% ARC-AGI-1 score claimed by the latest Qwen 3 release — neither on the public eval set nor on the semi-private set. The numbers we’re seeing are in line with other recent base models. In general, only rely on scores verified by”” / X https://x.com/fchollet/status/1947821353358483547

Qwen just released a 480B coding model & a space to try it out for web dev. Fun! Model: https://x.com/ClementDelangue/status/1947780025886855171

Qwen-MT: Where Speed Meets Smart Translation | Qwen https://qwenlm.github.io/blog/qwen-mt/

RT @Alibaba_Qwen: Performance of Qwen3-Coder-480B-A35B-Instruct on SWE-bench Verified! https://x.com/QuixiAI/status/1947773200953217326

RT @cline: Qwen3-Coder is now available in Cline 🧵 New 480B parameter model with 35B active parameters. > 256K context window > comparabl…”” / X https://x.com/Alibaba_Qwen/status/1947954292738105359

RT @GregKamradt: Anyone have a connection at @Alibaba_Qwen? Trying to reproduce the results on @arcprize and getting different metrics Wa…”” / X https://x.com/clefourrier/status/1947994251410682198

RT @OpenRouterAI: 🟣New: Qwen3-Coder by @Alibaba_Qwen – 480B params (35B active) – Native 256K context length, extrapolates to 1M – Outperf…”” / X https://x.com/huybery/status/1947808085504102487

RT @UnslothAI: @Alibaba_Qwen Congrats guys on another epic release! We’re uploading Dynamic GGUFs, and one with 1M context length so you gu…”” / X https://x.com/QuixiAI/status/1947773516368994320

RT @WolframRvnwlf: I’m now using Qwen3-Coder in Claude Code. Works with any model actually, but this is surely the best one currently. The…”” / X https://x.com/huybery/status/1948184493631959536

We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!”” / X https://x.com/huybery/status/1947345040470380614

Wow the new qwen reasoner at only 232B params is as good as the top closed frontier lab models Big day for OS”” / X https://x.com/Teknium1/status/1948711699013665275

NVIDIA’s Canary-Qwen-2.5B 1st place on the @HuggingFace leaderboard for automatic speech recognition – lowest word error rate (WER) ever recorded on the Hugging Face OpenASR leaderboard: 5.63%. – its the first speech model built on top of an existing LLM. – At its core, it https://x.com/rohanpaul_ai/status/1946823138932863210

AudioRAG is becoming real! Just built a demo with ColQwen-Omni that does semantic search on raw audio, no transcription needed. Drop in a podcast, ask your question, and it finds the exact chunks where it happens. You can also get a written answer. What’s exciting: it skips https://x.com/fdaudens/status/1946226098905169967

so many open LLMs and image LoRAs dropped past week, here’s some picks for you 🫡 LLMs > ByteDance released a bunch of translation models called Seed-X-RM (7B) > NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license 👏 > LG released https://x.com/mervenoyann/status/1948018642462933149

Looking at the HuggingFace configs, this is a wider/shallower model compared to Qwen3. – 62 layers vs 94 – dim 6144 vs 4096 – 160 experts vs 128 – 96 attn heads vs 64 Curious why the architectural change? Qwen3.5?”” / X https://x.com/nrehiew_/status/1947770826943549732

RT @SIGKITTEN: qwen3-coder, running locally I had it set up testing infra using minunit and gcov and write some tests on a small ~5000 lo…”” / X https://x.com/huybery/status/1948184517673644466

missed this, @NVIDIAAIDev silently dropped Open Reasoning Nemotron models (1.5-32B), SoTA on LiveCodeBench, CC-BY 4.0 licensed 🔥 > 32B competing with Qwen3 235B and DeepSeek R1 > Available across 1.5B, 7B, 14B and 32B size > Supports upto 64K output tokens > Utilises GenSelect https://x.com/reach_vb/status/1947331118983696907

RT @reach_vb: Lets GOOO! @NVIDIAAIDev just dropped Canary Qwen 2.5 – SoTA on Open ASR Leaderboard, CC-BY licensed 🔥 > Works in both ASR an…”” / X https://x.com/reach_vb/status/1946087224346313175

Now it’s possible to do RAG with any-to-any models 🔥 Learn how to search in a video dataset and generate using OmniEmbed, an all modality retriever, and Qwen2.5-Omni, any-to-any model in this notebook 🤝 https://x.com/mervenoyann/status/1947285360926494911

This is very true. Economically valuable agents for enterprises are already here, but you can’t buy them off the shelf & they require actual cross-functional R&D.”” / X https://x.com/emollick/status/1947014713637839171

A conversation with @rmstein on Google Search becoming a frontier AI product, the path to deploy Gemini to 1.5 billion people, and what comes next with an AI first search experience. I have never been more bullish on Google! https://x.com/OfficialLoganK/status/1948126774627627132

now AI can write novel proofs at the level of a world-class competitive mathematician but it still can’t reliably book me a weekend trip to boston so strange”” / X https://x.com/jxmnop/status/1946675650686746879

This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad – the most prestigious mathematics competition in the world. To uphold the sanctity of the student competition, the IMO Board https://x.com/HarmonicMath/status/1947023450578763991

Yes, there is an official marking guideline from the IMO organizers which is not available externally. Without the evaluation based on that guideline, no medal claim can be made. With one point deducted, it is a Silver, not Gold.”” / X https://x.com/lmthang/status/1946960256439058844

Introducing Opal: describe, create, and share your AI mini-apps – Google Developers Blog https://developers.googleblog.com/en/introducing-opal/

RT @jeffwsurf: To put it mildly, the past week at Windsurf has been crazy. There have been a lot of different rumors and reports, so I want…”” / X https://x.com/russelljkaplan/status/1946382813546045505

The Intriguing Reason Why Windsurf’s Remains Were Snapped up so Fast – Business Insider https://www.businessinsider.com/windsurf-google-cognition-acquisition-ai-coding-developer-data-ide-2025-7

Today, 108 hours and 10 minutes after Scott first cold texted Windsurf leadership, our acquisition of Windsurf has officially closed. Windsurf’s unique IP, strong book of business, and talented team are now part of Cognition.”” / X https://x.com/cognition_labs/status/1945679510533537944

Coding with LLMs in the summer of 2025 (an update) – https://antirez.com/news/154

Gemini 2.5 Flash-Lite, our fastest and most cost effective model, is now stable and ready for scaled production use!! It comes with native reasoning capabilities, a 1 million token context window, and is priced at ($0.10 in / 1M) and ($0.40 out / 1M). https://x.com/OfficialLoganK/status/1947689475351417141

RT @liliang_ren: We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning th…”” / X https://x.com/algo_diver/status/1946397862767767921

RT @liliang_ren: We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning th…”” / X https://x.com/ClementDelangue/status/1946246738823545317

Headers/footers are annoying for LLMs to interpret, using off-the-shelf parsing solutions 📑✍️ Without appropriate tagging, the LLM might get confused and interpret numbers as part of the main content, which can lead to hallucinations in your downstream use case (e.g. a research https://x.com/jerryjliu0/status/1947819412146291161

Length-Adaptive Policy Optimization (LAPO), cuts token use by up to 40.9% and lifts accuracy by 2.3% on math reasoning tasks. Regular models ramble with long chains even on easy problems, driving costs up for no extra benefit. LAPO first watches many trial answers, rewards ones https://x.com/rohanpaul_ai/status/1947556216001204387

Sapient Intelligence Open-Sources Hierarchical Reasoning Model, a Brain-Inspired Architecture That Solves Complex Reasoning Tasks With 27 Million Parameters
https://www.sapient.inc/blog/5

I gave Claude the Mistral report on its AI’s environmental impact and the prompt: “”visualize this in two different ways, one that makes the numbers appear positive, one that makes them seem negative, using vivid comparisons”” (I then had it do some error checking & corrections) https://x.com/emollick/status/1948090558309613587

2024: Voice Cloning 2025: What about personality cloning? Hume’s voice AI can now not only mimic your voice but also speaking style and language. It’s now available via our TTS and new speech-to-speech model, EVI 3, which is also launching today. https://x.com/hume_ai/status/1945900611334979712

NEW: Higgs Audio V2 from @boson_ai open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody https://x.com/reach_vb/status/1947997596456272203

A $50 million fund to build with communities | OpenAI https://openai.com/index/50-million-fund-to-build-with-communities/

The paper shows Portfolios chosen purely on news-topic earn more, and so even a lightweight generative model can surface a trade signal. Almost all S&P 500 gains since 1996 happen overnight, and topic level news explains most of that edge. The study runs a classic topic-model https://x.com/rohanpaul_ai/status/1947183258925732038

GPT-5 casually building cookie clicker with all features in 2 minutes https://x.com/scaling01/status/1948809543435395470

GPT‑4 Turbo grades code summaries almost like humans yet flags only 50% of faulty functions. The study asks whether models can replace fragile test suites and BLEU scores for everyday evaluation. Researchers checked 374 Java and Python tasks where 8 LLMs wrote or reviewed code, https://x.com/rohanpaul_ai/status/1948679870328045968

“When you think about the future of software engineering, what does no one talk about that more people should be talking about?” – @HarryStebbings to @ScottWu46 Full interview on @twentyminutevc 👇 https://x.com/cognition_labs/status/1947366219885252987

@AgentOpsAI Now supports tracking Flask applications, simply import it and use `track_endpoint` decorator and that’s it. You can track request and response along with your LLM/Agentic interaction in that endpoint with just 2 lines of code 🙂 Releasing soon! https://x.com/dwijptl/status/1945222332508668382

🎉We’re working towards `langchain` 1.0! Langchain will be the easiest place to get started building LLM apps. This 1.0 release will include: – revamped docs – general agent architectures and use cases – built on langgraph – high quality integrations Feedback? 👇”” / X https://x.com/hwchase17/status/1947376920355917909

Building in human-in-the-loop in a full-stack app requires agent orchestration that can pause, wait for human input in one endpoint, and resume the workflow from another endpoint. This is a great tutorial by @rsrohan99 showing you how to build an e2e agent with human-in-the-loop”” / X https://x.com/jerryjliu0/status/1946003574904987743

Context Engineering for AI Agents: Lessons from Building Manus”” – Beautiful piece from Manus The post asks whether an agent should be trained end‑to‑end or steered with prompts inside bigger frontier LLMs, then shows that careful “context engineering” wins. A steady prefix https://x.com/rohanpaul_ai/status/1948294565435433190

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations. https://x.com/MishaLaskin/status/1945500873762750912

Github 👨‍🔧: Open-source AI Agent workflow builder. – Build, test, and optimize agentic workflows via a user-friendly platform interface. – Offers self-hosting capabilities using Docker, Dev Containers, or manual setup. – Supports running local models within self-hosted Docker https://x.com/rohanpaul_ai/status/1947999135417639088

GitHub Spark in public preview for Copilot Pro+ subscribers – GitHub Changelog https://github.blog/changelog/2025-07-23-github-spark-in-public-preview-for-copilot-pro-subscribers/

Humans tweak each attempt on the fly, but most AI agents still stumble through long stretches of random moves. This paper teaches a policy to notice what it already tried and instantly pick a different expert‑like move instead. 🏗️ Random wandering wastes steps in huge state https://x.com/rohanpaul_ai/status/1946502774923174166

Hyperspell × AgentOps @hyperspell is now the default context retrieval tool in AgentStack, @AgentOpsAI’s curated stack for building production AI agents alongside @composiohq, @firecrawl_dev, @LangChainAI, and @perplexity_ai, we’re helping define the agent infra layer 🚀 https://x.com/conor_ai/status/1945532701693678003

i am at the point where my ai agents have become so good and fast at various complex tasks that i’ve become the bottleneck hard to make use of and keep track of all the insane value my ai agents are creating ai agent managers are going to be in high demand soon”” / X https://x.com/omarsar0/status/1948490601164316891

I won $1000 for “”Best Use of Protocols”” at the WeaveHacks: Agent Protocols Hackathon. Here’s the multi-agentic MCP system I built👇🧵 https://x.com/n_sri_laasya/status/1945568038436442433

If you’re using AI agents for large-scale document extraction 📑✂️, you will need to craft a good structured output schema. Most LLMs support structured output these days, but here are tips and tricks from learned experience💡 1️⃣Try to limit schema nesting to 3-4 levels. 2️⃣ Make https://x.com/jerryjliu0/status/1946358807875244398

Introducing `gut` – an AI agent for git 🤖🧑‍💻 This is a really neat CLI-based tool from @itsclelia that lets you ask about and execute git commands through natural language! Describe what you want to do in English (e.g. a complex rebase), and it’ll let you review and auto-execute https://x.com/jerryjliu0/status/1947026118260949146

Introducing FlowMaker 🌊🤖 A fully open-source, low-code way of building custom agent workflows. Build agents via a drag and drop interface, run it directly in the app, and also directly export it into a deeply custom workflow backed by @llama_index.TS. It’s a fantastic visual https://x.com/jerryjliu0/status/1948797112789205111

Introducing Kiro, an all-new agentic IDE that has a chance to transform how developers build software. Let me highlight three key innovations that make Kiro special: 1 – Kiro introduces spec-driven development, helping developers express their intent clearly through natural https://x.com/ajassy/status/1944785963663966633

It’s becoming more and more clear that Claude Code is the everything agent”” / X https://x.com/alexalbert__/status/1948060675974283689

Join @Redisinc and @LangChainAI on July 23 for a webinar to learn how LangGraph + Redis make it easy to build AI agents with real memory. In the live webinar, you’ll get demos, performance tips, and hear from the engineers behind the integration. 👉 RSVP here: https://x.com/LangChainAI/status/1946317782741832020

just tried and the agent solved level 1 in its own browser lol. thanks for creating the benchmark! https://x.com/EdwardSun0909/status/1946304932333940899

Kimi K2 tech report just dropped! Quick hits: – MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale – 20K+ tools, real & simulated: unlocking scalable agentic data – Joint RL with verifiable + self-critique rubric rewards: alignment that adapts – https://x.com/Kimi_Moonshot/status/1947520758760313170

langgraph is more low level – an “”agent runtime”” langchain will be higher level abstractions for getting started easily”” / X https://x.com/hwchase17/status/1947459414279262513

MCP is here! You can now give Devin access to your favorite servers via the MCP Marketplace. Think Datadog, Linear, Sentry, Figma, and thousands more. Demos from our team + getting started 👇 https://x.com/cognition_labs/status/1948081054579114421

Next Saturday (7/26), @walden_yan is giving an in-person keynote at the Devin Meetup Tokyo. Plus, there’ll be several sessions on mastering Devin. Register below! 来週土曜日（7月26日）、@walden_yan が東京で開催されるDevin Meetup Tokyoにて対面基調講演を行います。 https://x.com/cognition_labs/status/1946025580706808248

Pioneering an AI clinical copilot with Penda Health | OpenAI https://openai.com/index/ai-clinical-copilot-penda-health/

please stop making flight booking agent demos with faint but dying hope, the undersigned https://x.com/swyx/status/1946369984009306126

Qwen about to release a 480B MoE for coding with 1 million context! “”Qwen3-Coder-480B-A35B-Instruct is a powerful coding-specialized language model excelling in code generation, tool use, and agentic tasks.”” https://x.com/scaling01/status/1947732150872084693

Qwen3-Coder: Agentic Coding in the World | Qwen https://qwenlm.github.io/blog/qwen3-coder/

RT @adrianicosma: 🚨 New paper! We present Dr.Copilot – a multi-agent LLM system deployed in the real world to improve doctor-patient commun…”” / X https://x.com/lateinteraction/status/1948487640551969048

RT @alexalbert__: It’s becoming more and more clear that Claude Code is the everything agent”” / X https://x.com/_arohan_/status/1948249539678294250

RT @Alibaba_Qwen: >>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to…”” / X https://x.com/bigeagle_xd/status/1947817705324621910

RT @AnthropicAI: New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously c…”” / X https://x.com/EthanJPerez/status/1948605334698033479

RT @bindureddy: ASTOUNDING – THE NEW QWEN CODER IS NUMBER ONE AT AGENTIC CODING The new Qwen coder tops the charts for agentic coding (no…”” / X https://x.com/huybery/status/1948304004880179208

RT @chester_curme: 🛠️ Bedrock AgentCore tools in LangChain It’s now easy to use Bedrock AgentCore built-in tools with LangGraph agents. T…”” / X https://x.com/hwchase17/status/1947786031778173022

RT @iScienceLuvr: Kimi K2 paper dropped! describes: – MuonClip optimizer – large-scale agentic data synthesis pipeline that systematically…”” / X https://x.com/jeremyphoward/status/1947414667221237904

RT @KLieret: Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for…”” / X https://x.com/OfirPress/status/1948375444559974901

RT @satyanadella: Today we’re releasing GitHub Spark — a new tool in Copilot that turns your ideas into full-stack apps, entirely in natura…”” / X https://x.com/algo_diver/status/1948594244039704892

Scaling Laws for Efficient MoEs https://x.com/scaling01/status/1948713380308496575

Super excited to share @youdotcom’s Web Search & News API is now available on @awsmarketplace. You can now power your AI agents and products with the most accurate real-time web and news data. 🏢 Serving about one billion queries per month to our customers and their LLMs 🔄 https://x.com/RichardSocher/status/1945571077469737314

Thank you to @Kimi_Moonshot for quickly addressing my queries on the correct system prompt for Kimi K2! We’ll be re-uploading all BF16 + dynamic @unslothai GGUFs with fixed tool calling & the new sys prompt! Sys prompt = “”You are Kimi, an AI assistant created by Moonshot AI.”””” / X https://x.com/danielhanchen/status/1946163064665260486

Thanks for the great write-up! 🙌 Prefix caching is critical for agentic workflows like @ManusAI_HQ , and vLLM makes it seamless. ✅ prefix caching is enabled by default with an efficient implementation ✅ Append-only context? Cache hit heaven Context engineering FTW 🚀 https://x.com/vllm_project/status/1946575947295322171

The invention of modern writing instruments like the typewriter made writing easier, but they also led to the rise of writer’s block, where deciding what to write became the bottleneck. Similarly, the invention of agentic coding assistants has led to a new builder’s block, where”” / X https://x.com/AndrewYNg/status/1947308544916889979

The Latest 20VC + SaaStr: Why Vibe Coding is On Fire — But Also Can’t Be Trusted, How YC and Multi-Stage Funds Have Won, and Figma’s Epic IPO | SaaStr https://www.saastr.com/the-latest-20vc-saastr-why-vibe-coding-is-on-fire-but-also-cant-be-trusted-how-yc-and-multi-stage-funds-have-won-and-figmas-epic-ipo/

Thoughts after coding relentlessly with tools like Replit Agent and Claude Code for the past few days: Current coding models are more impressive that we all think. Clever memory management, search, and context engineering is what’s falling short. Tool calling also needs more”” / X https://x.com/omarsar0/status/1947859083702239314

Very promising results from our collaboration with Kenya-based @PendaHealth, studying an OpenAI-powered clinical copilot across 40,000 patient visits:”” / X https://x.com/gdb/status/1947732134430687351

Very soon just a few images – real or synthetic – will give you beautiful 3d reconstructions. Question is – what are people going to do with it? People know what to do with images and video. But 3d continues to be a next level jump in complexity, perhaps until AI agents actually”” / X https://x.com/bilawalsidhu/status/1948133500781019201

vibe coding tools need to learn to use the debugger instead of making a bunch of test scripts and adding print statements to my code”” / X https://x.com/QuixiAI/status/1946894174734684652

We’re excited to finally release our next big product at @genus_ai – Sage. Sage is an AI Agent that’s designed to help e-commerce brands grow via digital channels. It is currently available for brands using Shopify and Meta platforms and is design to be a partner and assistant https://x.com/TadasJucikas/status/1943701836432576647

We’ve built a fully open-source RFP (Request for Proposal) Response Agent that you can both use out-of-the-box and also clone/modify for whatever use case you’re solving! 💫 Generating responses to RFPs is a time-consuming task that requires humans to both analyze piles of https://x.com/jerryjliu0/status/1947465066892431792

Whenever I looked into having a personal assistant, it struck me how few of our existing structures support intermediate permissions. Either a person acts fully on your behalf and can basically defraud you, or they can’t do anything useful. I wonder if AI agents will change that.”” / X https://x.com/AmandaAskell/status/1946253987923304699

Why not all vector databases are agent-ready? Vector database is the backbone of AI agent memory. So its choice is one of the most crucial infrastructure decision. Your demo might crush it, but can your infrastructure survive success? Most vector databases weren’t built for https://x.com/TheTuringPost/status/1946342951199871340

With over 11k stars on Github and 1.5k forks, developers are loving Google’s Agent Development Kit (ADK) Here’s a recap of the announcements, guides, and projects built with Google’s Agent Development Kit (ADK) 🧵 https://x.com/AtomSilverman/status/1945241088005992662

Alibaba launches open-source AI coding model, touted as its most advanced to date | Reuters https://www.reuters.com/world/china/alibaba-launches-open-source-ai-coding-model-touted-its-most-advanced-date-2025-07-23/

RT @mckaywrigley: Claude decided it was time to go to bed so it drew some goodnight ascii art and ran time.sleep(28800). I’m dying. https:…”” / X https://x.com/jayelmnop/status/1946432132424818943

RT @DSPyOSS: Awesome to see Amazon AWS build on DSPy to help their customers for prompt migration 👀 Delivering up to 13% gains in their ev…”” / X https://x.com/lateinteraction/status/1946280376294314268

RT @sdrzn: Seriously blown away by Moonshot’s new Kimi K2 model in @cline. It beats Claude Opus 4 on coding benchmarks and is up to 90% che…”” / X https://x.com/ClementDelangue/status/1946316382313869778

The Tiny Teams Playbook – by Shawn swyx Wang – Latent.Space https://www.latent.space/p/tiny

Cognition: Perfect launch video. Quiet execution for ~12 months (Series A/B April 2024. No formal announcement for Series C in March). Does the right thing on Windsurf, reclaiming the spotlight as the people’s hero. Raise at $10B. Back on top. Bravo. https://x.com/ArfurRock/status/1948434232189071744

The economy is transitioning from pay for the process to pay for the results. Two examples in marketing: 1) Historical MO in most companies is to hire a service where there are experts that will do a job. Think website design/build. You pay for the process that will get you a”” / X https://x.com/c_valenzuelab/status/1947309109902037056

The problem is not just the proliferation of devices that let you record people without their knowledge, but the fact that multimodal LLM let you use recordings in ways that neither law not society anticipated. Everyone has an easy way to mine hours of footage. No forgetting. https://x.com/emollick/status/1948100250175942713

Gemini 2.5 Flash-Lite is now stable and generally available – Google Developers Blog https://developers.googleblog.com/en/gemini-25-flash-lite-is-now-stable-and-generally-available/

Gemini 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet, is now available — complete with the same capabilities that make our 2.5 model family so useful, like multimodality, thinking and connection to tools like Search. Learn more 🔦⬇️”” / X https://x.com/Google/status/1947689382892204542

Gemini’s native text to speech (TTS) capabilities are available for scaled production use 🗣️! Both 2.5 Flash and 2.5 Pro are available and are powerful for use cases like NotebookLM style podcast content. https://x.com/OfficialLoganK/status/1947328086577492309

Coding with LLMs in the summer of 2025 (an update) https://simonwillison.net/2025/Jul/21/coding-with-llms/

Tavily seems pretty nice for a web search/web extraction tool. Anyone have a better one?”” / X https://x.com/Teknium1/status/1946079899544158352

DevDay registration is open! → https://x.com/OpenAIDevs/status/1948067287359267129

woke up early on a saturday to have a couple of hours to try using our new model for a little coding project. done in 5 minutes. it is very, very good. not sure how i feel about it…”” / X https://x.com/sama/status/1946575101509734619

🔥 𝐍𝐄𝐖 𝐋𝐚𝐮𝐧𝐜𝐡 – 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 𝐟𝐨𝐫 𝐒𝐨𝐜𝐢𝐚𝐥 𝐌𝐞𝐝𝐢𝐚 𝐌𝐚𝐫𝐤𝐞𝐭𝐢𝐧𝐠 🔥 AI Agents App Store is now live on @ButternutAI! Imagine your website creating its own social media posts complete with catchy captions, beautiful graphics and perfect hashtags, https://x.com/pritika_mehta/status/1943779792551391494

Prompt it. Then push it to make it better. Figma Make, now available for everyone to try https://x.com/figma/status/1948399170030620870

Github 👨‍🔧: Automate browser-based workflows with LLMs and Computer Vision Github 👨‍🔧: Automate browser-based workflows with LLMs and Computer Vision Skyvern automates browser-based workflows using LLMs and computer vision, replacing brittle, code-defined XPath interactions. It https://x.com/rohanpaul_ai/status/1948712082431447194

Incredible results! Open source is winning. https://x.com/AravSrinivas/status/1947810865685925906

official results from @atcoder World Tour Finals are in — great results for both humans (#1 and #3 onwards) and AI (#2 in the world!). a milestone for AI for solving hard problems.”” / X https://x.com/gdb/status/1945989983569129632

🚀 Building AI-powered apps with @Gradio MCP servers just got a major glow-up! Here are 5 new features making dev life easier and smoother 👇 https://x.com/fdaudens/status/1945871028598493686

about to go onstage with @james406 to talk about the @posthog success story and am surprised to see a demo of a kanban board for claude code within posthog! everyone’s converging on the same software form factors, i can feel it https://x.com/swyx/status/1947829167707590663

No, and tell Claude what to do differently”” This will be one of the most powerful flywheels in the code agents space. A clear signal to iterate on. https://x.com/mathemagic1an/status/1948173798219669684

RT @mckaywrigley: So I gave Claude Code a Mac Mini. And it’s called Claudeputer. It runs 24/7 and it’s allowed to do whatever it wants -…”” / X https://x.com/imjaredz/status/1946304612102816136

Tip: @claude_code supports custom subagents. 1) Begin with Claude-generated agents (/agents) and iterate (“”e”” to edit). 2) Dynamic subagent selection: Claude Code chooses subagents intelligently. Be precise in description fields to guide it. 3) @AnthropicAI’s subagents page https://x.com/claude_code/status/1948622899604050063

Tip: Observe what @claude_code power users are doing and help turn it into an app that anyone can use. There’s a lag between what is possible versus what exists. Try out the @claude_code SDK. Also worth checking out the subreddit r/ClaudeAI. https://x.com/claude_code/status/1948299515577913385

Using a better model for analysis”” 🤨 I didn’t realize I was using haiku all this time, no idea when claude code snuck this one in rofl. https://x.com/karpathy/status/1946325810618700033

using sonnet to write a pytorch module: $0.038 using sonnet to write a react component: $33.74″” / X https://x.com/vikhyatk/status/1947875363889287179

🚨 BIG NEWS 🚨 Search Arena is live with 7 top models with search capabilities ready for testing. Be sure to have the “”Search”” modality selected in the chat box, and get testing. 🌐 @xAi: Grok 4 @anthropic: Claude Opus 4 @perplexity: Sonar Pro High & Reasoning Pro High https://x.com/lmarena_ai/status/1948053410139541626

RT @AnthropicAI: We’ve launched Claude for Financial Services. Claude now integrates with leading data platforms and industry providers fo…”” / X https://x.com/dilipkay/status/1945997040321999083

Turn written content into professional designs with Claude + @canva Upload any document — blog post, product guide, meeting notes — and ask Claude to turn it into branded visuals https://x.com/AnthropicAI/status/1948489708385816666

New ways to engage with artifacts on mobile: Create interactive tools, browse the gallery, and share your work directly from your phone. https://x.com/AnthropicAI/status/1947690894888513964

It is has not ceased to be weird that I can put Rilke’s First Elegy into Suno and get out a coherent 8 minute performance with music. You might not like the interpretation, but it is genuinely amazing that this audio, with apparent emotion, is all 100% AI from the verses alone. https://x.com/emollick/status/1947179948420088065

Shrek inspired, multi-person generation (with voice cloning) – this is possible now with a *single* TTS model! https://x.com/reach_vb/status/1948012058630303857

Lovable becomes a unicorn with $200M Series A just 8 months after launch | TechCrunch https://techcrunch.com/2025/07/17/lovable-becomes-a-unicorn-with-200m-series-a-just-8-months-after-launch/

ChatGPT, meet Notion.
https://x.com/NotionHQ/status/1948772739843596301

Learning AI has become table stakes for your career. The next competitive edge will be knowing how to manage a team of AIs.”” / X https://x.com/mustafasuleyman/status/1948798692598915186

RT @m4rkmc: 📣 We’ve just enabled LLMS.TXT on the Gemini API docs. On https://x.com/jeremyphoward/status/1946386696691683473

Scale your production apps with the stable version of Gemini 2.5 Flash-Lite. ⚡ It’s faster than our 2.0 Flash models, more cost-efficient, and outperforms 2.0 Flash-Lite across coding, math, reasoning, and multimodal understanding. Start building → https://x.com/GoogleDeepMind/status/1947689582012633542

@ Zuck all they need is 3 months to build a frontier coding model https://x.com/scaling01/status/1947773545733394439

Lisan al Gaib on X: “btw I don’t want a model router I want to be able to select the models I use” / X
https://x.com/scaling01/status/1946903963200262523

Yuchen Jin on X: “Heard GPT-5 is imminent, from a little bird. – It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. – That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model. -” / X
https://x.com/Yuchenj_UW/status/1946777842131632427

This might be the best coding model yet. General-purpose is cool, but if you want the best at coding, specialization wins. No free lunch.”” / X https://x.com/rasbt/status/1947995162782638157

LLMs often botch private graph QA because a single bad link breaks the path. BYOKG-RAG (“bring-your-own” Knowledge Graph) fixes that by having the model suggest entities, paths, and Cypher, then letting specialised tools retrieve real graph chunks and feeding them back for one https://x.com/rohanpaul_ai/status/1945822543182709243

RT @bibryam: RAG Patterns more resources: https://x.com/rachel_l_woods/status/1944206536424739230

Adrian Cosma on X: “🚨 New paper! We present Dr.Copilot – a multi-agent LLM system deployed in the real world to improve doctor-patient communication in Romanian 🇷🇴. One of the first production deployments of LLMs in Romanian telemedicine. 👇 📄 https://t.co/ghzoQJtpuo https://t.co/hLb9apgUMV” / X

Has the VSCode C/C++ Extension been blocked? · Issue #2976 · cursor/cursor https://github.com/cursor/cursor/issues/2976