Agents and Copilots: AI News Week Ending 02/13/2026

Agents and Copilots: AI News Week Ending 02/13/2026

February 13, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Photorealistic aerial view of a frozen bay at winter dusk with multiple intersecting pathways and channels carved through ice sheets, some trails fresh and white others refrozen and dark, paths converging and diverging organically across the surface, deep blue to orange sunset gradient sky reflected in exposed water patches, National Geographic quality, bold sans-serif text ‘AGENTS’ overlaid, 16:9 landscape format, hyperreal ice textures with dramatic shadows.

What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People’s Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.”” https://x.com/karpathy/status/2017296988589723767

❤️ We are partnering with @MiniMax_AI to give Ollama users free usage of MiniMax M2.5 for the next couple of days! ollama run minimax-m2.5:cloud Use MiniMax M2.5 with OpenCode, Claude Code, Codex, OpenClaw via ollama launch! OpenCode: ollama launch opencode –model”” https://x.com/ollama/status/2022018134186791177

Eigent day 0 supports @MiniMax_AI M2.5! Try M2.5 on your open source cowork! With Chinese New Year (Horse) coming, we asked Eigent to generate 10 complete HTML/CSS/JS games (no libraries) across arcade, puzzle, runner, strategy, memory, idle and more. The Developer Agent called”” https://x.com/Eigent_AI/status/2021983494407069926

Introducing M2.5, an open-source frontier model designed for real-world productivity. – SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. – Optimized for efficient execution, 37% faster at complex”” https://x.com/minimax_ai/status/2021980761210134808

MiniMax M2.5 is live now on OpenRouter! @MiniMax_AI’s update to their powerful agentic model M2.1 comes with improved reliability and performance on long running tasks. It’s become a powerful general agent, capable of much more than writing code.”” https://x.com/OpenRouter/status/2021983955898315238

MiniMax M2.5: Built for Real-World Productivity. – MiniMax News | MiniMax https://www.minimax.io/news/minimax-m25

MiniMax’s new open M2.5 and M2.5 Lightning near state-of-the-art while costing 1/20th of Claude Opus 4.6 | VentureBeat https://venturebeat.com/technology/minimaxs-new-open-m2-5-and-m2-5-lightning-near-state-of-the-art-while

MiniMax-M2.5 is a surprising new step in open coding models. The first model where I’ve been able to independently confirm that it’s better than the most recent Claude Sonnet. It showed up in our benchmarks below, and in my vibe checks it felt strong and diverse.”” https://x.com/gneubig/status/2021988250240598108

80.2% on SWE-Bench Verified and 76.3% on BrowseComp is quite impressive. Try @MiniMax_AI M2.5 on @Eigent_AI”” https://x.com/guohao_li/status/2021984827923476922

M2.5 runs at 100 tokens per second. That’s 3x faster than Opus. At $0.06/M blended with caching, you can run subagents in the CLI and just leave them going. Fast models exist. Cheap models exist. Both at SOTA performance is new.”” https://x.com/cline/status/2022034678065373693

A sane but extremely bull case on OpenClaw (Clawdbot) | Brandon Wang https://brandon.wang/2026/clawdbot

Apple’s iOS 26.4 Siri Update Runs Into Snags in Internal Testing; iOS 26.5, 27 – Bloomberg https://www.bloomberg.com/news/articles/2026-02-11/apple-s-ios-26-4-siri-update-runs-into-snags-in-internal-testing-ios-26-5-27

Meta AI prepares Avacado, Manus Agent, OpenClaw integration https://www.testingcatalog.com/meta-ai-redies-avacado-manus-agent-and-openclaw-integration/

Goldman Sachs taps Anthropic’s Claude to automate accounting https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html

I pointed Claude Cowork at a set of 107 documents (PPTs, Word docs, Excel) that were initially hand-created for my class at Wharton & expanded on by AI. They make up a very complex business case with lots of issues & opportunities AI was able to one-shot the case from documents”” https://x.com/emollick/status/2021638881158857204

Sabotage Risk Report: Claude Opus 4.6 https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf

An updated Gemini 3 Deep Think is out today: 📈 Achieves SOTA on ARC-AGI-2, MMMU-Pro, and HLE. 🥇Gold-medal level on Physics & Chemistry Olympiads. It turns out the best way to solve hard problems is still to think about them. Read more: https://x.com/NoamShazeer/status/2021988459519652089

Gemini 3 Deep Think (2/26) Semi Private Eval – ARC-AGI-1: 96.0%, $7.17/task – ARC-AGI-2: 84.6% $13.62/task New ARC-AGI SOTA model from @GoogleDeepMind”” https://x.com/arcprize/status/2021985585066652039

Gemini 3 Deep Think scores 84.6% on ARC-AGI-2″” https://x.com/scaling01/status/2021981766249328888

Sundar buried the real story in the cost data. Gemini 3 Deep Think went from 45.1% to 84.6% on ARC-AGI-2 in under 3 months. That’s an 88% improvement on a benchmark specifically designed to resist brute-force scaling. The number that matters: $13.62 per task. The previous Deep”” https://x.com/aakashgupta/status/2022025020839801186

The new Gemini Deep Think is achieving some truly incredible numbers on ARC-AGI-2. We certified these scores in the past few days.”” https://x.com/fchollet/status/2021983310541729894

Thrilled to announce a big upgrade to Gemini 3 Deep Think that hits new records on the most rigorous benchmarks in maths, science & reasoning – including 84.6% on ARC-AGI-2, 48.4% Humanity’s Last Exam without tools, and 3455 Elo rating on Codeforces!”” https://x.com/demishassabis/status/2022053593910821164

Today, we updated Gemini 3 Deep Think to further accelerate modern science, research and engineering. With 84.6% on ARC-AGI-2 and a new standard on Humanity’s Last Exam, see how this specialized reasoning mode is advancing research & development 🧵↓”” https://x.com/Google/status/2021982003818823944

We updated Gemini 3 Deep Think in @GeminiApp. Available for Ultra subscribers and slowly opening Gemini API access (fill out form below). – 48.4%, without tools on Humanity’s Last Exam. – 84.6% on ARC-AGI-2, verified by the ARC Prize Foundation. – Elo of 3455 on Codeforces. -“” https://x.com/_philschmid/status/2021989093110927798

An updated & faster Gemini 3 Deep Think is taking off! 🚀 Our smartest mode to date!™️ PhD-level reasoning to the most rigorous STEM challenges (models’ gotta think harder). Gold medal-level results on Physics & Chemistry Olympiads. 🧪💻 Full details: https://x.com/OriolVinyalsML/status/2021982720860233992

Anupam Pathak, a Google R&D lead in Google’s Platforms and Devices division, tested Deep Think’s ability to speed up the design of physical components. It’s proving that deep reasoning can translate directly into faster, more efficient prototyping.”” https://x.com/Google/status/2022007994897379809

At Duke University, the Wang Lab used Deep Think to optimize crystal growth for new semiconductors. Deep Think designed a recipe to grow thin films larger than 100 μm — hitting a precision target that previous methods had challenges to hit.”” https://x.com/Google/status/2022007988823973977

Gemini 3 Deep Think: AI model update designed for science https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/

nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion | AINews https://news.smol.ai/issues/25-08-26-nano-banana

The upgraded Gemini 3 DeepThink is now live! 🚀 We’re already seeing engineers and researchers leverage it as a partner in their design and development processes I love this example of Anupam Pathak using DeepThink to go from prompt to physical prototype–actually designing”” https://x.com/tulseedoshi/status/2021997867305775324

We’ve updated Gemini 3 Deep Think to better tackle the complexity of real-world research, science, and engineering. ♊ 🚀 It achieves gold-medal standards on the written portions of the Physics and Chemistry Olympiads, building on gold-level performance at IMO and ICPC and has”” https://x.com/JeffDean/status/2021989820604539250

We’ve upgraded our specialized reasoning mode Gemini 3 Deep Think to help solve modern science, research, and engineering challenges – pushing the frontier of intelligence. 🧠 Watch how the Wang Lab at Duke University is using it to design new semiconductor materials. 🧵”” https://x.com/GoogleDeepMind/status/2021981510400709092

What’s ahead for commercial experiences in 2026 https://blog.google/products/ads-commerce/digital-advertising-commerce-2026/

people sleep on last week’s open multimodal releases > GLM-OCR: sota OCR model > MiniCPM-o-4.5: Gemini 2.5-flash level Omni model that runs on your phone > InternS1: efficient generalist VLM outperforming on science tasks all allow commercial use freely 🔥”” https://x.com/mervenoyann/status/2021233480957304913

Gemini in Chrome: Your agentic browsing assistant – YouTube https://www.youtube.com/watch?v=5OR4c87Xt-E

BREAKING: @OpenAI just launched a new Codex model, Spark–it serves at 1,000 tokens per second. It’s blow your hair back fast. It’s their first model publicly released on Cerebras hardware, and you can see the difference. We’ve been testing internally @every for the last week or”” https://x.com/danshipper/status/2022009455773200569

GPT-5.3-Codex still doing a bit of the thing of taking your wording a bit too literally. It labeled things in a UI we made as “”Breadcrumbs”” instead of just… using them as the concept of breadcrumbs”” https://x.com/kylebrussell/status/2020927139546358171

Introducing GPT-5.3-Codex-Spark | OpenAI https://openai.com/index/introducing-gpt-5-3-codex-spark/

Introducing OpenAI Frontier | OpenAI https://openai.com/index/introducing-openai-frontier/

More than 1 million people downloaded Codex App in the first week. 60+% growth in overall Codex user last week! We’ll keep Codex available to Free/Go users after this promotion; we may have to reduce limits there but we want everyone to be able to try Codex and start building.”” https://x.com/sama/status/2020977975081177343

Now in deep research you can: – Connect to apps in ChatGPT and search specific sites – Track real-time progress and interrupt with follow-ups or new sources – View fullscreen reports”” https://x.com/OpenAI/status/2021299936948781095

OpenAI Abandons ‘io’ Branding for Its AI Hardware | WIRED https://www.wired.com/story/openai-drops-io-branding-hardware-devices/

OpenAI works on ChatGPT Skills, upgrades Deep Research https://www.testingcatalog.com/openai-works-on-chatgpt-skills-upgrades-deep-research/

OpenAI’s Jony Ive-Designed Device Delayed to 2027 – MacRumors https://www.macrumors.com/2026/02/10/openais-jony-ive-designed-device-delayed-to-2027/

OpenAI’s new Codex app hits 1M+ downloads in first week — but limits may be coming to free and Go users | VentureBeat https://venturebeat.com/technology/openais-new-codex-app-hits-1m-downloads-in-first-week-but-limits-may-be

Skills in OpenAI API https://developers.openai.com/cookbook/examples/skills_in_api

Skills in OpenAI API https://developers.openai.com/cookbook/examples/skills_in_api/

We just announced new primitives for building agents. Here are 10 tips on running multi-hour workflows reliably 👇”” https://x.com/OpenAIDevs/status/2021725246244671606

We’re introducing a new set of primitives in the Responses API for long-running agentic work on computers. Server-side compaction • Enable multi-hour agent runs without hitting context limits. Containers with networking • Give OpenAI-hosted containers controlled internet”” https://x.com/OpenAIDevs/status/2021286050623373500

This is batshit insane. Gemini 3 Deep Think just scored a 3455 on Codeforces, equivalent to the #8 best competitive programmer in the world. The previous best was 2727 (#175) from OpenAI o3. This is an absolutely superhuman result for AI and technology at large.”” https://x.com/deedydas/status/2022021396768133336?s=46

OpenAI partners with Cerebras | OpenAI https://openai.com/index/cerebras-partnership/

Testing ads in ChatGPT | OpenAI https://openai.com/index/testing-ads-in-chatgpt/

GLM-5: From Vibe Coding to Agentic Engineering https://simonwillison.net/2026/Feb/11/glm-5/

GLM-5: From Vibe Coding to Agentic Engineering https://z.ai/blog/glm-5

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens.”” https://x.com/Zai_org/status/2021638634739527773

GLM-5 was pre-trained on 28.5T tokens and uses DeepSeek Sparse Attention”” https://x.com/scaling01/status/2021627498451370331

Claude Opus 4.6 thinking has landed at #1 across Code and Text Arena! Both thinking and non-thinking have taken the top 2 spots across both leaderboards. @AnthropicAI now has 4 of the top 5 models in the Code Arena. A few highlights: – #1 Code Arena: scoring 1576 – #1 Text”” https://x.com/arena/status/2020956227795288132

‼️Position: AI coding agent research needs recalibration. We’ve heavily optimized for solo autonomy, and far less for designing agents that empower the humans using them. It’s time to build human-centered coding agents. 🧵”” https://x.com/ZhiruoW/status/2021603778982473813

🧪 Agent Observability Powers Agent Evaluation 🧪 When something goes wrong in traditional software, you know what to do: check the error logs, look at the stack trace, find the line of code that failed. But AI agents have changed what we’re debugging. When an agent takes”” https://x.com/LangChain/status/2021722975121420496

Anything you can do in Obsidian you can do from the command line. Obsidian CLI is now available in 1.12 (early access).”” https://x.com/obsdmd/status/2021241384057930224

Been thriving with GitHub Copilot CLI lately and VS Code 1.109 just took it up another level. Native worktrees, MCP apps, Mermaid diagrams in responses, and a massive wave of UX upgrades. The Copilot team is shipping at an insane pace. 🚀”” https://x.com/JoeCuevasJr/status/2021074196034630103

Been using this model for a bit now, the combination of speed and intelligence is insane. It genuinely feels like a new paradigm shift. Excited to plug it into more specialized coding pipelines.”” https://x.com/skirano/status/2022014051572969481

Between OpenClaw and VLMs getting better much at computer use – the loom recording to automated workflow dream is nearly ready for prime time”” https://x.com/bilawalsidhu/status/2021617353793093843

Building Better Coding Agent Harnesses at @LangChain we’re thinking hard about the science of harness engineering + open research on what works & doesn’t A quick peak on our deepagents X Terminal Bench 2.0 work, shoutout to @alexgshaw & Harbor (they’re great). Broad research”” https://x.com/Vtrivedy10/status/2022018287408910745

Canadian girlfriend Ai coding strikes again. | Hacker News https://news.ycombinator.com/item?id=46690007

Clawdbot and Moltbook are a False Alarm – For Now https://secondthoughts.ai/p/clawdbot-and-moltbook

Cognition | Windsurf Codemaps: Understand Code, Before You Vibe It https://cognition.ai/blog/codemaps

I think people on this site don’t realize how much people’s interactions with “”AI”” turn out to be, when you ask them: customer service lines (which are almost certainly not GenAI yet) or Siri or maybe a free model (often via an off-brand “”ChatPT”” AI app they downloaded somewhere)”” https://x.com/emollick/status/2021066407140786460

Introducing Composer 1.5 · Cursor https://cursor.com/blog/composer-1-5

OpenClaw is What Apple Intelligence Should Have Been – Jake Quist https://www.jakequist.com/thoughts/openclaw-is-what-apple-intelligence-should-have-been

The craft of engineering is rapidly changing. At @tryramp, we built our own background coding agent to accelerate faster. We call it Inspect. It wrote 30% of merged frontend + backend PRs in the past week. It’s powered by @opencode, @modal and @CloudflareDev. It runs fully in”” https://x.com/zachbruggeman/status/2010728444771074493?s=46

Towards self-driving codebases · Cursor https://cursor.com/blog/self-driving-codebases

[2602.05842] Reinforcement World Model Learning for LLM-based Agents https://arxiv.org/abs/2602.05842

One Year of MCP — with David Soria Parria and AAIF leads from OpenAI, Goose, Linux Foundation – YouTube https://www.youtube.com/watch?v=z6XWYCM3Q8s&t=3670s

@MiniMax_AI M2.5 is now in Cline. + 80.2% SWE-Bench Verified. + 100 tps. $0.06/M blended cost. + 10B activated parameters. And it’s free in Cine for a limited time!”” https://x.com/cline/status/2022034591075512636

🚨Busy week for new models in the Arena: MiniMax M2.5 by @MiniMax_AI is now available in the Text and Code Arena. Bring your toughest prompts and see how it stacks up against the latest models in real-world use. In Battle mode, your votes power the leaderboards. Learn more”” https://x.com/arena/status/2021987555655422257

Honestly I wanna release this beast ASAP — I’m dying to go back to my hometown for Spring Festival 😂 But the more training compute we put in, the more it keeps rising. Painfully happy problem. We hear you guys. M2.5 soon.”” https://x.com/SkylerMiao7/status/2021587213230715306

Instant access to M2.5 on MiniMax Agent web/desktop! @MiniMax_AI”” https://x.com/MiniMaxAgent/status/2021595954143515106

MiniMax M2.5 is now live on BLACKBOX AI. A frontier model designed for real world execution with strong reasoning, reliable tool use, and complex multi step workflows. Engineered for demanding workloads. Ready for production scale orchestration. Switch instantly in the”” https://x.com/blackboxai/status/2022140484601225420

A glance of MiniMax 2.5, are you ready?”” https://x.com/SkylerMiao7/status/2021578926884053084

Congrats @MiniMax_AI! 🎉 Free for 3 days on Qoder, it’s time to put M2.5 through some serious coding sessions!”” https://x.com/qoder_ai_ide/status/2021983111161213365

MiniMax just dropped M2.5 and it’s on par with Opus 4.6 while being 20x cheaper and 3x faster???”” https://x.com/shydev69/status/2021989925143597123

// Automating Sub-Agent Creation for Agentic Orchestration // Multi-agent systems are powerful but inflexible. Building agentic systems today relies on static and predefined roles. For example, an agentic AI coder might have a coder agent, a searcher agent, a reviewer agent.”” https://x.com/dair_ai/status/2021215864557797608

Agent Labs: Welcome to GPT Wrapper Summer – by swyx (Shawn) https://www.latent.space/p/agent-labs

Agents @ Work: Dust.tt – Latent.Space https://www.latent.space/p/dust

Agents @ Work: Lindy.ai – Latent.Space https://www.latent.space/p/lindy

Before LangChain, teams stitched together a patchwork: a framework (or bespoke glue code), generic observability (logs/APM), a spreadsheet of prompts and test cases, and a deployment stack designed for stateless APIs. That approach fails for agents for a reason LangChain keeps”” https://x.com/marvinvista/status/2021605778285814092

CooperBench update: we gave agents git. It didn’t cure the curse of coordination, but we found more interesting cases of miscoordination. We set up self-hosted git servers so agent pairs could actually see and share each other’s code. Cooperation improves marginally, but new”” https://x.com/_Hao_Zhu/status/2021252996848550005

deepagents now supports byo sandboxes, giving your agents the power to execute code in an isolated env. you can use our builtin integrations for @modal, @daytonaio, and @RunloopAI, or bring your own sandbox provider! docs:”” https://x.com/sydneyrunkle/status/2022025934774374503

Does the sandbox run your agent? Or does your agent run the sandbox? Sounds arcane. It’s not. Agent-in-Sandbox: Fast to ship, but LLM-generated code has the same permissions as your whole agent. Sandbox-as-Tool: Agent calls out to sandboxes for execution only. You can give”” https://x.com/chriscorcoran/status/2021631151970865530

Everyone is building “”data agents”” but nobody agrees on what that means. The term gets applied to everything from a simple SQL chatbot to a fully autonomous data scientist. This ambiguity makes it impossible for users and builders to reason about what a system can actually do.”” https://x.com/dair_ai/status/2021252863150924244

Expanding our long-running agents research preview · Cursor https://cursor.com/blog/long-running-agents

How Cognition Uses Devin to Build Devin – by Nader Dabit https://nader.substack.com/p/how-cognition-uses-devin-to-build

I hold this truth to be self-evident: Putting the agent in a different container than the environment makes a lot more architectural sense.”” https://x.com/bernhardsson/status/2021527682534760709

I simply do not see how Open Claw and systems like it won’t completely disrupt virtual assistant businesses like Athena etc. It’s been an absolute game changer allowing me to context switch like a madman without dropping a beat. VA doesn’t even do it justice. It’s like I have”” https://x.com/bilawalsidhu/status/2019612006811095199

I think one of the most important questions in multi-agent AI right now is one almost nobody is asking: when you add more agents, are you actually getting collaboration, or are you just spending more compute? Collaboration and communication are huge bottlenecks for multi-agent”” https://x.com/omarsar0/status/2021013257348419670

Long-running agents are now available at https://t.co/3PT8c7azU3 for Ultra, Teams, and Enterprise plans. With our new harness, agents can complete much larger tasks. https://x.com/cursor_ai/status/2022046178708492445

Minions: Stripe’s one-shot, end-to-end coding agents | Stripe Dot Dev Blog https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents

More and more agents need a workspace: a container to execute code and other processes We see two different ways of setting this up: 1. Agent IN a sandbox 2. Sandbox as a tool Wrote up the pros and cons of each! Ty to @nfcampos @RunloopAI @e2b @0thernet for their insights”” https://x.com/hwchase17/status/2021265779803521245

New course: A2A: The Agent2Agent Protocol, built with @googlecloudtech and @IBMResearch, and taught by Holt Skinner, @ivnardini, and Sandi Besen. Connecting agents built with different frameworks usually requires extensive custom integration. This short course teaches you A2A,”” https://x.com/AndrewYNg/status/2021985280102973931

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments https://huggingface.co/blog/openenv-turing

Personal, Local, Private AI Agents: Soumith Chintala – YouTube https://www.youtube.com/watch?v=jMoAaZP_Kkw

Pi: The Minimal Agent Within OpenClaw | Armin Ronacher’s Thoughts and Writings https://lucumr.pocoo.org/2026/1/31/pi/

tl;dr Today, we’re announcing our new company @EntireHQ to build the next developer platform for agent-human collaboration. Open, scalable, independent, and backed by a $60M seed round. Plus, we are shipping Checkpoints to automatically capture agent context. In the last three”” https://x.com/ashtom/status/2021255786966708280

Welcome to the team, @cognition. The dedicated AI coding agent company joins us as a Global Partner. Find out more: https://x.com/AstonMartinF1/status/2020845510345830653

Why Agentic AI Breaks Legacy Identity — and What Infrastructure Leaders Must Do Next | Teleport https://goteleport.com/why-agentic-ai-breaks-legacy-identity/

Don’t Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic – YouTube https://www.youtube.com/watch?v=CEvIs9y1uog&t=715s

Folks claim to set the state of the art on ARC-AGI-2 using an RLM, a deeply recursive one, to manage the long horizon. “”Other agent harnesses keep everything in the model’s context window. We don’t. Agentica uses a stateful REPL to manage context. This is an RLM-style loop.”””” https://x.com/lateinteraction/status/2021994073675247816

🤖 From this week’s issue: A research article presenting Google’s evaluation of 180 agent configurations, revealing multi-agent systems boost parallelizable tasks by 81% but degrade sequential tasks by 70%.”” https://x.com/dl_weekly/status/2020935994787143726

Kimi Agent Swarm blog is here 🐝 https://t.co/XjPeoRVNxG Kimi can spawn a team of specialists to: – Scale output: multi-file generation (Word, Excel, PDFs, slides) – Scale research: parallel analysis of news from 2000-2025 – Scale creativity: a book in 20 writing styles”” https://x.com/Kimi_Moonshot/status/2021141949416362381

Kimi Agent Swarm: 100 Sub-Agents at Scale https://www.kimi.com/blog/agent-swarm

Launching mini-SWE-agent 2.0, the simplest coding agent. Near SoTA performance, with the agent/model/environment only ~100 lines each. Powering benchmarks and RL training at NVIDIA, Anyscale, Stanford and many more!”” https://x.com/KLieret/status/2021606142699356215

The companies that succeed in the future are going to make very heavy use of AI. People will manage teams of agents to do very complex things. Today we are launching Frontier, a new platform to enable these companies.”” https://x.com/sama/status/2019441198734209374

> Someone reverse-engineered how Claude Code’s Agent Teams communicate. > > No WebSocket. No gRPC. No message queue. > > They read and write JSON files on disk. > > Each agent gets an inbox at ~/.claude/teams/inboxes/{agent}.json. Messages append to a JSON array. Protocol”” https://x.com/peter6759/status/2022156692985983266

Agentic Video Editing This is crazy! I just asked Claude Code to build me an entire agent-powered video editing app. ~10K lines of code. Uses Claude Agent SDK + Claude Opus 4.6. It’s really good. Runs locally. Highly customizable. You can just build things.”” https://x.com/omarsar0/status/2020912965885538664

Claude opus 4.6 (adaptive) takes the lead on WeirdML with 77.9% ahead of gpt-5.2 (xhigh) at 72.2%. It sets a new high score on 3 tasks including scoring 73% on the hardest task (digits_generalize) up from 59%. Opus 4.6 is extremely token hungry and uses an average of 32k”” https://x.com/htihle/status/2020845875447074874

Claude Opus 4.6: System Card Part 1: Mundane Alignment + MW https://thezvi.substack.com/p/claude-opus-46-system-card-part-1

guys, i’m still on the @claudeai cowork hype train. if you are not using this for a routine knowledge work task once a day you are probably miscalibrated for HOW MUCH BETTER THIS THING IS compared to Anthropic’s initial Computer Use demo. in this task i had Cowork: – scan my”” https://x.com/swyx/status/2016899888718442842

lotta chatter about RLMs and whether or not they’re useful over coding agents. i decided to just go ahead and try. i had claude code implement itself an RLM skill using bash as the execution environment / files as the variables. this is effectively implemented “”inside a coding”” https://x.com/tenobrus/status/2020770310958768449

Our teams have been building with a 2.5x-faster version of Claude Opus 4.6. We’re now making it available as an early experiment via Claude Code and our API.”” https://x.com/claudeai/status/2020207322124132504?s=20

Speed up responses with fast mode – Claude Code Docs https://code.claude.com/docs/en/fast-mode

The AI Labs don’t yet do a good job explaining how the upgrades to their harnesses change work. For example, since Opus 4.6, Claude Code will spontaneously use subagents to do work in parallel. This is very helpful with a real impact on tasks, but was sort of quietly rolled out.”” https://x.com/emollick/status/2021329048811741250

We need a shorthand way of saying: “”An AI did the work, but I vouch for the result”” Saying “”I did it”” feels slightly sketchy, but saying “”Claude did it”” feels like avoiding responsibility”” https://x.com/geoffreylitt/status/2008243455810748713?s=46

You can just build things. Ported my AI coloring book app I built for my kids to iOS/Swift with the new fast mode for Claude Opus 4.6 in @code!”” https://x.com/pierceboggan/status/2020616390974353880

Claude Opus 4.6: System Card Part 2: Frontier Alignment https://thezvi.substack.com/p/claude-opus-46-system-card-part-2

I strongly echo the concerns about the objectivity and methodology in @AnthropicAI’s safety evaluations for Claude models. Our team specifically studies the computer-use and browser-use scenarios. The system card reports low attack success rates for Claude Opus 4.6–around ~10% in”” https://x.com/hhsun1/status/2021696367216005139

The poetry tastes of GenAI: “”I want you to suggest two poems that you think apply very well the current state of GenAI models like you. Don’t just pick popular poems and back justify. Think hard about options first.”” ChatGPT, Gemini & Claude all suggest Borges’s “”The Golem”””” https://x.com/emollick/status/2021677609872986450

Anthropic’s More Than $20 Billion Funding to Close as Soon as Next Week https://finance.yahoo.com/news/anthropic-more-20-billion-funding-010451731.html?guccounter=1

I made 2 requests to Opus 4.6 Thinking in Antigravity and got rate-limited. It changed 30 LOC. lol”” https://x.com/scaling01/status/2021636359509979555

I’ve tested the latest generation of all the major AIs on theoretical physics research and Claude 4.6 has absolutely blown me away with how capable it is in physics. It feels like a Claude Code moment for research is not that far off. It has a very detailed understanding of”” https://x.com/ibab/status/2019879195028123847?s=20

Can just a 4B model solve IMO-level proof problems at the level of much stronger LLMs like Gemini 3 Pro? Yes, if you can train the LLM to scale test-time compute well! We’re very excited to release our 4B model “”QED-Nano””, built via an awesome open collab! Details below🧵⬇️”” https://x.com/aviral_kumar2/status/2022057927368995097

Early testers of Gemini 3 Deep Think are already seeing results. We partnered with researchers to explore how this model could tackle rigorous, real-world applications — from spotting hidden flaws in research papers to optimizing semiconductor growth. Here’s how early testers”” https://x.com/Google/status/2022007977419415958

If you’re an Ultra subscriber, you can try the latest in the Gemini App, but we’re also making Deep Think available for the first time in the Gemini API! Request early access here:”” https://x.com/tulseedoshi/status/2021997870858350640

@GeminiApp Do people realize how crazy that thing is??”” https://x.com/LexnLin/status/2021986194780041394

Codeforces results is “”no tools””? So Gemini 3.0 Deep Think cannot write test cases to test its solution before submission? I guess even the top1 human can’t get 3455 under this condition.”” https://x.com/YouJiacheng/status/2021985843074994534

Gemini 3 Deep Think benchmarks look amazing! On Codeforces, it scored 3,455 Elo. Apparently, only 7 humans in the world have a higher coding Elo score! A friend just sent me an output about a cancer mechanism that was so great that I am now resubscribing to Ultra for DT access!”” https://x.com/DeryaTR_/status/2022030594037989493

Gemini 3 Deep Think can help make things. 🧠 Here’s our side project: We sketched a laptop stand and Deep Think coded that into an interactive prototyping tool. We used that tool to generate a STL file, which we sent to @fleet_ai. And now I have a new laptop stand! What will”” https://x.com/joshwoodward/status/2022001967795777996

Gemini 3 Deep Think is available now in the @GeminiApp for Google AI Ultra subscribers and via the Gemini API to select researchers, engineers and enterprises through our early access program. Learn more ↓”” https://x.com/Google/status/2021982018679312829

Gemini 3 Deep Think is getting a significant upgrade. We’ve refined Deep Think in close partnership with scientists and researchers to tackle tough, real-world challenges. And it’s pushing the frontier across the most challenging benchmarks, achieving an unprecedented 84.6% on”” https://x.com/sundarpichai/status/2022002445027873257

Gemini 3 Deep Think now excels across scientific domains like chemistry and physics — achieving gold medal-level results on the written sections of the 2025 International Physics and Chemistry Olympiads.”” https://x.com/Google/status/2021982010739503138

Parsing PDFs at scale with LLMs is cost prohibitive. Newer models (e.g. gemini 3) are good at reading pdfs, but you burn unnecessary vision tokens even when the page is text heavy. We’ve built in a “cost-optimizer” within LlamaParse that will dynamically route pages to”” https://x.com/jerryjliu0/status/2021267495123140760

The upgraded Deep Think mode is rolling out now in the @GeminiApp for Google AI Ultra subscribers. For scientific researchers and developers, we’re opening a Vertex AI Early Access Program for the API. Start discovering → https://x.com/GoogleDeepMind/status/2021981517791342807

There are only 7 people on the planet who can beat Gemini 3 Deep Think in coding competitions. It has an Elo of 3455. A bit over a year ago the best systems were at 2727 (o3-preview).”” https://x.com/scaling01/status/2021983388442509478

Today, we’re releasing a significant upgrade to our specialized reasoning mode, Gemini 3 Deep Think. Deep Think is built to drive practical applications, enabling researchers to interpret complex data and engineers to model physical systems through code. With the updated Deep”” https://x.com/GeminiApp/status/2021985731577852282

(OpenAI Town Hall with Sam Altman – YouTube https://www.youtube.com/watch?v=Wpxv-8nG8ec&t=1179s

📣 Shipping software with Codex without touching code. Here’s how a small team steering Codex opened and merged 1,500 pull requests to deliver a product used by hundreds of internal users with zero manual coding.”” https://x.com/OpenAIDevs/status/2021637918847381656

🚀 deepagents v0.4 is out with: 🧩 pluggable sandboxes (modal, daytona, runloop) 🧠 smarter conversation history summarization 💬 responses API default for OpenAI models”” https://x.com/sydneyrunkle/status/2021289479139422296

A full super bowl ad for codex that’s wild”” https://x.com/iScienceLuvr/status/2020650521758179561

After a near-death experience, ChatGPT gave me closure my doctors didn’t https://www.axios.com/2026/02/11/chatgpt-postpartum-health-scare

Apps in ChatGPT – YouTube https://www.youtube.com/watch?v=2C4Cs6503gw

Big drop for Codex users later today! You can just build things.”” https://x.com/sama/status/2019442016594088211

Codex app is actually insane. I took an idea in my head, refined it with ChatGPT, copied the entire chat into Codex Plan mode, picked Codex’s suggested options, hit run… and it built everything in one shot. I’ve been coding for 15 years. I reviewed the output carefully. It was”” https://x.com/arrakis_ai/status/2021071947640312052

Codex ended up working for a little over 2 hours and 40 minutes in one run. It has now been working for a further 45 minutes (and counting) on the same C codebase. Gpt-5.3 high token usage is incredible. I’ve used barely 10% of my weekly usage. It keeps working until the tests”” https://x.com/CtrlAltDwayne/status/2020479866777510134

Codex is now over 1 million active users!”” https://x.com/sama/status/2019219967250669741

Codex-Spark is currently text-only with a 128k context window. We’ll introduce more capabilities-including larger models, longer context lengths, and multimodal input as we learn from our first production deployment of low-latency infrastructure and hardware.”” https://x.com/OpenAIDevs/status/2022009943105433809

Deep research in ChatGPT is now powered by GPT-5.2. Rolling out starting today with more improvements.”” https://x.com/OpenAI/status/2021299935678026168

From how the team operates, I always thought Codex would eventually win. But I am pleasantly surprised to see it happening so quickly. Thank you to all the builders; you inspire us to work even harder.”” https://x.com/sama/status/2021606985469211065

GPT-5.3 Codex is now available in Cursor! It’s noticeably faster than 5.2 and is now the preferred model for many of our engineers.”” https://x.com/cursor_ai/status/2020921643145519249

gpt-5.3-codex for rewriting applications between languages:”” https://x.com/gdb/status/2021272681237361027

GPT-5.3-Codex is here! *Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld). *Mid-task steerability and live updates during tasks. *Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token! *Good computer use.”” https://x.com/sama/status/2019474754529321247

GPT-5.3-Codex is rolling out in @cursor_ai, @code, and @github today. We’re starting with a small set of API customers as part of a phased release. This is the first model we’re treating as high cybersecurity capability under our Preparedness Framework. We’ll continue to scale”” https://x.com/OpenAIDevs/status/2020921792941166928

GPT-5.3-Codex is rolling out today in Cursor, Github, and VS Code!”” https://x.com/sama/status/2020940847190356092

GPT-5.3-Codex-Spark is launching today as a research preview for Pro. More than 1000 tokens per second! There are limitations at launch; we will rapidly improve.”” https://x.com/sama/status/2022011797524582726

GPT-5.3-Codex-Spark is now in research preview. You can just build things–faster.”” https://x.com/OpenAI/status/2022009582210715925

GPT-5.3-Codex-Spark size: ~700B@30B OpenAI’s new GPT-5.3-Codex-Spark is the first model for which we can somewhat reliably estimate its size. Cerebras inference: 1000 tokens/s – GLM-4.7 is 355@32B, 92 layers 1400 tokens/s – Qwen3-235B is 235@22B, 94 layers 3000 tokens/s -“” https://x.com/scaling01/status/2022028580226768995#m

He truly is! Since he joined OpenAI, we haven’t seen an interview with @SebastienBubeck, but here is one we did with him a couple of years ago. Still a very interesting read”” https://x.com/TheTuringPost/status/2020920421487608259

How would you prefer us to charge for Codex?”” https://x.com/sama/status/2019814741129195576

I do wonder how @cursor_ai feels about having made a partnership with openai, promoted and defaulted openai’s models in Cursor, only for them to withold their coding model from them just a few versions later. I was pretty dissapointed when I saw their CEO on stage with Sama”” https://x.com/Teknium/status/2020659530162692568

I hope Codex will inspire a new generation of builders and dreamers.”” https://x.com/thsottiaux/status/2020671175462912492

I’ve been using 5.3 Codex for 3 weeks. It’s an incredible model. I’ve built so much stuff with it. Made a vid showing everything I love about it, as well as a few call-outs of things I hope OpenAI changes.”” https://x.com/theo/status/2020279916760355142

i’ve joined the other side – codex is now my daily driver. the app is great and the model is highly effective, and med/high is fast enough to not block my work so they’ve clearly improved its efficiency too i thought i’d miss using the hooks and custom slash commands of cc but i”” https://x.com/atzydev/status/2020547181019607330

If you are not using the new codex app, you are really wasting your development time. I always dreamed about who is going to replace IDE like Cursor, because they are memory hungry when running multiple projects. Thought it is going to be terminals for a while, but you should be”” https://x.com/webtkdev/status/2020380003708596707

Introducing GPT-5.3-Codex-Spark, our ultra-fast model purpose built for real-time coding. We’re rolling it out as a research preview for ChatGPT Pro users in the Codex app, Codex CLI, and IDE extension.”” https://x.com/OpenAIDevs/status/2022009906329739681

It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in.”” https://x.com/ccccjjjjeeee/status/2021160492039811300

New art project. Train and inference GPT in 243 lines of pure, dependency-free Python. This is the *full* algorithmic content of what is needed. Everything else is just for efficiency. I cannot simplify this any further.”” https://x.com/karpathy/status/2021694437152157847

nice to see community repro of how token efficient GPT-5.3-Codex is! and we’re only getting started”” https://x.com/reach_vb/status/2021158781539713109

Not seeing GPT-5.3-Codex in @code? The rollout has been paused, but we’ll let you know as soon as we have an update!”” https://x.com/code/status/2021041639926673503

OpenAI Codex-Spark powered by Cerebras You can now just build things faster–at 1,000 tokens/s.”” https://x.com/cerebras/status/2022021218208297302

Opinion | I Left My Job at OpenAI. Putting Ads on ChatGPT Was the Last Straw. – The New York Times https://www.nytimes.com/2026/02/11/opinion/openai-ads-chatgpt.html

Over 300M people use ChatGPT to learn how to do something every week. More than half of US ChatGPT users say it enables them to achieve things that previously felt impossible. These are just a few stories of what they are building.”” https://x.com/OpenAI/status/2019822532795547807

Quite a visual from OpenAI. Your system of record is a dumb pipe and we will layer 5 rows of value on top of it to steal the relationship and all the economics along with it No wonder SaaS is in the gutter”” https://x.com/buccocapital/status/2019598551228223526

Software development is undergoing a renaissance in front of our eyes. If you haven’t used the tools recently, you likely are underestimating what you’re missing. Since December, there’s been a step function improvement in what tools like Codex can do. Some great engineers at”” https://x.com/gdb/status/2019566641491963946

Spark now with 100% of pro users. Update the Codex app or cli if you don’t see it. Infra is not completely stable, but we’re working on that. Proof attached”” https://x.com/thsottiaux/status/2022034024655728709

Starting something new at OpenAI! Excited to serve as Chief Futurist, where I’ll be working on studying AI impacts and engaging the world to discuss them, in collaboration with colleagues across the org and the research community.”” https://x.com/jachiam0/status/2021633259583812007

The 5.3 lovefest is so nice to see. Don’t think we’ve had so much excitement for a model since the original GPT-4.”” https://x.com/sama/status/2019813802049696064

the ability in codex cli with gpt 5.3 to instantly redirect the agent without waiting for your commands to be unqueued and risk interrupting the agent’s current session is so underrated codex cli is goated.”” https://x.com/blader/status/2020211746401841161

this is what i see when someone says “i asked chat GPT””” https://x.com/myelessar/status/2020818458653466918

This week on How I AI: OpenAI product lead on getting the most out of Codex https://www.lennysnewsletter.com/p/this-week-on-how-i-ai-the-power-users

try the codex app!”” https://x.com/gdb/status/2021093839315054690

Ultra-low latency Codex:”” https://x.com/gdb/status/2022010171124523148

We updated GPT-5.2 (the instant model) in ChatGPT today. Not a huge change, but hopefully you find it a little better.”” https://x.com/sama/status/2021452911511998557

We’re giving a small group of API customers early access to Codex-Spark to experiment with it in their products, helping us continue optimizing performance beyond Codex. We’ll expand access to more ChatGPT users and API developers as we add more capacity.”” https://x.com/OpenAIDevs/status/2022009955189158211

with codex, building is for everyone:”” https://x.com/gdb/status/2020651347293716694

You can just build things.”” https://x.com/OpenAI/status/2020649757434327362

claude code, codex, etc. are incredible products turns out you can build a really good agentic coding system quickly but they are exceptionally bad *terminals*. screen flashes, the scrolling doesn’t work, pasting often fails, etc turns out building a good CLI is very hard”” https://x.com/jxmnop/status/2021633739097563167

In a long time testing the new Opus 4.6 and Codex 5.3 models the most striking thing was how model releases are far trickier to read in 2026. I’m in my post-benchmark era. Claude is still king, but codex is closer than ever.”” https://x.com/natolambert/status/2020881482873811070

This is a great read if you are building complex applications with Claude Code and Codex. Most AI coding agents can generate a frontend. But building a real full-stack application is a completely different story. The gap between generating a landing page and shipping a working”” https://x.com/omarsar0/status/2020891961511809456

TLDR: codex 5.3 is a very useful coding tool, claude 4.6 is the first of many general agents to come.”” https://x.com/natolambert/status/2020885646555107619

Opus 4.6 dethroned GPT-5.2-xhigh on WeirdML and is now in clear first place! Opus finds much shorter (so presumably more simple and elegant) solutions to the problems. But code execution times went up. So maybe the difference in code length is due to optimizations? Would love”” https://x.com/scaling01/status/2020847174909665712

Opus 4.6, Codex 5.3, and the post-benchmark era https://www.interconnects.ai/p/opus-46-vs-codex-53

3 years ago, we emailed Jensen with requests for Blackwell. Today, we released GPT-5.3-Codex, a SOTA model designed for GB200-NVL72. Nitpicking ISA, simming rack designs, and tailoring our arch to the system has been a fun experience! I’m grateful to our collaborators at NVIDIA.”” https://x.com/trevorycai/status/2019482450855096440

At @nvidia, we use a lot of AI coding tools. Codex with GPT-5.3-codex is particularly impressive. The engineers I know here are big codex power users. The capabilities of these coding agents are advancing quickly, it’s quite exciting. With 5.3, I’m particularly impressed with”” https://x.com/benklieger/status/2021707684211569033

VS Code gives you extremely powerful building blocks with custom agents, parallel subagents, and slash commands to compose your own workflows. Here is /review command that uses Opus 4.6 fast mode, GPT-5.3-Codex, and Gemini 3 Pro to independently review changes and grade each”” https://x.com/pierceboggan/status/2021094988205969465

🤖 From this week’s issue: Official blog post announcing Qwen3-Coder-Next, an 80B-parameter coding model achieving competitive performance on SWE-Bench (70.6% on Verified) while enabling 10x higher throughput for repository-level agentic workflows.”” https://x.com/dl_weekly/status/2021690941879250945

🚀 Introducing Qwen-Image-2.0 — our next-gen image generation model! 🎨 Your imagination, unleashed. ✨ Type a paragraph → get a pro slides ✨ Describe a scene → get photoreal 2K magic ✨ Add text → it just works (no more glitchy letters!) ✨ Key upgrades: ✅ Professional”” https://x.com/Alibaba_Qwen/status/2021137577311600949

A quick update — we’ve fixed a Qwen-Image 2.0 bug in Qwen Chat that impacted: • Classical Chinese poem ordering in image generation • Character consistency during image editing ✅Patch is live now! https://t.co/DWnxVxa0hY Go test it out and drop us your feedback.”” https://x.com/Alibaba_Qwen/status/2021510747671720368

Folks ask about training an RLM like a hypothetical. In the paper, we do post-train and release open-weights RLM-Qwen3-8B-v0.1 on HF. It’s a tiny proof of concept, but it was surprisingly easy to get a marked jump in capability. Maybe learning to recurse is not too hard for 8B.”” https://x.com/lateinteraction/status/2020877152854409691

https://t.co/DIetNHHMp3 Qwen3.5 architecture is out: A vision language model, hybrid SSM-Transformer using Gated DeltaNet linear attention mixed with standard attention, interleaved MRoPE, and shared+routed MoE experts.”” https://x.com/QuixiAI/status/2021109801606893837

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched | AINews https://news.smol.ai/issues/25-09-05-1t-models

Qwen https://qwen.ai/blog?id=a6f483777144685d33cd3d2af95136fcbeb57652&from=research.research-list

Qwen https://qwen.ai/blog?id=qwen-image-2.0

Qwen https://qwen.ai/blog?id=qwen-image-layered

Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT | AINews https://news.smol.ai/issues/25-08-04-qwen-image

❤️ GLM-5 is on Ollama’s cloud! It’s free to start, and with higher limits available on the paid plans. ollama run glm-5:cloud It’s fast. You can connect it to Claude Code, Codex, OpenCode, OpenClaw via ollama launch! Claude: ollama launch claude –model glm-5:cloud”” https://x.com/ollama/status/2021667631405674845

🎉 The mysterious Pony Alpha is finally revealed, congrats to @Zai_org on releasing GLM-5! SGLang is ready to support on day-0. 🛠️ 744B params (40B active) model built for complex systems engineering & long-horizon agentic tasks 📚 28.5T tokens pretraining for a stronger”” https://x.com/lmsysorg/status/2021639499374375014

🔥Congrats to @Zai_org on launching GLM-5 — 744B parameters (40B active), trained on 28.5T tokens, integrating DeepSeek Sparse Attention to keep deployment cost manageable while preserving long-context capacity. vLLM has day-0 support for GLM-5-FP8 with: 📖 DeepSeek Sparse”” https://x.com/vllm_project/status/2021656482698387852

🚀 Zhipu AI GLM-5: A Real Step Into the Top Tier? Zhihu contributor toyama nao offers a concise verdict: “”A hard road upward — the stairway to godhood.”” 🔮From recovery to contention Over the past six months (4.5 → 5.0), Zhipu has climbed back into China’s first tier and now”” https://x.com/ZhihuFrontier/status/2022161058321047681

GLM-5 by @Zai_org is now the #1 open model in Code Arena, tied with Kimi-K2.5-Thinking! Overall #6 on par with Gemini-3-pro, 100+pts below Claude-Opus-4.6 in agentic webdev tasks. Congrats to the @Zai_org GLM team on the new milestone! 👏”” https://x.com/arena/status/2021996281141629219

GLM-5 from @Zai_org just climbed to #1 among open models in Text Arena! ▫️#1 open model on par with claude-sonnet-4.5 & gpt-5.1-high ▫️#11 overall; scoring 1452, +11pts over GLM-4.7 Test it out in the Code Arena and keep voting, we’ll see how GLM-5 performs for agentic coding”” https://x.com/arena/status/2021725350481526904

GLM-5 is coming to Coding Plan Pro users within one week, and we’re working to bring it to everyone after that. To be upfront: compute is very tight. Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference. We appreciate your understanding”” https://x.com/Zai_org/status/2021656633320018365

GLM-5 is now on AI Gateway. Better long-range planning, multiple thinking modes, and improved multi-step agent tasks versus previous https://t.co/Yqx8kVZ3i8 models. Use 𝚖𝚘𝚍𝚎𝚕: ‘𝚣𝚊𝚒/𝚐𝚕𝚖-𝟻’ to get started.”” https://x.com/vercel_dev/status/2021655129347539117

GLM-5 is the new leading open weights model! GLM-5 leads the Artificial Analysis Intelligence Index amongst open weights models and makes large gains over GLM-4.7 in GDPval-AA, our agentic benchmark focused on economically valuable work tasks GLM-5 is @Zai_org’s first new”” https://x.com/ArtificialAnlys/status/2021678229418066004

GLM-5 is ZAI’s new flagship. 744B params (40B active), trained on 28.5T tokens, and built for complex systems engineering and long-horizon agentic tasks. Two things worth paying attention to: 1. They integrated DeepSeek Sparse Attention to cut deployment costs while keeping”” https://x.com/cline/status/2021999167875555694

GLM-5 just launched — now available in Qoder. On Qoder Bench — our benchmark for real-world software engineering tasks — GLM-5 outperforms Sonnet 4.5 and approaches Opus 4.5. At a fraction of the cost. High demand expected — brief waits possible during peak hours. Scaling in”” https://x.com/qoder_ai_ide/status/2021639227814092802

GLM-5, the latest frontier open model from @Zai_org, is available now on Modal. We partnered with https://t.co/nhqgwNEWkB to release an endpoint that will be free for a limited time.”” https://x.com/modal/status/2021645783733616800

Pony Alpha Stealth model reveal: GLM-5 from @Zai_org GLM-5 is a new 744B foundation model for coding and agentic usecases. It achieves SOTA scores on top agent benchmarks, and has been used successfully in many agent flows during its Stealth period. Live now on OpenRouter!”” https://x.com/OpenRouter/status/2021639702789730631

Average Throughput of GLM-5 on Openrouter is 14 tps”” https://x.com/scaling01/status/2021981416452764058

Build more. Spend less. GLM-5 is now on YouWare. Landing pages, portfolios, prototypes. All handled fast, with a 200K context window. Save your premium credits for the big builds.”” https://x.com/YouWareAI/status/2021982784948936874

Congrats @Zai_org on GLM-5! Love the permissive MIT license (vs K2.5’s modified MIT). Haven’t chatted with it yet so no vibes, but from the numbers I’m not compelled to switch from @Kimi_Moonshot K2.5: • Similar evals, but GLM-5’s are at bf16 while K2.5’s are at int4 – GLM-5″” https://x.com/QuixiAI/status/2021651135615184988

Day-0 with @Zai_org: GLM-5 is live on DeepInfra 🔥 Built for long-horizon agents that plan, orchestrate, and self-correct. Serving ~100 TPS at launch and as usual the best price on the market!”” https://x.com/DeepInfra/status/2021666854088110318

GLM 5 is 2x the total parameter of GLM 4.5 + deepseek sparse attention for efficient long context this is going to be a crazy model”” https://x.com/eliebakouch/status/2020824645868630065

GLM MoE DSA”” is landing in transformers 👀”” https://x.com/xeophon/status/2020815776890909052

GLM-4.7-Flash-GGUF is now the most downloaded model on @UnslothAI.”” https://x.com/Zai_org/status/2021207517557051627

GLM-5 already available on OpenRouter (with even lower prices)”” https://x.com/scaling01/status/2021637257103651040

GLM-5 has a 200k context length and maximum output of 128k”” https://x.com/scaling01/status/2021628691357298928

GLM-5 is massive. 745B params. LETS FUCKING GOOOOO This should be fun!”” https://x.com/scaling01/status/2020840989947298156

GLM-5 Pricing $1 and $3.2 Output There is also a GLM-5 Code variant that is more expensive👀 almost 8 times cheaper than Opus”” https://x.com/scaling01/status/2021628971939418522

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It’s quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to @ActuallyIsaak and @kernelpool for the port.”” https://x.com/awnihannun/status/2022007608811696158

https://t.co/ctlyPtiB3j GLM-5 architecture is out: ~740B parameters ~50B active 78 layers, MLA attention lifted from DeepSeek V3, plus DeepSeek V3.2’s sparse attention indexer for 200k context. Basically DeepSeek V3 scale with DSA bolted on.”” https://x.com/QuixiAI/status/2021111352895393960

GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦”” https://x.com/mervenoyann/status/2021642658188538348

DeepSeek V4-lite, Minimax 2.5, GLM-5 what a bloodbath will Qwen accelerate the release of 3.5?”” https://x.com/teortaxesTex/status/2021586965594857487