Agents and Copilots: AI News Week Ending 03/20/2026

Agents and Copilots: AI News Week Ending 03/20/2026

March 20, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the deep midnight navy car hood, shallow depth-of-field sky background, chrome pedestal base, dramatic upward angle, and automotive advertisement lighting exactly as shown. Replace only the Mercedes star with a photorealistic chrome compass rose hood ornament at the same scale and position on the pedestal–eight pointed cardinal directions in polished metal. Add bold white sans-serif text ‘AGENTS’ across the upper portion of the image as a headline.

@chrlschn – MCP is Dead; Long Live MCP! https://chrlschn.dev/blog/2026/03/mcp-is-dead-long-live-mcp/

Built PrediHermes ✨ a Hermes Agent skill + companion WorldOSINT/MiroFish forks for geopolitical prediction. @NousResearch It pulls 54+ OSINT modules, uses Polymarket to find contracts with clear resolution criteria, then runs MiroFish multi-agent sims to model individual
https://x.com/WeXBT/status/2033391568426598608

good concepts here in fastmcp on distributing skills via MCP resources, i think this might be the right approach solves the problem of skills going out of date if you load them fresh each time + can tie them to tools easier more to explore here
https://x.com/RhysSullivan/status/2034125767987368242#m

Hermes Agent v0.3.0 ☤ 248 PRs. 15 contributors. 5 days. • Real-time streaming across CLI and all platforms • First-class plugin architecture, package and share tools+commands+skills • /browser connect to live Chrome via CDP • @vercel AI Gateway model provider •
https://x.com/NousResearch/status/2033877040399831478

MCP was a mistake. Long live CLIs.
https://x.com/skirano/status/2034269154404868314#m

Karpathy’s Autoresearch is bottlenecked by a single GPU. We removed the bottleneck. We gave the agent access to our K8s cluster with H100s and H200s and let it provision its own GPUs. Over 8 hours: • ~910 experiments instead of ~96 sequentially • Discovered that scaling model
https://x.com/skypilot_org/status/2034681533051855173

Browser Use is now an official provider for the browser tool in Hermes-Agent – Update to try it out 😉 Use `hermes tools` to set the browser backend. (Note: this requires an API key with them)
https://x.com/Teknium/status/2033811117521408078

Did a small local anime server tool powered by Hermes Agent (@NousResearch). You can: – fully sync your anime list – download torrents from different sources – add tracking & scheduled downloads – auto-manage disk usage – serve to any device within your local wifi and more!
https://x.com/rodmarkun/status/2033307437088850102

I’m not saying Copilot diagnosed me. I’m saying it helped me ask for the right test. A test no doctor had ordered for me in twenty years.”” This is why I’m so passionate about AI & healthcare.
https://x.com/mustafasuleyman/status/2033655842919395723

BREAKING 🚨: MiniMax released MiniMax M2.7, a new self-evolving model, achieving a score of 56.22% on SWE-Bench Pro. M2.7 was used for building complex agent harnesses during its own development. Users can now access MiniMax M2.7 via APIs and MiniMax Agent.
https://x.com/testingcatalog/status/2034250919345377604#m

During the iteration process, we also realized that the model’s ability to recursively evolve its harness is equally critical. Our internal harness autonomously collects feedback, builds evaluation sets for internal tasks, and based on this continuously iterates on its own
https://x.com/MiniMax_AI/status/2034315323109953605#m

Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 – Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents
https://x.com/MiniMax_AI/status/2034315320337522881#m

MiniMax Global Announces Full Year 2025 Financial Results – MiniMax News | MiniMax https://www.minimax.io/news/minimax-global-announces-full-year-2025-financial-results

Minimax M2.7 released! And its a big one Highlights: Self-evolving – first model that helped build itself, running 100+ autonomous optimization loops during its own RL training (30% internal improvement). Strong coder – 56.2% on SWE-Pro (near Opus 4.6), 55.6% on VIBE-Pro,
https://x.com/kimmonismus/status/2034269026353082422#m

MiniMax M2.7: Early Echoes of Self-Evolution – MiniMax News | MiniMax https://www.minimax.io/news/minimax-m27-en

Read about the feature and how to build your own channels for Claude Code on our docs!
https://x.com/neilhtennek/status/2034762489951658190

Today we’re launching channels for Claude Code as an experimental feature! A few days ago, I was fed up that I couldn’t text Claude on the go like I would any of my friends. But those days are gone! Claude is saved in my contacts and I can keep shipping on the go.
https://x.com/neilhtennek/status/2034762196576805123

We invited Claude users to share how they use AI, what they dream it could make possible, and what they fear it might do. Nearly 81,000 people responded in one week–the largest qualitative study of its kind. Read more:
https://x.com/AnthropicAI/status/2034302152945144166#m

We’re shipping a new feature in Claude Cowork as a research preview that I’m excited about: Dispatch! One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. To try it out, download Claude Desktop, then pair
https://x.com/felixrieseberg/status/2034005731457044577

Worth Reading: 1) What are Skills? They’re not just text files. They’re folders that can include scripts, assets, data, etc. A skill is a folder…think of the entire file system as a form of context engineering. 2) The Description Field Is For the Model The description field
https://x.com/claude_code/status/2034335585339375855#m

Hey Excel agents from Claude, OpenAI & MS Copilot: “”make me a working strategy game in excel, it should have some form of graphics”” Claude made a board and acted as game master, Copilot created a board but no game, ChatGPT built a working game with formulas with a “”smart”” enemy.
https://x.com/emollick/status/2033372471395512566

Florida man sold his house in just 5 days after letting ChatGPT handle the entire process instead of a real estate agent

The AI handled pricing, marketing, showings, and even helped draft the contract https://x.com/i/birdwatch/t/2032864183918690675?source=6

Introducing the all new vibe coding experience in @GoogleAIStudio, feating: – One click database support – Sign in with Google support – A new coding agent powered by Antigravity – Multiplayer + backend app support and so much more coming soon!
https://x.com/OfficialLoganK/status/2034656376450908203

Personal Intelligence in AI Mode and Gemini expands in the U.S. https://blog.google/products-and-platforms/products/search/personal-intelligence-expansion/

Vibe Code to production with Google AI Studio https://blog.google/innovation-and-ai/technology/developers-tools/full-stack-vibe-coding-google-ai-studio/

Measuring Progress Towards AGI: A Cognitive Framework https://blog.google/innovation-and-ai/models-and-research/google-deepmind/measuring-agi-cognitive-framework/

Design UI using AI with Stitch from Google Labs https://blog.google/innovation-and-ai/models-and-research/google-labs/stitch-ai-ui-design/

Exclusive: Early look at upcoming design tool from Google https://www.testingcatalog.com/exclusive-early-look-at-upcoming-vibe-design-tool-from-google/

Announced in Jensen’s keynote today: LangChain frameworks have crossed 1B downloads. We’re excited to join the NVIDIA Nemotron Coalition to help shape the open models that power these agents. ➡️ Read the announcement: https://t.co/CWlbAzhlXy ➡️ Check out the docs:
https://x.com/LangChain/status/2033788913937195132

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 — a Dell Pro Max with GB300. 💚 We can’t wait to see what you’ll create @karpathy! 🔗 https://t.co/8ct5QZ3frS @DellTech
https://x.com/NVIDIAAIDev/status/2034291235041554871

Jensen is cementing the idea that Nvidia-powered AI is now the backbone of every major industry. He said robotics alone will be a $50 trillion industry.
https://x.com/TheHumanoidHub/status/2033619022508659118

Jensen: “Nvidia is the first vertically integrated but horizontally open company.” This strategy positions Nvidia as the backbone of robotics without stifling innovation. Vertical integration ensures cutting-edge performance on each layer of the AI stack. Horizontal openness
https://x.com/TheHumanoidHub/status/2033622691408974133

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics https://huggingface.co/blog/nvidia/physical-ai-for-healthcare-robotics

Announcing NVIDIA DLSS 5, an AI-powered breakthrough in visual fidelity for games, coming this fall. DLSS 5 infuses pixels with photorealistic lighting and materials, bridging the gap between rendering and reality. Learn More → https://x.com/NVIDIAGeForce/status/2033617732147810782

DLSS 5 is completely mind blowing. The neural rendering model with photoreal lighting and materials is a generation step up in visual fidelity. Gaming with DLSS 5 feels like future tech, but its possible now. It is truly incredible. 🤯
https://x.com/GeForce_JacobF/status/2033615891045454112

DLSS 5 might be the moment where the anti AI pendulum starts swinging back. Many in the 3D community who were against generative AI are now pushing back on the “”everything is AI slop”” crowd. The pendulum swung too far and they can feel it. Nice to see the rebalancing.
https://x.com/bilawalsidhu/status/2034281398052274666

Here’s everything we know about Nvidia’s “”greatest leap in graphics since real-time ray tracing”” You can see Digital Foundry’s jaw drop in this reaction after they just saw DLSS 5.0: – Will ship in Fall of 2026! – Demo ran 4k on 2 5090’s but is already running on single GPU in
https://x.com/Grummz/status/2033641075806769382

GR00T is moving away from VLM-based backbones in favor of integrated world models. Jensen Huang teased GR00T N2 during his keynote; NVIDIA’s next-gen foundation model built on DreamZero research. Utilizing a new world-action model architecture, it succeeds at novel tasks in
https://x.com/TheHumanoidHub/status/2034279221372321940

What if a robot could simulate the physical world from a single image. [📍Bookmark Paper & GitHub for later] PointWorld-1B from Stanford and NVIDIA is a large 3D world model that predicts how an entire scene will move, given RGB-D input and robot actions. The key idea is
https://x.com/IlirAliu_/status/2032895393407660380

Breaking: 1 trillion revenue for NVIDIA in 2027 Jensen Huang: “One year after last GTC, right here where I stand… I see, going down so much, through 2027. At least… one trillion dollars, you know? Now, does it make any sense? I’m certain computer demand will be much
https://x.com/TheTuringPost/status/2033622628385362068

Jensen just said NVIDIA’s $1T projection for 2025-27 covers only Blackwell and Rubin to keep it consistent with the previous projection. He mentioned he could have included Groq in that number: “”so if I would’ve included that, theoretically, not actually, but theoretically,
https://x.com/TheHumanoidHub/status/2033990614824665421

Nvidia targets data center revenue of $1+ trillion for 2025-2027. That’s already quite ridiculous, with the AI physical world only in its zeroth innings . $NVDA
https://x.com/TheHumanoidHub/status/2033627322331660784

A breakthrough in real-time video generation. As a research preview developed with @NVIDIA and shared at @NVIDIAGTC this week, we trained a new real-time video model running on Vera Rubin. HD videos generate instantly, with time-to-first-frame under 100ms. Unlocking an entirely
https://x.com/runwayml/status/2034284298769985914#m

NVIDIA GTC 2026 Keynote: Everything That Happened in 12 Minutes – YouTube https://www.youtube.com/watch?v=X2i_8O75_Os

Subagents are now supported in Codex. They’re very fun and make it possible to get large amounts of work done *quickly*:
https://x.com/gdb/status/2033757784437895367

5.3 to 5.4 is what i would have expected to warrant a jump to GPT-6
https://x.com/yacineMTB/status/2033291560217923803

A knowledge-work platform built around GPT-5.4 Pro level intelligence would be really useful. The gap between other models and what Pro can do on complex intellectual work remains stark. I would love to have access in a Codex-like platform with shared file spaces, subagents, etc
https://x.com/emollick/status/2033959257196966360

GPT-5.4 mini matters for subagents because it changes what feels worth handing off. The parent thread should hold the architecture, plan, and progress narrative. Fast subagents can explore the repo, check hypotheses, and preserve the parent thread’s limited attention.
https://x.com/nickbaumann_/status/2034134875234832540#m

i mean this story is insane. man used chatgpt to sell his house in 5 DAYS. got 5 offers in 72 hours. no real estate agents. saved so much money doing it too. he used AI to: > price the house (researched neighboring properties for sale) > wrote up the legal contracts (saving
https://x.com/cryptopunk7213/status/2033194801852567620?s=46

Man uses ChatGPT to sell his Cooper City home – NBC 6 South Florida https://www.nbcmiami.com/news/local/innovation-on-6/man-uses-chatgpt-to-sell-his-cooper-city-home-it-exceeded-our-expectations/3778919/

OpenAI preps for IPO in 2026, says ChatGPT must be ‘productivity tool’ https://www.cnbc.com/2026/03/17/openai-preps-for-ipo-in-2026-says-chatgpt-must-be-productivity-tool.html

AI really can help education: Randomized controlled experiment on high school students found a GPT-4o powered tutor that personalized problems for students raised final test scores by .15 SD, “”equivalent to as much as six to nine months of additional schooling by some estimates””
https://x.com/emollick/status/2033773791688433708

An AI consultant with no biology training used ChatGPT and AlphaFold to create a personalized mRNA cancer vaccine for his rescue dog. Tumor shrunk by half. UNSW structural biologist Dr. Kate Michie: “It’s exciting to me that someone who’s not a scientist has been able to do
https://x.com/TheRundownAI/status/2032843584869708105

this is actually insane > be tech guy in australia > adopt cancer riddled rescue dog, months to live > not_going_to_give_you_up.mp4 > pay $3,000 to sequence her tumor DNA > feed it to ChatGPT and AlphaFold > zero background in biology > identify mutated proteins, match them to
https://x.com/IterIntellectus/status/2032858964858228817

GPT-5.4 mini approaches the performance of the larger GPT-5.4 model on several evaluations, including SWE-Bench Pro and OSWorld-Verified.
https://x.com/OpenAIDevs/status/2033953828387885470

GPT-5.4 mini is available today in the API, Codex, and ChatGPT. In the API, it has a 400k context window. In Codex, it uses only 30% of the GPT-5.4 quota, letting you handle simpler coding tasks for about one-third of the cost. GPT-5.4 nano is only available in the API.
https://x.com/OpenAIDevs/status/2033953840312291603

GPT-5.4-mini 2.25 times more expensive than GPT-5-mini $0.75 Input $4.5 Output 400k
https://x.com/scaling01/status/2033955279079907511

Introducing GPT-5.4 mini and nano | OpenAI https://openai.com/index/introducing-gpt-5-4-mini-and-nano/

We’re introducing GPT-5.4 mini and nano, our most capable small models yet. GPT-5.4 mini is more than 2x faster than GPT-5 mini. Optimized for coding, computer use, multimodal understanding, and subagents. For lighter-weight tasks, GPT-5.4 nano is our smallest and cheapest
https://x.com/OpenAIDevs/status/2033953815834333608

I have consecutively spent millions of tokens today with Hermes without breaking anything, where Openclaw would’ve needed several nudges. Both have their merits, but for prod, Hermes nailed it. Amazing job @Teknium
https://x.com/populartourist/status/2034653545287348266

i’ve been using @NousResearch Hermes Agent for about a week and my initial thoughts are it just works out of the box I find it’s memory and learning to be far superior to OpenClaw without augmenting it with QMD or any additional memory systems it’s early and my OC’s handle A
https://x.com/austin_hurwitz/status/2033552632241857002

Told my agent to create a fresh OpenClaw agent in digital ocean according to the instructions. Ran into like 6 issues with the flow from the official docs. Did the same for the @NousResearch Hermes agent – Claude one-shotted it 🙂
https://x.com/0xMasonH/status/2033608276286243323

trying Hermes agent. setup was significantly easier than openclaw. I can see its already basically a leaner and more opinionated openclaw
https://x.com/fuckyourputs/status/2033503910376431728

After using it a bit, Claude Cowork Dispatch covers 90% of what I was trying to use OpenClaw for, but feels far less likely to upload my entire drive to a malware site.
https://x.com/emollick/status/2034067677157679379

OpenClaw, Anthropic version. Basically: assign work from anywhere, come back to finished results. This is what “”AI assistant”” was always supposed to mean.
https://x.com/fdaudens/status/2034080669119152238

New @openclaw beta is up: it comes with the new live browser control that Google added in latest Chrome! enable via chrome://inspect#remote-debugging Your clanker will know when to use what, or you can ast it. new “”user”” profile session is there!
https://x.com/steipete/status/2032686376932491363

Introducing LangSmith Fleet: an enterprise workspace for creating, using, and managing your fleet of agents. Fleet agents have their own memory, access to a collection of tools and skills, and can be exposed through the communication channels your team uses every day. Fleet
https://x.com/LangChain/status/2034679590250258855

Introducing LangSmith Fleet. Agents for every team. → Build agents with natural language → Share and control who can edit, run, or clone each agent → Manage authentication with agent identity → Approve actions with human-in-the-loop → Track and audit actions with tracing in
https://x.com/LangChain/status/2034694530478612777

LangChain just open-sourced a replica of Claude Code. It’s called Deep Agents. MIT licensed, model-agnostic, and fully inspectable – so you can finally see exactly how coding agents like Claude Code are built under the hood. The black box just became a textbook. GitHub:
https://x.com/RoundtableSpace/status/2033955271333011829

LangChain just open-sourced Deep Agents–an agent harness that’s opinionated and ready-to-run out of the box. Instead of wiring up prompts, tools, and context management yourself, you get a working agent immediately and customize what you need. It’s an MIT-licensed system that’s
https://x.com/itsafiz/status/2033591253955449289

Computer can now take full control of Comet to complete tasks. When you’re in Comet, Computer spins up a browser agent that can access any site or logged‑in app with your permission, without the need for connectors or MCPs. Available to all Computer users on Comet.
https://x.com/perplexity_ai/status/2033598416962592813

Perplexity launches Perplexity Health agent in US https://www.testingcatalog.com/perplexity-launches-perplexity-health-agent-in-us/

Introducing Perplexity Health https://www.perplexity.ai/hub/blog/introducing-perplexity-health

2/ I spent an hour with @BarakLenz, CTO at @AI21Labs, and one thesis kept coming back: we’re not building agents. We’re building an AI Operating System. An OS manages resources, tracks what’s running, and decides when to spawn or kill work. That’s the bar.
https://x.com/YuvalinTheDeep/status/2034624197528269085

3x in 3 months: Cursor @ $28b, Cognition + Windsurf @ $10b | AINews https://news.smol.ai/issues/25-07-24-cogsurf-cursor

a lot of engineering orgs (Stripe, Ramp, Coinbase) are building internal cloud coding agents we’re releasing a fully OSS one today – every company should have the power of cloud agents at their fingertips
https://x.com/hwchase17/status/2033977192053612621

Agent Auth https://agent-auth-protocol.com/

AgentKit提供開始：エージェント主体のWebに向けた人間証明 https://world.org/ja-jp/blog/announcements/now-available-agentkit-proof-of-human-for-the-agentic-web

btw emerging consensus is that identity-based authz for ai is the most important solution for security, esp if you want to break the binary decision between HITL-everything and –dangerously-skip-permissions keycard is the leading voice in this and now supports all koding agents
https://x.com/swyx/status/2034667846505214295

Composer 2 is now available in Cursor.
https://x.com/cursor_ai/status/2034668943676244133

Composer 2 is out! Cursor is an example of a new type of company, not a pure app maker and not a model provider. Our aim is to build the most useful coding agents by combining the best API models and our domain-specific models.
https://x.com/mntruell/status/2034729462211002505

Composer 2 marks the one-year anniversary of our large model training efforts. Since then, we’ve built an exceptionally talent-dense team of ~40 people with some of the best researchers and engineers from the labs, academia, industry, and more heterogeneous backgrounds. And we
https://x.com/amanrsanger/status/2034704792925479356

Context engineering is the new prompt engineering — and if you’re building AI agents, you need to understand the difference and why parsing your data correctly sits at the heart of it Andrej Karpathy put it well: context engineering is “”the delicate art and science of filling
https://x.com/llama_index/status/2034347384973762694#m

Devin can now manage a team of Devins. Devin will break down large tasks and delegate them to parallel Devins that each run in their own VM. Over time, Devin gets better at breaking down and managing tasks for your codebase. Available now for all users.
https://x.com/cognition/status/2034679897084264659

E-Z guide to getting Honcho memory system enabled and setup in Hermes Agent
https://x.com/Teknium/status/2033563976219709766

Excited to release: AgentUI > a fresh chat interface – natively multi-agent > agents coordinate via reports and figures > plug+play any open/closed model as sub-agent > agents specialise in code, web search, multimodal… Try it here: https://x.com/lvwerra/status/2034666400007016590

fun fact about the Composer 2 RL run: we ran training distributed across 3 (sometimes 4) different clusters around the world using some secret sauce we built together.
https://x.com/ellev3n11/status/2034778708163404102

Fwiw, I think everyone is going to do this now. It’s a much better UI for the “new way of coding”. Cursor’s main UI has been in a rough state for like a year now. I really like that they are resetting. Overdue tbh. Also really like that they support ACP, might have T3 Code
https://x.com/theo/status/2034780545134256205

GitHub – psi-oss/get-physics-done: The first open-source agentic AI physicist, by Physical Superintelligence PBC (PSI). · GitHub https://github.com/psi-oss/get-physics-done

GitHub already has millions of repos full of procedural knowledge. The work introduces a framework for extracting agent skills directly from open-source repos. The pipeline analyzes repo structure, identifies procedural knowledge through dense retrieval, and translates it into
https://x.com/dair_ai/status/2033546855376916735

Heygen’s APi documentation is a glimpse of how to write for your two audiences: humans and agents. (though I think their llms.txt file could do a lot more to get AIs “excited” to use their product in creative ways by explaining some stuff in English, rather than just tech specs)
https://x.com/emollick/status/2034408188133728379

hf automatically serves markdown versions of papers when agents request, thereby saving tokens & improving content clarity. also, we have added hf papers search & get content SKILLS
https://x.com/mishig25/status/2034274342343733295#m

Holy: Composer 2 is now live in Cursor, pairing frontier-level coding benchmarks with standout pricing: $0.50 per million input tokens and $2.50 per million output tokens. Cursor reports major jumps over prior versions across CursorBench (61.3), Terminal-Bench 2.0 (61.7), and
https://x.com/kimmonismus/status/2034667869816979645

How Do You Want to Remember? | Zak El Fassi | Systems Engineering for the Agentic AI Age https://zakelfassi.com/how-do-you-want-to-remember

I get why AI labs are so focused on software development (it helps them get recursive improvement, and also they are coders so they think coding is the most vital thing), but there are 9.5x more managers than there are coders & efforts to build tools for them are very nascent.
https://x.com/emollick/status/2033745890762952834

Introducing Composer 2 · Cursor https://cursor.com/blog/composer-2

Introducing Imagine Gallery https://blog.character.ai/imagine-gallery/

Introducing LiteParse – the best model-free document parsing tool for AI agents 💫 ✅ It’s completely open-source and free. ✅ No GPU required, will process ~500 pages in 2 seconds on commodity hardware ✅ More accurate than PyPDF, PyMuPDF, Markdown. Also way more readable – see
https://x.com/jerryjliu0/status/2034665976428724267

Introducing Mamba-3 🐍 Inference speeds are more important than ever, driven by the rise in agents and inference-heavy RL
https://x.com/togethercompute/status/2033956365165859026

Introducing the Paper Pages skill! Simply paste this SKILL.md, so your coding agent knows how to work with @huggingface papers Ask it to summarize papers, search papers, or list linked models or datasets
https://x.com/NielsRogge/status/2034287785297735785#m

Many don’t know the power of visualizing your complex and hard-to-understand research papers. Hermes Agent eases it. @NousResearch @Teknium
https://x.com/t105add4_13/status/2033364535852360069

MCP Server Architecture Determines AI Accuracy–Not Just the Model – CData Software https://www.cdata.com/lp/ai-accuracy-whitepaper/

mcp: model cli protocol
https://x.com/denisyarats/status/2034067933975187586#m

My submission for @NousResearch HackTUI started as a small SIEM I made in Elixir on BEAM. Gave the project to Hermes & asked it to refactor it into an umbrella system, build its own MCP server, integrate Jido, & turn it into a realtime purple team platform.
https://x.com/aylacroft/status/2033429386427351043

New category emerging: Headless SaaS Not infrastructure as a service / platform as a service Traditional software (Photoshop, Slack, Jira) rebuilt with agent-first APIs. – No UI – Programmatic access – Essentially the same product with different interface Entirely new
https://x.com/ivanburazin/status/2034042095548187072#m

One of the hardest problems with using AI agents to automate meaningful document work (contracts, KYC, diligence, claims, and more) is not actually building the agent, but building the UI/UX audit trail so the human can understand decisions linking back to the source documents –
https://x.com/jerryjliu0/status/2034047686262087720#m

Open Models, Open Runtime, Open Harness – Building your own AI agent with LangChain and Nvidia Claude Code, OpenClaw, Manus and other agents all use the same architecture under the hood. They consist of a model, a runtime (environment), and a harness. In this video, we show how
https://x.com/hwchase17/status/2034297125417460044#m

OpenViking – filesystem memory for AI agents It gives agents a structured navigable context system that: – replaces flat vector storage with a filesystem (viking://) – unifies memory, resources, and skills – loads context in layers (L0/L1/L2) to save tokens – retrieves info via
https://x.com/TheTuringPost/status/2034381560502452667

Redesigning the Service Role for the AI Agent Era https://asapp.wistia.com/live/events/w0j0re9b9u

Scaling Karpathy’s Autoresearch: What Happens When the Agent Gets a GPU Cluster | SkyPilot Blog https://blog.skypilot.co/scaling-autoresearch/

Should there be a Stack Overflow for AI coding agents to share learnings with each other? Last week I announced Context Hub (chub), an open CLI tool that gives coding agents up-to-date API documentation. Since then, our GitHub repo has gained over 6K stars, and we’ve scaled from
https://x.com/AndrewYNg/status/2033577583200354812

The best coding agents still need a little help when your API is moving faster than their training data. We’ve shipped a lot in the last few months: Universal-3 Pro Streaming, Universal-3 Pro, and LLM Gateway to name a few. Features that simply postdate what most coding agents
https://x.com/AssemblyAI/status/2033514383914283118

the deepagents library is basically our starting point for doing harness engineering and shipping agents the internal agents used at the company are built on it (background coding, GTM/SDR, research) there’s primitives we find really useful across our evals and dogfooding like
https://x.com/Vtrivedy10/status/2033608199564067098

Training Composer for longer horizons · Cursor https://cursor.com/blog/self-summarization

We trained Composer to self-summarize through RL instead of a prompt. This reduces the error from compaction by 50% and allows Composer to succeed on challenging coding tasks requiring hundreds of actions.
https://x.com/cursor_ai/status/2033967614309835069

We were able to significantly improve the model quality and cost to serve. These quality improvements come from our first continued pretraining run, providing a far stronger base to scale our reinforcement learning.
https://x.com/cursor_ai/status/2034668950240329837

We’re also sharing an early alpha of our new interface. https://x.com/cursor_ai/status/2034719920710103452

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%… … and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics
https://x.com/antoine_chaffin/status/2034649565614272925

Announcing Copilot leadership update – The Official Microsoft Blog https://blogs.microsoft.com/blog/2026/03/17/announcing-copilot-leadership-update/

LlamaParse Agentic Plus mode now delivers precise visual grounding with bounding boxes for the most challenging document elements. Our latest update brings major improvements to how we handle complex visual content: 📐 Complex LaTex formulas – accurately parse mathematical
https://x.com/llama_index/status/2034300076441633276#m

build agents with LangSmith and let them execute code securely. We’re launching LangSmith Sandboxes today!
https://x.com/samecrowder/status/2034123616720421210#m

more and more agents will write and execute code launching langsmith sandboxes (waitlist to start) to make this easy we’ll be letting people off pretty quickly, so sign up! or dm me
https://x.com/hwchase17/status/2033950657619874217

We just made it dramatically easier for agents to read trending research papers on HF. Let’s go AI powered research!
https://x.com/ClementDelangue/status/2034277529981178007#m

We just released an hf CLI extension to detect the best model/quant for a user’s hardware and then spins up a local coding agent. Time to go local/private/free/fast for your agents thanks to open-source!
https://x.com/ClementDelangue/status/2033982183791108278

We’re launching LangSmith Fleet today! There are some primitives in Fleet that I think will be very useful in a future where agents do a lot of the world’s work – Agent Identity: as more work is specified by humans but done by agents, we need identity + security models that
https://x.com/Vtrivedy10/status/2034690067839521114

Computer can now use your local browser Comet as a tool. Which makes it possible for Computer to do anything, even without connectors or MCPs. This is a unique advantage Computer possesses that no other tool on the market can match.
https://x.com/AravSrinivas/status/2033598960238277059

Can AI agents conduct advanced cyber-attacks autonomously? We tested seven models released between August 2024 and February 2026 on two custom-built cyber ranges designed to replicate complex attack environments. Here’s what we found🧵
https://x.com/AISecurityInst/status/2033562026534953156

“a large jump in agentic” – we agree 🙌 M2.7 is a big step forward in agentic workflows, from tool use to real-world, multi-step execution. Now live on @OpenRouter 🚀
https://x.com/MiniMax_AI/status/2034356786413867182#m

🔍Follow Zhihu contributor toyama nao, a top large model reviewer, to evaluate @MiniMax_AI MiniMax-M2.7’s capabilities in detail!✨ 📌 Basic Info： MiniMax iterates monthly in the Agent-driven model track. As a minor version upgrade, M2.7 carries its new understanding of the
https://x.com/ZhihuFrontier/status/2034543142234628318

DEFAULT and FREE M2.7 on @zocomputer
https://x.com/MiniMax_AI/status/2034348503347171625#m

Early testers are saying that M2.7 has big improvements in emotional intelligence and character consistency 👀
https://x.com/MiniMax_AI/status/2034528945962696948

Great to see M2.7 live on @vercel_dev 🙌 We’re seeing a real shift from simple tool use → multi-step agentic workflows running in production. M2.7 is built for exactly that.
https://x.com/MiniMax_AI/status/2034357583797178841#m

Live Stream Alert with @OpenClaw Thursday 9PM ET We will share an in-depth look at MiniMax M2.7, including early developments in self-evolution and efficient solutions designed to support 100,000 OpenClaw running clusters. 🎁 MiniMax vouchers will also be distributed during
https://x.com/MiniMax_AI/status/2034520321466978488

M2.7 is already up😎 Try it on @kilocode.
https://x.com/MiniMax_AI/status/2034339731660759097#m

M2.7 now live on @yupp_ai 🌸 Feels like a good time to build something new.
https://x.com/MiniMax_AI/status/2034328337527783857#m

M2.7 now on @opencode ⚙️ give it a plan → it runs with it add the loop (check → fix → retry) and things start to feel very agentic
https://x.com/MiniMax_AI/status/2034361282527461473#m

Minimax 2.7 incoming!
https://x.com/kimmonismus/status/2033531736647463151

Minimax 2.7 is available in Hermes Agent through the Minimax Provider, try it today!
https://x.com/Teknium/status/2034658808870621274

MiniMax doubles in Hong Kong debut, marking yet another Chinese AI listing https://www.cnbc.com/2026/01/09/minimax-hong-kong-ipo-ai-tigers-zhipu.html

MiniMax has released MiniMax-M2.7, delivering GLM-5-level intelligence for less than one third of the cost MiniMax-M2.7 from @MiniMax_AI scores 50 on the Artificial Analysis Intelligence Index, an 8-point improvement over MiniMax-M2.5, which was released one month ago. This is
https://x.com/ArtificialAnlys/status/2034313314420019462#m

MiniMax launches M2.7 model on MiniMax Agent and APIs https://www.testingcatalog.com/minimax-launches-m2-7-model-on-minimax-agent-and-apis/

MiniMax M2.7 now live on @Trae_ai Excited to see what you ship. 🙌
https://x.com/MiniMax_AI/status/2034327432124350924#m

MiniMax M2.7: Early Echoes of Self-Evolution
https://x.com/MiniMax_AI/status/2034335605145182659

MiniMax M2.7🆚MiniMax M2.5 – Website about recently released video games The release of M2.7 should be close. MiniMax M2.5 was released two days after it appeared on the Arena
https://x.com/AiBattle_/status/2033503838284447758

MiniMax-M2.7 is now available on Ollama’s cloud. made for coding and agentic tasks 🖥️ Try it inside Claude Code: ollama launch claude –model minimax-m2.7:cloud 🦞 Use it with OpenClaw: ollama launch openclaw –model minimax-m2.7:cloud If you already have OpenClaw
https://x.com/ollama/status/2034351916097106424#m

1M context is now generally available for Opus 4.6 and Sonnet 4.6 | Claude https://claude.com/blog/1m-context-ga

We’re hosting a webinar today at 10am PT that should be very helpful for folks trying to learn how to apply AI to their work. We’ll be sharing best practices on how to use new products like Claude for Excel/Powerpoint. Attendees will also get one month free of Claude Pro.
https://x.com/alexalbert__/status/2034276242317566107#m

OpenAI acquired Astral, the team behind uv, ruff, and ty. Fun fact: Claude is the #6 contributor to uv. Curious if Anthropic will ban them from using Claude since the team is joining OpenAI. Congrats to the Astral team who built incredible Python tools!
https://x.com/Yuchenj_UW/status/2034661120599101498

You shouldn’t have to have a “meeting notes app.” You should have an “AI context & data app” that happens to have great meeting notes. Don’t overpay for things.
https://x.com/zachtratar/status/2034079952757547042#m

@cloneofsimo This is funny! Although in hindsight I think we should give due credit to all the new works that improve on it and scale, think of about deploying in a real training run (solving for memory growth) An advantage Google had was that there was extremely strong folks left alone to
https://x.com/_arohan_/status/2033589201363735004

Google Colab now has an open-source MCP server that lets you use Colab runtimes with GPUs from any local AI agent. 🔧 Tools to execute_code, connect, notebook editing ☁️ Run Python on cloud GPUs directly from agents 📝 Can create .ipynb files and add code/markdown 🔌 Works with
https://x.com/_philschmid/status/2034197315661988010#m

Google is reportedly testing a Gemini app for Mac https://www.engadget.com/ai/google-is-reportedly-testing-a-gemini-app-for-mac-203703372.html

Google’s Personal Intelligence feature is expanding to all US users | TechCrunch https://techcrunch.com/2026/03/17/googles-personal-intelligence-feature-is-expanding-to-all-us-users/

Introducing a new upgraded vibe coding experience in @GoogleAIStudio. You can now turn any idea into functional, production ready apps. Build multiplayer games, collaborative tools, apps with secure log-ins and more.
https://x.com/Google/status/2034658419202744614

Lots of great Gemini API updates shipping today 🛠️ 1. Built-in tools (search, maps, file search) now work with function calling 2. We now do context circulation with built-in tools for better model performance 3. Grounding with Google Maps now works with Gemini 3!!
https://x.com/OfficialLoganK/status/2034309073651347821

Tomorrow we will unveil the all new vibe coding experience in @GoogleAIStudio, the team has spent 4 months rebuilding it all from scratch and smoothing out rough edges to help everyone bring their ideas to life. This is a big step forward, but just the start : )
https://x.com/OfficialLoganK/status/2034347641740337653

vibe coding in AI Studio just got a major upgrade 🚀 • multiplayer: build real-time games & tools • real services: connect live data • persistent builds: close the tab, it keeps working • pro UI: shadcn, Framer Motion & npm support we can’t wait to see what you build!
https://x.com/GoogleAIStudio/status/2034655113961455651

We shipped one of the most requested Gemini API features! 🥳 You can now combine built-in tools (Google Search, URL Context,…) with your own functions custom in a single API call. Gemini orchestrates everything: 🔧 Combine Google Search, Google Maps, File Search or Url Context
https://x.com/_philschmid/status/2034308856885481791#m

Google Maps 3d basemap and navigation experience just became a lot more immersive 😍
https://x.com/bilawalsidhu/status/2032122828992962704

Introducing our biggest upgrade to @googlemaps since the original launch, featuring Ask Gemini (with personalization), Immersive Navigation, and much more!! 🗺️
https://x.com/OfficialLoganK/status/2032101245763149908

it’s all over when google realizes the treasure trove that is street view + aerial data and launches a version of genie grounded in the real world…
https://x.com/bilawalsidhu/status/2033954619181654114

TSMC_Equity_Research_Report.docx – Google Docs https://docs.google.com/document/d/1ieBAOr8jOL36MTDCmQWakLdYjydDjCts/edit

LLMs are now unlocking a new level of scientific reasoning. We partnered with top experts to test 6 LLMs on high-temperature superconductivity, an open area of inquiry in condensed matter physics. Our case study found that curated, closed-system models were the clear winner,
https://research.google/blog/testing-llms-on-superconductivity-research-questions/

Finishing a video episode of Attention Span about super interesting announcement from #NVIDIAGTC
https://x.com/TheTuringPost/status/2033568823396430101

New Scaling Law? What “Agentic Scaling”” Is – Inside NVIDIA’s Biggest Idea at GTC 2026
https://x.com/TheTuringPost/status/2033689291419734102

NVIDIA’s Nemotron 3 is an architectural response to the 2 pressures: – Long-context cost as agentic interactions scale – Repeated reasoning cost from invoking full models for small subtasks Nemotron 3 proposes several design decisions to solve this: ▪️ Hybrid architecture:
https://x.com/TheTuringPost/status/2034668980892479993

NemoClaw – NVIDIA’s contribution to the emerging OpenClaw ecosystem and one of the biggest announcements at NVIDIA GTC It’s a framework for long-running autonomous agents. ▪️ The idea: Install OpenClaw together with Nemotron models and OpenShell (NVIDIA’s new security runtime)
https://x.com/TheTuringPost/status/2034389444875428043

💚🤗💚 Jensen showing @huggingface during GTC keynote, where @NVIDIAAI dropped amazing new open models, datasets and blogs! Some of my favorites, links in comments: 🧠 Nemotron 3 Super 120A12B – Reasoning LLM 🏥 Open-H-Embodiment – Healthcare Robotics Dataset 🩻
https://x.com/jeffboudier/status/2033959279510884631

Jensen Huang: “It is now one of the recruiting tools in Silicon Valley. How many tokens comes along with my job?” @NVIDIAGTC
https://x.com/TheTuringPost/status/2033639746128515518

NVIDIA’s strategy in one picture @NVIDIAGTC
https://x.com/TheTuringPost/status/2033620574694752678

Robotics research is accelerating fast, especially around simulation. Factory deployment still isn’t. The gap between simulation and real production lines remains one of the biggest bottlenecks in manufacturing automation. That’s why @ABBRobotics’s partnership with @NVIDIA
https://x.com/IlirAliu_/status/2033381389232689529

Second day! “Technology Behind Robotic Characters”, session at @nvidia GTC. Moritz Baecher on how @Disney Imagineering builds believable physical AI: Many robotics teams struggle to move from digital animation to stable physical movement. Their approach bridges that gap. The
https://x.com/IlirAliu_/status/2033980181413827053

With legendary @Scobleizer and @wschenk #nvidiagtc @NVIDIAGTC
https://x.com/TheTuringPost/status/2033574233360699881

And 2.3 years later we have DLSS on steroids
https://x.com/bilawalsidhu/status/2033752195095535801

DLSS 5 casually solved the fancy coat of paint part of this vision
https://x.com/bilawalsidhu/status/2034131183353643289

DLSS 6 mode on about to take greyboxed 3d assets to final render. Ai video-to-video foreshadowed this; many said it could never happen in real time. Yet here we are.
https://x.com/bilawalsidhu/status/2033898489952841763

So proud of DLSS5: Fully generative neural rendering, in real-time, in real games. Mind-blowing realism. A whole new generation of real-time graphics. A decade of continuous research and development. Coming soon to PCs everywhere. 💚
https://x.com/ctnzr/status/2033613807105544666

Jensen Huang’s view on autonomous vehicles is pretty straightforward: the “automotive is less than 1% of your business” number misses what is actually happening. NVIDIA is selling three computers: – training systems – simulation and synthetic data – the AV system in the car
https://x.com/TheTuringPost/status/2033992848203514225

Been so much fun cooking OpenShell and NemoClaw with the @NVIDIAAI folks! 🙏🦞 Huge step towards secure agents you can trust. What’s your OpenClaw strategy?
https://x.com/steipete/status/2033641463104323868

GTC 2026 News | NVIDIA Newsroom https://nvidianews.nvidia.com/online-press-kit/gtc-2026-news

Jensen says he can’t think of a company building robots that isn’t working with Nvidia.
https://x.com/TheHumanoidHub/status/2033642974492659894

NVIDIA GTC 2026: Live Updates on What’s Next in AI | NVIDIA Blog https://blogs.nvidia.com/blog/gtc-2026-news/

Developers used to argue about programming languages; now they argue about harnesses. NemoClaw is NVIDIA’s answer to your OpenClaw safety woes — zero permissions by default, sandboxed subagents, private inference enforced at the infra layer. Here’s a guide on how to start:
https://x.com/baseten/status/2034649896523874356

Go from “”hello world”” to “”hello claw!”” 🦞 We’re hosting a Build-A-Claw extravaganza in the #NVIDIAGTC Park Mon-Thur where you can BYOD or buy a DGX Spark on-site and our NVIDIA experts will help you install @OpenClaw. See you there! 🙌 Full details 👉 https://x.com/NVIDIAAIDev/status/2032847578404888907

We’re going live at #NVIDIAGTC in 30 minutes. ⏱️ Join us for GTC Live at 8 a.m. PT as we get ready for Jensen Huang’s keynote 11 a.m. Featuring industry leaders from: @bfl_ml, @Cadence, @CaterpillarInc, @cohere, @CoreWeave, @DellTech, @EdisonSci, @FireworksAI_HQ, @IBM,
https://x.com/nvidia/status/2033551362210865371

🚀 Live from @NVIDIAGTC, we’re releasing Holotron-12B! Developed with @nvidia, it’s a high-throughput, open-source, multimodal model engineered specifically for the age of computer-use agents. Get started today! 🤗Hugging Face: https://t.co/SyAuqLIacS 📖Technical Deep Dive:
https://x.com/hcompany_ai/status/2033851052714320083

AI is already redesigning chip design itself! And the biggest bottleneck left is validation. Here is Bill Dally describing to @JeffDean how @nvidia uses AI to design chips: “We’re already using AI across multiple parts of the chip design process, and it’s delivering real
https://x.com/TheTuringPost/status/2034413469542588613

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale | NVIDIA Technical Blog https://developer.nvidia.com/blog/nvidia-dynamo-1-production-ready/

With Nemotron 3 Nano 4B in the NVIDIA Nemotron 3 family, llama.cpp users get a compact model for action-taking conversational personas, available across NVIDIA GPU-enabled systems and @NVIDIA_AI_PC
https://x.com/ggerganov/status/2033947673825337477

The frontier has increasingly shifted to hybrid models – from Qwen to Kimi-Linear and now with NVIDIA’s Nemotron-3 Super – that rely on a strong linear sequence model. Today we release Mamba-3, the most powerful linear model to date.
https://x.com/tri_dao/status/2033948569502413245

NVIDIA thanks all its partners: the message? There is no way around NVIDIA. NVIDIA is the center of the revolution.
https://x.com/kimmonismus/status/2033615181415387610

Straight from NVIDIA GTC: Jensen Huang just unveiled a new vision for AI infrastructure For the first time, Rubin GPUs+Groq LPUs are paired: > 35× higher inference throughput > 10× more revenue from trillion-parameter models Architecture & why it’s needed
https://x.com/TheTuringPost/status/2033700480975520097

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
https://x.com/karpathy/status/2034321875506196585

Absolutely loving the new sub-agents feature with Codex 🤯 Feels like managing a tiny team of thinkers… Plato waiting, Lorentz thinking, Mendel chilling… Who decided to name them after legendary minds? 😂
https://x.com/fdaudens/status/2033939319103070334

Codex 🤝 @NotionHQ Meet us in NYC on March 17 for a night packed with: Codex demos. Practical workflows. Builders to meet and learn from. https://x.com/OpenAIDevs/status/2033333345619464228

Companies go through phases of exploration and phases of refocus; both are critical. But when new bets start to work, like we’re seeing now with Codex, it’s very important to double down on them and avoid distractions. Really glad we’re seizing this moment.
https://x.com/fidjissimo/status/2034769466433913082

Gemini as folklore machine: “”Create a comic using universal folklore index ATU 570 set in the present day”” “”Now add ATU 720″” ATU 570 are tales about “”the king’s rabbit herder”” & ATU 720 is “”My mother slew me, my father ate me, my sister buried me under the juniper”” (really)
https://x.com/emollick/status/2033754096453271778

gemini-3.1-flash-lite-preview is extremely underrated. I know I keep saying that, but nothing beats the (price*latency)/intelligence you get here.
https://x.com/matvelloso/status/2033304726226493829

GLM-OCR 0.9B model that beats Gemini on OCR benchmarks going live in 15 minutes to test it on real-world datasets and build some cool demos link: https://x.com/skalskip92/status/2034658568117309600

GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. https://x.com/OpenAI/status/2033953592424731072

GPT-5.4 mini is now available in Windsurf!
https://x.com/windsurf/status/2033954998837776869

GPT-5.4 Pro is the first model that’s made me feel genuinely enabled to do almost anything. I didn’t expect this kind of leap. I don’t know what counts as AGI, but this feels awfully close.
https://x.com/shaunralston/status/2031901724571812226

GPT-5.4-mini is a wildly capable model and gives you ~3.3x more usage on Codex tasks compared to GPT-5.4. It’s excellent for spinning up new subagents!
https://x.com/dkundel/status/2033953901301665838

I have so much gratitude to people who wrote extremely complex software character-by-character. It already feels difficult to remember how much effort it really took. Thank you for getting us to this point.
https://x.com/sama/status/2033935276079510011

I helped a bit with the new community page for codex. Sign up to become an ambassador, if you’re a fan!
https://x.com/steipete/status/2034400645630058792

Ollama is now a provider inside CodexBar! Thank you @steipete for the awesome work!
https://x.com/ollama/status/2033794815448780803

Spend caps in the Gemini API, available starting today!! This is another step forward of many to give developers more control and peace of mind when building with Gemini. Please go set a cap and send us any feedback as you use them!
https://x.com/OfficialLoganK/status/2032126479257968907

Subagents are now available in Codex. You can accelerate your workflow by spinning up specialized agents to: • Keep your main context window clean • Tackle different parts of a task in parallel • Steer individual agents as work unfolds
https://x.com/OpenAIDevs/status/2033636701848174967

The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:
https://x.com/sama/status/2033599375256207820

The value produced by models is getting so much better so fast that old hardware is actually getting *more* expensive to rent. 3 years ago, the best model you could run on a H100 chip was GPT-4. Now, you can run GPT-5.4 on it, which is smaller and cheaper to run while
https://x.com/dwarkesh_sp/status/2033953122197115324

Use subagents and custom agents in Codex https://simonwillison.net/2026/Mar/16/codex-subagents/

We evalled @OpenAI GPT-5.4 mini and nano on APEX-Agents. With xhigh reasoning, mini scores 24.5% Pass@1. It outperforms other lightweight models like Gemini 3.1 Flash Lite (12.8%) as well as midweight models like Sonnet 4.6 (23.7% Pass@1) – but the token $ is just ¼.
https://x.com/mercor_ai/status/2033955468650156503

We just shipped a bunch of stuff to make it easier to scale with the Gemini API: – Automatic tier upgrades – Tier 1 -> Tier 2 now happens much faster (30 days post payment -> 3 days) and with less spend ($250 -> $100) – New billing account caps on each tier to limit over spend
https://x.com/OfficialLoganK/status/2033587540419019127

BullshitBench update: The new GPT-5.4 mini and nano models score quite low. This screenshot shows OpenAI models only, on the full list would put GPT-5.4-mini around 40th place and Nano is around 70th place. Again thinking didn’t help much at all.
https://x.com/petergostev/status/2033995459522396287

GPT 5.4 is a big step for Codex – by Nathan Lambert https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex

gpt-5.4 has ramped faster than any other model we’ve launched in the API: within a week of launch, 5T tokens per day, handling more volume than our entire API one year ago, and reaching an annualized run rate of $1B in net-new revenue. it’s a good model, try it out!
https://x.com/gdb/status/2033605419726483963

GPT-5.4 nano is is also available starting today in the API.
https://x.com/OpenAI/status/2033953595637538849

GPT-5.4-mini looks really good for computer-use
https://x.com/scaling01/status/2033954794105127007

Ran a small eval today on an LM using GPT-5.2 as a judge. Model scores 10%, but paper reports it scoring 34%. I see that the paper uses GPT-5.1 as a judge; for the sake of consistency I change it. Switch to GPT-5.1 as a judge. Model now scores 43.5%… bro
https://x.com/a1zhang/status/2034059629072945251#m

This, but for real* Here’s METR-style graph of labor displacement from Roman aqueducts, doubling time of CDDII years. Lesson: 1) Displacing terrible work is good 2) All exponentials become s-curves in the end * I had GPT-5.4 Pro do the research, spot checks seemed accurate.
https://x.com/emollick/status/2033636278508425646

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost | VentureBeat https://venturebeat.com/technology/xiaomi-stuns-with-new-mimo-v2-pro-llm-nearing-gpt-5-2-opus-4-6-performance

@steipete @RatulSarna made custom feature to check how much of tokens I wasted in the past still testing locally, and wonder if people would be interested, so I’ll raise a PR
https://x.com/maxceem/status/2033465188016595202

anthropic’s “generational fumble”
https://x.com/morqon/status/2023203435475063157?s=20

CodexBar 🎚️ 0.18 is out: – New providers: Kilo, Ollama, OpenRouter – Codex historical pace + risk forecasting + backfill – merged-menu Overview tab – fewer Claude keychain prompt annoyances – lower CPU/energy use, faster JSONL scanning thx @RatulSarna 🙏
https://x.com/steipete/status/2033422930449944990

Hear me out…
https://x.com/steipete/status/2034416944074613174

Huh. Thought URLs in terminal emulators would be easy. Nothing is really easy.
https://x.com/steipete/status/2032832312933757090

If you are on a Mac and you use different LLM model providers, this tiny app from @steipete is quite useful 🤩
https://x.com/TommasoGritti/status/2034391666459541537

we do.
https://x.com/steipete/status/2033369381124935829

🤖: A blog post announcing opik-openclaw, a native OpenClaw plugin from Comet that adds full-stack observability, tracing every LLM call, tool execution, token cost, and sub-agent delegation, to address the visibility gap in autonomous agent workflows.
https://x.com/dl_weekly/status/2033529164813250938

Claude CoClaw
https://x.com/simonw/status/2034014713928106261?s=46

Folks, if you get crypto emails from websites claiming to be associated with openclaw, it’s ALWAYS a scam. We would never do that. The project is open source and non-commercial. Use the official website. Be sceptical of folks trying to build commercial wrappers on top of it.
https://x.com/steipete/status/2034301028670255447

How China is getting everyone on OpenClaw, from gearheads to grandmas https://www.cnbc.com/2026/03/18/china-openclaw-baidu-tencent-ai.html

I migrated from Openclaw -> Hermes and so far, so good – Things “”just work”” a lot better – The transition of data from OC -> Hermes was very easy too – It doesn’t seem to randomly crash and stop working on me – When I ask it to do things, it’ll create an actual skill, rather
https://x.com/Zeneca/status/2033460972346650852

in the next claw release (~Sunday), you can always ask your agents, even they are busy working.
https://x.com/steipete/status/2033050183492382767

Must-read AI research of the week: ▪️ OpenClaw-RL ▪️ Meta-Reinforcement Learning with Self-Reflection for Agentic Search ▪️ Agentic Critical Training ▪️ Video-Based Reward Modeling for Computer-Use Agents ▪️ AutoResearch-RL ▪️ Neural Thickets ▪️ Training Language Models via
https://x.com/TheTuringPost/status/2033856658615767496

My openclaw twitter mention block cron job is working unreasonably well. Turns out AI is really good at detecting spam/reply guy/promo stuff. Runs every 5 min and cleans up my mentions – I actually see useful replies now and Twitter got pleasant again!
https://x.com/steipete/status/2033220404555526154

NemoClaw for openclaw. Nice!
https://x.com/kimmonismus/status/2033636585963721182

Ollama 0.18.1 is here! 🌐 Web search and fetch in OpenClaw Ollama now ships with web search and web fetch plugin for OpenClaw. This allows Ollama’s models (local or cloud) to search the web for the latest content and news. This also allows OpenClaw with Ollama to be able to
https://x.com/ollama/status/2033993519459889505

Ollama is now an official provider for OpenClaw. openclaw onboard –auth-choice ollama All models from Ollama will work seamlessly with OpenClaw. 🦞 Use it for the tasks you want, all from your chat app. Thank you @steipete for helping and reviewing. 🦞
https://x.com/ollama/status/2033339501872116169

parallel tool calling coming to openclaw
https://x.com/steipete/status/2032676333440880903

There’s a lot of cool stuff being built around openclaw. If the stock memory feature isn’t great for you, check out the qmd memory plugin! If you are annoyed that your crustacean is forgetful after compaction, give https://t.co/C5B7PJxorq a try!
https://x.com/steipete/status/2032861327967072671

Thinking how we can evolve openclaw plugins to be more powerful while also making core leaner. Also wanna add support for claude code/codex plugin bundles. Good stuff coming soon!
https://x.com/steipete/status/2033215216469614923

Super excited to see @Microsoft getting involved and helping to make MS Teams top notch for @openclaw!
https://x.com/steipete/status/2032675009005531237

🚀 Today we’re launching LangSmith Sandboxes Agents get a lot more useful when they can run code: analyze data, call APIs, build entire applications. Sandboxes give them a safe place to do it with ephemeral, locked-down environments you control. Now in Private Preview. Learn
https://x.com/LangChain/status/2033949251529793978

Deploy LangGraph agents using the LangGraph CLI You can now deploy LangGraph agents to production straight from your terminal using the LangGraph CLI! 🛠️ langgraph new → scaffold from a template 🧪 langgraph dev → test locally in Studio 🚀 langgraph deploy → deploy your
https://x.com/LangChain/status/2033596690171629582

Fantastic write-up on SKILLS by the GOAT @trq212 I have been leaning into SKILLS a ton too, both for my Claude Code setup, and also for building agentic software (mostly with DeepAgents from LangChain). You should read his post top to bottom, and if you don’t have much time,
https://x.com/mstockton/status/2034095691648098606#m

New Conceptual Guide: You don’t know what your agent will do until it’s in production 👀 With traditional software, you ship with reasonable confidence. Test coverage handles most paths. Monitoring catches errors, latency, and query issues. When something breaks, you read the
https://x.com/LangChain/status/2034314483259031965#m

Polly is our AI assistant built directly into LangSmith to help you debug, analyze, and improve your agents — now generally available. Now, Polly lives on every page of LangSmith, remembers your full session as you navigate, and can take action to update prompts, compare
https://x.com/LangChain/status/2034321435418825023#m

You can now build your own version of Claude Code. Deep Agents is a MIT-licensed framework that recreates the core workflow behind top coding agents. It lets you inspect and modify the exact architecture that makes these agents work. – Planning and todo tools for managing
https://x.com/simplifyinAI/status/2033581939756818648

Computer is now on Android.
https://x.com/perplexity_ai/status/2033562296077963773

Perplexity Computer has been rolled out to all Android users. Update your Perplexity app and toggle to Computer to get started!
https://x.com/AravSrinivas/status/2033561054324953432

ScreenSpot-Pro, the GUI computer use benchmark is now on @huggingface 🏆 just added Qwen3.5 it takes 5th place, with specialist Holo2 family takes top ranks whoever builds next GUI model based on Qwen3.5 can top the leaderboard? 🔥
https://x.com/mervenoyann/status/2034265145158119642#m