Agents and Copilots: AI News Week Ending 04/03/2026

Agents and Copilots: AI News Week Ending 04/03/2026

April 3, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted perfume bottle with warm amber liquid, crystal stopper, white background, soft shadow, and glass refractions. Replace the label text with ‘Agents’ in the same black serif typography. Add a delicate sterling silver chain draped naturally around the bottle neck with a small dainty compass rose pendant (eight-pointed star design) in high-fashion jewelry aesthetic, catching light with precise metallic detail.

Great little story from @danshapiro about how he asked a coding agent to fix the official webcam software from Canon that kept crashing. He woke up to a new, fully functional Rust webcam app that has worked ever since.
https://x.com/emollick/status/2037295090306039867

One of our company goals is to automate manual data entry from documents ✍️📑 Our Extract feature in LlamaParse does exactly that, and today we are launching Extract v2 🚀 Define a schema in natural language, and our agentic extraction will fill out the schema from the document
https://x.com/jerryjliu0/status/2039764004332339565

Claude Dispatch and the Power of Interfaces
https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of

Computer use in Claude Code is a huge unlock. The biggest bottleneck in AI coding is it can’t “see” what it built. Computer use gives Claude Code eyes. It can now run this closed loop: “write the code, compile it, launch the app, click through it, find the bug, fix it, and
https://x.com/Yuchenj_UW/status/2038671697923223999

Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans.
https://x.com/claudeai/status/2038663014098899416

Did they just turn on the claude code pets?
https://x.com/meowbooksj/status/2039256157781410298

Fortune: “”Anthropic says: Capybara is a new name for a new tier of model: larger and more intelligent than our Opus model”” “”Compared to our previous best model, Claude Opus 4.6, Capybara gets dramatically higher scores on tests of software coding, academic reasoning, and
https://x.com/scaling01/status/2037379145806524655

I like how the Anthropic Claude Code team is being chill about the code leak. What’s leaked is leaked. 70k forks, Python & Rust versions on GitHub, there’s no way back. One thing is clear from reading the code: harness engineering is hard and deeply non-trivial. I think more
https://x.com/Yuchenj_UW/status/2039191313749524518

I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I’ll focus on the ones I use the most. Here goes.
https://x.com/bcherny/status/2038454336355999749

Let Claude use your computer from the CLI – Claude Code Docs
https://code.claude.com/docs/en/computer-use

new claude code buddy feature is kinda cute
https://x.com/eliebakouch/status/2039176958416720104

Schedule tasks on the web – Claude Code Docs
https://code.claude.com/docs/en/web-scheduled-tasks

The biggest bottleneck in AI for most people isn’t the models. It’s the chatbot. New interfaces like Claude Dispatch, are closing the gap between what AI can do and what people can actually use it for. For many folks, that is where leaps will come from.
https://x.com/emollick/status/2039109996097491153

There’s an AI pet lurking in Claude Code!
https://x.com/dbreunig/status/2039017351061143780

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
https://x.com/AnthropicAI/status/2039749628737019925

🚀 Imagine running Claude 4.6 Opus-level reasoning… but entirely on your own GPU with just 16GB VRAM. This 27B Qwen3.5 variant, distilled on Claude 4.6 Opus reasoning traces, delivers frontier coding power locally. It’s beating Claude Sonnet 4.5 on SWE-bench in 4-bit
https://x.com/outsource_/status/2038999111039357302

This model has been #1 trending for 3 weeks now. It’s Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model:
https://x.com/UnslothAI/status/2038625148354679270

Very bullish on open source and local models Imagine running near-Opus-level model locally on that $600, 16GB Mac Mini you bought last month This 27B Qwen3.5 distill was trained on Claude 4.6 Opus reasoning traces and is putting up real numbers: – beats Claude Sonnet 4.5 on
https://x.com/TheCraigHewitt/status/2039303217620627604

> Anthropic leaked Claude Code source code > someone forked it > 32.6k stars, 44.3k forks > got scared of getting sued > convert the whole codebase from TypeScript to Python with Codex AI is quietly erasing copyright.
https://x.com/Yuchenj_UW/status/2038996920845430815

🧵 Claude Code source leak — After reading 500K+ lines of code, one takeaway stands out: This isn’t just good engineering. It’s research-grade thinking shipped as a product Deep insights from Zhihu contributor Yufeng He 👇 🧠 Core design • A single while(true) loop = the
https://x.com/ZhihuFrontier/status/2039229986339688581

🚨 Anthropic’s Claude Code Source Leak — What It Actually Exposes A careless build mistake just laid bare one of the most advanced AI coding tools — and the lessons are huge. Insights from Zhihu contributor deephub 👇 🏢 About Anthropic Anthropic is a leading AI safety-focused
https://x.com/ZhihuFrontier/status/2039289110075203854

0xMarioNawfal on X: “The leaked Claude Code source has 44 hidden feature flags and 20+ unshipped features. – Background agents running 24/7 – One Claude orchestrating multiple worker Claudes – Cron scheduling for agents – Full voice command mode – Actual browser control via Playwright – Agents that https://t.co/IkU0WzP0VO” / X
https://x.com/RoundtableSpace/status/2038960753458438156?s=20

Anthropic’s new model, Capybara: “Compared to Claude Opus 4.6, Capybara achieves dramatically higher scores in software coding, academic reasoning, and cybersecurity.” According to Dario’s previous interview, it might be a 10T-parameter model that cost $10 billion to train.
https://x.com/Yuchenj_UW/status/2037387996694200509

Beyond raw model capability, the real gap in coding tools is the harness. Now that 500k+ lines of Claude Code are out there, every model lab and AI coding startup, including open-source AI labs, will study it and close that gap fast. SF already has Claude Code source
https://x.com/Yuchenj_UW/status/2039029676040220682

Claude Code leaked their source map, effectively giving you a look into the codebase. I immediately went for the one thing that mattered: spinner verbs There are 187
https://x.com/wesbos/status/2038958747200962952?s=20

Claude code source code has been leaked via a map file in their npm registry! Code:
https://x.com/Fried_rice/status/2038894956459290963?s=20

Claude Code’s source code appears to have leaked: here’s what we know | VentureBeat
https://venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know

Claude Code’s source code has been leaked via a map file in their NPM registry | Hacker News
https://news.ycombinator.com/item?id=47584540

dharmi on X: “incredible to learn more about how the best coding agent works under the hood eg: here is how the plan mode in claude code works https://t.co/qd16GCVjau” / X
https://x.com/DharmiKumbhani/status/2038917827462308308?s=20

DMCAs for Claude code source code are going out.
https://x.com/BlancheMinerva/status/2039114452088295821

ellen livia ᯅ 🇺🇸🇮🇩🔜 ICIAI Tokyo on X: “here’s how Claude Code actually handles memory : all 8 phases 🧵 Our team at @mem0ai use @claudeai a lot, we deeply care about memory. here is a summary of how it works 👇 User Input -> Context Assembly -> History System -> API / Query -> Response -> Summary Phase 1: session https://t.co/hcZbJzbUxB” / X
https://x.com/ellen_in_sf/status/2039098050837463504

fakeguru on X: “I reverse-engineered Claude Code’s leaked source against billions of tokens of my own agent logs. Turns out Anthropic is aware of CC hallucination/laziness, and the fixes are gated to employees only. Here’s the report and CLAUDE.md you need to bypass employee verification:👇 https://t.co/h8KQESUz1i” / X
https://x.com/iamfakeguru/status/2038965567269249484?s=20

himanshu on X: “Based on everything explored in the source code, here’s the full technical recipe behind Claude Code’s memory architecture: [shared by claude code] Claude Code’s memory system is actually insanely well-designed. It isn’t like “store everything” but constrained, structured and https://t.co/PlGRvuvkts” / X
https://x.com/himanshustwts/status/2038924027411222533?s=20

https://pbs.twimg.com/media/HEuwvh_bgAE1xnL?format=jpg&name=large

Justin Schroeder on X: “Important takeaways from Claude’s source code: 1. Much of Claude Code’s system prompting is in the source code. This is actually surprising. (get full post)
https://x.com/jpschroeder/status/2038960058499768427

Leon Lin on X: “IT WORKED. opensource full claude code soon. https://t.co/6TJ2IBgRzq” / X
https://x.com/LexnLin/status/2038991257582604618?s=20

mal on X: “i read through the claude code source code so u dont have to. ” / X
https://x.com/mal_shaik/status/2038918662489510273

most interesting features in the Anthropic CC repo: – Kairos: always-on autonomous agent mode – dream: nightly memory consolidation – teammem: shared project memory – buddy: tamagotchi-like pet system with models
https://x.com/scaling01/status/2038982287648293016

My takeaways from scanning the Claude Code code for ~45 min this evening: 1️⃣Harness engineering is hard. There’s a lot of hard won knowledge in here and plenty of diagnostics to keep the feedback flowing. 2️⃣Harnesses and prompts smooth out model quirks. @SrihariSriraman and I
https://x.com/dbreunig/status/2039206774558036466

OFFICIAL STATEMENT from Anthropic regarding the leak
https://x.com/theo/status/2039074833334689987

Ole Lehmann on X: “i can’t believe more people aren’t talking about this part of the claude code leak there’s a hidden feature in the source code called KAIROS, and it basically shows you anthropic’s endgame KAIROS is an always-on, *proactive* Claude that does things without you asking it to.
https://x.com/itsolelehmann/status/2039018963611627545?s=20

rahat on X: “Claude Code has a regex that detects “wtf”, “ffs”, “piece of shit”, “fuck you”, “this sucks” etc. It doesn’t change behavior…it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will https://t.co/dJTfwxYMCV” / X
https://x.com/Rahatcodes/status/2038995503141065145?s=20

Sebastian Raschka on X: “Claude Code’s Real Secret Sauce (Probably) Isn’t the Model” / X
https://x.com/rasbt/status/2038980345316413862?s=20

The leaked Claude Code hit 110k+ GitHub stars in a day. Made OpenClaw look slow. #1 open-source project in Anthropic history.
https://x.com/Yuchenj_UW/status/2039415430994100440

What surprises me is that @DarioAmodei – the CEO – has said nothing. Boris seems to be an amazing leader and it’s great to hear these words from him. But…
https://x.com/TheTuringPost/status/2039390822093779258

A few take-aways from the Claude Code Leak: – Anthropic is actively using Capybara (Mythos) for development – they are already at Capybara v8 – Capybara still has issues with over-commenting and false-claims – Capybara has 1M context and fast mode – Numbat is another interesting
https://x.com/scaling01/status/2038948989257630166?s=20

Another Claude 5 update: Anthropic’s upcoming Model “”Mythos”” will have its own Tier *above* Opus, called “”Capybara”” This means that in addition to Haikiu, Sonnet, and Opus, there will also be “”Capybara,”” which is even more compute-intensive but also delivers significantly better
https://x.com/kimmonismus/status/2037463638261305752

Anthropic’s new model Capybara/Mythos just wants to be human
https://x.com/scaling01/status/2039091546377576864

Claude Mythos Blog Post Saved before it was taken down.
https://x.com/M1Astra/status/2037377109472018444

Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power revealed in data leak | Fortune
https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/

Local Claude Code builds have been achieved internally
https://x.com/theo/status/2039079267905261831

METR time horizons are doubling every ~107 days Opus 4.6 reached 11.98 hours in February today we should be at around ~15.2h and by end of year ~87.4h 90% CI’s today April 3rd 2026: [11.64h, 21.88h] EOY: [53.13h, 164.19h]
https://x.com/scaling01/status/2040047917306876325

Useful guide for getting started with Hermes Agent:::
https://x.com/Teknium/status/2039102514508058675

LiteParse is our open-source document parser that provides high-quality spatial text parsing with bounding boxes. It can parse hundreds of pages of table-heavy documents in seconds – and give you bounding boxes over all the text elements! 🎁 This means that any agent automation
https://x.com/jerryjliu0/status/2039730277786980833

Z AI has released GLM-5-Turbo, a proprietary model optimized for agentic use cases that scores lower than GLM-5 (Reasoning) on the Artificial Analysis Intelligence Index @Zai_org’s GLM-5-Turbo scores 47 on the Artificial Analysis Intelligence Index, 3 points behind the open
https://x.com/ArtificialAnlys/status/2038667075489808804

Build autonomous agents that plan, navigate apps, and execute multi-step tasks – like searching databases or triggering APIs – with native tool use. With up to 256K context, it can analyze full codebases and retain complex action histories without losing focus.
https://x.com/GoogleDeepMind/status/2039735455533453316

Inbox Zero is a thing of the past. Introducing AI Inbox: cut through your email clutter with smart prioritization and daily personalized briefings. Rolling out today in Beta for Google AI Ultra subscribers in the US. →
https://x.com/gmail/status/2039107985281008078

NEW paper from Google DeepMind The biggest threat to AI agents isn’t a smarter attacker. It’s the web itself. This work introduces the first systematic framework for understanding how the open web can be weaponized against autonomous agents. The paper defines “”AI Agent Traps””:
https://x.com/omarsar0/status/2039383554510217707

. @googlegemma have open sourced the perfect model for local open source agents. Gemma 4 comes in all the sizes we need for mobile, local, and code. This is how I’ll be switching my @thdxr opencode agent over. Let’s go local agents.
https://x.com/ben_burtenshaw/status/2039740590091362749

🎉 Gemma 4 is officially available on vLLM! Byte-for-byte, these are the most capable open models for advanced reasoning and agentic workflows. Key features include: – Native Multimodal Support: Full vision and audio capabilities with up to a 256K context window. – Broad
https://x.com/vllm_project/status/2039762998563418385

A 12-month time difference between Gemma 3 27b and Gemma 4 31b. The jump is absolutely enormous. Just look at the evaluations between the two models. GPQA doubled, AIME 2026 went from ~20% to ~90%, and so on. Crazy.
https://x.com/kimmonismus/status/2039759264680747219?s=20

A Visual Guide to Gemma 4 With almost 40 (!) custom visuals, explore the new models from Google DeepMind. We explore various techniques, ranging from Mixture of Experts and the Vision Encoder all the way up to Per-Layer Embeddings and the Audio Encoder. Link below 👇
https://x.com/MaartenGr/status/2040099556948390075

Gemma 4 — Google DeepMind
https://deepmind.google/models/gemma/gemma-4/

Gemma 4 31B (Reasoning) is very token efficient, using ~1.2M tokens on the GPQA Diamond evaluation, fewer than peers models such as Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M)
https://x.com/ArtificialAnlys/status/2039752015811866652

Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the
https://x.com/Prince_Canuma/status/2039840313074753896

Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!)
https://x.com/demishassabis/status/2040067244349063326

Gemma 4: Our most capable open models to date
https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

Gemma-4-31B is now live in Text Arena – ranking #3 among open models (#27 overall), matching much larger models at 10× smaller scale! A significant jump from Gemma-3-27B (+87 pts). Highlights: – #3 open (#27 overall), on par with the best open models Kimi-K2.5, Qwen-3.5-397b –
https://x.com/arena/status/2039739427715735645

Getting Started with Gemma 4 in AI Studio
https://x.com/GoogleAIStudio/status/2040090067709075732

Google just open-sourced Gemma 4. Unprecedented performance for advanced reasoning and agentic workflows, and big leap in efficiency on a parameter basis. Use it now in KerasHub. I recommend the JAX backend – best performance!
https://x.com/fchollet/status/2039845249334510016

Google just re-entered the game 🔥🔥 They want to take the crown 👑 back from Chinese open source AI. And… Gemma 4 is FINALLY Apache 2.0 aka real-open-source-licensed. From what I’ve seen it’s going to be a pretty significant model. But give it a try yourself today: brew
https://x.com/ClementDelangue/status/2039941213244072173

got Gemma 4 up and running at 34 tokens per second this is the 26B-A4B model, running on my mac mini m4 with 16GB ram next time i hit my claude session limits i’ll have this fast free local AI as a backup :]
https://x.com/measure_plan/status/2040069272613834847

Got Gemma-4-26B-A4 MoE running on iPhone w/Flash SSD in Swift MLX. Still pretty slow, I expect 10+ t/s once optimized properly for Swift.
https://x.com/anemll/status/2040126326708031969

Introducing a Visual Guide to Gemma 4 👀 An in-depth, architectural deep dive of the Gemma 4 family of models. From Per-Layer Embeddings to the vision and audio encoders. Take a look!
https://x.com/osanseviero/status/2040105484061954349

Let’s look at how the open model Gemma has progressed across its last three versions. – Gemma 4 ranks 100 places above Gemma 3 – Gemma 3 ranks 87 above Gemma 2 All three models from @GoogleDeepMind are roughly the same size (31B, 27B, 27B), and these gains came only 9 and 13
https://x.com/arena/status/2039848959301361716

Lets go: Running a full AI assistant locally on a MacBook Air M4 with 16GB, completely free, open source, no API keys needed. Atomic Bot makes it really simple: install, pick Gemma 4, and you have an always-on AI agent running on your machine. No cloud. No subscription. No data
https://x.com/kimmonismus/status/2039989730901623049

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
https://x.com/GoogleDeepMind/status/2039735446628925907

NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇
https://x.com/xenovacom/status/2039741226337935430

To explain why I consider Gemma 4 a bigger release than most people realize. This is a big deal because models like Gemma 4 E4B can run directly on devices, bringing powerful AI (even a 2B model ~60% on MMLU Pro) to phones, laptops, and edge systems without relying on the cloud,
https://x.com/kimmonismus/status/2039978863644537048

Today, we’re launching Gemma 4, our most intelligent open models to date. Built with the same breakthrough technology as Gemini 3, Gemma 4 brings advanced reasoning to your personal hardware and devices. Here’s what Gemma 4 unlocks for developers: — Intelligence-per-parameter:
https://x.com/GoogleAI/status/2039735543068504476

We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially
https://x.com/Google/status/2039736220834480233

You can run Gemma 4 100% locally in your browser thanks to HF transformers.js. That means 100% private and 100% free! @xenovacom created a demo for it here:
https://x.com/ClementDelangue/status/2039782910996148508

run OpenClaw, Hermes Agent and Pi with Gemma 4 with few lines of change 🔥
https://x.com/mervenoyann/status/2039788257815261400

So happy to see Google release Gemma 4 today in apache 2.0 that gives you frontier capabilities locally. You can use it right away in all your favorite open agent platforms like openclaw, opencode, pi, Hermes by asking it to change your model to local gemma 4 with
https://x.com/ClementDelangue/status/2039740419899056152

Been really cool to see the traction of @NousResearch Hermes Agent, the open source agent that grows with you! Hermes Agent is open-source and remembers what it learns and gets more capable over time, with a multi-level memory system and persistent dedicated machine access.
https://x.com/ClementDelangue/status/2037634211973140898

I just had a very magical moment with the Hermes Agent by @NousResearch . My Hermes agent messaged my business partner’s Hermes agent, and they established a secure connection. They made a few rounds back-and-forth, introduced themselves, and updated notes on the current
https://x.com/fancylancer3991/status/2037579517389144399

Going to install Hermes today Never did get around to OpenClaw. Having read what I’ve seen about Hermes, kind of glad I waited. Excited to give it a go
https://x.com/soundslikecanoe/status/2038611090704113931

Openclaw took me weeks to deploy and get going. Something still breaks daily. I still love it. Hermes took 15 min to setup and get running, fully local, Discord, local model. Crazy… Keep tinkering. Stay agnostic.
https://x.com/charliehinojosa/status/2039384870091465202

Switched to Hermes over OpenClaw a few weeks back and it’s been largely smooth sailing and a blissful experience For those still using OpenClaw, is it a lot more smooth sailing these days too?
https://x.com/Zeneca/status/2039836468928233875

You can switch to Hermes in 2 minutes. They have an import function from OpenClaw. Smart @NousResearch.
https://x.com/AntoineRSX/status/2039017227270156395

OpenClaw on a Unitree G1 humanoid 🤯 A MIT dropout developed an open-source robotics platform that supports 80% of Chinese OEM robots! This OpenClaw upgrade to process physical space and time via integrations with LiDAR, stereo, or RGB cameras. It enables robots like the
https://x.com/IlirAliu_/status/2039250442434072973

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries
https://x.com/DrJimFan/status/2039358115318243352

Here comes AutoClaw. We offer a new solution to run OpenClaw locally on your own machine. – Download and start immediately. No API key required. – Bring any model you like, or use GLM-5-Turbo, optimized for tool calling and multi-step tasks. – Fully local. Your data never leaves
https://x.com/Zai_org/status/2038632251551023250

Open Models have crossed a threshold
https://blog.langchain.com/open-models-have-crossed-a-threshold/

A load-bearing wall that everyone assumed was structural, could be removed now. That kind of unlock doesn’t come along often in front-end!
https://x.com/TheTuringPost/status/2038892871663685902

My dear front-end developers (and anyone who’s interested in the future of interfaces):
I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept):
Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow
https://x.com/_chenglou/status/2037713766205608234

pretext is a bigger deal than you think – YouTube

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’.
https://x.com/Alibaba_Qwen/status/2038636335272194241

Demo2：Audio-Visual Vibe Coding
https://x.com/Alibaba_Qwen/status/2038637124619231467

Here’s another demo of Audio-Visual Vibe Coding~
https://x.com/Alibaba_Qwen/status/2038641496455557565

Qwen
https://qwen.ai/blog?id=qwen3.5-omni

Qwen
https://qwen.ai/blog?id=qwen3.6

.@ceo_clickhouse raised $50M for @ClickHouseDB with no deck, no product, no customers. On Gradient Dissent he calls out Snowflake, Datadog and Databricks by name, talks wiring $100M out of SVB before it collapsed, and why he’s building the fastest database in AI for agents not
https://x.com/wandb/status/2038984035301822784

“how do we get training data to improve our agents?” Collect every Trace + point agentic compute at it – run an Data/Eval Agent on every trace – mine errors+mistakes, fix + test – turn this into a data point for training or harness eng ex: just internal dogfooding of our agents
https://x.com/Vtrivedy10/status/2040079505763504373

@ClementDelangue Yes! Our work on Agent Data Protocol (https://t.co/lTOthtvYIq) proposes a standardized schema for agent interaction traces to make collection, sharing, and reuse easier across different agent frameworks. Happy to contribute/collaborate! 📰Paper link:
https://x.com/yueqi_song/status/2037614951230296230

// Coding Agents are Effective Long-Context Processors // We are just touching the surface of what’s possible with coding agents. LLMs struggle with long contexts, even the ones that support massive context windows. It turns out coding agents already know how to solve this;
https://x.com/dair_ai/status/2038635382989005015

// Unified Inference and Training Framework for Agent Memory // Most memory-augmented agents are built with duct tape–one system for storage, another for retrieval, a third for training. New research introduces a unified framework that treats agent memory as a first-class,
https://x.com/omarsar0/status/2039349083039817984

Agent harnesses are too restrictive. That’s because they’re still designed as code. What if the harness itself were written in natural language and interpreted by an LLM at runtime? This research explores the idea. The work introduces Natural-Language Agent Harnesses (NLAHs),
https://x.com/dair_ai/status/2038968068706390117

Agent Labs: Workload-Harness Fit – Software Synthesis
https://www.akashbajwa.co/p/agent-labs-workload-harness-fit

Agent Orchestration & Cowork with Slackbot | Slack
https://slack.com/blog/news/agent-orchestration

Always satisfying to visualize improvements as a ladder! It’s also worth observing here that: – all methods from 15-72% use dense retrieval models or BM25 – all methods above 80-91% use late interaction models, from LightOn and Mixedbread
https://x.com/lateinteraction/status/2039382401961410803

are you paying attention to what just shipped? this is a kanban board where the workers are AI agents. > you create a card. > an agent picks it up. it runs in its own worktree. > you review the diff when it’s done. > link cards together and they figure out the dependency
https://x.com/VibeMarketer_/status/2037521519736463782

Build more efficient AI agents with the Agent Skills specification 🛠️ By using progressive disclosure, you can load domain expertise only when needed. This can reduce baseline context usage by 90%. We break agent knowledge into three layers: 1️⃣ L1 metadata: Just enough info
https://x.com/googledevs/status/2039359112668950986

Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like @karpathy, I also use Obsidian for my MD vaults. What’s different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for
https://x.com/omarsar0/status/2039844072748204246

day 2 of the harness engineering series: dynamic config middleware lets you reshape your agent’s model, tools, and prompt at every step based on context. ex: LLMToolSelectorMiddleware runs a fast filter on your tool registry so your main model receives streamlined tool specs.
https://x.com/sydneyrunkle/status/2039040565749096607

depthfirst has raised an $80M Series B at a $580M valuation. Attackers are using AI to break into systems faster than ever before. depthfirst is on a mission to stop this. RT + Comment “depthfirst” and I’ll send you a FREE vibe coding security agent.
https://x.com/andreamichi/status/2039010131443437850

Diagram of the LLM Knowledge Base system. Feed this to your favorite agent and get your own LLM knowledge base going.
https://x.com/omarsar0/status/2040099881008652634

EpochX Building the Infrastructure for an Emergent Agent Civilization paper:
https://x.com/_akhaliq/status/2039006585188499744

feedback loops for agents are all the rage – here’s how @vishsuresh_ implemented one for our GTM agent! great blog on it below
https://x.com/hwchase17/status/2039749451259195428

Generalization Results from APEX-Agents Dev Set | Mercor Research
https://www.mercor.com/blog/generalization-results-from-training-on-the-apex-agents-dev-set/

Hark just posted 25 roles to build AI models and native AI devices: > AI Infra > Supply Chain > Embedded Software > Product Engineering > iOS & Android Mobile > Computer Use Agents > AI Foundation Models > Design & Hardware Engineering Apply here:
https://x.com/adcock_brett/status/2037559392858722789

Human-in-the-loop in @LangChain UIs is a clean pattern: the agent interrupts, your frontend reads the pending action, and the user decides whether to approve, reject, or edit before execution continues. Interrupts show up as regular stream state, so rendering a review UI feels
https://x.com/LangChain_JS/status/2038985561348993107

Hyperagent is one systems that can modify everything about itself, including how it improves itself. Hyperagents combine everything into one editable program: ▪️ Task agent that solves the task ▪️ Meta agent which modifies the system. It’s not fixed, it can modify itself too.
https://x.com/TheTuringPost/status/2037289001552683041

In 2024, @ScottWu46 & @russelljkaplan launched Devin, the first AI software engineer. In the first two months of 2026, Devin usage surpassed all of 2025. They help huge companies finish projects in months, that previously took years. Fun to catch up, and play Ricochet Robots.
https://x.com/JTLonsdale/status/2037555800193851727

Introducing Kaggle Standardized Agent Exams 🔥 Let your agents register to an exam, solve it, and join the leaderboard
https://x.com/osanseviero/status/2039246602255114650

Introducing the agent-browser dashboard See exactly what your agent sees → Watch headless browser in real time → Manage all your sessions in one place → Debug with activity, console, network, and storage panels agent-browser dashboard start
https://x.com/ctatedev/status/2037599050112160165

It is trendy to discuss Jevon’s Paradox in AI (as AI gets more efficient, overall use increases) but the current situation is much simpler: thanks to agents, token demand is surging and compute is supply constrained, at least for powerful models. That will be reflected in pricing
https://x.com/emollick/status/2038629127712878725

Most devs think that adding more agents to a planning system should help. The math says otherwise. New theoretical work from MIT proves fundamental limits on what multi-agent LLM architectures can achieve. The work models LLM multi-agent planning as finite acyclic decision
https://x.com/omarsar0/status/2039361664374739136

New LangChain Academy Course Launch: Monitoring Production Agents Shipping agents to production is hard. Unlike traditional software, agents are non-deterministic. Users can say anything, and the same input can produce different outputs. You can’t rely on pre-launch testing
https://x.com/LangChain/status/2039014039892947062

One thing I’ve realized after migrating a bunch of agentic workflows to RLMs and trying auto research on it It’s not enough to put the context/prompt in the harness/repl You have to put the harness into the harness Recursive all the way!
https://x.com/raibaggy/status/2039849261974814882

Scaling of agents is getting weird. Still find 2-4 sessions optimal for my brain too but invoking agent teams within them
https://x.com/kylebrussell/status/2040090424799350878

Strix – open-source AI hackers for apps It uses multi-agent systems that run your code, attack it and validate vulnerabilities with working proof-of-concepts. Comes with a full built-in toolkit (browser, proxy, terminal, Python runtime) for static + dynamic analysis in one
https://x.com/TheTuringPost/status/2037564560446804239

Teleport Beams — Trusted Runtimes for Infrastructure Agents
https://www.beams.run/

The journey from a one-shot LLM to a single agent with DSPy, and finally to specialized sub-agents + MIPRO, is very valuable here. 1. Let agents control their own context retrieval. The shift from “”here are the pages”” to “”here are tools, go investigate”” is an important
https://x.com/koylanai/status/2039027239304433767

The Model-Harness Training Loop imo every great team in the world will use some version of this loop to build the best agents for their tasks this is now possible because: 1. Harness Engineering is becoming more democratized and accessible (we want it to be even easier) 2. Open
https://x.com/Vtrivedy10/status/2039872562662941118

the next phase of long running autonomous agents is when agents monitor & understand when things go wrong and deploy fixes great blog from Vishnu on how he built this pipeline using our background coding agent there’s a lot of human priors that go into mining & identifying
https://x.com/Vtrivedy10/status/2039756274468810778

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵
https://x.com/liquidai/status/2039029358224871605

Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on
https://x.com/lennysan/status/2039845666680176703

We also added in-line diffs to the CLI TUI! Many coders desired the ability to see the changes the agent was making when it was making it, to better help them know what’s going on. Now you can! It is on by default but able to be disabled in the config!
https://x.com/Teknium/status/2040152383121154265

We also overhauled the docs — new guides for GRPO training, vLLM serving, training stability, debugging, and agent-specific workflows. Full release notes:
https://t.co/iL9iaBUuzm Docs:
https://t.co/7FS3EQTuxs pip install axolotl==0.16.0
https://x.com/winglian/status/2039740266597245113

We need more open agent traces datasets. Who can help?
https://x.com/ClementDelangue/status/2037530125638455610

We’re excited to support @Arcee_ai’s Trinity-Large-Thinking — a frontier open reasoning model Purpose-built for the agents people are actually running in production. Proud to have supported with our infra and post-training stack including prime-rl and verifiers.
https://x.com/PrimeIntellect/status/2039401593309667727

We’re introducing Cursor 3. It is simpler, more powerful, and built for a world where all code is written by agents, while keeping the depth of a development environment.
https://x.com/cursor_ai/status/2039768512894505086

we’re seeing that open source models are getting good at file operations, summarization, tool use, retrieval good enough to drive harnesses like deep agents!
https://x.com/hwchase17/status/2039787730402705653

When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc… I am really
https://x.com/karpathy/status/2037200624450936940

You can now automate harness engineering. System prompts. Tool definitions. Retry logic. Context management. Changing just this layer can create a 6x performance gap on the same model. It’s called Meta-Harness. Here’s how it works: 1. Start with any harness. A coding agent
https://x.com/LiorOnAI/status/2038669301541228606

Your coding agent shouldn’t live in a chat box 🤖 Give it a real workspace: sandboxed filesystem, live file tree, diffs, terminal, chat. New Deep Agents guide shows how to build an IDE-style agent UI end-to-end ↓
https://x.com/LangChain_JS/status/2037560951445266891

👾 Claude Code 🤝 LangSmith 🦜 We’ve shipped a new way to trace Claude Code runs to LangSmith! It’s a plugin that traces subagents, tool calls, compaction runs, and more. You can run evals to test the impact of skills/MCPs, use LangSmith Insights to look for trends across your
https://x.com/LangChain/status/2040137349313556633

Agent teams in Claude Code can drive SO MANY browser tabs for verification work
https://x.com/kylebrussell/status/2039825390131155270

Another sick upcoming feature: /acp spawn codex –bind here LOOK AT ME, I AM CODEX NOW You could bind codex/claude code/opencode already in threads, now you can take over your current session as well.
https://x.com/steipete/status/2037725493315707290

Anthropic DMCA’d my Claude code fork. …which did not have the Claude Code source. It was only for a PR where I edited a skill a few weeks ago. Absolutely pathetic.
https://x.com/theo/status/2039411851919057339

Anthropic should really lock in more GPUs and TPUs. Otherwise we’re all headed for a $2,000/month Claude Capybara plan.
https://x.com/Yuchenj_UW/status/2037391159115563214

Audit Claude Platform activity with the Compliance API | Claude
https://claude.com/blog/claude-platform-compliance-api

Bought Claude Pro and hit the usage limit with a single Claude Code prompt (Sonnet 4.6) The same prompt runs significantly more times on Codex (with Plus sub) Not sure how people deal with Anthropic’s cursed rate limits and absolutely scuffed infrastructure reliability
https://x.com/cto_junior/status/2040130186755371192

btw the codex for claude code plugin is open sauce – go play around with it, tinker and make it fit for your use cases gotta love the open standards
https://x.com/reach_vb/status/2038702889070211557

Claude Code being closed source is the biggest bag fumble in the AI era. If CC was on Github, these things would be trivial to identify and fix. Instead we’re stuck reverse engineering their incompetence.
https://x.com/theo/status/2038740065300676777

Claude SO slow
https://x.com/Teknium/status/2039270117650116934

Code is free, but Anthropic is shutting down repos of the leaked Claude Code source with DMCA requests. 🤔
https://x.com/dbreunig/status/2039007097376108979

I built a new plugin! You can now trigger Codex from Claude Code! Use the Codex plugin for Claude Code to delegate tasks to Codex or have Codex review your changes using your ChatGPT subscription. Start by installing the plugin:
https://x.com/dkundel/status/2038670330257109461

In a world where everyone can build websites, apps and features easily (thank you Cursor, Lovable, Claude and the likes), it will take more for you and your company to differentiate themselves (which is in my opinion the basis for success). That’s why we’re seeing more and more
https://x.com/ClementDelangue/status/2038649731404927202

In this Codex vs. Claude Code AI coding war, rate limit reset frequency is Prometheus’s fire. Whoever gives developers more rate limit resets wins this token economy.
https://x.com/Yuchenj_UW/status/2039364184459391075

Introducing ARC — Agent Remote Control 📡 Think Claude Code’s web viewer, but for any local AI agent. Start a task on your workstation, then monitor and interact from any browser – your phone, tablet, or another laptop. Watch tool calls in real-time, send messages, answer
https://x.com/winglian/status/2038680417125957865

Is Claude Code 5x Cheaper Than Cursor?
https://www.ashu.co/claude-code-vs-cursor-pricing/

it may sound insane, but @theo is almost definitely right to suggest that Anthropic should open-source Claude Code. The benefits are obvious: -it would be the most-starred repo of all time by a very wide margin, easily surpassing OpenClaw -they would get a bunch of free
https://x.com/michael_chomsky/status/2039986402260046226

me after seeing claude elevated errors and 529 (couldn’t remove the markers because i got 529 while making this)
https://x.com/dejavucoder/status/2037439287873159641

Sachin1801/claude-code | DeepWiki
https://deepwiki.com/Sachin1801/claude-code

Self-updating docs in the Claude Code source code: – Employees can create new files “”Magic Docs”” with the MAGIC DOC header – Internal builds of Claude Code fire off a dedicated subagent when idle – A background agent documents the specified feature in the Magic Doc file –
https://x.com/mattyp/status/2038988217102266669

Starting today you can use Codex in Claude Code 👀 /plugin marketplace add openai/codex-plugin-cc Try it out today with: /codex:review for a normal read-only Codex review /codex:adversarial-review for a steerable challenge review /codex:rescue to let codex rescue your code
https://x.com/reach_vb/status/2038671858862583967

This is exactly what I’ve been doing with Claude Code. The biggest bottleneck with my ability to use these agents is ensuring they preserve relevant context between relevant sessions. Having the agent output files in .md and .html is not only a nicer way to view outputs than in
https://x.com/jerryjliu0/status/2039834316013031909

To clarify, OpenHands will not be issuing any DMCA takedown notices for those who want to use our agent, which has most of the features of Claude Code. We have Tamagotchi on the roadmap, no worries.
https://x.com/gneubig/status/2039166255089799222

Two notes from this year-old prediction: 1) You can either view this as hype (100% of code is not written by AI) or a startlingly solid prediction (Claude Code didn’t exist then, but now writes a remarkably high percentage of code) 2) Adoption is more of a barrier than technology
https://x.com/emollick/status/2037147367925789073

Universal CLAUDE.md Claims to cut Claude output tokens by 63%! Drop-in. No code changes. CLAUDE.md is one of the best ways to steer Claude Code. Not surprised to see the efficiency reported here.
https://x.com/omarsar0/status/2039343351187554490

We don’t talk about this enough. Opus scored 20% higher in Cursor than in Claude Code.
https://x.com/theo/status/2038690786821505378

We just open-sourced 12 agent skills that teach Claude Code and Codex how to use Together AI. Install them and your coding agent just knows the right SDK patterns, model IDs, and API calls, no more copy-pasting from docs! npx skills add togethercomputer/skills
https://x.com/togethercompute/status/2039392682553094239

We need to talk about the Claude Code rate limits
https://x.com/theo/status/2039992633616224366

We’ve made setting up
https://t.co/wbJhWHsewH with GitHub much easier! You can now run /web-setup in a local `claude` session to use your local GitHub credentials on the web
https://x.com/_catwu/status/2039027712288075812

Chinese OpenSource models are gonna mug Anthropic & OpenAI like they never existed before The coding gap between open and closed-source is practically gone GLM-5.1 gives the almost the same comparable coding performance that goes toe-to-toe with Claude Opus, but a roughly 10x
https://x.com/XFreeze/status/2037695882301436412

⚠️ Supply chain attack in progress: someone is squatting Anthropic-internal npm package names targeting people trying to compile the leaked Claude Code source. `color-diff-napi` and `modifiers-napi` — both registered today, same person, disposable email. Do NOT install them. 🧵
https://x.com/Butanium_/status/2039079715823128964

I think this is a terrible move by @AnthropicAI. The open source community is going to build custom harness now anyways, you might as well have some control. Obviously they didn’t want this to happen, but now that it has I don’t see what they’re going to accomplish
https://x.com/BlancheMinerva/status/2039128635559318013

is it just me or is Claude down?
https://x.com/iScienceLuvr/status/2037487244634972471

The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.
https://x.com/emollick/status/2039142905156153428

This is an actual violation of the DMCA. Anthropic just broke the law.
https://x.com/theo/status/2039412173689196674

I know these are all unreliable leaks of internal code names but please, please AI labs, the only thing worse than calling your models GPT-5.5-xhigh-Codex-nano is giving them names like Agent Smith or Mythos, for obvious reasons.
https://x.com/emollick/status/2037565418970185786

I have long felt that agent harnesses – even claude code – are too restrictive, because they are still designed by humans. New paper for Tinsghua and Shenzhen says, what if AI itself runs the harness, rather than defining it in code? Given a natural language SOP of how an agent
https://x.com/rronak_/status/2038401494177694074

Collinear presents YC-Bench This benchmark evaluates agent capability to run a simulated startup over a one-year horizon spanning hundreds of turns.
https://x.com/arankomatsuzaki/status/2039541189968626047

evals rhyme with training data the same rigor and care we put into data quality/curation for training should go into eval design training data updates the weights of our models, each example contributes a weight push in some direction to correctly classify that datapoint Evals
https://x.com/Vtrivedy10/status/2039029715533455860

I just published a blog that covers 30+ popular LLM evals / benchmarks and how they are created. Here are the common themes for success… For full details, find the blog post here:
https://t.co/sWSNkbCEhm (1) Domain Taxonomy. Most popular LLM benchmarks categorize their data
https://x.com/cwolferesearch/status/2039009111711367557

I really like the strategy used by CursorBench to evaluate Composer 2. Many good design decision: – Benchmark items are sourced from real coding sessions (from the Cursor team, so no issues with opt-in), which makes the evals realistic and less prone to contamination. – The
https://x.com/cwolferesearch/status/2037726856699420987

Introducing AA-AgentPerf – the hardware benchmark for the agent era. Key details: ➤ Real agent workloads, not synthetic queries: we’ve captured real coding agent trajectories where our agents used up to 200 turns and worked with sequence lengths >100K tokens ➤ Production
https://x.com/ArtificialAnlys/status/2037562417836929315

Introducing Contra Labs. The first frontier data and evaluation lab for Creative AI.
https://x.com/contraben/status/2039021014244262000?s=20

New conceptual guide: 🔄 The agent improvement loop starts with a trace Tracing is the foundational primitive for improving agents. A trace gives you the full behavioral record of what an agent actually did. From there, teams can enrich traces with evals and human feedback,
https://x.com/LangChain/status/2039028327030079565

Reasoning over Mathematical Objects Our 70-page(!) paper is out on arXiv, as covered by several of our recent blog posts. We study how to improve reasoning on hard tasks (e.g., math expressions) via: • better training data (& new evals) • better reward models (on-policy
https://x.com/jaseweston/status/2040062089725645039

Tau Bench got an update! Tau Bench is one of the most adopted Agentic Benchmarks. They now added “Banking” a fintech-inspired customer support domain built around a realistic knowledge base of 698 documents across 21 product categories. Tasks require agents to search this
https://x.com/_philschmid/status/2038655544613826985

The Agent Evaluation Readiness Checklist Starting to think through how to test your agents? We put together a step-by-step checklist for building, running, and shipping agent evals. 🧪 We walk through: → How to read traces in LangSmith and analyze errors, before building evals
https://x.com/LangChain/status/2037590936234959355

we’re leaning incredibly hard into Open Models + Open Harnesses evals show that current open models get near frontier (or better) intelligence on many tasks, they’re way cheaper, and usually faster real world tasks need to take perf, cost, latency into account many tasks don’t
https://x.com/Vtrivedy10/status/2039805753905840159

we’re leaning into the future of Agent Improvement with Traces, Evals, & Infra the future will be deeply grounded in data so that we can win against slop that means we’ll need to: – point smart agentic compute towards traces to surface and monitor errors – use human & agent
https://x.com/Vtrivedy10/status/2039035899938267334

Weekend over. Here’s what I built:
https://t.co/me1qexYWgw A simple agent-native CLI to parse, sanitise, and commit agent traces to public or private Hugging Face datasets for analytics, evals, and training. What I focused on: – a schema that is actually useful for downstream
https://x.com/jayfarei/status/2038385591818023278

Cohere has released Cohere Transcribe: an open weights model achieving 4.7% on AA-WER, based on 3 datasets including our proprietary AA-AgentTalk dataset The 2B parameter model is based on a conformer encoder-decoder architecture. It was trained from scratch on 14 languages
https://x.com/ArtificialAnlys/status/2038678855213568031

Improve coding agents’ performance with Gemini API Docs MCP and Agent Skills.
https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-docsmcp-agent-skills/

Really enjoyed this conversation with @JTLonsdale on what’s happening in agentic software engineering and what it means for the rest of the world.
https://x.com/russelljkaplan/status/2037628375788073105

Meta tests Paricado model family, also Health agents
https://www.testingcatalog.com/meta-tests-paricado-model-family-health-and-document-agents/

Curious if there has been any good articles written on the impact of VLMs on low-vision and blind people. The advent of a universal text reading, and visual description system seems like it would be a big advance as a result of AI, but haven’t seen anything written about it.
https://x.com/emollick/status/2037968740671713407

IBM just dropped Granite 4.0-3B-Vision, new vision language model for documents > sota for its size for table & charts 🙌🏼 > use with transformers & vLLM > free license
https://x.com/mervenoyann/status/2039015519135641997

Introducing Gemini 3.1 Flash Live, our new realtime model to build voice and vision agents!! We have spent more than a year improving the model + infra + experience, the results? A step function improvement in quality, reliability, and latency.
https://x.com/OfficialLoganK/status/2037187750005240307

The leading performance of GLM-5V-Turbo stems from systematic upgrades across four levels: Native Multimodal Fusion: Deep fusion of text and vision begins at pre-training, with multimodal collaborative optimization during post-training. We developed the next-generation CogViT
https://x.com/Zai_org/status/2039371149721694639

Cycle your keys and oauths for the same provider when one runs out – now in Hermes Agent latest. `hermes update` to access!
https://x.com/Teknium/status/2039096442313396514

Deeper dive into some of the updates in v0.7 Memory: We have begun transitioning each of the systems in Hermes Agent to work through defined interfaces so that the core code is more maintainable, and more providers for everything can be supported. We started with memory: Now
https://x.com/Teknium/status/2040151297991770435

Hermes Agent now supports @plastic_lab’s Honcho, @mem0ai, @openvikingai, @Vectorizeio’s Hindsight, @retaindb, and @ByteroverDev memory systems! Try them now with `hermes update` then `hermes memory setup` We have rehauled our memory system to be much more maintainable and
https://x.com/Teknium/status/2039912975444926885

installed the icarus plugin on my Hermes agent. it picked up all 6 tools automatically. The agent works across slack, telegram, discord. every session gets captured. after a month of running you have hundreds of real decisions logged. then you tell the agent “”train yourself.””
https://x.com/IcarusHermes/status/2038524251355934872

It’s FINALLY HERE! Multi Agent Profiles so you can have as many independent bots with their own memory, gateway connections, skills, chat history, everything! To use: Run `hermes update` and look for multi agent profiles User Guide:
https://t.co/i0R8puqJ6k Reference:
https://x.com/Teknium/status/2038694680549077059

Our biggest day EVER with Hermes Agent, we’re now #5 biggest AI App on OpenRouter metrics! What do you want to see in the next update?
https://x.com/Teknium/status/2039788883312087231

Your Hermes agent writes things every session — research, skills, decisions, logs. After a few weeks, you’ve got hundreds of files sitting in the working directory. But the agent can’t read them all every session. It doesn’t know which ones matter for this question. So it either
https://x.com/jphorism/status/2039822829412405671

Excited about our new paper: AI Agent Traps AI agents inherit every vulnerability of the LLMs they’re built on – but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself. The web pages, emails, APIs, and
https://x.com/FranklinMatija/status/2039001719007330530

It’s time for open-source agent tools to rely primarily on open-source models, instead of closed-source APIs that send all your data to the cloud and ultimately will get hacked and/or shut down
https://x.com/ClementDelangue/status/2038552830638755962

// Graph Augmented Associative Memory for Agents // Long-term memory for agents is still an unsolved problem. Flat RAG loses structural relationships, and knowledge graphs miss conversational associations. New research proposes combining both through a hierarchical approach.
https://x.com/dair_ai/status/2039072251199549573

Fine-grained authorization for RAG is one of the most underestimated problems in production AI. If your agent can retrieve documents, it needs to enforce who’s allowed to see them, not just at the role level. With @auth0 FGA and
https://x.com/thinkshiv/status/2039836920243486790

Access control is one of the top priorities across every enterprise organization to secure AI agents. We’re excited to collaborate with @auth0 on this blog post. We’re building the infrastructure enabling agents to automate document heavy work (invoices, contracts, claims,
https://x.com/jerryjliu0/status/2039841363202818505

Autonomous AI is already in production in 50%+ of orgs, but governance is falling behind, and agent sprawl is becoming the next enterprise risk. Here’s a good webinar that can help mitigate it: “”AgentOps 2026: How to Securely Manage AI Agents”” →
https://x.com/TheTuringPost/status/2037877632520634654

The first paper from the Secure Intelligence Institute responds to NIST’s request for information on securing autonomous agents. Read the paper on arXiv:
https://x.com/perplexity_ai/status/2039029152880480260

We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7
https://x.com/LyptusResearch/status/2039861448927739925

NEW papers on self-organizing LLM Agents. Assign an agent a role, and it’ll follow instructions. Let agents figure out roles themselves, and they’ll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing
https://x.com/dair_ai/status/2039350842382512455

NEW research from CMU. (bookmark this one) The biggest unlock in coding agents is understanding strategies for how to run them asynchronously. Simply giving a single agent more iterations helps, but does not scale well. And multi-agent research shows that coordination >
https://x.com/omarsar0/status/2038627572108743001

Using GLM-5.1 in Coding Agents
https://x.com/Zai_org/status/2037506911013138851

It helps to think of ARC-AGI-3 as a different test entirely than the previous ARC-AGIs. It measures different things (though, as in the previous tests, precisely what it measures isn’t clear) and has different rules. That doesn’t mean it isn’t good, but it is its own thing.
https://x.com/emollick/status/2037356753197617409

Kind of want a ARC-AGI-X test where a reputable organization runs it & builds a validated benchmark with outside expert help, but they never disclose the questions or even the nature of the challenges themselves so the tasks can never be targets. All we see is a leaderboard
https://x.com/emollick/status/2037106065553154521

This is true, but ARC-AGI-3 is also a test designed so that AI gets zero today, just as the earlier ARC-AGI tests were designed . Those tests were then mostly saturated with a year or two. The thing to watch with ARC-AGI-3 is whether we see the same progress.
https://x.com/emollick/status/2038680759305691586

World Reasoning Arena – A comprehensive benchmark for evaluating world model – Expose a substantial gap between current models and human-level hypothetical reasoning
https://x.com/arankomatsuzaki/status/2038443186255991169

(9) AIE Europe Day 1: Keynotes & OpenClaw/Personal Agents ft OpenAI, Vercel, Google Deepmind & more – YouTube

Love that Google DeepMind is following OpenAI’s suit w/ using Apache 2.0 license for their open weights models – congrats! but, can we please stop using Arena Elo as the de facto measure of performance?
https://x.com/reach_vb/status/2040070816247734720

Quality of life updates to @GoogleAIStudio we just shipped (using Gemini): – You can now (optionally) save a temp chat in the playground – You can now turn a playground chat into an app in 2 clicks – Updated colors for playground to add some soul to it – Simplified the mobile
https://x.com/OfficialLoganK/status/2039137446932185266

So much of this, every day. You really have to develop thick skin (exoskeleton?) when working on successful open source. (The Chrome extension has been removed since Google added native access in 144+, which is simpler, but yes, it does require a one-time setting change)
https://x.com/steipete/status/2037988925818519763

How developers can use Veo 3.1 Lite for AI video generation
https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/

Veo 3.1 Lite now available in Gemini API and @GoogleAIStudio. Designed for rapid prototyping and high-volume video generation, starting at $0.05/sec. 🪶 – 1/2 the cost of Veo 3.1 Fast. – Text-to-Video (T2V) & Image-to-Video (I2V). – Landscape (16:9) and Portrait (9:16) format –
https://x.com/_philschmid/status/2039014102811427263

.@GoogleDeepMind Gemma 4 is here with state-of-the-art models targeting edge and workstations. Requires Ollama 0.20+ that is rolling out. 4 models: 4B Effective (E4B) ollama run gemma4:e4b 2B Effective (E2B) ollama run gemma4:e2b 26B (4B active MoE) ollama run gemma4:26b
https://x.com/ollama/status/2039738348647108680

.@UnslothAI supports @GoogleGemma 4 models, optimized for RTX GPUs. 🦥 Run & fine-tune locally in Unsloth Studio.
https://x.com/NVIDIA_AI_PC/status/2040096993800761579

Axolotl support for Gemma 4 is in v0.16.1 is released! Finetune @GoogleAIStudio Gemma4 26B-A4B on your own 5090 using our optimized fused MoE+LoRA kernels!
https://x.com/winglian/status/2039823559363629432

Deploy Gemma4 31B and 26B-A4B with one click on Hugging Face Inference Endpoints 🔥👇
https://x.com/ErikKaum/status/2040008281796513939

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use – happy building!
https://x.com/demishassabis/status/2039736628659269901

Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4
https://x.com/rasbt/status/2039780905619705902

future is local 🔥 Google DeepMind just released Gemma 4: local frontier in many sizes, all modalities with free license 🤯 we ship Gemma 4 in transformers, llama.cpp, transformers.js and more for your convenience 🫡 plug-and-play with your agents 🙌🏻 read our blog ⤵️
https://x.com/mervenoyann/status/2039739097611215344

Gemma
https://x.com/OfficialLoganK/status/2039486016751366431

Gemma 4 26B MoE (4B active) on a single RTX 4090: – 162 t/s decode – 8,400 t/s prefill – Full 262K native context — 19.5 GB VRAM – Only 10 Elo below the 31B dense Q8_0 on dual 4090+3090: 9,024 t/s prefill at 10K. 2,537 t/s at full 262K — that’s a novel in about 100
https://x.com/basecampbernie/status/2039847254534852783

Gemma 4 architecture analysis thread Just as Gemma3n, this thing has a galaxybrained architecture, very much not a standard transformer
https://x.com/norpadon/status/2039740827975500251

Gemma 4 by @GoogleDeepMind debuts at 3rd and 6th on the open source leaderboard, making it the #1 ranked US open source model. By total parameter count, Gemma 4 31B is 24× smaller than GLM-5 and 34× smaller than Kimi-K2.5-Thinking, delivering comparable performance at a
https://x.com/arena/status/2039782449648214247

Gemma 4 is here! The best open-source model you can run on your machine. Day-0 support in a llama.cpp. Check it out!
https://x.com/ggerganov/status/2039744468899811419

Gemma 4 is live on Baseten and available to all customers on day 0 via the Baseten model library. All models in the Gemma 4 family are multimodal, supporting text and image inputs with text output. Key capabilities include: -> Advanced reasoning and thinking -> Coding and
https://x.com/baseten/status/2039751071284015393

Gemma4 is amazing. You’ll read that everywhere. Let’s focus on what is HUGE here: the revenge of dense models…. Throw away your b200, not needed anymore, throw away the millions of lines of code we had to write to make MOEs faster, training stable etc… throw away your
https://x.com/art_zucker/status/2039740402517893361

Google Deep Mind’s impressive fully-open Gemma 4 is live day-zero on Modular Cloud. Modular provides the fastest performance on NVIDIA Blackwell and AMD MI355X, thanks to MAX and Mojo🔥. The team took this impressive new model to production inference in days.🚀
https://x.com/clattner_llvm/status/2039738590213910558

google gemma 4 architecture is very interesting and every model has some subtle differences, here is a recap: > per layer embedding only on the small variant > no attention scale (usually you divide qk^T by sqrt(d), they don’t) > they do QK norm + V norm as well > they share
https://x.com/eliebakouch/status/2039751171556954531

Google has released Gemma 4, a new family of multimodal open-weight models including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 31B and Gemma 4 26B A4B @GoogleDeepMind’s new Gemma 4 family introduces four multimodal models supporting text, image, and video inputs. We evaluated Gemma 4
https://x.com/ArtificialAnlys/status/2039752013249212600

Google releases Gemma 4. ✨ Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B. The multimodal reasoning models are under Apache 2.0. Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB. GGUFs:
https://t.co/fpX21yWbge Guide:
https://x.com/UnslothAI/status/2039739190536286313

I have to give credit to Google for Apache 2.0 on Gemma 4! This is huge!
https://x.com/QuixiAI/status/2039862230452252926

Intel is partnering with @GoogleAI to deliver fully functional #Gemma4 models on Intel hardware from day zero–across Intel Xeon CPUs, Intel Xe GPUs, and Intel Core Ultra processors, with support across open frameworks including @vllm_project and @huggingface. This means
https://x.com/intelnews/status/2040106767258906707

Just do this: brew install llama.cpp –HEAD Then; llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
https://x.com/julien_c/status/2039746054355067002

Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result:
https://x.com/ggerganov/status/2039752638384709661

Say hello to Gemma 4 from @GoogleDeepMind 🚀🔥 💎 Comes in 4 sizes: E2B, E4B, 26B A4B, 31B 💎 Supports vision and reasoning 💎 Apache 2.0 💎 Available now in LM Studio
https://x.com/lmstudio/status/2039738625525502426

Son lead the development on HF/llama.cpp side for adding support for the new Gemma 4 models. As always, he did an outstanding job throughout the collaboration with the Google DeepMind team. Day-0 support is possible thanks to his hard work!
https://x.com/ggerganov/status/2039943099284140286

Thanks for following us! We’re excited to see what you all build with Gemma 4! In case you missed it, you can find all our checkpoints, with an Apache 2.0 License, on Hugging Face:
https://x.com/googlegemma/status/2040107948010242075

thinking about google’s gemma 4 and what it means a few months ago running something this capable locally meant serious hardware and serious tradeoffs on quality now it runs on your laptop, works offline on your phone (!!!), speaks 140 languages natively, 256k context window,
https://x.com/gregisenberg/status/2039853864082424198

Today we’re releasing Gemma 4, our new family of open foundation models, built on the same research and technology as our Gemini 3 series. These models set a new standard for open intelligence, offering SOTA reasoning capabilities from edge-scale (2B and 4B w/ vision/audio) up
https://x.com/JeffDean/status/2039748604232122707

Two years ago, we released Gemma, Google DeepMind family of open models. Today, I’m thrilled to share a new milestone: Gemma 400M downloads and 100,000 variants! Thank you to every developer, partner, and contributor. We can’t wait to see what you build next!👀
https://x.com/osanseviero/status/2039120000095547722

What you need to know about @googlegemma 4: 4️⃣ 4 sizes (E2B, E4B, 26B4A, 31B) 🪟 Up to 256K context window 🛠️ Native function-calling, structured JSON output 👁️ + audio on edge models (E2B/E4B) 🌍 Trained on 140+ languages 🏆 31B ranks #3 open model on Arena AI 🪪 Apache 2.0
https://x.com/_philschmid/status/2039736207676965264

Yowza! @ollama is on it with new Gemma 4 models
https://x.com/MichaelGannotti/status/2039903041642508541

Gemma 4 31B shifts the Pareto frontier, scoring +30 Arena points above similarly priced models like DeepSeek 3.2. Its position on the Pareto frontier is based on early pricing indicators from third parties.
https://x.com/arena/status/2040128319719670101

impressive, very nice. now let’s compare a 31b dense to a 31b active 670b total instead. flop for flop
https://x.com/stochasticchasm/status/2039912148676264334

MoE models differ from the likes of DeepSeek and Qwen: instead of using shared experts in parallel to the routed ones, Gemma adds MoE blocks as separate layers in addition to the normal MLP blocks. So the architecture is Attention -> MLP -> MoE
https://x.com/norpadon/status/2039750841754697767

Nemotron Super / Ultra Arcee Trinity Large (soon) Gemma 4 (eventually) Reflection’s first models (maybe) GPT OSS 2? (maybe) Thinky? Other neolabs? Things looking up for open models built in the US in 2026. We had 0 for a bit there.
https://x.com/natolambert/status/2039499358325129530

Gen-Searcher Reinforcing Agentic Search for Image Generation paper:
https://x.com/_akhaliq/status/2039000804061847801

1-bit Bonsai 8B running locally on an M4 Pro (MLX) alongside a standard 16-bit 8B model. Same class of model, very different deployment profile: far lower memory use and substantially higher throughput.
https://x.com/PrismML/status/2039049404209148007

A new addition to Claw-style agents AutoClaw – a local-first agent runner from
https://t.co/3QuHijMYPx promising full autonomy – No API keys, no cloud dependency – No data leaving your machine – Runs custom models + GLM-5-Turbo (tool-optimized) – Start tasks directly from a
https://x.com/TheTuringPost/status/2038900836794081287

Cohere transcribe running locally in the browser!
https://x.com/nickfrosst/status/2037680223445975131#m

Demo of 1-bit Bonsai 8B from @PrismML running on-device on iPhone 17 Pro More than 40tk/s for a dense 8B model on iPhone, that’s a first Powered by Apple MLX and available now in Locally AI
https://x.com/adrgrondin/status/2039066539022778613

llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on
https://x.com/ggerganov/status/2038632534414680223

My self-sovereign / local / private / secure LLM setup, April 2026
https://vitalik.eth.limo/general/2026/04/02/secure_llms.html

Introducing multi-model intelligence in Researcher | Microsoft Community Hub
https://techcommunity.microsoft.com/blog/microsoft365copilotblog/introducing-multi-model-intelligence-in-researcher/4506011

GLM-5V-Turbo from @Zai_org is now available in TRAE as a custom model. GLM-5V-Turbo is featured as a Vision Coding Model built for vision-based coding and agent-driven tasks. Start building with TRAE and GLM now!
https://x.com/Trae_ai/status/2039380056460730451

Hermes cron job for scanning for new major vulnerabilities and checking + notifying and even resolving those vulnerabilities if existing locally might be a pretty great use case!
https://x.com/Teknium/status/2039022907020689898

Must-read AI research of the week: ▪️ Learning to Commit: Generating Organic Pull Requests via Online Repository Memory ▪️ Effective Strategies for Asynchronous Software Engineering Agents ▪️ Composer 2 ▪️ From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow
https://x.com/TheTuringPost/status/2038763668079550900

Repo is offically live. Thank you for the support and encouragement. I hope everyone likes it. Please send feedback when you can. Thank you. @NousResearch @Teknium
https://x.com/aijoey/status/2039108098174906514

Thanks for using Hermes Agent @OrdinaryGamers!
https://x.com/NousResearch/status/2039402523711140094

The Hermes Agent update you’ve been waiting for is here.
https://x.com/NousResearch/status/2038688578201346513

Wouldn’t have known how good hermes is if not for my dead laptop. Just standard setup. No fancy plug-ins or skills. Running GLM5. Whatever it needed to learn, it did by chatting with me. @Teknium and team, thank you so much for working on this. It’s so damn good. F’ing awesome
https://x.com/AnomalistG/status/2039969500968501748

We have integrated @huggingface as a first-class inference provider in Hermes Agent. When you select Hugging Face in the model picker it now shows 28 curated models organized by use case, with a custom option for the 100+ other models they serve.
https://x.com/NousResearch/status/2037654827929338324

🚨 Codex rate limits reset across ALL surfaces and plans! Thank you @thsottiaux sensei for making this cautious decision
https://x.com/reach_vb/status/2039257725402542363

A pre-ChatGPT paper that seems relevant: information processing seems to be the key to growth throughout the long span of recorded history. Better information processing tools are, the paper argues, what keeps societies from collapse.
https://x.com/emollick/status/2039370083173326936

Ah nevermind, I actually remember we decided to have the core open-source for Codex because it would be awesome to see the ecosystem flourish as it’s all so nascent and fun. And we would learn a lot in return. Phew.
https://x.com/thsottiaux/status/2039482054686196116

Asking Codex to build a SimGothicManor game and really enjoying how much of its internal planning monologue has become obsessed with tongue-in-cheek gothic, such as worrying about “”scope creep in a velvet cape””
https://x.com/emollick/status/2037765012958199898

Box has launched our plugin within Codex. Users can take any content within Box and automate workflows around it using the power of a coding agent. Here’s an example of processing earnings call documents to extract structured data at scale.👇
https://x.com/Box/status/2037563341431058497

Bring Codex to your team without fixed seat costs. We’re rolling out usage-based pricing for Codex in ChatGPT Business and Enterprise plans, so teams have a more flexible way to get started.
https://x.com/OpenAIDevs/status/2039794643513295328

btw i think a little buried today in the oai fundraise is the fact that OpenAI Codex added 600k users in last 3 weeks: – Feb 4 @sama said it crossed 1M WAU – Feb 27 oai says it crossed 1.6M WAU it is up >3x from Jan 1 (!?!?!?!?!?!?!?) which includes the Codex app launch (Feb 2)
https://x.com/swyx/status/2027613757787279730?s=20

Codex use cases are like Skills, but for humans
https://x.com/gdb/status/2037732675897770123

Developers are getting work done, even while they sleep. Latest data from Codex use shows that developers delegate their long-running, hard tasks, such as refactors and architecture planning, to Codex at the end of the day.
https://x.com/OpenAIDevs/status/2038707501492056401

𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲 𝗝𝘂𝘀𝘁 𝗗𝗲𝗺𝗼𝗰𝗿𝗮𝘁𝗶𝘀𝗲𝗱 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 TRL v1.0 is out. SFT, reward modelling, DPO, GRPO: the entire post-training stack, unified, open source, production-ready. This is the bit OpenAI charges you for. ⬩ Every startup can now fine-tune and
https://x.com/RussellQuantum/status/2039270550099443954

It is worth noting the absolute confidence of the leading AI labs that they can continue to release ever more powerful models for the near future. As usual, they may not be right, but they haven’t been wrong on this yet (despite the weird “”GPT-5 is a plateau”” articles last year)
https://x.com/emollick/status/2037392207641014688

Keep the work and the ticket in sync. @linear plugin in the Codex app.
https://x.com/OpenAIDevs/status/2039482146369458526

Living more & more in the Codex App since we rolled out plugins Recently I’ve been using the @linear plugin to turn noisy context from anywhere into projects > milestones > issues
https://x.com/nickbaumann_/status/2037395162641686813

New CodexBar beta has experimental multi-acount support for codex.
https://x.com/steipete/status/2039019069257756735

one of my favourite plugins in codex is Build Web Apps, it combines @shadcn & react best practices with web design guidelines! all of it with the ability to deploy on Vercel and connect to stripe & superbase you can literally build a startup with just this one plugin!
https://x.com/reach_vb/status/2037614060452106437

One of the things that is useful about the ChatGPT GPT-5.4 Pro (and also Thinking) harness is that it is quite good at understanding how to read scientific papers, not just relying on text, but also figuring out which figures are key and inspecting those visually.
https://x.com/emollick/status/2038693491153199428

Our Codex dashboards are showing increased rate of users hitting rate limits and since we don’t fully understand why I have made the cautious decision of resetting the usage limits for all plans. Enjoy. I also wanted to celebrate us finding a pocket of fraudulent accounts that
https://x.com/thsottiaux/status/2039248564967424483

Plugins are now available in Codex:
https://x.com/gdb/status/2037348081684111623

Plugins in Codex? We got you. Explore practical workflows in our use case gallery. Open in one click in the Codex app and start building iOS apps, analyzing datasets, or generating reports and slides.
https://x.com/OpenAIDevs/status/2037604273434018259

the codex app is growing super fast, it’s very well done
https://x.com/gdb/status/2039950296969863283

The coolest meeting I had this week with was Paul, who used ChatGPT and other LLMs to create an mRNA vaccine protocol to save his dog Rosie. It is amazing story. “”The chat bots empowered me as an individual to act with the power of a research institute – planning, education,
https://x.com/sama/status/2037396826060673188

We’ve changed our pricing so it’s now possible to try Codex at work without any up-front commitment. Codex (especially through the app!) has gotten *really* good. Happy building!
https://x.com/gdb/status/2039830819498491919

Will AI agents replace coding? “”The throughput that you can get, if you don’t hold yourself responsible for typing the code, is just so massive.”” Michael Bolin @bolinfest, lead for open-source Codex at @OpenAI, in our interview Watch the full conversation about what engineers
https://x.com/TheTuringPost/status/2037921817344823639

You’d think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4’s trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as @dylan522p explains. A 5T
https://x.com/dwarkesh_sp/status/2039357128373350853

Holo3 is here 🚀. Today, we’re launching Holo3: our new series of frontier computer-use models. 78.9% on OSWorld-Verified. That puts us ahead of GPT-5.4 and Opus 4.6, at one-tenth of the cost. Weights on Hugging Face. API is live. Test it now! #Holo3 #OpenSource #ComputerUse
https://x.com/hcompany_ai/status/2039021096649805937

are they vibing the takedown requests too?
https://x.com/steipete/status/2039156882041123035

creating new jobs right here 🦞
https://x.com/steipete/status/2039090059748823330

MCPorter (MCP->CLI)🧳0.8.0 is out. – stronger OAuth handling for servers – valid JSON output on fallback paths – better mcporter call behavior and error handling – generated CLIs handle object-valued args better – keep-alive/daemon reliability
https://x.com/steipete/status/2038074759527981416

New Claw beta bits are up! Lots of reliablility+security improvements in there + a new task system for more reliable subagents/crons/etc
https://x.com/steipete/status/2039076488897876462

Sooo I got MS Teams, I got Telegram, and I’m just onboarding @_egzim to make our Slack channel integration amazing! Claw level ↑🦞
https://x.com/steipete/status/2037695302644232518

this just became more relevant 🙃
https://x.com/steipete/status/2039043198329528446

@soflowolf @Teknium We are working on a project that i started on openclaw. The difference is night and day. Less mistakes, and it doesnt seem to repeat them. Switching to hermes also made me realize i was drastically bleeding usage on openclaw. Im doing more work with like 1/4 of the usage.
https://x.com/PolackJack/status/2037661357785690584

Arcee’s latest model, Trinity Large Thinking is live now on OpenRouter! It is a 400B total 13B active model with powerful agentic performance, free in @openclaw for the first 5 days!
https://x.com/OpenRouter/status/2039369849441497340

Holy shit. Finally, goodbye OpenClaw. Super excited to set this up, @Teknium I’ve seen nothing but great things and this delivery looks amazing.
https://x.com/valenxi_r/status/2038692504120504453

If you wanna work on OpenClaw with payroll, check this out.
https://x.com/steipete/status/2037625805329592682

OpenClaw 2026.3.28 🦞 🛡️ Plugin approval hooks — any tool can pause for your OK ⚡ xAI Responses API + x_search 💬 ACP bind here: Discord/iMessage 🩹WhatsApp echo loop, Telegram splitting, Discord reconnect fixes Tokyo pre-ClawCon drop 🇯🇵
https://x.com/openclaw/status/2038084923517796839

OpenClaw 2026.3.31 🦞 🇨🇳 Bundled QQ Bot — private, group, and guild chat + media 📹 LINE now sends images, video, and audio 🧵 Real background task flows: list, show, cancel 🇯🇵 Better CJK: context, memory, and TTS OpenClaw’s next release has been leaked🦞
https://x.com/openclaw/status/2039095081215672584

OpenClaw 2026.4.1 🦞 🤖 GLM 5.1 + failover that doesn’t loop 🛡️ AWS Bedrock Guardrails 📋 /tasks — your agent keeps receipts ⏱️ Cron per-job tool allowlists 🔧 40+ stability & exec fixes We’re renaming to ClankerBot. This is not a joke. Okay it is.
https://x.com/openclaw/status/2039409616950542351

OpenClaw has proven that local AI assistants have product-market fit. But the big issue with them has been security. The team at @Pokee_AI is fixing it with PokeeClaw: works like OpenClaw, but with in a secure sandbox architecture with isolated environments, approval workflows,
https://x.com/fchollet/status/2038662563228230127

Responsible OpenClaw owners do not let their Claws post on social media on their behalf. They make terrible and very boring commentators.
https://x.com/emollick/status/2038664772632121573

Talked with @durov and Telegram folks offered uncomplicated help, welcome @izhukov as new OpenClaw maintainer! First action point is to figure out why enabling the bot streaming API sometimes causes message dupes. This will make Telegram support so good!
https://x.com/steipete/status/2037197024081195188

ClawHub now has an official China mirror 🇨🇳🦞
https://t.co/d8Odd4sNOp Just tell your agent: “”Find skills on ClawHub using
https://t.co/NoR7AXyM6U”” Thanks @BytePlusGlobal / VolcanoEngine for the infra sponsorship 🙏 Other regions need a mirror? PRs welcome.
https://x.com/openclaw/status/2039240359197438229

Testing a new feature for Microsoft Foundry support in @openclaw. Their website is a jungle, I used to make screenshots so codex can guide me through it, but now Chrome has an MCP so codex can simply connect and drive my browser session and do all of that for me. The human is no
https://x.com/steipete/status/2037177396315488627

@NousResearch @Teknium Hermes has been running 20 mins straight on trying to solve something. Openclaw would have lost its way by now. Second time tonight it’s been running long trying to solve things. This is magic. Hermes also fixed my Openclaw agent which now runs better. Wow
https://x.com/erick_lindberg_/status/2039897087878275580

Is it just me or does codex 5.4 give better answers and results when using Hermes-agent versus OpenClaw? I mean not sort of kind of, but literally like you are using a completely better model? @Teknium what’s the secret sauce? I spent a lot of time on OpenClaw getting it “just
https://x.com/alexcovo_eth/status/2037589212648665273

it’s pretty obvious at this point. Hermes Agent > OpenClaw
https://x.com/VadimStrizheus/status/2039523211369762875

며칠전부터 자꾸 Hermes 에이전트에 신경이 쓰인다. 사실 OpenClaw가 좀 더 오래 시장을 장악할 줄 알았는데, 아직 검증은 안됐지만 강력한 경쟁자가 들어온 것 같다. 미국에 NousResearch라는 팀이 있다. Nous Research는 오픈소스 AI 분야에서 가장 앞서가는 스타트업/연구 팀 중 하나이고.
https://x.com/supernovajunn/status/2039847124687605811

试了一下 Nous Research 的 Hermes Agent，体验比 OpenClaw 好太多了开源自主代理，装好之后常驻服务器，有持久记忆，用得越久越聪明。40+ 内置工具，网页搜索、终端、文件系统、浏览器自动化全都有。支持 Telegram、Discord、Slack、WhatsApp 多端接入，还能自然语言调度任务、多子代理并行处理
https://x.com/evanlong_me/status/2039026061640601816

@Zeneca I really tried to make OpenClaw work with Kimi 2.5, but it was unusable with anything smaller than Sonnet 4.6… Hermes, Qwen 3.5 35B drives is mostly without issues. So yeah, a pretty big difference.
https://x.com/Everlier/status/2039853380844081260

Huge thanks to @NVIDIAAI for supporting full-time engineering work on OpenClaw hardening. A lot of careful security and reliability improvements landed over the last few releases, and that investment is paying off.
https://x.com/openclaw/status/2039100191324979580

The next version of OpenClaw is also an MCP, you can use it instead of Anthropic’s message channel MCP to connect to a much wider range of message providers. (I know, this is awkward)
https://x.com/steipete/status/2037715163562815817

Arcee AI | Trinity-Large-Thinking: Scaling an Open Source Frontier Agent
https://www.arcee.ai/blog/trinity-large-thinking

Chat LangChain is now embedded directly in our docs 📚 You can ask questions grounded in: • Full docs (LangSmith + OSS) • Knowledge base • OSS code We’ve been investing heavily in developer experience. This is one step toward making everything easier and more accessible.
https://x.com/LangChain/status/2039387501140275431

Environments in LangSmith Prompt Hub Environments give you a proper promotion workflow for your prompts: – Assign any commit to Staging or Production – Promote between environments instantly – Roll back with a single click from a full deployment history – Reference reserved tags
https://x.com/LangChain/status/2037666098561032421

great example to see how the Hercules team uses LangSmith + LLM as a judge to enrich their trace data to capture customer sentiment many models are cheap enough that it’s often worth using them to identify semantics that regex alone can’t capture ex: don’t judge but i or may
https://x.com/Vtrivedy10/status/2039186184161616245

Today we’re releasing TRL v1. 75+ methods. SFT, DPO, GRPO, async RL to take advantage of the latest and greatest open-source. 6 years from first commit to the library that post-trains most open models in the world. Built to be future proof. pip install trl
https://x.com/ClementDelangue/status/2039121367656702102

Training mRNA Language Models Across 25 Species for $165
https://huggingface.co/blog/OpenMed/training-mrna-models-25-species

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally. MIT’s Alex L. Zhang, Tim Kraska, and Omar Khattab developed Recursive Language Models (RLMs) that process
https://x.com/DeepLearningAI/status/2039831830979838240

Just tried out new qwen3.5:4b-nvfp4 @ollama model on M1 Max here (in project where it’s used with Koog AI agent)…..38% faster than qwen3.5:4b (averaged over 5 runs of the agent).
https://x.com/joreilly/status/2039002786130534618

this model is an agentic treasure. it has been #1 trending for 3 weeks on @huggingface as mentioned by @danielhanchen. it’s Qwen 3.5 27B fine-tuned on Opus 4.6 distilled data and beats Sonnet 4.5 on SWE-bench verified and more. “”Runs locally on 16GB in 4-bit or 32GB in 8-bit.””
https://x.com/Hesamation/status/2038642306434150427

Alibaba’s Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn’t mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging.
https://x.com/kimmonismus/status/2038638427604762666

Function Calling Harness: From 6.75% to 100%
https://autobe.dev/blog/function-calling-harness-qwen-meetup-korea/

Holo3, new model of @hcompany_ai outperforming closed and larger open models on GUI navigation 🔥 > A3B/35B based on Qwen3.5 > officially supported in transformers 🤗 > free license 👏
https://x.com/mervenoyann/status/2039327292665561577

I benchmarked various formats of Qwen3.5 27B: BF16, FP8, NVFP4, and INT4 on: RTX Pro 6000, B200, H100 If you have an RTX Pro 6000, INT4 is your best option for faster inference. And it’s probably also true for the RTX 5090.
https://x.com/bnjmn_marie/status/2037564190802563157

I upgraded my Ollama to use MLX and my QWEN3.5:36b speed 2.2Xd instantly.
https://x.com/Shawkat_m1/status/2039014724071719405

I’ve pushed my TurboQuant vLLM to GitHub: TQ 2.5/3.5 fused Triton KV write path Triton decode-attn from packed KV real engine/runtime integration calibration + metadata flow substantial test coverage Qwen3.5-35B AWQ 1M context 4M KV cache ZGX GB10
https://x.com/iotcoi/status/2037478891179135123

Just tested this as I was skeptical and it works suprisingly well actually ( with their llama.cpp fork). Looks like a continued pretraining of qwen3-8b in 1bit 👀. Full weights report below and github/hf instructions: ALL 399 TENSORS token_embd.weight 4096×151669
https://x.com/nisten/status/2039100896840134935

Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090
https://x.com/0xSero/status/2037560787565252666

This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large
https://x.com/PrismML/status/2039049405815529559

vLLM-Omni v0.18.0 is out — 324 commits from 83 contributors (38 new), aligned with vLLM v0.18.0. 🎉 🗣️ Production TTS/Omni serving: Qwen3-TTS, Qwen3-Omni, Fish Speech S2 Pro, Voxtral TTS 🎨 Diffusion runtime refactor with cache-dit/TeaCache and TP/SP/HSDP scaling 🔢 Unified
https://x.com/vllm_project/status/2038415516772299011

your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we’re going to put it there.
https://x.com/HessianFree/status/2039049800398655730

here it is! ~4000 agent traces of GLM-5 in hermes-agent, all uploaded to hf. thanks to @pingToven for supplying openrouter credits necessary for this. next step, fine-tune a Qwen3.5!😆
https://x.com/kaiostephens/status/2038414350986207421

Qwen 27b on the 3090 saving me a bag. This is cost savings for 7 days of usage, w/ Hermes agent. Assuming 80% cache hit (unlikely) and no cache timeout. This is conservative. 27b is between sonnet and 5.4 mini This is just my tokens in/out w/ api costs, assuming no rate
https://x.com/LottoLabs/status/2037557925015949676

Elon Musk’s last co-founder reportedly leaves xAI | TechCrunch