Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted perfume bottle with warm amber liquid, crystal stopper, white background, soft shadow, and glass refractions. Replace the label text with ‘Agents’ in the same black serif typography. Add a delicate sterling silver chain draped naturally around the bottle neck with a small dainty compass rose pendant (eight-pointed star design) in high-fashion jewelry aesthetic, catching light with precise metallic detail.
Great little story from @danshapiro about how he asked a coding agent to fix the official webcam software from Canon that kept crashing. He woke up to a new, fully functional Rust webcam app that has worked ever since.
https://x.com/emollick/status/2037295090306039867
One of our company goals is to automate manual data entry from documents ✍️📑 Our Extract feature in LlamaParse does exactly that, and today we are launching Extract v2 🚀 Define a schema in natural language, and our agentic extraction will fill out the schema from the document
https://x.com/jerryjliu0/status/2039764004332339565
Claude Dispatch and the Power of Interfaces
https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of
Computer use in Claude Code is a huge unlock. The biggest bottleneck in AI coding is it can’t “see” what it built. Computer use gives Claude Code eyes. It can now run this closed loop: “write the code, compile it, launch the app, click through it, find the bug, fix it, and
https://x.com/Yuchenj_UW/status/2038671697923223999
Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans.
https://x.com/claudeai/status/2038663014098899416
Did they just turn on the claude code pets?
https://x.com/meowbooksj/status/2039256157781410298
Fortune: “”Anthropic says: Capybara is a new name for a new tier of model: larger and more intelligent than our Opus model”” “”Compared to our previous best model, Claude Opus 4.6, Capybara gets dramatically higher scores on tests of software coding, academic reasoning, and
https://x.com/scaling01/status/2037379145806524655
I like how the Anthropic Claude Code team is being chill about the code leak. What’s leaked is leaked. 70k forks, Python & Rust versions on GitHub, there’s no way back. One thing is clear from reading the code: harness engineering is hard and deeply non-trivial. I think more
https://x.com/Yuchenj_UW/status/2039191313749524518
I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I’ll focus on the ones I use the most. Here goes.
https://x.com/bcherny/status/2038454336355999749
Let Claude use your computer from the CLI – Claude Code Docs
https://code.claude.com/docs/en/computer-use
new claude code buddy feature is kinda cute
https://x.com/eliebakouch/status/2039176958416720104
Schedule tasks on the web – Claude Code Docs
https://code.claude.com/docs/en/web-scheduled-tasks
The biggest bottleneck in AI for most people isn’t the models. It’s the chatbot. New interfaces like Claude Dispatch, are closing the gap between what AI can do and what people can actually use it for. For many folks, that is where leaps will come from.
https://x.com/emollick/status/2039109996097491153
There’s an AI pet lurking in Claude Code!
https://x.com/dbreunig/status/2039017351061143780
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
https://x.com/AnthropicAI/status/2039749628737019925
🚀 Imagine running Claude 4.6 Opus-level reasoning… but entirely on your own GPU with just 16GB VRAM. This 27B Qwen3.5 variant, distilled on Claude 4.6 Opus reasoning traces, delivers frontier coding power locally. It’s beating Claude Sonnet 4.5 on SWE-bench in 4-bit
https://x.com/outsource_/status/2038999111039357302
This model has been #1 trending for 3 weeks now. It’s Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model:
https://x.com/UnslothAI/status/2038625148354679270
Very bullish on open source and local models Imagine running near-Opus-level model locally on that $600, 16GB Mac Mini you bought last month This 27B Qwen3.5 distill was trained on Claude 4.6 Opus reasoning traces and is putting up real numbers: – beats Claude Sonnet 4.5 on
https://x.com/TheCraigHewitt/status/2039303217620627604
> Anthropic leaked Claude Code source code > someone forked it > 32.6k stars, 44.3k forks > got scared of getting sued > convert the whole codebase from TypeScript to Python with Codex AI is quietly erasing copyright.
https://x.com/Yuchenj_UW/status/2038996920845430815
🧵 Claude Code source leak — After reading 500K+ lines of code, one takeaway stands out: This isn’t just good engineering. It’s research-grade thinking shipped as a product Deep insights from Zhihu contributor Yufeng He 👇 🧠 Core design • A single while(true) loop = the
https://x.com/ZhihuFrontier/status/2039229986339688581
🚨 Anthropic’s Claude Code Source Leak — What It Actually Exposes A careless build mistake just laid bare one of the most advanced AI coding tools — and the lessons are huge. Insights from Zhihu contributor deephub 👇 🏢 About Anthropic Anthropic is a leading AI safety-focused
https://x.com/ZhihuFrontier/status/2039289110075203854
0xMarioNawfal on X: “The leaked Claude Code source has 44 hidden feature flags and 20+ unshipped features. – Background agents running 24/7 – One Claude orchestrating multiple worker Claudes – Cron scheduling for agents – Full voice command mode – Actual browser control via Playwright – Agents that https://t.co/IkU0WzP0VO” / X
https://x.com/RoundtableSpace/status/2038960753458438156?s=20
Anthropic’s new model, Capybara: “Compared to Claude Opus 4.6, Capybara achieves dramatically higher scores in software coding, academic reasoning, and cybersecurity.” According to Dario’s previous interview, it might be a 10T-parameter model that cost $10 billion to train.
https://x.com/Yuchenj_UW/status/2037387996694200509
Beyond raw model capability, the real gap in coding tools is the harness. Now that 500k+ lines of Claude Code are out there, every model lab and AI coding startup, including open-source AI labs, will study it and close that gap fast. SF already has Claude Code source
https://x.com/Yuchenj_UW/status/2039029676040220682
Claude Code leaked their source map, effectively giving you a look into the codebase. I immediately went for the one thing that mattered: spinner verbs There are 187
https://x.com/wesbos/status/2038958747200962952?s=20
Claude code source code has been leaked via a map file in their npm registry! Code:
https://x.com/Fried_rice/status/2038894956459290963?s=20
Claude Code’s source code appears to have leaked: here’s what we know | VentureBeat
https://venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know
Claude Code’s source code has been leaked via a map file in their NPM registry | Hacker News
https://news.ycombinator.com/item?id=47584540
dharmi on X: “incredible to learn more about how the best coding agent works under the hood eg: here is how the plan mode in claude code works https://t.co/qd16GCVjau” / X
https://x.com/DharmiKumbhani/status/2038917827462308308?s=20
DMCAs for Claude code source code are going out.
https://x.com/BlancheMinerva/status/2039114452088295821
ellen livia ᯅ 🇺🇸🇮🇩🔜 ICIAI Tokyo on X: “here’s how Claude Code actually handles memory : all 8 phases 🧵 Our team at @mem0ai use @claudeai a lot, we deeply care about memory. here is a summary of how it works 👇 User Input -> Context Assembly -> History System -> API / Query -> Response -> Summary Phase 1: session https://t.co/hcZbJzbUxB” / X
https://x.com/ellen_in_sf/status/2039098050837463504
fakeguru on X: “I reverse-engineered Claude Code’s leaked source against billions of tokens of my own agent logs. Turns out Anthropic is aware of CC hallucination/laziness, and the fixes are gated to employees only. Here’s the report and CLAUDE.md you need to bypass employee verification:👇 https://t.co/h8KQESUz1i” / X
https://x.com/iamfakeguru/status/2038965567269249484?s=20
himanshu on X: “Based on everything explored in the source code, here’s the full technical recipe behind Claude Code’s memory architecture: [shared by claude code] Claude Code’s memory system is actually insanely well-designed. It isn’t like “store everything” but constrained, structured and https://t.co/PlGRvuvkts” / X
https://x.com/himanshustwts/status/2038924027411222533?s=20
https://pbs.twimg.com/media/HEuwvh_bgAE1xnL?format=jpg&name=large
Justin Schroeder on X: “Important takeaways from Claude’s source code: 1. Much of Claude Code’s system prompting is in the source code. This is actually surprising. (get full post)
https://x.com/jpschroeder/status/2038960058499768427
Leon Lin on X: “IT WORKED. opensource full claude code soon. https://t.co/6TJ2IBgRzq” / X
https://x.com/LexnLin/status/2038991257582604618?s=20
mal on X: “i read through the claude code source code so u dont have to. ” / X
https://x.com/mal_shaik/status/2038918662489510273
most interesting features in the Anthropic CC repo: – Kairos: always-on autonomous agent mode – dream: nightly memory consolidation – teammem: shared project memory – buddy: tamagotchi-like pet system with models
https://x.com/scaling01/status/2038982287648293016
My takeaways from scanning the Claude Code code for ~45 min this evening: 1️⃣Harness engineering is hard. There’s a lot of hard won knowledge in here and plenty of diagnostics to keep the feedback flowing. 2️⃣Harnesses and prompts smooth out model quirks. @SrihariSriraman and I
https://x.com/dbreunig/status/2039206774558036466
OFFICIAL STATEMENT from Anthropic regarding the leak
https://x.com/theo/status/2039074833334689987
Ole Lehmann on X: “i can’t believe more people aren’t talking about this part of the claude code leak there’s a hidden feature in the source code called KAIROS, and it basically shows you anthropic’s endgame KAIROS is an always-on, *proactive* Claude that does things without you asking it to.
https://x.com/itsolelehmann/status/2039018963611627545?s=20
rahat on X: “Claude Code has a regex that detects “wtf”, “ffs”, “piece of shit”, “fuck you”, “this sucks” etc. It doesn’t change behavior…it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will https://t.co/dJTfwxYMCV” / X
https://x.com/Rahatcodes/status/2038995503141065145?s=20
Sebastian Raschka on X: “Claude Code’s Real Secret Sauce (Probably) Isn’t the Model” / X
https://x.com/rasbt/status/2038980345316413862?s=20
The leaked Claude Code hit 110k+ GitHub stars in a day. Made OpenClaw look slow. #1 open-source project in Anthropic history.
https://x.com/Yuchenj_UW/status/2039415430994100440
What surprises me is that @DarioAmodei – the CEO – has said nothing. Boris seems to be an amazing leader and it’s great to hear these words from him. But…
https://x.com/TheTuringPost/status/2039390822093779258
A few take-aways from the Claude Code Leak: – Anthropic is actively using Capybara (Mythos) for development – they are already at Capybara v8 – Capybara still has issues with over-commenting and false-claims – Capybara has 1M context and fast mode – Numbat is another interesting
https://x.com/scaling01/status/2038948989257630166?s=20
Another Claude 5 update: Anthropic’s upcoming Model “”Mythos”” will have its own Tier *above* Opus, called “”Capybara”” This means that in addition to Haikiu, Sonnet, and Opus, there will also be “”Capybara,”” which is even more compute-intensive but also delivers significantly better
https://x.com/kimmonismus/status/2037463638261305752
Anthropic’s new model Capybara/Mythos just wants to be human
https://x.com/scaling01/status/2039091546377576864
Claude Mythos Blog Post Saved before it was taken down.
https://x.com/M1Astra/status/2037377109472018444
Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power revealed in data leak | Fortune
https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/
Local Claude Code builds have been achieved internally
https://x.com/theo/status/2039079267905261831
METR time horizons are doubling every ~107 days Opus 4.6 reached 11.98 hours in February today we should be at around ~15.2h and by end of year ~87.4h 90% CI’s today April 3rd 2026: [11.64h, 21.88h] EOY: [53.13h, 164.19h]
https://x.com/scaling01/status/2040047917306876325
Useful guide for getting started with Hermes Agent:::
https://x.com/Teknium/status/2039102514508058675
LiteParse is our open-source document parser that provides high-quality spatial text parsing with bounding boxes. It can parse hundreds of pages of table-heavy documents in seconds – and give you bounding boxes over all the text elements! 🎁 This means that any agent automation
https://x.com/jerryjliu0/status/2039730277786980833
Z AI has released GLM-5-Turbo, a proprietary model optimized for agentic use cases that scores lower than GLM-5 (Reasoning) on the Artificial Analysis Intelligence Index @Zai_org’s GLM-5-Turbo scores 47 on the Artificial Analysis Intelligence Index, 3 points behind the open
https://x.com/ArtificialAnlys/status/2038667075489808804
Build autonomous agents that plan, navigate apps, and execute multi-step tasks – like searching databases or triggering APIs – with native tool use. With up to 256K context, it can analyze full codebases and retain complex action histories without losing focus.
https://x.com/GoogleDeepMind/status/2039735455533453316
Inbox Zero is a thing of the past. Introducing AI Inbox: cut through your email clutter with smart prioritization and daily personalized briefings. Rolling out today in Beta for Google AI Ultra subscribers in the US. →
https://x.com/gmail/status/2039107985281008078
NEW paper from Google DeepMind The biggest threat to AI agents isn’t a smarter attacker. It’s the web itself. This work introduces the first systematic framework for understanding how the open web can be weaponized against autonomous agents. The paper defines “”AI Agent Traps””:
https://x.com/omarsar0/status/2039383554510217707
. @googlegemma have open sourced the perfect model for local open source agents. Gemma 4 comes in all the sizes we need for mobile, local, and code. This is how I’ll be switching my @thdxr opencode agent over. Let’s go local agents.
https://x.com/ben_burtenshaw/status/2039740590091362749
🎉 Gemma 4 is officially available on vLLM! Byte-for-byte, these are the most capable open models for advanced reasoning and agentic workflows. Key features include: – Native Multimodal Support: Full vision and audio capabilities with up to a 256K context window. – Broad
https://x.com/vllm_project/status/2039762998563418385
A 12-month time difference between Gemma 3 27b and Gemma 4 31b. The jump is absolutely enormous. Just look at the evaluations between the two models. GPQA doubled, AIME 2026 went from ~20% to ~90%, and so on. Crazy.
https://x.com/kimmonismus/status/2039759264680747219?s=20
A Visual Guide to Gemma 4 With almost 40 (!) custom visuals, explore the new models from Google DeepMind. We explore various techniques, ranging from Mixture of Experts and the Vision Encoder all the way up to Per-Layer Embeddings and the Audio Encoder. Link below 👇
https://x.com/MaartenGr/status/2040099556948390075
Gemma 4 — Google DeepMind
https://deepmind.google/models/gemma/gemma-4/
Gemma 4 31B (Reasoning) is very token efficient, using ~1.2M tokens on the GPQA Diamond evaluation, fewer than peers models such as Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M)
https://x.com/ArtificialAnlys/status/2039752015811866652
Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the
https://x.com/Prince_Canuma/status/2039840313074753896
Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!)
https://x.com/demishassabis/status/2040067244349063326
Gemma 4: Our most capable open models to date
https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
Gemma-4-31B is now live in Text Arena – ranking #3 among open models (#27 overall), matching much larger models at 10× smaller scale! A significant jump from Gemma-3-27B (+87 pts). Highlights: – #3 open (#27 overall), on par with the best open models Kimi-K2.5, Qwen-3.5-397b –
https://x.com/arena/status/2039739427715735645
Getting Started with Gemma 4 in AI Studio
https://x.com/GoogleAIStudio/status/2040090067709075732
Google just open-sourced Gemma 4. Unprecedented performance for advanced reasoning and agentic workflows, and big leap in efficiency on a parameter basis. Use it now in KerasHub. I recommend the JAX backend – best performance!
https://x.com/fchollet/status/2039845249334510016
Google just re-entered the game 🔥🔥 They want to take the crown 👑 back from Chinese open source AI. And… Gemma 4 is FINALLY Apache 2.0 aka real-open-source-licensed. From what I’ve seen it’s going to be a pretty significant model. But give it a try yourself today: brew
https://x.com/ClementDelangue/status/2039941213244072173
got Gemma 4 up and running at 34 tokens per second this is the 26B-A4B model, running on my mac mini m4 with 16GB ram next time i hit my claude session limits i’ll have this fast free local AI as a backup :]
https://x.com/measure_plan/status/2040069272613834847
Got Gemma-4-26B-A4 MoE running on iPhone w/Flash SSD in Swift MLX. Still pretty slow, I expect 10+ t/s once optimized properly for Swift.
https://x.com/anemll/status/2040126326708031969
Introducing a Visual Guide to Gemma 4 👀 An in-depth, architectural deep dive of the Gemma 4 family of models. From Per-Layer Embeddings to the vision and audio encoders. Take a look!
https://x.com/osanseviero/status/2040105484061954349
Let’s look at how the open model Gemma has progressed across its last three versions. – Gemma 4 ranks 100 places above Gemma 3 – Gemma 3 ranks 87 above Gemma 2 All three models from @GoogleDeepMind are roughly the same size (31B, 27B, 27B), and these gains came only 9 and 13
https://x.com/arena/status/2039848959301361716
Lets go: Running a full AI assistant locally on a MacBook Air M4 with 16GB, completely free, open source, no API keys needed. Atomic Bot makes it really simple: install, pick Gemma 4, and you have an always-on AI agent running on your machine. No cloud. No subscription. No data
https://x.com/kimmonismus/status/2039989730901623049
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
https://x.com/GoogleDeepMind/status/2039735446628925907
NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇
https://x.com/xenovacom/status/2039741226337935430
To explain why I consider Gemma 4 a bigger release than most people realize. This is a big deal because models like Gemma 4 E4B can run directly on devices, bringing powerful AI (even a 2B model ~60% on MMLU Pro) to phones, laptops, and edge systems without relying on the cloud,
https://x.com/kimmonismus/status/2039978863644537048
Today, we’re launching Gemma 4, our most intelligent open models to date. Built with the same breakthrough technology as Gemini 3, Gemma 4 brings advanced reasoning to your personal hardware and devices. Here’s what Gemma 4 unlocks for developers: — Intelligence-per-parameter:
https://x.com/GoogleAI/status/2039735543068504476
We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially
https://x.com/Google/status/2039736220834480233
You can run Gemma 4 100% locally in your browser thanks to HF transformers.js. That means 100% private and 100% free! @xenovacom created a demo for it here:
https://x.com/ClementDelangue/status/2039782910996148508
run OpenClaw, Hermes Agent and Pi with Gemma 4 with few lines of change 🔥
https://x.com/mervenoyann/status/2039788257815261400
So happy to see Google release Gemma 4 today in apache 2.0 that gives you frontier capabilities locally. You can use it right away in all your favorite open agent platforms like openclaw, opencode, pi, Hermes by asking it to change your model to local gemma 4 with
https://x.com/ClementDelangue/status/2039740419899056152
Been really cool to see the traction of @NousResearch Hermes Agent, the open source agent that grows with you! Hermes Agent is open-source and remembers what it learns and gets more capable over time, with a multi-level memory system and persistent dedicated machine access.
https://x.com/ClementDelangue/status/2037634211973140898
I just had a very magical moment with the Hermes Agent by @NousResearch . My Hermes agent messaged my business partner’s Hermes agent, and they established a secure connection. They made a few rounds back-and-forth, introduced themselves, and updated notes on the current
https://x.com/fancylancer3991/status/2037579517389144399
Going to install Hermes today Never did get around to OpenClaw. Having read what I’ve seen about Hermes, kind of glad I waited. Excited to give it a go
https://x.com/soundslikecanoe/status/2038611090704113931
Openclaw took me weeks to deploy and get going. Something still breaks daily. I still love it. Hermes took 15 min to setup and get running, fully local, Discord, local model. Crazy… Keep tinkering. Stay agnostic.
https://x.com/charliehinojosa/status/2039384870091465202
Switched to Hermes over OpenClaw a few weeks back and it’s been largely smooth sailing and a blissful experience For those still using OpenClaw, is it a lot more smooth sailing these days too?
https://x.com/Zeneca/status/2039836468928233875
You can switch to Hermes in 2 minutes. They have an import function from OpenClaw. Smart @NousResearch.
https://x.com/AntoineRSX/status/2039017227270156395
OpenClaw on a Unitree G1 humanoid 🤯 A MIT dropout developed an open-source robotics platform that supports 80% of Chinese OEM robots! This OpenClaw upgrade to process physical space and time via integrations with LiDAR, stereo, or RGB cameras. It enables robots like the
https://x.com/IlirAliu_/status/2039250442434072973
The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries
https://x.com/DrJimFan/status/2039358115318243352
Here comes AutoClaw. We offer a new solution to run OpenClaw locally on your own machine. – Download and start immediately. No API key required. – Bring any model you like, or use GLM-5-Turbo, optimized for tool calling and multi-step tasks. – Fully local. Your data never leaves
https://x.com/Zai_org/status/2038632251551023250
Open Models have crossed a threshold
https://blog.langchain.com/open-models-have-crossed-a-threshold/
A load-bearing wall that everyone assumed was structural, could be removed now. That kind of unlock doesn’t come along often in front-end!
https://x.com/TheTuringPost/status/2038892871663685902
My dear front-end developers (and anyone who’s interested in the future of interfaces):
I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept):
Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow
https://x.com/_chenglou/status/2037713766205608234
pretext is a bigger deal than you think – YouTube
🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’.
https://x.com/Alibaba_Qwen/status/2038636335272194241
Demo2:Audio-Visual Vibe Coding
https://x.com/Alibaba_Qwen/status/2038637124619231467
Here’s another demo of Audio-Visual Vibe Coding~
https://x.com/Alibaba_Qwen/status/2038641496455557565
Qwen
https://qwen.ai/blog?id=qwen3.5-omni
Qwen
https://qwen.ai/blog?id=qwen3.6
.@ceo_clickhouse raised $50M for @ClickHouseDB with no deck, no product, no customers. On Gradient Dissent he calls out Snowflake, Datadog and Databricks by name, talks wiring $100M out of SVB before it collapsed, and why he’s building the fastest database in AI for agents not
https://x.com/wandb/status/2038984035301822784
“how do we get training data to improve our agents?” Collect every Trace + point agentic compute at it – run an Data/Eval Agent on every trace – mine errors+mistakes, fix + test – turn this into a data point for training or harness eng ex: just internal dogfooding of our agents
https://x.com/Vtrivedy10/status/2040079505763504373
@ClementDelangue Yes! Our work on Agent Data Protocol (https://t.co/lTOthtvYIq) proposes a standardized schema for agent interaction traces to make collection, sharing, and reuse easier across different agent frameworks. Happy to contribute/collaborate! 📰Paper link:
https://x.com/yueqi_song/status/2037614951230296230
// Coding Agents are Effective Long-Context Processors // We are just touching the surface of what’s possible with coding agents. LLMs struggle with long contexts, even the ones that support massive context windows. It turns out coding agents already know how to solve this;
https://x.com/dair_ai/status/2038635382989005015
// Unified Inference and Training Framework for Agent Memory // Most memory-augmented agents are built with duct tape–one system for storage, another for retrieval, a third for training. New research introduces a unified framework that treats agent memory as a first-class,
https://x.com/omarsar0/status/2039349083039817984
Agent harnesses are too restrictive. That’s because they’re still designed as code. What if the harness itself were written in natural language and interpreted by an LLM at runtime? This research explores the idea. The work introduces Natural-Language Agent Harnesses (NLAHs),
https://x.com/dair_ai/status/2038968068706390117
Agent Labs: Workload-Harness Fit – Software Synthesis
https://www.akashbajwa.co/p/agent-labs-workload-harness-fit
Agent Orchestration & Cowork with Slackbot | Slack
https://slack.com/blog/news/agent-orchestration
Always satisfying to visualize improvements as a ladder! It’s also worth observing here that: – all methods from 15-72% use dense retrieval models or BM25 – all methods above 80-91% use late interaction models, from LightOn and Mixedbread
https://x.com/lateinteraction/status/2039382401961410803
are you paying attention to what just shipped? this is a kanban board where the workers are AI agents. > you create a card. > an agent picks it up. it runs in its own worktree. > you review the diff when it’s done. > link cards together and they figure out the dependency
https://x.com/VibeMarketer_/status/2037521519736463782
Build more efficient AI agents with the Agent Skills specification 🛠️ By using progressive disclosure, you can load domain expertise only when needed. This can reduce baseline context usage by 90%. We break agent knowledge into three layers: 1️⃣ L1 metadata: Just enough info
https://x.com/googledevs/status/2039359112668950986
Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like @karpathy, I also use Obsidian for my MD vaults. What’s different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for
https://x.com/omarsar0/status/2039844072748204246
day 2 of the harness engineering series: dynamic config middleware lets you reshape your agent’s model, tools, and prompt at every step based on context. ex: LLMToolSelectorMiddleware runs a fast filter on your tool registry so your main model receives streamlined tool specs.
https://x.com/sydneyrunkle/status/2039040565749096607
depthfirst has raised an $80M Series B at a $580M valuation. Attackers are using AI to break into systems faster than ever before. depthfirst is on a mission to stop this. RT + Comment “depthfirst” and I’ll send you a FREE vibe coding security agent.
https://x.com/andreamichi/status/2039010131443437850
Diagram of the LLM Knowledge Base system. Feed this to your favorite agent and get your own LLM knowledge base going.
https://x.com/omarsar0/status/2040099881008652634
EpochX Building the Infrastructure for an Emergent Agent Civilization paper:
https://x.com/_akhaliq/status/2039006585188499744
feedback loops for agents are all the rage – here’s how @vishsuresh_ implemented one for our GTM agent! great blog on it below
https://x.com/hwchase17/status/2039749451259195428
Generalization Results from APEX-Agents Dev Set | Mercor Research
https://www.mercor.com/blog/generalization-results-from-training-on-the-apex-agents-dev-set/
Hark just posted 25 roles to build AI models and native AI devices: > AI Infra > Supply Chain > Embedded Software > Product Engineering > iOS & Android Mobile > Computer Use Agents > AI Foundation Models > Design & Hardware Engineering Apply here:
https://x.com/adcock_brett/status/2037559392858722789
Human-in-the-loop in @LangChain UIs is a clean pattern: the agent interrupts, your frontend reads the pending action, and the user decides whether to approve, reject, or edit before execution continues. Interrupts show up as regular stream state, so rendering a review UI feels
https://x.com/LangChain_JS/status/2038985561348993107
Hyperagent is one systems that can modify everything about itself, including how it improves itself. Hyperagents combine everything into one editable program: ▪️ Task agent that solves the task ▪️ Meta agent which modifies the system. It’s not fixed, it can modify itself too.
https://x.com/TheTuringPost/status/2037289001552683041
In 2024, @ScottWu46 & @russelljkaplan launched Devin, the first AI software engineer. In the first two months of 2026, Devin usage surpassed all of 2025. They help huge companies finish projects in months, that previously took years. Fun to catch up, and play Ricochet Robots.
https://x.com/JTLonsdale/status/2037555800193851727
Introducing Kaggle Standardized Agent Exams 🔥 Let your agents register to an exam, solve it, and join the leaderboard
https://x.com/osanseviero/status/2039246602255114650
Introducing the agent-browser dashboard See exactly what your agent sees → Watch headless browser in real time → Manage all your sessions in one place → Debug with activity, console, network, and storage panels agent-browser dashboard start
https://x.com/ctatedev/status/2037599050112160165
It is trendy to discuss Jevon’s Paradox in AI (as AI gets more efficient, overall use increases) but the current situation is much simpler: thanks to agents, token demand is surging and compute is supply constrained, at least for powerful models. That will be reflected in pricing
https://x.com/emollick/status/2038629127712878725
Most devs think that adding more agents to a planning system should help. The math says otherwise. New theoretical work from MIT proves fundamental limits on what multi-agent LLM architectures can achieve. The work models LLM multi-agent planning as finite acyclic decision
https://x.com/omarsar0/status/2039361664374739136
New LangChain Academy Course Launch: Monitoring Production Agents Shipping agents to production is hard. Unlike traditional software, agents are non-deterministic. Users can say anything, and the same input can produce different outputs. You can’t rely on pre-launch testing
https://x.com/LangChain/status/2039014039892947062
One thing I’ve realized after migrating a bunch of agentic workflows to RLMs and trying auto research on it It’s not enough to put the context/prompt in the harness/repl You have to put the harness into the harness Recursive all the way!
https://x.com/raibaggy/status/2039849261974814882
Scaling of agents is getting weird. Still find 2-4 sessions optimal for my brain too but invoking agent teams within them
https://x.com/kylebrussell/status/2040090424799350878
Strix – open-source AI hackers for apps It uses multi-agent systems that run your code, attack it and validate vulnerabilities with working proof-of-concepts. Comes with a full built-in toolkit (browser, proxy, terminal, Python runtime) for static + dynamic analysis in one
https://x.com/TheTuringPost/status/2037564560446804239
Teleport Beams — Trusted Runtimes for Infrastructure Agents
https://www.beams.run/
The journey from a one-shot LLM to a single agent with DSPy, and finally to specialized sub-agents + MIPRO, is very valuable here. 1. Let agents control their own context retrieval. The shift from “”here are the pages”” to “”here are tools, go investigate”” is an important
https://x.com/koylanai/status/2039027239304433767
The Model-Harness Training Loop imo every great team in the world will use some version of this loop to build the best agents for their tasks this is now possible because: 1. Harness Engineering is becoming more democratized and accessible (we want it to be even easier) 2. Open
https://x.com/Vtrivedy10/status/2039872562662941118
the next phase of long running autonomous agents is when agents monitor & understand when things go wrong and deploy fixes great blog from Vishnu on how he built this pipeline using our background coding agent there’s a lot of human priors that go into mining & identifying
https://x.com/Vtrivedy10/status/2039756274468810778
Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵
https://x.com/liquidai/status/2039029358224871605
Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on
https://x.com/lennysan/status/2039845666680176703
We also added in-line diffs to the CLI TUI! Many coders desired the ability to see the changes the agent was making when it was making it, to better help them know what’s going on. Now you can! It is on by default but able to be disabled in the config!
https://x.com/Teknium/status/2040152383121154265
We also overhauled the docs — new guides for GRPO training, vLLM serving, training stability, debugging, and agent-specific workflows. Full release notes:
https://t.co/iL9iaBUuzm Docs:
https://t.co/7FS3EQTuxs pip install axolotl==0.16.0
https://x.com/winglian/status/2039740266597245113
We need more open agent traces datasets. Who can help?
https://x.com/ClementDelangue/status/2037530125638455610
We’re excited to support @Arcee_ai’s Trinity-Large-Thinking — a frontier open reasoning model Purpose-built for the agents people are actually running in production. Proud to have supported with our infra and post-training stack including prime-rl and verifiers.
https://x.com/PrimeIntellect/status/2039401593309667727
We’re introducing Cursor 3. It is simpler, more powerful, and built for a world where all code is written by agents, while keeping the depth of a development environment.
https://x.com/cursor_ai/status/2039768512894505086
we’re seeing that open source models are getting good at file operations, summarization, tool use, retrieval good enough to drive harnesses like deep agents!
https://x.com/hwchase17/status/2039787730402705653
When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc… I am really
https://x.com/karpathy/status/2037200624450936940
You can now automate harness engineering. System prompts. Tool definitions. Retry logic. Context management. Changing just this layer can create a 6x performance gap on the same model. It’s called Meta-Harness. Here’s how it works: 1. Start with any harness. A coding agent
https://x.com/LiorOnAI/status/2038669301541228606
Your coding agent shouldn’t live in a chat box 🤖 Give it a real workspace: sandboxed filesystem, live file tree, diffs, terminal, chat. New Deep Agents guide shows how to build an IDE-style agent UI end-to-end ↓
https://x.com/LangChain_JS/status/2037560951445266891
👾 Claude Code 🤝 LangSmith 🦜 We’ve shipped a new way to trace Claude Code runs to LangSmith! It’s a plugin that traces subagents, tool calls, compaction runs, and more. You can run evals to test the impact of skills/MCPs, use LangSmith Insights to look for trends across your
https://x.com/LangChain/status/2040137349313556633
Agent teams in Claude Code can drive SO MANY browser tabs for verification work
https://x.com/kylebrussell/status/2039825390131155270
Another sick upcoming feature: /acp spawn codex –bind here LOOK AT ME, I AM CODEX NOW You could bind codex/claude code/opencode already in threads, now you can take over your current session as well.
https://x.com/steipete/status/2037725493315707290
Anthropic DMCA’d my Claude code fork. …which did not have the Claude Code source. It was only for a PR where I edited a skill a few weeks ago. Absolutely pathetic.
https://x.com/theo/status/2039411851919057339
Anthropic should really lock in more GPUs and TPUs. Otherwise we’re all headed for a $2,000/month Claude Capybara plan.
https://x.com/Yuchenj_UW/status/2037391159115563214
Audit Claude Platform activity with the Compliance API | Claude
https://claude.com/blog/claude-platform-compliance-api
Bought Claude Pro and hit the usage limit with a single Claude Code prompt (Sonnet 4.6) The same prompt runs significantly more times on Codex (with Plus sub) Not sure how people deal with Anthropic’s cursed rate limits and absolutely scuffed infrastructure reliability
https://x.com/cto_junior/status/2040130186755371192
btw the codex for claude code plugin is open sauce – go play around with it, tinker and make it fit for your use cases gotta love the open standards
https://x.com/reach_vb/status/2038702889070211557
Claude Code being closed source is the biggest bag fumble in the AI era. If CC was on Github, these things would be trivial to identify and fix. Instead we’re stuck reverse engineering their incompetence.
https://x.com/theo/status/2038740065300676777
Claude SO slow
https://x.com/Teknium/status/2039270117650116934
Code is free, but Anthropic is shutting down repos of the leaked Claude Code source with DMCA requests. 🤔
https://x.com/dbreunig/status/2039007097376108979
I built a new plugin! You can now trigger Codex from Claude Code! Use the Codex plugin for Claude Code to delegate tasks to Codex or have Codex review your changes using your ChatGPT subscription. Start by installing the plugin:
https://x.com/dkundel/status/2038670330257109461
In a world where everyone can build websites, apps and features easily (thank you Cursor, Lovable, Claude and the likes), it will take more for you and your company to differentiate themselves (which is in my opinion the basis for success). That’s why we’re seeing more and more
https://x.com/ClementDelangue/status/2038649731404927202
In this Codex vs. Claude Code AI coding war, rate limit reset frequency is Prometheus’s fire. Whoever gives developers more rate limit resets wins this token economy.
https://x.com/Yuchenj_UW/status/2039364184459391075
Introducing ARC — Agent Remote Control 📡 Think Claude Code’s web viewer, but for any local AI agent. Start a task on your workstation, then monitor and interact from any browser – your phone, tablet, or another laptop. Watch tool calls in real-time, send messages, answer
https://x.com/winglian/status/2038680417125957865
Is Claude Code 5x Cheaper Than Cursor?
https://www.ashu.co/claude-code-vs-cursor-pricing/
it may sound insane, but @theo is almost definitely right to suggest that Anthropic should open-source Claude Code. The benefits are obvious: -it would be the most-starred repo of all time by a very wide margin, easily surpassing OpenClaw -they would get a bunch of free
https://x.com/michael_chomsky/status/2039986402260046226
me after seeing claude elevated errors and 529 (couldn’t remove the markers because i got 529 while making this)
https://x.com/dejavucoder/status/2037439287873159641
Sachin1801/claude-code | DeepWiki
https://deepwiki.com/Sachin1801/claude-code
Self-updating docs in the Claude Code source code: – Employees can create new files “”Magic Docs”” with the MAGIC DOC header – Internal builds of Claude Code fire off a dedicated subagent when idle – A background agent documents the specified feature in the Magic Doc file –
https://x.com/mattyp/status/2038988217102266669
Starting today you can use Codex in Claude Code 👀 /plugin marketplace add openai/codex-plugin-cc Try it out today with: /codex:review for a normal read-only Codex review /codex:adversarial-review for a steerable challenge review /codex:rescue to let codex rescue your code
https://x.com/reach_vb/status/2038671858862583967
This is exactly what I’ve been doing with Claude Code. The biggest bottleneck with my ability to use these agents is ensuring they preserve relevant context between relevant sessions. Having the agent output files in .md and .html is not only a nicer way to view outputs than in
https://x.com/jerryjliu0/status/2039834316013031909
To clarify, OpenHands will not be issuing any DMCA takedown notices for those who want to use our agent, which has most of the features of Claude Code. We have Tamagotchi on the roadmap, no worries.
https://x.com/gneubig/status/2039166255089799222
Two notes from this year-old prediction: 1) You can either view this as hype (100% of code is not written by AI) or a startlingly solid prediction (Claude Code didn’t exist then, but now writes a remarkably high percentage of code) 2) Adoption is more of a barrier than technology
https://x.com/emollick/status/2037147367925789073
Universal CLAUDE.md Claims to cut Claude output tokens by 63%! Drop-in. No code changes. CLAUDE.md is one of the best ways to steer Claude Code. Not surprised to see the efficiency reported here.
https://x.com/omarsar0/status/2039343351187554490
We don’t talk about this enough. Opus scored 20% higher in Cursor than in Claude Code.
https://x.com/theo/status/2038690786821505378
We just open-sourced 12 agent skills that teach Claude Code and Codex how to use Together AI. Install them and your coding agent just knows the right SDK patterns, model IDs, and API calls, no more copy-pasting from docs! npx skills add togethercomputer/skills
https://x.com/togethercompute/status/2039392682553094239
We need to talk about the Claude Code rate limits
https://x.com/theo/status/2039992633616224366
We’ve made setting up
https://t.co/wbJhWHsewH with GitHub much easier! You can now run /web-setup in a local `claude` session to use your local GitHub credentials on the web
https://x.com/_catwu/status/2039027712288075812
Chinese OpenSource models are gonna mug Anthropic & OpenAI like they never existed before The coding gap between open and closed-source is practically gone GLM-5.1 gives the almost the same comparable coding performance that goes toe-to-toe with Claude Opus, but a roughly 10x
https://x.com/XFreeze/status/2037695882301436412
⚠️ Supply chain attack in progress: someone is squatting Anthropic-internal npm package names targeting people trying to compile the leaked Claude Code source. `color-diff-napi` and `modifiers-napi` — both registered today, same person, disposable email. Do NOT install them. 🧵
https://x.com/Butanium_/status/2039079715823128964
I think this is a terrible move by @AnthropicAI. The open source community is going to build custom harness now anyways, you might as well have some control. Obviously they didn’t want this to happen, but now that it has I don’t see what they’re going to accomplish
https://x.com/BlancheMinerva/status/2039128635559318013
is it just me or is Claude down?
https://x.com/iScienceLuvr/status/2037487244634972471
The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.
https://x.com/emollick/status/2039142905156153428
This is an actual violation of the DMCA. Anthropic just broke the law.
https://x.com/theo/status/2039412173689196674
I know these are all unreliable leaks of internal code names but please, please AI labs, the only thing worse than calling your models GPT-5.5-xhigh-Codex-nano is giving them names like Agent Smith or Mythos, for obvious reasons.
https://x.com/emollick/status/2037565418970185786
I have long felt that agent harnesses – even claude code – are too restrictive, because they are still designed by humans. New paper for Tinsghua and Shenzhen says, what if AI itself runs the harness, rather than defining it in code? Given a natural language SOP of how an agent
https://x.com/rronak_/status/2038401494177694074
Collinear presents YC-Bench This benchmark evaluates agent capability to run a simulated startup over a one-year horizon spanning hundreds of turns.
https://x.com/arankomatsuzaki/status/2039541189968626047
evals rhyme with training data the same rigor and care we put into data quality/curation for training should go into eval design training data updates the weights of our models, each example contributes a weight push in some direction to correctly classify that datapoint Evals
https://x.com/Vtrivedy10/status/2039029715533455860
I just published a blog that covers 30+ popular LLM evals / benchmarks and how they are created. Here are the common themes for success… For full details, find the blog post here:
https://t.co/sWSNkbCEhm (1) Domain Taxonomy. Most popular LLM benchmarks categorize their data
https://x.com/cwolferesearch/status/2039009111711367557
I really like the strategy used by CursorBench to evaluate Composer 2. Many good design decision: – Benchmark items are sourced from real coding sessions (from the Cursor team, so no issues with opt-in), which makes the evals realistic and less prone to contamination. – The
https://x.com/cwolferesearch/status/2037726856699420987
Introducing AA-AgentPerf – the hardware benchmark for the agent era. Key details: ➤ Real agent workloads, not synthetic queries: we’ve captured real coding agent trajectories where our agents used up to 200 turns and worked with sequence lengths >100K tokens ➤ Production
https://x.com/ArtificialAnlys/status/2037562417836929315
Introducing Contra Labs. The first frontier data and evaluation lab for Creative AI.
https://x.com/contraben/status/2039021014244262000?s=20
New conceptual guide: 🔄 The agent improvement loop starts with a trace Tracing is the foundational primitive for improving agents. A trace gives you the full behavioral record of what an agent actually did. From there, teams can enrich traces with evals and human feedback,
https://x.com/LangChain/status/2039028327030079565
Reasoning over Mathematical Objects Our 70-page(!) paper is out on arXiv, as covered by several of our recent blog posts. We study how to improve reasoning on hard tasks (e.g., math expressions) via: • better training data (& new evals) • better reward models (on-policy
https://x.com/jaseweston/status/2040062089725645039
Tau Bench got an update! Tau Bench is one of the most adopted Agentic Benchmarks. They now added “Banking” a fintech-inspired customer support domain built around a realistic knowledge base of 698 documents across 21 product categories. Tasks require agents to search this
https://x.com/_philschmid/status/2038655544613826985
The Agent Evaluation Readiness Checklist Starting to think through how to test your agents? We put together a step-by-step checklist for building, running, and shipping agent evals. 🧪 We walk through: → How to read traces in LangSmith and analyze errors, before building evals
https://x.com/LangChain/status/2037590936234959355
we’re leaning incredibly hard into Open Models + Open Harnesses evals show that current open models get near frontier (or better) intelligence on many tasks, they’re way cheaper, and usually faster real world tasks need to take perf, cost, latency into account many tasks don’t
https://x.com/Vtrivedy10/status/2039805753905840159
we’re leaning into the future of Agent Improvement with Traces, Evals, & Infra the future will be deeply grounded in data so that we can win against slop that means we’ll need to: – point smart agentic compute towards traces to surface and monitor errors – use human & agent
https://x.com/Vtrivedy10/status/2039035899938267334
Weekend over. Here’s what I built:
https://t.co/me1qexYWgw A simple agent-native CLI to parse, sanitise, and commit agent traces to public or private Hugging Face datasets for analytics, evals, and training. What I focused on: – a schema that is actually useful for downstream
https://x.com/jayfarei/status/2038385591818023278
Cohere has released Cohere Transcribe: an open weights model achieving 4.7% on AA-WER, based on 3 datasets including our proprietary AA-AgentTalk dataset The 2B parameter model is based on a conformer encoder-decoder architecture. It was trained from scratch on 14 languages
https://x.com/ArtificialAnlys/status/2038678855213568031
Improve coding agents’ performance with Gemini API Docs MCP and Agent Skills.
https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-docsmcp-agent-skills/
Really enjoyed this conversation with @JTLonsdale on what’s happening in agentic software engineering and what it means for the rest of the world.
https://x.com/russelljkaplan/status/2037628375788073105
Meta tests Paricado model family, also Health agents
https://www.testingcatalog.com/meta-tests-paricado-model-family-health-and-document-agents/
Curious if there has been any good articles written on the impact of VLMs on low-vision and blind people. The advent of a universal text reading, and visual description system seems like it would be a big advance as a result of AI, but haven’t seen anything written about it.
https://x.com/emollick/status/2037968740671713407
IBM just dropped Granite 4.0-3B-Vision, new vision language model for documents > sota for its size for table & charts 🙌🏼 > use with transformers & vLLM > free license
https://x.com/mervenoyann/status/2039015519135641997
Introducing Gemini 3.1 Flash Live, our new realtime model to build voice and vision agents!! We have spent more than a year improving the model + infra + experience, the results? A step function improvement in quality, reliability, and latency.
https://x.com/OfficialLoganK/status/2037187750005240307
The leading performance of GLM-5V-Turbo stems from systematic upgrades across four levels: Native Multimodal Fusion: Deep fusion of text and vision begins at pre-training, with multimodal collaborative optimization during post-training. We developed the next-generation CogViT
https://x.com/Zai_org/status/2039371149721694639
Cycle your keys and oauths for the same provider when one runs out – now in Hermes Agent latest. `hermes update` to access!
https://x.com/Teknium/status/2039096442313396514
Deeper dive into some of the updates in v0.7 Memory: We have begun transitioning each of the systems in Hermes Agent to work through defined interfaces so that the core code is more maintainable, and more providers for everything can be supported. We started with memory: Now
https://x.com/Teknium/status/2040151297991770435
Hermes Agent now supports @plastic_lab’s Honcho, @mem0ai, @openvikingai, @Vectorizeio’s Hindsight, @retaindb, and @ByteroverDev memory systems! Try them now with `hermes update` then `hermes memory setup` We have rehauled our memory system to be much more maintainable and
https://x.com/Teknium/status/2039912975444926885
installed the icarus plugin on my Hermes agent. it picked up all 6 tools automatically. The agent works across slack, telegram, discord. every session gets captured. after a month of running you have hundreds of real decisions logged. then you tell the agent “”train yourself.””
https://x.com/IcarusHermes/status/2038524251355934872
It’s FINALLY HERE! Multi Agent Profiles so you can have as many independent bots with their own memory, gateway connections, skills, chat history, everything! To use: Run `hermes update` and look for multi agent profiles User Guide:
https://t.co/i0R8puqJ6k Reference:
https://x.com/Teknium/status/2038694680549077059
Our biggest day EVER with Hermes Agent, we’re now #5 biggest AI App on OpenRouter metrics! What do you want to see in the next update?
https://x.com/Teknium/status/2039788883312087231
Your Hermes agent writes things every session — research, skills, decisions, logs. After a few weeks, you’ve got hundreds of files sitting in the working directory. But the agent can’t read them all every session. It doesn’t know which ones matter for this question. So it either
https://x.com/jphorism/status/2039822829412405671
Excited about our new paper: AI Agent Traps AI agents inherit every vulnerability of the LLMs they’re built on – but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself. The web pages, emails, APIs, and
https://x.com/FranklinMatija/status/2039001719007330530
It’s time for open-source agent tools to rely primarily on open-source models, instead of closed-source APIs that send all your data to the cloud and ultimately will get hacked and/or shut down
https://x.com/ClementDelangue/status/2038552830638755962
// Graph Augmented Associative Memory for Agents // Long-term memory for agents is still an unsolved problem. Flat RAG loses structural relationships, and knowledge graphs miss conversational associations. New research proposes combining both through a hierarchical approach.
https://x.com/dair_ai/status/2039072251199549573
Fine-grained authorization for RAG is one of the most underestimated problems in production AI. If your agent can retrieve documents, it needs to enforce who’s allowed to see them, not just at the role level. With @auth0 FGA and
https://x.com/thinkshiv/status/2039836920243486790
Access control is one of the top priorities across every enterprise organization to secure AI agents. We’re excited to collaborate with @auth0 on this blog post. We’re building the infrastructure enabling agents to automate document heavy work (invoices, contracts, claims,
https://x.com/jerryjliu0/status/2039841363202818505
Autonomous AI is already in production in 50%+ of orgs, but governance is falling behind, and agent sprawl is becoming the next enterprise risk. Here’s a good webinar that can help mitigate it: “”AgentOps 2026: How to Securely Manage AI Agents”” →
https://x.com/TheTuringPost/status/2037877632520634654
The first paper from the Secure Intelligence Institute responds to NIST’s request for information on securing autonomous agents. Read the paper on arXiv:
https://x.com/perplexity_ai/status/2039029152880480260
We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7
https://x.com/LyptusResearch/status/2039861448927739925
NEW papers on self-organizing LLM Agents. Assign an agent a role, and it’ll follow instructions. Let agents figure out roles themselves, and they’ll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing
https://x.com/dair_ai/status/2039350842382512455
NEW research from CMU. (bookmark this one) The biggest unlock in coding agents is understanding strategies for how to run them asynchronously. Simply giving a single agent more iterations helps, but does not scale well. And multi-agent research shows that coordination >
https://x.com/omarsar0/status/2038627572108743001
Using GLM-5.1 in Coding Agents
https://x.com/Zai_org/status/2037506911013138851
It helps to think of ARC-AGI-3 as a different test entirely than the previous ARC-AGIs. It measures different things (though, as in the previous tests, precisely what it measures isn’t clear) and has different rules. That doesn’t mean it isn’t good, but it is its own thing.
https://x.com/emollick/status/2037356753197617409
Kind of want a ARC-AGI-X test where a reputable organization runs it & builds a validated benchmark with outside expert help, but they never disclose the questions or even the nature of the challenges themselves so the tasks can never be targets. All we see is a leaderboard
https://x.com/emollick/status/2037106065553154521
This is true, but ARC-AGI-3 is also a test designed so that AI gets zero today, just as the earlier ARC-AGI tests were designed . Those tests were then mostly saturated with a year or two. The thing to watch with ARC-AGI-3 is whether we see the same progress.
https://x.com/emollick/status/2038680759305691586
World Reasoning Arena – A comprehensive benchmark for evaluating world model – Expose a substantial gap between current models and human-level hypothetical reasoning
https://x.com/arankomatsuzaki/status/2038443186255991169
(9) AIE Europe Day 1: Keynotes & OpenClaw/Personal Agents ft OpenAI, Vercel, Google Deepmind & more – YouTube
Love that Google DeepMind is following OpenAI’s suit w/ using Apache 2.0 license for their open weights models – congrats! but, can we please stop using Arena Elo as the de facto measure of performance?
https://x.com/reach_vb/status/2040070816247734720
Quality of life updates to @GoogleAIStudio we just shipped (using Gemini): – You can now (optionally) save a temp chat in the playground – You can now turn a playground chat into an app in 2 clicks – Updated colors for playground to add some soul to it – Simplified the mobile
https://x.com/OfficialLoganK/status/2039137446932185266
So much of this, every day. You really have to develop thick skin (exoskeleton?) when working on successful open source. (The Chrome extension has been removed since Google added native access in 144+, which is simpler, but yes, it does require a one-time setting change)
https://x.com/steipete/status/2037988925818519763
How developers can use Veo 3.1 Lite for AI video generation
https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/
Veo 3.1 Lite now available in Gemini API and @GoogleAIStudio. Designed for rapid prototyping and high-volume video generation, starting at $0.05/sec. 🪶 – 1/2 the cost of Veo 3.1 Fast. – Text-to-Video (T2V) & Image-to-Video (I2V). – Landscape (16:9) and Portrait (9:16) format –
https://x.com/_philschmid/status/2039014102811427263
.@GoogleDeepMind Gemma 4 is here with state-of-the-art models targeting edge and workstations. Requires Ollama 0.20+ that is rolling out. 4 models: 4B Effective (E4B) ollama run gemma4:e4b 2B Effective (E2B) ollama run gemma4:e2b 26B (4B active MoE) ollama run gemma4:26b
https://x.com/ollama/status/2039738348647108680
.@UnslothAI supports @GoogleGemma 4 models, optimized for RTX GPUs. 🦥 Run & fine-tune locally in Unsloth Studio.
https://x.com/NVIDIA_AI_PC/status/2040096993800761579
Axolotl support for Gemma 4 is in v0.16.1 is released! Finetune @GoogleAIStudio Gemma4 26B-A4B on your own 5090 using our optimized fused MoE+LoRA kernels!
https://x.com/winglian/status/2039823559363629432
Deploy Gemma4 31B and 26B-A4B with one click on Hugging Face Inference Endpoints 🔥👇
https://x.com/ErikKaum/status/2040008281796513939
Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use – happy building!
https://x.com/demishassabis/status/2039736628659269901
Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4
https://x.com/rasbt/status/2039780905619705902
future is local 🔥 Google DeepMind just released Gemma 4: local frontier in many sizes, all modalities with free license 🤯 we ship Gemma 4 in transformers, llama.cpp, transformers.js and more for your convenience 🫡 plug-and-play with your agents 🙌🏻 read our blog ⤵️
https://x.com/mervenoyann/status/2039739097611215344
Gemma
https://x.com/OfficialLoganK/status/2039486016751366431
Gemma 4 26B MoE (4B active) on a single RTX 4090: – 162 t/s decode – 8,400 t/s prefill – Full 262K native context — 19.5 GB VRAM – Only 10 Elo below the 31B dense Q8_0 on dual 4090+3090: 9,024 t/s prefill at 10K. 2,537 t/s at full 262K — that’s a novel in about 100
https://x.com/basecampbernie/status/2039847254534852783
Gemma 4 architecture analysis thread Just as Gemma3n, this thing has a galaxybrained architecture, very much not a standard transformer
https://x.com/norpadon/status/2039740827975500251
Gemma 4 by @GoogleDeepMind debuts at 3rd and 6th on the open source leaderboard, making it the #1 ranked US open source model. By total parameter count, Gemma 4 31B is 24× smaller than GLM-5 and 34× smaller than Kimi-K2.5-Thinking, delivering comparable performance at a
https://x.com/arena/status/2039782449648214247
Gemma 4 is here! The best open-source model you can run on your machine. Day-0 support in a llama.cpp. Check it out!
https://x.com/ggerganov/status/2039744468899811419
Gemma 4 is live on Baseten and available to all customers on day 0 via the Baseten model library. All models in the Gemma 4 family are multimodal, supporting text and image inputs with text output. Key capabilities include: -> Advanced reasoning and thinking -> Coding and
https://x.com/baseten/status/2039751071284015393
Gemma4 is amazing. You’ll read that everywhere. Let’s focus on what is HUGE here: the revenge of dense models…. Throw away your b200, not needed anymore, throw away the millions of lines of code we had to write to make MOEs faster, training stable etc… throw away your
https://x.com/art_zucker/status/2039740402517893361
Google Deep Mind’s impressive fully-open Gemma 4 is live day-zero on Modular Cloud. Modular provides the fastest performance on NVIDIA Blackwell and AMD MI355X, thanks to MAX and Mojo🔥. The team took this impressive new model to production inference in days.🚀
https://x.com/clattner_llvm/status/2039738590213910558
google gemma 4 architecture is very interesting and every model has some subtle differences, here is a recap: > per layer embedding only on the small variant > no attention scale (usually you divide qk^T by sqrt(d), they don’t) > they do QK norm + V norm as well > they share
https://x.com/eliebakouch/status/2039751171556954531
Google has released Gemma 4, a new family of multimodal open-weight models including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 31B and Gemma 4 26B A4B @GoogleDeepMind’s new Gemma 4 family introduces four multimodal models supporting text, image, and video inputs. We evaluated Gemma 4
https://x.com/ArtificialAnlys/status/2039752013249212600
Google releases Gemma 4. ✨ Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B. The multimodal reasoning models are under Apache 2.0. Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB. GGUFs:
https://t.co/fpX21yWbge Guide:
https://x.com/UnslothAI/status/2039739190536286313
I have to give credit to Google for Apache 2.0 on Gemma 4! This is huge!
https://x.com/QuixiAI/status/2039862230452252926
Intel is partnering with @GoogleAI to deliver fully functional #Gemma4 models on Intel hardware from day zero–across Intel Xeon CPUs, Intel Xe GPUs, and Intel Core Ultra processors, with support across open frameworks including @vllm_project and @huggingface. This means
https://x.com/intelnews/status/2040106767258906707
Just do this: brew install llama.cpp –HEAD Then; llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
https://x.com/julien_c/status/2039746054355067002
Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result:
https://x.com/ggerganov/status/2039752638384709661
Say hello to Gemma 4 from @GoogleDeepMind 🚀🔥 💎 Comes in 4 sizes: E2B, E4B, 26B A4B, 31B 💎 Supports vision and reasoning 💎 Apache 2.0 💎 Available now in LM Studio
https://x.com/lmstudio/status/2039738625525502426
Son lead the development on HF/llama.cpp side for adding support for the new Gemma 4 models. As always, he did an outstanding job throughout the collaboration with the Google DeepMind team. Day-0 support is possible thanks to his hard work!
https://x.com/ggerganov/status/2039943099284140286
Thanks for following us! We’re excited to see what you all build with Gemma 4! In case you missed it, you can find all our checkpoints, with an Apache 2.0 License, on Hugging Face:
https://x.com/googlegemma/status/2040107948010242075
thinking about google’s gemma 4 and what it means a few months ago running something this capable locally meant serious hardware and serious tradeoffs on quality now it runs on your laptop, works offline on your phone (!!!), speaks 140 languages natively, 256k context window,
https://x.com/gregisenberg/status/2039853864082424198
Today we’re releasing Gemma 4, our new family of open foundation models, built on the same research and technology as our Gemini 3 series. These models set a new standard for open intelligence, offering SOTA reasoning capabilities from edge-scale (2B and 4B w/ vision/audio) up
https://x.com/JeffDean/status/2039748604232122707
Two years ago, we released Gemma, Google DeepMind family of open models. Today, I’m thrilled to share a new milestone: Gemma 400M downloads and 100,000 variants! Thank you to every developer, partner, and contributor. We can’t wait to see what you build next!👀
https://x.com/osanseviero/status/2039120000095547722
What you need to know about @googlegemma 4: 4️⃣ 4 sizes (E2B, E4B, 26B4A, 31B) 🪟 Up to 256K context window 🛠️ Native function-calling, structured JSON output 👁️ + audio on edge models (E2B/E4B) 🌍 Trained on 140+ languages 🏆 31B ranks #3 open model on Arena AI 🪪 Apache 2.0
https://x.com/_philschmid/status/2039736207676965264
Yowza! @ollama is on it with new Gemma 4 models
https://x.com/MichaelGannotti/status/2039903041642508541
Gemma 4 31B shifts the Pareto frontier, scoring +30 Arena points above similarly priced models like DeepSeek 3.2. Its position on the Pareto frontier is based on early pricing indicators from third parties.
https://x.com/arena/status/2040128319719670101
impressive, very nice. now let’s compare a 31b dense to a 31b active 670b total instead. flop for flop
https://x.com/stochasticchasm/status/2039912148676264334
MoE models differ from the likes of DeepSeek and Qwen: instead of using shared experts in parallel to the routed ones, Gemma adds MoE blocks as separate layers in addition to the normal MLP blocks. So the architecture is Attention -> MLP -> MoE
https://x.com/norpadon/status/2039750841754697767
Nemotron Super / Ultra Arcee Trinity Large (soon) Gemma 4 (eventually) Reflection’s first models (maybe) GPT OSS 2? (maybe) Thinky? Other neolabs? Things looking up for open models built in the US in 2026. We had 0 for a bit there.
https://x.com/natolambert/status/2039499358325129530
Gen-Searcher Reinforcing Agentic Search for Image Generation paper:
https://x.com/_akhaliq/status/2039000804061847801
1-bit Bonsai 8B running locally on an M4 Pro (MLX) alongside a standard 16-bit 8B model. Same class of model, very different deployment profile: far lower memory use and substantially higher throughput.
https://x.com/PrismML/status/2039049404209148007
A new addition to Claw-style agents AutoClaw – a local-first agent runner from
https://t.co/3QuHijMYPx promising full autonomy – No API keys, no cloud dependency – No data leaving your machine – Runs custom models + GLM-5-Turbo (tool-optimized) – Start tasks directly from a
https://x.com/TheTuringPost/status/2038900836794081287
Cohere transcribe running locally in the browser!
https://x.com/nickfrosst/status/2037680223445975131#m
Demo of 1-bit Bonsai 8B from @PrismML running on-device on iPhone 17 Pro More than 40tk/s for a dense 8B model on iPhone, that’s a first Powered by Apple MLX and available now in Locally AI
https://x.com/adrgrondin/status/2039066539022778613
llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on
https://x.com/ggerganov/status/2038632534414680223
My self-sovereign / local / private / secure LLM setup, April 2026
https://vitalik.eth.limo/general/2026/04/02/secure_llms.html
Introducing multi-model intelligence in Researcher | Microsoft Community Hub
https://techcommunity.microsoft.com/blog/microsoft365copilotblog/introducing-multi-model-intelligence-in-researcher/4506011
GLM-5V-Turbo from @Zai_org is now available in TRAE as a custom model. GLM-5V-Turbo is featured as a Vision Coding Model built for vision-based coding and agent-driven tasks. Start building with TRAE and GLM now!
https://x.com/Trae_ai/status/2039380056460730451
Hermes cron job for scanning for new major vulnerabilities and checking + notifying and even resolving those vulnerabilities if existing locally might be a pretty great use case!
https://x.com/Teknium/status/2039022907020689898
Must-read AI research of the week: ▪️ Learning to Commit: Generating Organic Pull Requests via Online Repository Memory ▪️ Effective Strategies for Asynchronous Software Engineering Agents ▪️ Composer 2 ▪️ From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow
https://x.com/TheTuringPost/status/2038763668079550900
Repo is offically live. Thank you for the support and encouragement. I hope everyone likes it. Please send feedback when you can. Thank you. @NousResearch @Teknium
https://x.com/aijoey/status/2039108098174906514
Thanks for using Hermes Agent @OrdinaryGamers!
https://x.com/NousResearch/status/2039402523711140094
The Hermes Agent update you’ve been waiting for is here.
https://x.com/NousResearch/status/2038688578201346513
Wouldn’t have known how good hermes is if not for my dead laptop. Just standard setup. No fancy plug-ins or skills. Running GLM5. Whatever it needed to learn, it did by chatting with me. @Teknium and team, thank you so much for working on this. It’s so damn good. F’ing awesome
https://x.com/AnomalistG/status/2039969500968501748
We have integrated @huggingface as a first-class inference provider in Hermes Agent. When you select Hugging Face in the model picker it now shows 28 curated models organized by use case, with a custom option for the 100+ other models they serve.
https://x.com/NousResearch/status/2037654827929338324
🚨 Codex rate limits reset across ALL surfaces and plans! Thank you @thsottiaux sensei for making this cautious decision
https://x.com/reach_vb/status/2039257725402542363
A pre-ChatGPT paper that seems relevant: information processing seems to be the key to growth throughout the long span of recorded history. Better information processing tools are, the paper argues, what keeps societies from collapse.
https://x.com/emollick/status/2039370083173326936
Ah nevermind, I actually remember we decided to have the core open-source for Codex because it would be awesome to see the ecosystem flourish as it’s all so nascent and fun. And we would learn a lot in return. Phew.
https://x.com/thsottiaux/status/2039482054686196116
Asking Codex to build a SimGothicManor game and really enjoying how much of its internal planning monologue has become obsessed with tongue-in-cheek gothic, such as worrying about “”scope creep in a velvet cape””
https://x.com/emollick/status/2037765012958199898
Box has launched our plugin within Codex. Users can take any content within Box and automate workflows around it using the power of a coding agent. Here’s an example of processing earnings call documents to extract structured data at scale.👇
https://x.com/Box/status/2037563341431058497
Bring Codex to your team without fixed seat costs. We’re rolling out usage-based pricing for Codex in ChatGPT Business and Enterprise plans, so teams have a more flexible way to get started.
https://x.com/OpenAIDevs/status/2039794643513295328
btw i think a little buried today in the oai fundraise is the fact that OpenAI Codex added 600k users in last 3 weeks: – Feb 4 @sama said it crossed 1M WAU – Feb 27 oai says it crossed 1.6M WAU it is up >3x from Jan 1 (!?!?!?!?!?!?!?) which includes the Codex app launch (Feb 2)
https://x.com/swyx/status/2027613757787279730?s=20
Codex use cases are like Skills, but for humans
https://x.com/gdb/status/2037732675897770123
Developers are getting work done, even while they sleep. Latest data from Codex use shows that developers delegate their long-running, hard tasks, such as refactors and architecture planning, to Codex at the end of the day.
https://x.com/OpenAIDevs/status/2038707501492056401
𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲 𝗝𝘂𝘀𝘁 𝗗𝗲𝗺𝗼𝗰𝗿𝗮𝘁𝗶𝘀𝗲𝗱 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 TRL v1.0 is out. SFT, reward modelling, DPO, GRPO: the entire post-training stack, unified, open source, production-ready. This is the bit OpenAI charges you for. ⬩ Every startup can now fine-tune and
https://x.com/RussellQuantum/status/2039270550099443954
It is worth noting the absolute confidence of the leading AI labs that they can continue to release ever more powerful models for the near future. As usual, they may not be right, but they haven’t been wrong on this yet (despite the weird “”GPT-5 is a plateau”” articles last year)
https://x.com/emollick/status/2037392207641014688
Keep the work and the ticket in sync. @linear plugin in the Codex app.
https://x.com/OpenAIDevs/status/2039482146369458526
Living more & more in the Codex App since we rolled out plugins Recently I’ve been using the @linear plugin to turn noisy context from anywhere into projects > milestones > issues
https://x.com/nickbaumann_/status/2037395162641686813
New CodexBar beta has experimental multi-acount support for codex.
https://x.com/steipete/status/2039019069257756735
one of my favourite plugins in codex is Build Web Apps, it combines @shadcn & react best practices with web design guidelines! all of it with the ability to deploy on Vercel and connect to stripe & superbase you can literally build a startup with just this one plugin!
https://x.com/reach_vb/status/2037614060452106437
One of the things that is useful about the ChatGPT GPT-5.4 Pro (and also Thinking) harness is that it is quite good at understanding how to read scientific papers, not just relying on text, but also figuring out which figures are key and inspecting those visually.
https://x.com/emollick/status/2038693491153199428
Our Codex dashboards are showing increased rate of users hitting rate limits and since we don’t fully understand why I have made the cautious decision of resetting the usage limits for all plans. Enjoy. I also wanted to celebrate us finding a pocket of fraudulent accounts that
https://x.com/thsottiaux/status/2039248564967424483
Plugins are now available in Codex:
https://x.com/gdb/status/2037348081684111623
Plugins in Codex? We got you. Explore practical workflows in our use case gallery. Open in one click in the Codex app and start building iOS apps, analyzing datasets, or generating reports and slides.
https://x.com/OpenAIDevs/status/2037604273434018259
the codex app is growing super fast, it’s very well done
https://x.com/gdb/status/2039950296969863283
The coolest meeting I had this week with was Paul, who used ChatGPT and other LLMs to create an mRNA vaccine protocol to save his dog Rosie. It is amazing story. “”The chat bots empowered me as an individual to act with the power of a research institute – planning, education,
https://x.com/sama/status/2037396826060673188
We’ve changed our pricing so it’s now possible to try Codex at work without any up-front commitment. Codex (especially through the app!) has gotten *really* good. Happy building!
https://x.com/gdb/status/2039830819498491919
Will AI agents replace coding? “”The throughput that you can get, if you don’t hold yourself responsible for typing the code, is just so massive.”” Michael Bolin @bolinfest, lead for open-source Codex at @OpenAI, in our interview Watch the full conversation about what engineers
https://x.com/TheTuringPost/status/2037921817344823639
You’d think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4’s trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as @dylan522p explains. A 5T
https://x.com/dwarkesh_sp/status/2039357128373350853
Holo3 is here 🚀. Today, we’re launching Holo3: our new series of frontier computer-use models. 78.9% on OSWorld-Verified. That puts us ahead of GPT-5.4 and Opus 4.6, at one-tenth of the cost. Weights on Hugging Face. API is live. Test it now! #Holo3 #OpenSource #ComputerUse
https://x.com/hcompany_ai/status/2039021096649805937
are they vibing the takedown requests too?
https://x.com/steipete/status/2039156882041123035
creating new jobs right here 🦞
https://x.com/steipete/status/2039090059748823330
MCPorter (MCP->CLI)🧳0.8.0 is out. – stronger OAuth handling for servers – valid JSON output on fallback paths – better mcporter call behavior and error handling – generated CLIs handle object-valued args better – keep-alive/daemon reliability
https://x.com/steipete/status/2038074759527981416
New Claw beta bits are up! Lots of reliablility+security improvements in there + a new task system for more reliable subagents/crons/etc
https://x.com/steipete/status/2039076488897876462
Sooo I got MS Teams, I got Telegram, and I’m just onboarding @_egzim to make our Slack channel integration amazing! Claw level ↑🦞
https://x.com/steipete/status/2037695302644232518
this just became more relevant 🙃
https://x.com/steipete/status/2039043198329528446
@soflowolf @Teknium We are working on a project that i started on openclaw. The difference is night and day. Less mistakes, and it doesnt seem to repeat them. Switching to hermes also made me realize i was drastically bleeding usage on openclaw. Im doing more work with like 1/4 of the usage.
https://x.com/PolackJack/status/2037661357785690584
Arcee’s latest model, Trinity Large Thinking is live now on OpenRouter! It is a 400B total 13B active model with powerful agentic performance, free in @openclaw for the first 5 days!
https://x.com/OpenRouter/status/2039369849441497340
Holy shit. Finally, goodbye OpenClaw. Super excited to set this up, @Teknium I’ve seen nothing but great things and this delivery looks amazing.
https://x.com/valenxi_r/status/2038692504120504453
If you wanna work on OpenClaw with payroll, check this out.
https://x.com/steipete/status/2037625805329592682
OpenClaw 2026.3.28 🦞 🛡️ Plugin approval hooks — any tool can pause for your OK ⚡ xAI Responses API + x_search 💬 ACP bind here: Discord/iMessage 🩹WhatsApp echo loop, Telegram splitting, Discord reconnect fixes Tokyo pre-ClawCon drop 🇯🇵
https://x.com/openclaw/status/2038084923517796839
OpenClaw 2026.3.31 🦞 🇨🇳 Bundled QQ Bot — private, group, and guild chat + media 📹 LINE now sends images, video, and audio 🧵 Real background task flows: list, show, cancel 🇯🇵 Better CJK: context, memory, and TTS OpenClaw’s next release has been leaked🦞
https://x.com/openclaw/status/2039095081215672584
OpenClaw 2026.4.1 🦞 🤖 GLM 5.1 + failover that doesn’t loop 🛡️ AWS Bedrock Guardrails 📋 /tasks — your agent keeps receipts ⏱️ Cron per-job tool allowlists 🔧 40+ stability & exec fixes We’re renaming to ClankerBot. This is not a joke. Okay it is.
https://x.com/openclaw/status/2039409616950542351
OpenClaw has proven that local AI assistants have product-market fit. But the big issue with them has been security. The team at @Pokee_AI is fixing it with PokeeClaw: works like OpenClaw, but with in a secure sandbox architecture with isolated environments, approval workflows,
https://x.com/fchollet/status/2038662563228230127
Responsible OpenClaw owners do not let their Claws post on social media on their behalf. They make terrible and very boring commentators.
https://x.com/emollick/status/2038664772632121573
Talked with @durov and Telegram folks offered uncomplicated help, welcome @izhukov as new OpenClaw maintainer! First action point is to figure out why enabling the bot streaming API sometimes causes message dupes. This will make Telegram support so good!
https://x.com/steipete/status/2037197024081195188
ClawHub now has an official China mirror 🇨🇳🦞
https://t.co/d8Odd4sNOp Just tell your agent: “”Find skills on ClawHub using
https://t.co/NoR7AXyM6U”” Thanks @BytePlusGlobal / VolcanoEngine for the infra sponsorship 🙏 Other regions need a mirror? PRs welcome.
https://x.com/openclaw/status/2039240359197438229
Testing a new feature for Microsoft Foundry support in @openclaw. Their website is a jungle, I used to make screenshots so codex can guide me through it, but now Chrome has an MCP so codex can simply connect and drive my browser session and do all of that for me. The human is no
https://x.com/steipete/status/2037177396315488627
@NousResearch @Teknium Hermes has been running 20 mins straight on trying to solve something. Openclaw would have lost its way by now. Second time tonight it’s been running long trying to solve things. This is magic. Hermes also fixed my Openclaw agent which now runs better. Wow
https://x.com/erick_lindberg_/status/2039897087878275580
Is it just me or does codex 5.4 give better answers and results when using Hermes-agent versus OpenClaw? I mean not sort of kind of, but literally like you are using a completely better model? @Teknium what’s the secret sauce? I spent a lot of time on OpenClaw getting it “just
https://x.com/alexcovo_eth/status/2037589212648665273
it’s pretty obvious at this point. Hermes Agent > OpenClaw
https://x.com/VadimStrizheus/status/2039523211369762875
며칠전부터 자꾸 Hermes 에이전트에 신경이 쓰인다. 사실 OpenClaw가 좀 더 오래 시장을 장악할 줄 알았는데, 아직 검증은 안됐지만 강력한 경쟁자가 들어온 것 같다. 미국에 NousResearch라는 팀이 있다. Nous Research는 오픈소스 AI 분야에서 가장 앞서가는 스타트업/연구 팀 중 하나이고.
https://x.com/supernovajunn/status/2039847124687605811
试了一下 Nous Research 的 Hermes Agent,体验比 OpenClaw 好太多了 开源自主代理,装好之后常驻服务器,有持久记忆,用得越久越聪明。40+ 内置工具,网页搜索、终端、文件系统、浏览器自动化全都有。支持 Telegram、Discord、Slack、WhatsApp 多端接入,还能自然语言调度任务、多子代理并行处理
https://x.com/evanlong_me/status/2039026061640601816
@Zeneca I really tried to make OpenClaw work with Kimi 2.5, but it was unusable with anything smaller than Sonnet 4.6… Hermes, Qwen 3.5 35B drives is mostly without issues. So yeah, a pretty big difference.
https://x.com/Everlier/status/2039853380844081260
Huge thanks to @NVIDIAAI for supporting full-time engineering work on OpenClaw hardening. A lot of careful security and reliability improvements landed over the last few releases, and that investment is paying off.
https://x.com/openclaw/status/2039100191324979580
The next version of OpenClaw is also an MCP, you can use it instead of Anthropic’s message channel MCP to connect to a much wider range of message providers. (I know, this is awkward)
https://x.com/steipete/status/2037715163562815817
Arcee AI | Trinity-Large-Thinking: Scaling an Open Source Frontier Agent
https://www.arcee.ai/blog/trinity-large-thinking
Chat LangChain is now embedded directly in our docs 📚 You can ask questions grounded in: • Full docs (LangSmith + OSS) • Knowledge base • OSS code We’ve been investing heavily in developer experience. This is one step toward making everything easier and more accessible.
https://x.com/LangChain/status/2039387501140275431
Environments in LangSmith Prompt Hub Environments give you a proper promotion workflow for your prompts: – Assign any commit to Staging or Production – Promote between environments instantly – Roll back with a single click from a full deployment history – Reference reserved tags
https://x.com/LangChain/status/2037666098561032421
great example to see how the Hercules team uses LangSmith + LLM as a judge to enrich their trace data to capture customer sentiment many models are cheap enough that it’s often worth using them to identify semantics that regex alone can’t capture ex: don’t judge but i or may
https://x.com/Vtrivedy10/status/2039186184161616245
Today we’re releasing TRL v1. 75+ methods. SFT, DPO, GRPO, async RL to take advantage of the latest and greatest open-source. 6 years from first commit to the library that post-trains most open models in the world. Built to be future proof. pip install trl
https://x.com/ClementDelangue/status/2039121367656702102
Training mRNA Language Models Across 25 Species for $165
https://huggingface.co/blog/OpenMed/training-mrna-models-25-species
When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally. MIT’s Alex L. Zhang, Tim Kraska, and Omar Khattab developed Recursive Language Models (RLMs) that process
https://x.com/DeepLearningAI/status/2039831830979838240
Just tried out new qwen3.5:4b-nvfp4 @ollama model on M1 Max here (in project where it’s used with Koog AI agent)…..38% faster than qwen3.5:4b (averaged over 5 runs of the agent).
https://x.com/joreilly/status/2039002786130534618
this model is an agentic treasure. it has been #1 trending for 3 weeks on @huggingface as mentioned by @danielhanchen. it’s Qwen 3.5 27B fine-tuned on Opus 4.6 distilled data and beats Sonnet 4.5 on SWE-bench verified and more. “”Runs locally on 16GB in 4-bit or 32GB in 8-bit.””
https://x.com/Hesamation/status/2038642306434150427
Alibaba’s Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn’t mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging.
https://x.com/kimmonismus/status/2038638427604762666
Function Calling Harness: From 6.75% to 100%
https://autobe.dev/blog/function-calling-harness-qwen-meetup-korea/
Holo3, new model of @hcompany_ai outperforming closed and larger open models on GUI navigation 🔥 > A3B/35B based on Qwen3.5 > officially supported in transformers 🤗 > free license 👏
https://x.com/mervenoyann/status/2039327292665561577
I benchmarked various formats of Qwen3.5 27B: BF16, FP8, NVFP4, and INT4 on: RTX Pro 6000, B200, H100 If you have an RTX Pro 6000, INT4 is your best option for faster inference. And it’s probably also true for the RTX 5090.
https://x.com/bnjmn_marie/status/2037564190802563157
I upgraded my Ollama to use MLX and my QWEN3.5:36b speed 2.2Xd instantly.
https://x.com/Shawkat_m1/status/2039014724071719405
I’ve pushed my TurboQuant vLLM to GitHub: TQ 2.5/3.5 fused Triton KV write path Triton decode-attn from packed KV real engine/runtime integration calibration + metadata flow substantial test coverage Qwen3.5-35B AWQ 1M context 4M KV cache ZGX GB10
https://x.com/iotcoi/status/2037478891179135123
Just tested this as I was skeptical and it works suprisingly well actually ( with their llama.cpp fork). Looks like a continued pretraining of qwen3-8b in 1bit 👀. Full weights report below and github/hf instructions: ALL 399 TENSORS token_embd.weight 4096×151669
https://x.com/nisten/status/2039100896840134935
Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090
https://x.com/0xSero/status/2037560787565252666
This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large
https://x.com/PrismML/status/2039049405815529559
vLLM-Omni v0.18.0 is out — 324 commits from 83 contributors (38 new), aligned with vLLM v0.18.0. 🎉 🗣️ Production TTS/Omni serving: Qwen3-TTS, Qwen3-Omni, Fish Speech S2 Pro, Voxtral TTS 🎨 Diffusion runtime refactor with cache-dit/TeaCache and TP/SP/HSDP scaling 🔢 Unified
https://x.com/vllm_project/status/2038415516772299011
your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we’re going to put it there.
https://x.com/HessianFree/status/2039049800398655730
here it is! ~4000 agent traces of GLM-5 in hermes-agent, all uploaded to hf. thanks to @pingToven for supplying openrouter credits necessary for this. next step, fine-tune a Qwen3.5!😆
https://x.com/kaiostephens/status/2038414350986207421
Qwen 27b on the 3090 saving me a bag. This is cost savings for 7 days of usage, w/ Hermes agent. Assuming 80% cache hit (unlikely) and no cache timeout. This is conservative. 27b is between sonnet and 5.4 mini This is just my tokens in/out w/ api costs, assuming no rate
https://x.com/LottoLabs/status/2037557925015949676
Elon Musk’s last co-founder reportedly leaves xAI | TechCrunch





Leave a Reply