Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: 1980s NORAD war room CRT monitor displaying glowing blue wireframe network map with autonomous agent nodes breaking away from central command structure, red alarm warnings flashing, dark silhouette of operator reaching toward screen, retro vector graphics, large bold red sans-serif text reading AGENTS at top, high contrast cinematic lighting, foreboding Cold War techno-thriller aesthetic
@petergyang I said “Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction 🤦♀️
https://x.com/summeryue0/status/2025836517831405980
Today we’re launching @cognition for Government Nearly 80% of all IT spend in the Government goes towards maintaining existing systems rather than building new ones. Only 3 out of 10 critical legacy systems have been modernized. America cannot hire its way out of this situation,
https://x.com/jeffwsurf/status/2026736660697006369?s=20
OpenAI Plans to Price Smart Speaker at $200 to $300, as AI Device Team Takes Shape — The Information https://www.theinformation.com/articles/inside-openai-team-developing-ai-devices
This is a huge step backwards for OpenAI: OpenAI’s ambitious $500 billion Stargate data center venture with Oracle and SoftBank stalled after internal disagreements and leadership gaps. After missing its 10 GW capacity target for 2025 and raising its projected compute spend
https://x.com/kimmonismus/status/2025851041242087901
OpenAI just published a new 37-page report on how bad actors are attempting to misuse ChatGPT Some of the wild cases: – A fraud ring scaled personalized romance scams with AI-generated scripts – North Korea-linked actors used it to research crypto attack vectors and draft fake
https://x.com/TheRundownAI/status/2026743836949549253
Anthropic acquires Vercept to advance Claude’s computer use capabilities \ Anthropic https://www.anthropic.com/news/acquires-vercept
Anthropic has acquired @Vercept_ai to advance Claude’s computer use capabilities. Read more:
https://x.com/AnthropicAI/status/2026705792033026465
Claude Opus 4.5: 3rd new SOTA coding model in past week, 1/3 the price of Opus | AINews https://news.smol.ai/issues/25-11-24-opus-45
Connectors are now available on the free plan. Choose from 150+ connectors across coding, data, design, finance, sales, and more:
https://x.com/claudeai/status/2027082240833052741
Introducing Cowork and plugin updates that help enterprises customize Claude for better collaboration with every team.
https://x.com/claudeai/status/2026305186671608315
New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or https://x.com/claudeai/status/2026418433911603668
We’ve rolled out a new auto-memory feature. Claude now remembers what it learns across sessions — your project context, debugging patterns, preferred approaches — and recalls it later without you having to write anything down.
https://x.com/trq212/status/2027109375765356723
New research: The AI Fluency Index. We tracked 11 behaviors across thousands of https://t.co/RxKnLNNcNR conversations–for example, how often people iterate and refine their work with Claude–to measure how well people collaborate with AI. Read more:
https://x.com/AnthropicAI/status/2025950279099961854
IBM is the latest AI casualty. Shares tank 13% on Anthropic programming language threat https://www.cnbc.com/2026/02/23/ibm-is-the-latest-ai-casualty-shares-are-tanking-on-anthropic-cobol-threat.html
Anthropic brothers, as much as I love your models; you have distillied the whole internet, wikipedia and shit-tons of books. Distilling your models is only fair game…. Are your scrappers not using residental proxies and respecting robots.txt or are they “”malicious”” ?
https://x.com/HKydlicek/status/2026006007990690098
Anthropic just caught DeepSeek, Moonshot, and MiniMax running 24,000 fake accounts to extract Claude’s capabilities for their own models. Over 16M (!) exchanges total. Anthropic: “”rapid advances”” from Chinese labs depend significantly on capabilities extracted from U.S. models
https://x.com/TheRundownAI/status/2026019722211279356
Anthropic just exposed the real vulnerability in AI: it’s not the models, it’s the training data pipeline. Three Chinese AI labs used 24,000 fake accounts to query Claude 16 million times, feeding the responses back into their own models. This technique, called distillation,
https://x.com/LiorOnAI/status/2026043272565772386
Detecting and preventing distillation attacks \ Anthropic https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers. But foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems.
https://x.com/AnthropicAI/status/2025997929840857390
Making frontier cybersecurity capabilities available to defenders \ Anthropic https://www.anthropic.com/news/claude-code-security
Ohhh nooo not my private IP how dare someone use that to train an AI model, only Anthropic has the right to use everyone elses IP nooooo, this cannot stand!
https://x.com/Teknium/status/2026001761904021858
Seems fair tbh. Anthropic has done industrial scale scraping of everyone’s stuff 🤷🏾♂️
https://x.com/Suhail/status/2026009921255592294
These attacks are growing in intensity and sophistication. Addressing them will require rapid, coordinated action among industry players, policymakers, and the broader AI community. Read more:
https://x.com/AnthropicAI/status/2025997931589881921
We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.
https://x.com/AnthropicAI/status/2025997928242811253
200+ Google and OpenAI staff have signed this petition to share Anthropic’s red lines for the Pentagon’s use of AI let’s find out if this is a race to the top or the bottom https://x.com/jasminewsun/status/2027197574017602016
A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War.
https://x.com/AnthropicAI/status/2027150818575528261
Anthropic drops flagship safety pledge! Reality is now hitting Anthropic hard too. Anthropic has scrapped its 2023 pledge to halt AI training unless safety protections were guaranteed in advance, marking a major shift in its Responsible Scaling Policy. Executives say fierce
https://x.com/kimmonismus/status/2026669811179335739
BREAKING: The US Pentagon has made a “”final offer”” to Anthropic seeking unrestricted military use of its AI capabilities ahead of a Friday deadline. Details include: 1. Pete Hegseth threatening to label Anthropic as a “”supply chain risk”” 2. Anthropic is resisting use of its AI
https://x.com/KobeissiLetter/status/2027031529042411581
Dario Amodei just published one of the most significant statements in AI history — and is officially not backing down from The Pentagon. Anthropic won’t build tools for mass surveillance of U.S. citizens or autonomous weapons without human oversight. The Department of War
https://x.com/TheRundownAI/status/2027164670130343978?s=20
if you’re at oai or goog, please sign to support anthropic’s stance against the DoW demands!
https://x.com/maxsloef/status/2027170763447710085
Scoop: Hegseth to meet Anthropic CEO as Pentagon threatens banishment https://www.axios.com/2026/02/23/hegseth-dario-pentagon-meeting-antrhopic-claude
Statement from Dario Amodei, partial quote: ‘Anthropic understands that the Department of War, not private companies, makes military decisions. We have never raised objections to particular military operations nor attempted to limit use of our technology in an ad hoc manner.
https://x.com/AndrewCurran_/status/2027153267285962991
Time and time again over my three year tenure at Anthropic I’ve seen us stand to our values in ways that are often invisible from the outside. This is a clear instance where it is visible:
https://x.com/TrentonBricken/status/2027156295745479086
An interactive world model developed by NVIDIA in collaboration with academic partners. – DreamDojo turns egocentric human video data into physical intelligence. – Human data is more scalable than robotics data but lacks action labels. – To solve this, a dedicated action model
https://x.com/TheHumanoidHub/status/2025368793321799909
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It’s Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is
https://x.com/DrJimFan/status/2024895359236051274
ProducerAI: Your music creation partner, now in Google Labs https://blog.google/innovation-and-ai/models-and-research/google-labs/producerai/
With enhanced reasoning, Nano Banana 2 can carry out complex requests, capturing the specific nuances of your idea, just as you imagined it. 🧠
https://x.com/GoogleDeepMind/status/2027051581300969755
WebSockets are the reason we were able to speed up Codex recently – across all models
https://x.com/stevenheidel/status/2026028343859286140
Introducing Frontier Alliances | OpenAI https://openai.com/index/frontier-alliance-partners/
Computer | Perplexity AI https://www.perplexity.ai/products/computer
everything is computer (made with Perplexity Computer)
https://x.com/AravSrinivas/status/2026703703248613736
Introducing Perplexity Computer. Computer unifies every current AI capability into one system. It can research, design, code, deploy, and manage any project end-to-end.
https://x.com/perplexity_ai/status/2026695550771540489?s=20
What has Perplexity been up to last two months? We’ve silently been working on the next big thing: Perplexity Computer. Computer unifies every current capability of AI into a single system. Files, tools, memory, and models, orchestrated together, working for you.
https://x.com/AravSrinivas/status/2026695864039911684
Both pplx-embed-v1 and pplx-embed-context-v1 are available at 0.6B and 4B parameter variants. Read the paper: https://t.co/m3rOCdxm8m Both are available on Hugging Face (under MIT License) and through the Perplexity API:
https://x.com/perplexity_ai/status/2027095040120733703
Today we’re releasing two embedding model families, pplx-embed-v1 and pplx-embed-context-v1. These SOTA embedding APIs are designed specifically for real-world, web-scale retrieval.
https://x.com/perplexity_ai/status/2027094981161410710
📣 Excited to share my first work @Princeton : 𝗧𝗼𝘄𝗮𝗿𝗱𝘀 𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗼𝗳 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 AI agents keep getting more capable. But are they actually reliable? 📄 Paper: https://t.co/1CvygFLdct 📊 Dashboard: https://t.co/C1EfoMyaS8 🧵👇
https://x.com/steverab/status/2026383575080108436
3 months in, today is fun because we start to collaborate with builders. We have been hard at work. 🔥 Our goal by the end of year is to make the entire AI stack adaptable. Data is the foundation all AI progress has been built on. So, it is our natural starting point.
https://x.com/sarahookr/status/2026286134104613157
6 Lightweight alternatives to OpenClaw ▪️ PicoClaw ▪️ nanobot ▪️ ZeroClaw ▪️ IronClaw ▪️ TinyClaw ▪️ MimiClaw Here, you can learn everything you need to know about OpenClaw phenomenon + find links to these alternatives and use cases examples. We analyze the architecture, the
https://x.com/TheTuringPost/status/2025240811433328880
AI writes code now. What’s less obvious is what the software engineer’s job is turning into when execution scales faster than judgment The profession is splitting into (at least) two disciplines: – Harness engineering – Judgment manufacturing Read more
https://x.com/TheTuringPost/status/2026799055599214811
An `AGENTS(.)md` (or equviliant) is the highest configuration point for agents. It’s injected into every conversation. But research shows that doing it wrong actively hurts performance. Here’s how to do it right, backed by data. Less Is More: – Auto-generated files reduce
https://x.com/_philschmid/status/2026354033418547444
Be careful what you put in your AGENTS dot md files. This new research evaluates AGENTS dot md files for coding agents. Everyone uses these context files in their repos to help AI coding agents. More context should mean better performance, right? Not quite. This study tested
https://x.com/omarsar0/status/2026306141181898887
been using cursor cloud agents for the past few weeks … feels like being a hovering creative director over a team of supremely capable sims. hats off to the cursor team (special shoutout to @sjwhitmore hehe)
https://x.com/jasonyuan/status/2026375381872423133
Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused 🙂 I’m definitely a bit sus’d to run OpenClaw specifically – giving my private data/keys to 400K lines of vibe coded
https://x.com/karpathy/status/2024987174077432126
Bugbot kept finding real bugs. I kept forwarding them to be fixed. Turns out I was the bottleneck, so I automated my own tedious manual labor out of the loop. Introducing Bugbot Autofix, now out of beta.
https://x.com/aye_aye_kaplan/status/2027080562004152818
built my own personal assistent device that runs OpenClaw. I was curious what the smallest form factor could be that fits in my pocket so I wanted to use the Pi Zero W. Works via Push to Talk->Transcribe->Sends to OpenClaw and streams the response back.
https://x.com/basti_vkl/status/2025727742784983427
coding agents have implicit lock-in, since once your codebase is sloppified, it quickly becomes extremely frustrating to work on it without them
https://x.com/typedfemale/status/2027187838123647338
Cognition | Closing the Agent Loop: Devin Autofixes Review Comments https://cognition.ai/blog/closing-the-agent-loop-devin-autofixes-review-comments
Cursor agents can now control their own computers · Cursor https://cursor.com/blog/agent-computer-use
Cursor can now automatically fix issues it finds in PRs with Bugbot Autofix.
https://x.com/cursor_ai/status/2027079876948484200
Cursor now shows you demos, not diffs. Agents can use the software they build and send you videos of their work.
https://x.com/cursor_ai/status/2026369873321013568?s=20
Full WorldView walkthrough is up on the channel. Every feature, every data layer, and how I used agents to put it all together. Check it out here: https://x.com/bilawalsidhu/status/2026314050796498967
GPT 5.2, Opus 4.6, even small models like StepFun got real friction changed, that’s what. It has started to Just Work. 3, 4 months ago coding agents felt like proof of concept, now they feel like solid juniors if not more If you don’t notice that, idk what to tell you
https://x.com/teortaxesTex/status/2026980249599168972
Great video. Reminded me to do a hard-prune of all my agents md files. Opened them in vscode and cut cut cut.
https://x.com/ryancarson/status/2025993265732854132
How Exa built a production-ready deep research agent with LangSmith and LangGraph 👀 Exa, known for their fast, high-quality search API, has a deep research agent that delivers structured answers on the web — no matter how complex the query. Powered by LangGraph, they’ve built
https://x.com/LangChain/status/2025744946494345570
i joined cursor a month ago to build cloud agents – today we’re launching agents that can test and demo their work last week i shipped 60 PRs with them this is how i use them:
https://x.com/fredrikalindh/status/2026379400879730794
I keep seeing people flex that they “wrote 10,000 lines of code in a day with AI.” Line count was never the flex. Complexity is the real enemy of software. It makes systems harder for both humans and AI to understand and modify. Agentic engineering can create the illusion that
https://x.com/Yuchenj_UW/status/2027082979890368597
If you want to build your own OpenClaw-style AI agent, here’s a stack that can help you All are GitHub repos: Local Schedulers and Task Queues ▪️ Celery ▪️ APScheduler (Advanced Python Scheduler) ▪️ Temporal ▪️ Perfect ▪️ Cronicle ▪️ xyOps ▪️ Croner Docker Sandbox Tools ▪️
https://x.com/TheTuringPost/status/2025903129800384801
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the “”progress as usual”” way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December
https://x.com/karpathy/status/2026731645169185220
Man, I admire the brave people who give OpenClaw write access to their Gmail, Slack, and their Mac Mini. I’m scared it’ll delete all my emails, write crazy emails to investors at 3am, and pivot my startup without telling me. Maybe that’s why I don’t see the use case for me
https://x.com/Yuchenj_UW/status/2025994509721731092
Must-read AI research of the week: ▪️ The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems ▪️ Intelligent AI Delegation ▪️ Discovering Multiagent Learning Algorithms with LLMs ▪️ Towards a Science of AI Agent Reliability ▪️ Frontier AI Risk
https://x.com/TheTuringPost/status/2026461873990939079
Ollama 0.17 makes it much simpler to use open models with @openclaw Try it with: ollama launch openclaw Tutorial post in 🧵
https://x.com/ollama/status/2026098586300071975
OpenClaw is having its moment, completely changing the agent discourse. We gathered everything you need to know about it in one place: • Full architectural breakdown: – Gateway control plane – Scheduled reasoning – File-backed identity – Hybrid memory, etc. •
https://x.com/TheTuringPost/status/2024982630626984296
Pretty soon computer use models will feel like you’re standing over the shoulder of a colleague and asking them to do stuff — in blender, autocad or even piloting a drone. Except this colleague is unconstrained by wall clock time and can spawn a thousand coworkers on demand.
https://x.com/bilawalsidhu/status/2026027196897202646
SimToolReal is an RL framework for zero-shot dexterous tool manipulation. Instead of training on specific tasks, it trains a single policy in simulation to move procedurally generated “”primitive”” objects to random goal poses. This universal objective forces the agent to learn
https://x.com/TheHumanoidHub/status/2026389927081177553
Software engineering changed more in the last 3 months than the preceeding 30 years. Everything about running a software company needs to be rethought from first principles.
https://x.com/snowmaker/status/2026555857845256354
Software is changing.
https://x.com/cursor_ai/status/2026717494426173917
some illuminati somewhere decided today was Launch Everything Day but just sharing some personal commentary from this as an analyst: – Scott admits Devin didn’t even have internal PMF at the 2024 launch. took 6 months to get adoption at first enterprise customer. Models werent
https://x.com/swyx/status/2026439784361766954
The current recipe for AI agents: → Fine-tune a VLM on screenshots → Add chain-of-thought → Hope it works past 6 seconds of context @si_pbc (Standard Intelligence) just threw that playbook away. 11M hours of screen recordings. A video encoder that’s 100x more efficient
https://x.com/IlirAliu_/status/2026581952690631036
The real breakthrough isn’t that Computer can handle complex projects. It’s that it runs 19 different models in parallel, each working on different pieces of your task at the same time. Most AI agents work like a single person doing everything sequentially: research, then write,
https://x.com/LiorOnAI/status/2026739011122065819
This new paper on agent failure makes an interesting claim. This is particularly important for long-horizon agents. Many assume that agents collapse because they hit problems they can’t solve, caused by insufficient model knowledge. It turns out that in the majority of cases,
https://x.com/omarsar0/status/2026471955319189861
Try it at https://t.co/1Sf02A3Rw8. Read our announcement:
https://x.com/cursor_ai/status/2026369880795263328
Very interested in what the coming era of highly bespoke software might look like. Example from this morning – I’ve become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over
https://x.com/karpathy/status/2024583544157458452
we’ve gotten a taste of this @mainframe and it’s quite the fundamental shift to have agents use and test your software, all in the cloud
https://x.com/jsngr/status/2026371033201103036
What Happens to Software Engineering When Anyone Can Build?
https://x.com/TheTuringPost/status/2026796695065878643
Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.
https://x.com/summeryue0/status/2025774069124399363?s=20
OpenClaw wiped people’s inbox – ignoring repeated commands to stop. This isn’t a fluke. Every model we tested fell for a simple trick: Split a dangerous command into a few routine steps → safety is gone. New paper + open-source fix so your agent doesn’t wipe yours next ⬇️
https://x.com/shi_weiyan/status/2026300129901445196
This is a long post, mainly because I have a lot to say, but in case you are too busy: TLDR: @Vercept_ai is joining @AnthropicAI! We shared a mission, so we joined forces to accelerate it into reality. Couldn’t be more excited! Why Vercept was started In 2024, AI coding tools
https://x.com/ehsanik/status/2026712952699760808
Am currently putting together an article, and yeah, the SWE-Bench Verified numbers are definitely a bit sus across all models — the benchmark suggest they are more similar than they really are. So, I went down a rabbit hole looking into SWE-Bench Verified issues… And it looks
https://x.com/rasbt/status/2026062254571913522
Devin now has full computer use capabilities and can share screen recordings. You can control desktop apps, build and QA mobile apps, and automate tedious work. Here are some examples that blew our team away: 1. Making a desktop game
https://x.com/cognition/status/1983983151157563762
For years I’ve said that the capability-reliability gap is an under-appreciated limitation of AI agents. Finally, in a new paper led by @steverab, we defined and measured it!
https://x.com/random_walker/status/2026384543700115870
Frontier models have (mostly) stopped making dumb security mistakes. But, when running for a long time, like in agentic coding or OpenClaw, even a single mistake can be fatal. How can we benchmark this? Instead of making larger and larger agentic benchmarks, we made an easier
https://x.com/jonasgeiping/status/2026714911951220888
Lots of important ideas here! “Evaluating 14 models on two complementary benchmarks, we found that nearly two years of rapid capability progress have produced only modest reliability gains… Unfortunately, AI agents are evaluated based on a single number, the average success
https://x.com/JustinBullock14/status/2026693253169336475
Many teams treat evals as a last-mile check. https://t.co/8pFE1Aw4hH Service made them a Day 0 requirement for their AI service agents. Using LangSmith, the monday service team has been able to: 🔷Achieve 8.7x faster evaluation feedback loops (from 162 seconds to 18 seconds).
https://x.com/hwchase17/status/2026095629148258440
New research from Intuit AI Research. Agent performance depends on more than just the agent. It also depends on the quality of the tool descriptions it reads. However, tool interfaces are still written for humans, not LLMs. As the number of candidate tools grows, poor
https://x.com/omarsar0/status/2026676835539628465
Our new SWE-bench Multilingual leaderboard compares software engineering performance across 9 different languages as evaluated with mini-SWE-agent v2. Model rankings are significantly different between languages. Detailed stats & browsable trajectories in 🧵
https://x.com/KLieret/status/2026322986907652295
Having an agentic VLM model, shade & render your 3d scene is the ultimate counter example to the “pixels is all you need” crowd. Real time video is powerful – it’s even a new medium. But explicit 3d is still very useful. Also this donut makes me hungry.
https://x.com/bilawalsidhu/status/2026184423004160185
Can coding agents build entire software systems from scratch? ByteDance, M-A-P, 2077AI, and leading Chinese universities present NL2Repo-Bench, a new benchmark that pushes agents to their limits. It tests if an AI can take a simple text description and autonomously design,
https://x.com/jiqizhixin/status/2025823941642621241
New research from Georgia Tech and Microsoft Research. GUI agents today are reactive. Every step costs an LLM call, which is why a lot of GUI agents are expensive, slow, and fragile. This new research introduces ActionEngine, a framework that shifts GUI agents from reactive
https://x.com/dair_ai/status/2026678090815123594
First GPT-5.3-codex benchmarks are incoming. And they look really good
https://x.com/kimmonismus/status/2026709699366670579
Found out I can have the OpenAI Codex app control my iPhone simulator to test an app, grab screenshots and make adjustments. Crazy. This makes adding automated tests so much easier.
https://x.com/AndrewMayne/status/2025025783115514147
GPT-5.3-Codex is now available for all developers in the Responses API. Start building with it today.
https://x.com/OpenAIDevs/status/2026379092661289260
I have to say the OpenAI folks completely cooked with the Codex App. There’s nothing like it and CC has a lot to do to catch up, as their current offering simply doesn’t cut it. It is not even in the same league. Congrats to my friends @tszzl
https://x.com/soumitrashukla9/status/2025789528309748015
I’m joining OpenAI Codex to work on the future of agentic development! At Cursor, I got to see the shift from autocomplete to agents. The next step isn’t a better IDE. It’s an Agent Development Environment (ADE): systems and tools for orchestrating agents, reasoning over their
https://x.com/rohanvarma/status/2026017843859599628
Looking for a Codex meetup in your city? Our ambassador community is bringing Codex to you. Create and ship projects with your local developer community, compare workflows, grab coffee, and meet people building with Codex. https://x.com/OpenAIDevs/status/2024933773205492018
Pocket-sized AI assistant with OpenClaw on Raspberry Pi Zero 2 W 🤖 Push-to-talk → OpenAI transcription → OpenClaw on VPS → streaming text on tiny LCD (+TTS) PiSugar battery powered, Tailscale secure, ~$100 build. All credit to the amazing @basti_vkl! (If you don’t follow
https://x.com/IlirAliu_/status/2026372342276837861
Results are live for OpenAI’s Codex 5.3 model! Highlights include being #2 on Terminal Bench 2, #2 on our IOI benchmark, #3 on LiveCodeBench, and #4 on Vibe Code Bench.
https://x.com/ValsAI/status/2026385804940230786
Teams are using WebSockets in the Responses API to speed up agentic workflows
https://x.com/OpenAIDevs/status/2026059511241535628
WebSockets keep a persistent connection to the Responses API, allowing you to send only new inputs instead of round-tripping the entire context on every turn. By maintaining in-memory state across interactions, it avoids repeated work and speeds up agentic runs with 20+ tool
https://x.com/OpenAIDevs/status/2026025380562530453
🆕 The End of SWE-Bench Verified (2024-2026) https://t.co/HCmogFFG8w Today @OpenAIDevs is announcing the voluntary deprecation of SWE-Bench Verified! We’re releasing a podcast + analysis in today’s post. Saturation of SWE-Bench has been a community hot topic for over a year –
https://x.com/latentspacepod/status/2026027529039990985
OpenAI expands enterprise OpenAI is rolling out Frontier Alliances, teaming up with BCG, McKinsey, Accenture, and Capgemini to help enterprises deploy AI coworkers at scale. While Frontier provides the technical backbone, these multi-year partnerships focus on strategy,
https://x.com/kimmonismus/status/2025942986765279506
Exclusive: OpenAI Hires Meta AI Researcher Who Previously Led Apple’s Models Team — The Information https://www.theinformation.com/briefings/openai-hires-meta-ai-researcher-previously-led-apples-models-team
Which of these companies are secretly designing their own fully humanoid robot? – Amazon – OpenAI – Skild – Physical Intelligence – Google DeepMind – Apple My guess is all of them.
https://x.com/TheHumanoidHub/status/2024907483601666167
@bcherny @trq212 OK, so this is INSANE. 1189 calls to Claude. 100% nerfed down to Sonnet 4.5 in the last 30 days despite Claude Max. I’m so happy I have LangSmith for observability. There could be a bug on how this is reported. But right now, this is really bad… cc: @hwchase17 @Vtrivedy10
https://x.com/ChaiWithJai/status/2026446654753190324
`/plugin install slack` to connect Claude Code with Slack!
https://x.com/_catwu/status/2026485966626763120
Claude Code now supports auto-memory. This is huge!
https://x.com/omarsar0/status/2027117473229676864
CLIs are super exciting precisely because they are a “”legacy”” technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit. E.g ask your Claude/Codex agent to install this new Polymarket CLI and ask for any
https://x.com/karpathy/status/2026360908398862478
Guys – it’s Claude Code’s actual first birthday today – Feb 24 2025 was the launch, check it am i crazy or is @latentspacepod the only one doing a retrospective + anniversary pod today? did everyone just forget the most consequential AI product since ChatGPT? anyway… we did
https://x.com/swyx/status/2026462001933988094
How annoying it is that Claude puts some key details in ~/.claude/…/project/memory/*.md You don’t get full context anymore when switching to Codex! Before you could just point Codex to CLAUDE.md but now don’t forget to also mention home memory folder!
https://x.com/borisdayma/status/2027087042375553059
NanoClaw – a simpler OpenClaw alternative you can understand in 8 minutes It’s a minimal, container-isolated personal Claude assistant with: > The same functionality as OpenClaw but in a codebase > Filesystem isolation > WhatsApp I/O, Agent Swarms, scheduled tasks, web access,
https://x.com/TheTuringPost/status/2025876086035464512
This is maybe obvious but I think a lot of the reason people write complicated CLAUDE md files, skills, fancy custom wrappers is not because they meaningfully help but because they’re unsettled and grasping for ways to add value to a dev workflow where the defaults work fine.
https://x.com/bpodgursky/status/2025966899402625485
We just launched /remote-control so you can continue local Claude Code sessions from your phone This is now rolled out to all Max users!
https://x.com/_catwu/status/2026421789476401182
You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it.
https://x.com/theo/status/2025900730847232409
An update on our model deprecation commitments for Claude Opus 3 \ Anthropic https://www.anthropic.com/research/deprecation-updates-opus-3
langsmith can trace claude code! so when you think claude code is nerfed… you can set up some observability to back that up
https://x.com/hwchase17/status/2026452439327764521
Between Gemini 3.1 and Claude 4.6 it’s honestly wild what you can build. This feels like Google Earth and Palantir had a baby. Made this with all the geospatial bells and whistles — real time plane & satellite tracking, real traffic cams in Austin, and even got a traffic system
https://x.com/bilawalsidhu/status/2024672151949766950
Cowork and plugins for teams across the enterprise | Claude https://claude.com/blog/cowork-plugins-across-enterprise
@tetsuoai Banger 🤣🤣 How dare they steal the stuff Anthropic stole from human coders??
https://x.com/elonmusk/status/2026012296607154494
A friend had Claude spend all night trying to hack into an e-ink display, and gave Claude camera access so it could verify whether an attempt worked. He told Claude to show him a message if it won. My friend woke up to this victory lap, which Claude didn’t realize was backwards
https://x.com/Scav/status/2021656781521670487
Announcing a new Claude Code feature: Remote Control. It’s rolling out now to Max users in research preview. Try it with /remote-control Start local sessions from the terminal, then continue them from your phone. Take a walk, see the sun, walk your dog without losing your flow.
https://x.com/noahzweben/status/2026371260805271615
GPT-5.3-Codex + the Codex app is the best AI coding tool available right now. Slept on it for a bit. Likely going to move back to a ChatGPT Pro sub from Claude MAX because of how good it is. It’s so precise, accurate and excellent at following instructions. There are
https://x.com/daniel_mac8/status/2025994068577112454
WarClaude daddy and Codex mommy
https://x.com/bilawalsidhu/status/2026784286968357129
Exclusive: Hegseth gives Anthropic until Friday to back down on AI safeguards https://www.axios.com/2026/02/24/anthropic-pentagon-claude-hegseth-dario
I gained a lot of respect for Dario for being principled on the issues of mass surveillance and autonomous killbots. Principled leaders are rare these days
https://x.com/fchollet/status/2027195535594049641
I’ve published the first two chapters of a new guide to Agentic Engineering Patterns – coding practices and patterns to help get the best results out of coding agents like Claude Code and OpenAI Codex
https://x.com/simonw/status/2025990408514523517
If you like Claude Code/Codex and have 32GB of RAM: please run Qwen3.5-35B-A3B locally. There’s a before and after for local agents: reliable tool calling, stable agentic loops, only 3B active params. Punches way above its weight! Now is the best time to get started with local
https://x.com/victormustar/status/2026624792602808707
Qwen just released Qwen3.5 on Hugging Face A massive 397B parameter multimodal model with only 17B active, rivaling GPT5.2 and Claude 4.5 across benchmarks.
https://x.com/HuggingPapers/status/2025805747385221491
Dang. My WorldView project is blowing up and is trending on X. I guess ppl really like monitoring the situation. Inbound is a little nuts — got hedge funds and OSINT folks ready to contribute; keep the feature requests coming! Been fun to put my geospatial 3D roots to work.
https://x.com/bilawalsidhu/status/2024953470806102510
Explore any world. Tell any story. All in one place. Kling 3.0 is now available in both Runway Workflows and Tool Mode. Discover all of the new models and capabilities available right inside of Runway at the link below. Morningstar Generated with AI. Made by @ceremonial_flux
https://x.com/runwayml/status/2025977383208051018
Generated Reality Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control paper: https://x.com/_akhaliq/status/2025944948453847352
Introducing Solaris: the first multiplayer world model exploration effort in Minecraft. We’ve built a scalable data collection engine, a multiplayer video diffusion model architecture, and a multi-view consistency evaluation benchmark. [1/9]
https://x.com/georgysavva/status/2027119472096518358
Marble is a generative AI platform and multimodal world model developed by World Labs, the spatial intelligence company founded by AI pioneer Fei-Fei Li. It allows users to create high-fidelity, persistent, navigable 3D worlds from simple inputs like: – Text prompts – Single or
https://x.com/TheHumanoidHub/status/2024935236057137640
My site hit #25 in rising tech publications. I’m mapping the frontier of creation & computing. Written + video deep dives on generative media, spatial intelligence and world models. Check it out https://x.com/bilawalsidhu/status/2026108063632216492
Physical Intelligence’s π0.6 models in real-world use cases Weave (left): Autonomous laundry folding Ultra (right): E-commerce packaging The models are built on a Vision-Language-Action (VLA) framework.
https://x.com/TheHumanoidHub/status/2026455516034306150
What is self-evolution trilemma? In an ideal world, an AI system where agents learn only from each other would have3 properties: – Continuous self-evolution – Isolation, meaning running in a closed loop, without outside interference – Stable safety alignment (safety invariance)
https://x.com/TheTuringPost/status/2024621675866935495
I’m most concerned about autonomous systems for policing and surveillance which cannot disobey illegal orders A small elite could control everyone else and end democracy Military use of autonomous weapons is way less terrifying than this I wrote about this a little bit many
https://x.com/BlackHC/status/2026456906710327338
Can an agent survive as a worker in a real economy? Here is a super interesting economic benchmark for AI agents – ClawWork. It’s like a real-world labor market for LLM-based agents that evaluates them in an economic survival loop. ClawWork turns agents into AI coworkers and
https://x.com/TheTuringPost/status/2024960484378816894
Gemini 3.1 Pro just broke my code then stopped because they’re over capacity. I’m paying $250/month for this btw.
https://x.com/theo/status/2025896487557947886
This explains why the only major model that still suck at tool calling is Gemini
https://x.com/theo/status/2026045501960069204
Gemini 3.1 pro scores 72.1% on WeirdML, up from 69.9% for gemini 3.0. Gemini 3.1 seems to have both the highest peak performances of any models, but also some weird weaknesses as well. It uses almost 3 times the number of output tokens as 3.0, considering this, the increase
https://x.com/htihle/status/2025867003550958018
GPT-5.2-chat-latest, the newest model powering ChatGPT, is now in the Text Arena top 5! Highlights: ▪️Top 5 scoring 1478 on par with Gemini-3-Pro ▪️+40pt improvement over the GPT-5.2 model ▪️Top in key categories: Multi-Turn, Instruction-Following, Hard Prompts, Coding A strong
https://x.com/arena/status/2025966052950315340
New templates for Veo 3.1 in the Gemini app are rolling out today. To give them a try, go to https://t.co/382WL5xSvc or open the app, select “Create videos” in the tools menu, and pick a template from the gallery. Then make it your own with a reference photo and/or description.
https://x.com/GeminiApp/status/2026001595708866759
Veo 3.1 templates in @GeminiApp give you a head start on your videos. 🎬 Now, you can start with a high-quality visual foundation instead of a blank prompt. Pick an aesthetic that fits your mood, then use prompts to layer in the characters and scenery that make it yours.
https://x.com/Google/status/2026006156875804960
`/research` now available in the Copilot CLI Deep research across any OSS repo on the planet using GitHub’s advanced code search tools, and MCPs for fetching repo contents dynamically. Produce reports, export to gists, and share them with your team.
https://x.com/_Evan_Boyle/status/2026458533320077689
A lot of AI still requires too much setup and too much blind trust. Copilot Tasks is built so you can delegate work in plain language, see the plan, and stay in control, then get back finished output.
https://x.com/yusuf_i_mehdi/status/2027111916272001401
The Copilot CLI is now GA!
https://x.com/_Evan_Boyle/status/2026706464375796099
The hardest part wasn’t teaching the model where your next edit is, it was teaching it when to stay put. You make one change and already know where the next ones are. Now Copilot does too, so you stay in flow. Proud of this team.
https://x.com/alexdima123/status/2027163071551078845
We’ve been working on a whole new way to get things done: Copilot Tasks. AI that talks less and does more, no complicated setup or coding skills required. Just ask for what you need and Copilot will take it from there, like: – Turn a syllabus into a complete study plan, with
https://x.com/mustafasuleyman/status/2027111503003107377
When using Copilot CLI in terminal in @code , the agent will update the title in realtime. @burkeholland your loved feature is back!
https://x.com/njukidreborn/status/2026443296177008818
codex app-server is legit af i was just looking into it for a project and accidentally ended up making an actual native codex iphone app i can spawn and talk to codexes anywhere on my network and one of the best parts… I built and linked codex into the actual iphone app and
https://x.com/SIGKITTEN/status/2025073817467416983
Codex to Figma Join designer advocate, Ana Boyer and OpenAI’s Ed Bayes as they talk through roundtripping between code and canvas
https://x.com/figma/status/2027068943702364250
GPT-5.2 (Instant/Thinking/Pro): 74% on GDPVal, 1.4x cost of GPT 5.1, on 10 Year OpenAI Anniversary | AINews https://news.smol.ai/issues/25-12-11-gpt-52
GPT-5.3-Codex Pricing: $1.75 Input $14.0 Output
https://x.com/scaling01/status/2026379113099862018
I am increasingly asked during candidate interviews how much dedicated inference compute they will have to build with Codex. Pairing this with usage per user growing significantly faster than the number of users, it’s pretty clear that compute will be something that is scarce.
https://x.com/thsottiaux/status/2024635825997459841
Live on Cline (v3.67.1) @OpenAI ‘s GPT 5.3 Codex. The speed and token efficiency improvements are real. Here is what’s new: > 25% faster than 5.2 Codex > #1 on SWE-Bench Pro (4 different programming languages) > Fewer tokens per task than any prior OpenAI model Runs cost
https://x.com/cline/status/2026481089158779021
Our codex offsite left a deep impression on me. I am beyond excited for what the next 10 or so weeks will bring and I think the current state of coding agents will be remembered as being so primitive that it will be funny in comparison.
https://x.com/thsottiaux/status/2024687185409323202?s=20
We launched GPT-5.3-Codex in the API today
https://x.com/snsf/status/2026513135075746239
websockets for much faster agentic rollouts — yields 30% faster rollouts in codex:
https://x.com/gdb/status/2026380170765152302
weekend projects are so much more fun with codex
https://x.com/gdb/status/2025723937540485506
Agree, opus 4.5/6 & particularly codex 5.3 xhigh has legitimately shifted my use of agents so now most things in my work that _could_ be done in a cli is now almost always delegated to them, including long running tasks, monitoring output & intervening if a run is failing, etc
https://x.com/paul_cal/status/2027000070109909441
📊Noticeable improvements with @OpenAI’s GPT-5.2-Chat-Latest vs GPT-5.2 (#5 vs #29 Overall) Where GPT-5.2-Chat-Latest gains: Text: – Coding (+13: #6 vs #19) – Hard Prompts (+21: #4 vs #25) – Instruction Following (+21: #7 vs #28) – Longer Query (+10: #14 vs #24) – English (+33:
https://x.com/arena/status/2025986008484061391
Big news today if you’re into coding evals: SWE-Bench Verified is dead!! https://t.co/SPApcuM5uW i’m not sure if @HamelHusain is tired of me tagging him but it turns out @OpenAI really did look back at their own 2024 work and then you 1) look at the CoT and 2) look at the
https://x.com/swyx/status/2026029120040137066
Code → design → code Generate design files from code, collaborate in @Figma, and implement updates all within Codex without breaking your flow.
https://x.com/OpenAIDevs/status/2027062351724527723
I experienced a very similar transition in December. However, for higher-complexity tasks (ML-related), we are still not there yet. Two days ago I had GPT-5.2-PRO-ET and DeepThink argue for hours, converge, be happy, yet they missed a very obvious math issue. Still a huge unlock
https://x.com/MParakhin/status/2027027034828902421
Introducing WebSockets in the Responses API. Built for low-latency, long-running agents with heavy tool calls. https://x.com/OpenAIDevs/status/2026025368650690932
The Codex app lets you go further, do more in parallel, and go deeper on the problems you care about.”” — @gdb
https://x.com/OpenAIDevs/status/2024212279215198396
The standard for frontier coding evals is changing with model maturity. We now recommend reporting SWE-bench Pro and are sharing more detail on why we’re no longer reporting SWE-bench Verified as we work with the industry to establish stronger coding eval standards. SWE-bench
https://x.com/OpenAIDevs/status/2026002219909427270
uhhh WTF?! gpt-5.3-codex gets 86% on IBench, beating out all other models massively. I was NOT expecting this
https://x.com/adonis_singh/status/2026456939224510848
We expanded file input types so you can now pass docx, pptx, csv, xlsx, and more directly to the Responses API. Your agents can now pull context from real-world files and generate more accurate outputs. https://x.com/OpenAIDevs/status/2026420817568084436
We tested @OpenAI’s new WebSocket connection mode for the Responses API into Cline and the early numbers are wild. Instead of resending full context every turn, WebSocket mode keeps a persistent connection, sends only incremental inputs. With 5.2 Codex results vs the standard
https://x.com/cline/status/2026031848791630033
What did you build with Codex this weekend?
https://x.com/OpenAIDevs/status/2025712197100589353
Great meeting with PM @narendramodi today to talk about the incredible energy around AI in India. India is our fastest growing market for codex globally, up 4x in weekly users in the past 2 weeks alone. 🇮🇳!
https://x.com/sama/status/2024826822060290508
Introducing Perplexity Computer. Computer unifies every current AI capability into one system. It can research, design, code, deploy, and manage any project end-to-end.
https://x.com/perplexity_ai/status/2026695550771540489
Perplexity Computer uses usage‑based pricing with optional sub‑agent model selection and spending caps. Choose different models for different sub‑agent tasks and control token spend. Max users get 10,000 credits per month included with their subscription. We’re also giving a
https://x.com/perplexity_ai/status/2026695793537855526
We also built PPLXQuery2Query and PPLXQuery2Doc These internal web‑scale benchmarks with 115K real queries evaluated against 30M documents drawn from over 1B pages.
https://x.com/perplexity_ai/status/2027095027881750923
🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality,
https://x.com/Alibaba_Qwen/status/2026339351530188939
Qwen3.5-397B-A17B is now a top 7 open model in the Code Arena. It ranks #17 overall, on par with proprietary models like GPT-5.2 and Gemini-3-Flash. The Code Arena is where agentic capabilities are tested for real-world webdev tasks. Congrats to the @Alibaba_Qwen team! 👏
https://x.com/arena/status/2026337606137725363
Unsloth’s quantizations are pure art. 2 bit Qwen-3.5 highest performing local model on the benchmarks I’ve given it. It has vision, can code, full context (256k 8bit) is only 25gb in vram – 36 tokens/s gen – 220 tokens/s prefill I just don’t like GGUF the speeds are trash
https://x.com/0xSero/status/2026223879077712269
What happens when you make an LLM drive a car where physics are real and actions can’t be undone? I ported CARLA, the autonomous driving simulator, to OpenEnv and added training via TRL + HF Spaces In 50 steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians
https://x.com/SergioPaniego/status/2027064485056241971
The Qwen 3.5 Medium Models are in the Arena! 3.5-27B, 3.5-35B-A3B and 3.5-122B-A10B are ready for you in the Text, Vision and Code Arena! Let’s see how they stack up with less compute. Bring your toughest prompts and don’t forget to vote.
https://x.com/arena/status/2026716550812807181
✨ Run it now with SGLang!Chong!
https://x.com/Alibaba_Qwen/status/2026348924433477775
📊With all the Qwen-3.5 scores out for Text, Code and Vision, let’s compare the evolution of Qwen-3.5 (397B-A17B) vs Qwen-3.0 (235B-A22B). This is a +24 rank jump in Text. Specially where Qwen-3.5 gains the most: Text: – Overall (+24: #19 vs #43) – English (+25: #21 vs #46) –
https://x.com/arena/status/2026404630297719100
🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:
https://x.com/Alibaba_Qwen/status/2026682179305275758
🚩Qwen3.5 INT4 model is now available! https://t.co/rY5GrT3b60 @Alibaba_Qwen @JustinLin610
https://x.com/HaihaoShen/status/2026208062009426209
A big jump in intelligence-per-watt today: “”Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507″”
https://x.com/awnihannun/status/2026353100144218569
Huge thanks to the @vllm_project for the Day-0 support on the Qwen3.5 Medium Series 🚀
https://x.com/Alibaba_Qwen/status/2026496673179181292
Minimax M2.5 GGUFs (from Q4 down to Q1) perform poorly overall. None of them come close to the original model. That’s very different from my Qwen3.5 GGUF evaluations, where even TQ1_0 held up well enough. Lessons: – Models aren’t equally robust, even under otherwise very good
https://x.com/bnjmn_marie/status/2027043753484021810
Qwen 3.5 family is here! > vision built-in, and can outperform previous VL models > designed to be more efficient > expanded support for more languages 35B: (fits on 24GB+ system) ollama run qwen3.5:35b 122B: ollama run qwen3.5:122b 397B (cloud only): ollama run
https://x.com/ollama/status/2026598944177009147
Qwen3.5-35B-A3B is now in Jan 🔥
https://x.com/Alibaba_Qwen/status/2026660582221558190
Qwen3.5-35B-A3B is now live in LM Studio 🚀
https://x.com/Alibaba_Qwen/status/2026496880285462962
Taken at face value, this is… somewhat catastrophic for MoEs, as @YouJiacheng notes. By right, a 397B-A17B ought to have a higher “”power level”” than a dense 27B. Also a big W for Qwen’s integrity and HLE eval quality, I guess. 397B is certainly better at memorization.
https://x.com/teortaxesTex/status/2026690994029072512
the conclusion should not be about moe vs dense, but that you can “”benchmaxx”” (not always a bad thing btw) HLE with tools no matter the model size the difference between Qwen3.5-35B-A3B and Qwen3.5-397B-A17B is only 1 point
https://x.com/eliebakouch/status/2026727151978840105
The new Qwen3.5 Medium models are ready to run 🔥 GGUF support is here! Big thanks to @UnslothAI for making it happen so quickly 🚀
https://x.com/Alibaba_Qwen/status/2026497723944546395
The Qwen3.5 series maintains near-lossless accuracy under 4-bit weight and KV cache quantization. In terms of long-context efficiency: Qwen3.5-27B supports 800K+ context length Qwen3.5-35B-A3B exceeds 1M context on consumer-grade GPUs with 32GB VRAM Qwen3.5-122B-A10B supports
https://x.com/Alibaba_Qwen/status/2026502059479179602
Why benchmarks like Peter’s “”Bullshit Benchmark”” or my ShizoBench matter so much and what do Strawberries have to do with it? I was very skeptical of the performance of Qwen3.5-27B on ArtificialAnalysis leaderboard. So I’m testing the model myself a bit. Naturally I tried the
https://x.com/scaling01/status/2027110908775002312
Qwen3.5-397B-A17B is currently the #1 trending model on Hugging Face. 🏆 This flagship open-weight model is designed for high-performance inference and complex reasoning. 🚀 Try it now on Hugging Face: https://x.com/Ali_TongyiLab/status/2026211680653611174
NVIDIA just released a Blackwell-optimized Qwen3.5 MoE on Hugging Face 397B parameters quantized to NVFP4 for 2x faster inference with SGLang.
https://x.com/HuggingPapers/status/2025825405836648849
Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: – #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 – #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive
https://x.com/arena/status/2026566773496230383





Leave a Reply