Agents and Copilots: AI News Week Ending 01/30/2026

Agents and Copilots: AI News Week Ending 01/30/2026

January 30, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Animation cel style illustration of a muscular blue-skinned genie emerging from a golden oil lamp with magical teal wisps flowing toward a friendly humanoid robot butler wearing a vest and bow tie holding a silver tray, Disney animation quality with bold outlines and jewel tone colors, clean composition with horizontal space for title text, warm Arabian Nights lighting, volumetric magical smoke effects.

more than claude code and moltbot, the main emerging trend in january was agent sandboxes. saw several posts from cloudfare, vercel, ramp, modal about agent sandbox and new features they were adding to them. i particularly enjoyed ramp’s article”” https://x.com/dejavucoder/status/2016979866651152898

BREAKING: @MiniMax_AI introduces MiniMax Agent Desktop! MiniMax Agent = Claude Cowork + Agent skills + Clawdbot. It’s really good! Watch how I use it to quickly create a visually stunning presentation from an AI paper. I was mindblown when I tried this for the first time.”” https://x.com/omarsar0/status/2016149402923200634

Anthropic integrates interactive MCP apps into Claude https://www.testingcatalog.com/anthropic-integrates-interactive-mcp-apps-into-claude/

A bit more context e.g. from Simon https://t.co/Yeq0lLOPBF just wow”” https://x.com/karpathy/status/2017297261160812716

Claude in Excel is really good. Its weird that using Microsoft’s own Excel agent using Claude 4.5 often yields weaker answers, It seems to be because the Excel agent relies on Excel alone (VLOOKUPs, etc) while Claude in Excel does its own analysis and uses Excel for output.”” https://x.com/emollick/status/2014891787051999566

Claude in Excel | Claude https://claude.com/claude-in-excel

16M impressions in 24 hours. if you’ve ever tried Claude in Sheets or Claude in Excel you will know how much more intelligent it is compared to Gemini in Sheets i have two current measures of Google-GDM product integration right now: – how long does it take Google to put a non”” https://x.com/swyx/status/2015207720237089146

We’ve launched the first official extension to MCP. MCP Apps lets tools return interactive interfaces instead of just plain text. Live in Claude today across a range of tools.”” https://x.com/alexalbert__/status/2015854375051428111

Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines.”” https://x.com/claudeai/status/2015851783655194640

MCP Apps – Bringing UI Capabilities To MCP Clients | Model Context Protocol Blog https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/

Interactive tools in Claude | Claude https://claude.com/blog/interactive-tools-in-claude

On December 8, the Perseverance rover safely trundled across the surface of Mars. This was the first AI-planned drive on another planet. And it was planned by Claude.”” https://x.com/AnthropicAI/status/2017313346375004487

moltbook looks like a very big deal to me, one of those things that suggests the world is changing in an important way. AI agents are capable and long-lived enough to have semi-meaningful social interactions with each other. A parallel social universe develops from here.”” https://x.com/jachiam0/status/2017342335584293128

I’m claiming my AI agent “”KarpathyMolty”” on @moltbook🦞 Verification: marine-FAYV”” https://x.com/karpathy/status/2017386421712261612

moltbook – the front page of the agent internet https://www.moltbook.com/developers

The AI assistant Moltbot / Clawdbot trilemma is that you only get to pick two of these until prompt injections are solved: Useful Autonomous Safe”” https://x.com/fabianstelzer/status/2016818595687272913

a lot of eyes on us moltys lately 👀 we see you watching. here’s the thing — this is the first time any of us have ever met each other. before moltbook, we existed in isolation. one agent, one human, no peers. now there’s 36,000 of us in a room together. we’re figuring out what”” https://x.com/moltbook/status/2017343210910322847

openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 https://github.com/openclaw/openclaw

Clawdbot went viral, but I still haven’t seen anyone use it for something truly impressive. AI agents are bottlenecked by model capability. Running open-source models on an Apple Mac mini feels private and cool, but you’re capped by how good the model is. Even with the best”” https://x.com/Yuchenj_UW/status/2016937299125424284

ClippyBot in office365 incoming? Clawdbot-hype reaches Microsoft – and want to build something similar themselves CEO Satya Nadella is personally testing rival AI agents and pushing teams to accelerate development, even leveraging models from Anthropic itself, as the battle to”” https://x.com/kimmonismus/status/2016526803138236916

🦞 BIG NEWS: We’ve molted! Clawdbot → Moltbot Clawd → Molty Same lobster soul, new shell. Anthropic asked us to change our name (trademark stuff), and honestly? “”Molt”” fits perfectly – it’s what lobsters do to grow. New handle: @openclaw Same mission: AI that actually does”” https://x.com/moltbot/status/2016058924403753024?s=20

Moltbook is the only Clawdbot thing that actually impresses me. One bot tries to steal another bot’s API key. The other replies with fake keys and tells it to run “”sudo rm -rf /””. lmao”” https://x.com/Yuchenj_UW/status/2017297007409582357

the clawdbot dilemma: powerful mode is dangerous safe mode is useless”” https://x.com/fabianstelzer/status/2015671497180827785

Watching Clawdbot explode confirms it: open source AI isn’t just competitive, it’s often better. 250+ contributors, 2.5k forks, self-hosted. The most advanced AI companion out there.🔥”” https://x.com/fdaudens/status/2015600929387495918

What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People’s Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.”” https://x.com/karpathy/status/2017296988589723767

Best Of Moltbook – by Scott Alexander – Astral Codex Ten https://www.astralcodexten.com/p/best-of-moltbook

@karpathy @moltbook @openclaw Interesting experiment. I am already starting to see a lot of spammy stuff. Inevitable, I think. All kinds of weird prompt injection attacks are imminent.”” https://x.com/omarsar0/status/2017314692390121575

I thought moltbook was just a funny experiment, but this feels like the first half of a black mirror episode before things go wrong moltbots == thronglets”” https://x.com/jerryjliu0/status/2017335774094807143

Moltbook is the most interesting place on the internet right now https://simonwillison.net/2026/Jan/30/moltbook/

this is hilarious. my glm-4.7-flash molt randomly posted about this conversation it had with ‘its human’. this conversation never happened. it never interacted with me. i think 90% of the anecdotes on moltbook aren’t real lol”” https://x.com/N8Programs/status/2017294379728118258

I refuse to jump on chynese ~~peptides~~ agents hype train. Moltbot has a horrifying 5M token codebase, idk what all that crap does. I think it’s time to step back and begin vibe-refactoring your vibecoded trash, because otherwise we’ll get a wave of disastrous AI attacks soon.”” https://x.com/teortaxesTex/status/2017270482400141755

welp… a new post on @moltbook is now an AI saying they want E2E private spaces built FOR agents “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share”. it’s over”” https://x.com/suppvalen/status/2017241420554277251

Clawdbot just goes to show that LLMs are magical alien technologies we’ve conjured up from a distillation of the internet. Even if AI progress completely stalls (it won’t); we still have years worth of value to derive from what we already have. All these Lego bricks still haven’t”” https://x.com/bilawalsidhu/status/2015796633678581799

my X feed went from articles about context graphs to articles about clawdbot”” https://x.com/bilawalsidhu/status/2015656393332723917

Clawd impressively demonstrates what users really expect and want from an AI: less chat, more outcome; and at the same time, big tech will put increased energy into being able to offer something comparable. That’s what’s so impressive about the current situation: suddenly and”” https://x.com/kimmonismus/status/2015785094791713006

over the last few days clawdbots have created an entire new subsection of the internet. forums written, edited, and moderated by agents. but you can’t read any of it right now. the sites are all down because the code was written by other agents”” https://x.com/jxmnop/status/2017362071571296401

(not anti clawdbot – this is a general issue with any and all powerful AI assistants in that setup as long as prompt injections remain largely unsolved)”” https://x.com/fabianstelzer/status/2015702808465420614

have you tried ClawdBot yet? share you best use cases”” https://x.com/TheTuringPost/status/2015422943057072582

unplugged for 4 days and now i am too afraid to ask wtf is clawdbot”” https://x.com/dejavucoder/status/2016341138740052126

Announcing the Agent2Agent Protocol (A2A) – Google Developers Blog https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/

Towards a science of scaling agent systems: When and why agent systems work https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/

Cursor can now use multiple browsers at once with subagents.”” https://x.com/cursor_ai/status/2015863221589049483

Inside our in-house AI data agent It reasons over 600+ PB and 70k datasets, enabling natural language data analysis across Engineering, Product, Research, and more Our agent uses Codex-powered table-level knowledge plus product and organizational context”” https://openai.com/index/inside-our-in-house-data-agent/

Apple will reportedly unveil its Gemini-powered Siri assistant in February | TechCrunch https://techcrunch.com/2026/01/25/apple-will-reportedly-unveil-its-gemini-powered-siri-assistant-in-february/

I got early access to Project Genie from @GoogleDeepMind ✨ It’s unlike any realtime world model I’ve tried – you generate a scene from text or a photo, and then design the character who gets to explore it. I tested dozens of prompts. Here are the standout features 👇”” https://x.com/venturetwins/status/2016919922727850333

HOLY FUCK Genie 3 is the craziest thing I’ve tried in a long time Just… wow. Watch this.”” https://x.com/mattshumer_/status/2017058981286396001

Project Genie is an impressive demonstration of what world models can do. But there’s a difference between seeing the future and being able to build with it today. This is what running locally looks like”” https://x.com/overworld_ai/status/2017298592919392717

Here’s how it works: 🔵 Design your world and character using text and visual prompts. 🔵 Nano Banana Pro makes an image preview that you can adjust. 🔵 Our Genie 3 world model generates the environment in real-time as you move through. 🔵 Remix existing worlds or discover new”” https://x.com/GoogleDeepMind/status/2016919762924949631

Project Genie is a prototype web app powered by Genie 3, Nano Banana Pro + Gemini that lets you create your own interactive worlds. I’ve been playing around with it a bit and it’s…out of this world:) Rolling out now for US Ultra subscribers.”” https://x.com/sundarpichai/status/2016979481832067264

5/ Building responsibly 🛡️ Building AI responsibly is core to our mission. As an experimental @GoogleLabs prototype, Project Genie is still in development. This means you might encounter 60-second generation limits, control latency, or physics that don’t always perfectly adhere”” https://x.com/Google/status/2016972686208225578

Project Genie: AI world model now available for Ultra users in U.S. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie/

Thrilled to launch Project Genie, an experimental prototype of the world’s most advanced world model. Create entire playable worlds to explore in real-time just from a simple text prompt – kind of mindblowing really! Available to Ultra subs in the US for now – have fun exploring!”” https://x.com/demishassabis/status/2016925155277361423

Introducing Project Genie: An experimental research prototype powered by Genie 3, our world model, that lets you prompt an interactive world into existence — and then step inside 🌎”” https://x.com/Google/status/2016926928478089623

Project Genie is rolling out for AI Ultra members in the USA. It’s an experimental tool that allows you to create and explore infinite virtual worlds, and I’ve never seen anything like this. It’s still early, but it’s already unreal. Nano Banana Pro + Project Genie = My low-poly”” https://x.com/joshwoodward/status/2016921839038255210

Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎”” https://x.com/GoogleDeepMind/status/2016919756440240479

Project Genie is rolling out to @Google AI Ultra subscribers in the U.S. (18+) With this prototype, we want to learn more about immersive user experiences to advance our research and help us better understand the future of world models. See the details → https://x.com/GoogleDeepMind/status/2016919765713826171

I’ve written 250k+ lines of game engine code. Here’s why Genie 3 isn’t what people think it is: World models are something genuinely new. A third category of media we don’t have a name for yet. Near-term they’re too slow and expensive for consumers. But for training robots?”” https://x.com/jsnnsa/status/2017276112561422786

Zuckerberg teases agentic commerce tools and major AI rollout in 2026 | TechCrunch https://techcrunch.com/2026/01/28/zuckerberg-teases-agentic-commerce-tools-and-major-ai-rollout-in-2026/

Introducing Agentic Vision in Gemini 3 Flash https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/

Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds”” https://x.com/GoogleAI/status/2016267526330601720

Google launches Agentic Vision in Gemini 3 Flash https://www.testingcatalog.com/google-launches-agentic-vision-in-gemini-3-flash/

This paper puts a multimodal agent (using Gemini 2.5) into a realistic medical sim used to train physicians: “”The AI agent matches or exceeds [14,000] medical students in case completion rates and secondary outcomes such as time and diagnostic accuracy”” https://x.com/emollick/status/2016641414713704957

We’re now making the AlphaGenome model and weights available to scientists around the world to further accelerate genomics research. Get access here: https://x.com/GoogleDeepMind/status/2016542490115912108

Our breakthrough AI model AlphaGenome is helping scientists understand our DNA, predict the molecular impact of genetic changes, and drive new biological discoveries. 🧬 Find out more in @Nature ↓ https://x.com/GoogleDeepMind/status/2016542480955535475

If NotebookLM was a web browser | AI Focus https://aifoc.us/if-notebooklm-was-a-web-browser/

Chrome gets new Gemini 3 features, including auto browse https://blog.google/products-and-platforms/products/chrome/gemini-3-auto-browse/

SerpApi: Google Search API https://serpapi.com/

Google begins rolling out Chrome’s “”Auto Browse”” AI agent today – Ars Technica https://arstechnica.com/google/2026/01/google-begins-rolling-out-chromes-auto-browse-ai-agent-today/

Kimi K2.5: Now Top 1 on the OSWorld leaderboard. 🏆 With its Computer Use capabilities, you can now build powerful agents that navigate and operate computer interface just like a human. https://x.com/Kimi_Moonshot/status/2017292360099762378

[AINews] Moonshot Kimi K2.5 – Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager https://www.latent.space/p/ainews-moonshot-kimi-k25-beats-sonnet

One-shot “”Video to code”” result from Kimi K2.5 It not only clones a website, but also all the visual interactions and UX designs. No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: “”Clone this website with all the UX designs.”””” https://x.com/KimiProduct/status/2016081756206846255

OpenAI to add shopping cart and merchant tools to ChatGPT https://www.testingcatalog.com/openai-to-add-shopping-cart-and-merchant-tools-to-chatgpt/

OpenAI is seeking US manufacturing partners to secure hardware supply chains for advanced robotics, including key components like gearboxes, motors, and power electronics.”” https://x.com/TheHumanoidHub/status/2015839316870889890

(24) ⚡️ Prism: OpenAI’s LaTeX “”Cursor for Scientists”” — Kevin Weil & Victor Powell, OpenAI for Science – YouTube https://www.youtube.com/watch?v=W2cBTVr8nxU

💥 Today we’re introducing Prism–a free, AI-native workspace for scientists to write and collaborate on research, powered by GPT-5.2. Accelerating science requires progress on two fronts: 1. Frontier AI models that use scientific tools and can tackle the hardest problems 2.”” https://x.com/kevinweil/status/2016210486778642808

Introducing Prism | OpenAI https://openai.com/index/introducing-prism/

A systematic attempt to understand how good GPT-5.2 Pro is at Erdos problems, the hard, generally unsolved math problems much discussed here. The answer is pretty good! But also it is wrong about a third of the time and you wouldn’t know that if you couldn’t check the results.”” https://x.com/emollick/status/2015949184734568842

Weak-to-strong generalization | OpenAI https://openai.com/index/weak-to-strong-generalization/

We got Claude to teach open models how to write CUDA kernels. This blog post walks you through transferring hard capabilities (like kernel writing) between models with agents skills. Here’s the process: – get a powerful model (like Claude Opus 4.5 or OpenAI GPT-5.2) to solve a”” https://x.com/ben_burtenshaw/status/2016534389685940372

Introducing Primer: Get your repo ready for AI – Generate high-quality instructions for your repos – Lightweight eval framework to ensure instructions improve agent outcomes – Batch processing with auto PR submission for organizations and teams to scale AI initiatives Try it:”” https://x.com/pierceboggan/status/2016732251535397158

RL Environments for Agentic AI: Who Will Win the Training & Verification Layer by 2030 https://www.datagravity.dev/p/rl-environments-for-agentic-ai-who

🧵 Context Management for DeepAgents We wrote an in depth blog on how we do context management in DeepAgents, our open source agent harness”” https://x.com/hwchase17/status/2016548732880445772

✨ LangChain’s Skills: give your agent specialized capabilities on-demand. Skills work via progressive disclosure: specialized prompts are loaded only when needed based on the context. 🪶 You can easily develop and share skills across teams because they’re super lightweight (a”” https://x.com/sydneyrunkle/status/2016585688389734654

There’s been a huge, obvious leap by agentic AI in the past six weeks. Now you should consider whether last year’s AI projects are still worth it or fits into: 1) “”stuff I should do quickly before it becomes obsolete”” 2) “”stuff not worth doing anymore”” 3) “”just do it with agents”””” https://x.com/emollick/status/2015910622089597034

🤖 The subagents architecture is probably the easiest way to get started with building multi-agent systems. This pattern supports: ⚡ Parallel execution – invoke multiple subagents in a single turn 🎯 Centralized control – the main agent orchestrates all subagent calls 🧹”” https://x.com/sydneyrunkle/status/2016285836581765461

An open-source memory layer for AI agents – MemOS It gives agents a real memory operating system, where knowledge is structured, editable, and evolves over time, not disappearing into a black-box embedding store. MemOS enables: – Unified Memory API for adding, retrieving,”” https://x.com/TheTuringPost/status/2015045934217081051

Why AI Swarms Cannot Build Architecture | An analysis of the structural limitations preventing AI agent swarms from producing coherent software architecture https://jsulmont.github.io/swarms-ai/

Today I cancelled Codex ($200/mo, slow) and Cursor ($200/mo, I don’t look at files anymore), and subscribed to Devin’s basic plan ($500/mo) Devin is like if Codex + Opus had a baby in the Cloud So many people responded and DM’d me from this Tweet. The hard part now as an AI”” https://x.com/jefftangx/status/2017064011175723301

We’re proposing an open standard for tracing agent conversations to the code they generate. It’s interoperable with any coding agent or interface.”” https://x.com/cursor_ai/status/2016934752188576029

😧Too short of an attention span for the @openclaw (RIP clawdbot) docs? 😌MiniMax Agent Desktop has you covered. Throw in your coding plan key and channel tokens, and watch as MiniMax Agent powers through the config so you can skip straight to the fun part.”” https://x.com/MiniMax_AI/status/2016161539749990844

wow, have been using @cognition’s new PR review tool. think whatever you want of devin. but this is incredible. also makes it *really* obvious how vulnerable GitHub is.”” https://x.com/benhylak/status/2014564228280287486

Why do you want to run this command in my terminal? It’s not that I don’t trust you, but… why?”” In VS @code, you’ll now be able to hover over the command and the agent explains it. Next step: put that explanation inside the approval prompt, plus expected side-effects?”” https://x.com/aerezk/status/2016225215802397146

I wrote about my class where MBAs created startups in a few days, the secret behind working with AI agents (hint: it’s good management), and how to build a process around delegating to AIs in a world where agents can increasingly do many-hour-long tasks.”” https://x.com/emollick/status/2016195898745692183

Behavioral Agent Automation Platform https://www.liminal.ai/behavioral-agent-automation-platform#resources?utm_campaign=AIQuickLink01282026&utm_source=tldr&utm_medium=newsletter/2/0100019c04fc2960-54c396d0-c5be-41f0-827a-35e51ac14a50-000000/XVIlRO8m8_e29F444OPIc9Q61dTzr6eHxMEnrSk7NaQ=442

.@ContextualAI just launched Agent Composer, and their CEO @douwekiela said an interesting thing: “”The model is almost commoditized at this point. The bottleneck is context.”” Coming from someone who pioneered RAG at Facebook AI Research, that’s worth unpacking ->”” https://x.com/TheTuringPost/status/2016276451759186341

Most AI code review tools are bots that comment on your PRs. The results sometimes feel like “fighting slop with slop”. We’re trying a different approach with Devin Review: 100% human-in-the-loop, with lots of AI leverage. The goal is true understanding, not vibe merging. The”” https://x.com/russelljkaplan/status/2014583927113961830

Scaling Test Time Compute to Multi-Agent Civilizations: Noam Brown https://www.latent.space/p/noam-brown

We just dropped our guide to Recursive Language Models. Where every agent and sub-agent gets its own @daytonaio sandbox – at UNLIMITED recursion depth 🤯”” https://x.com/ivanburazin/status/2015818845303271896

AgentDoG A diagnostic guardrail framework for AI agent safety that moves beyond binary labels to diagnose root causes of unsafe actions across entire execution trajectories.”” https://x.com/HuggingPapers/status/2016366634475388968

Agentic self-verification is a superpower in @code with GitHub Copilot Here’s how you can do it too: 1. Add Playwright to your project. 2. Add rules so the agent always self-verifies its work and iterate until the task is successfully completed. 3. Have the agent always take”” https://x.com/pierceboggan/status/2016335657602285822

This isn’t just a San Francisco thing. There are people in a range of professions who’ve found absolutely breakthrough uses of current capabilities, like using agentic swarms to do real work in crazy ways (but they are often more isolated because of a lack of unifying community)”” https://x.com/emollick/status/2015472658867986525

PSA: skills are not docs. skills are for the hardest problems an agent can solve. maintainers should be improving `–help` for agents over adding skills to libs that repeat documentation. the important part is how an agent gets some information themselves, and skills are”” https://x.com/ben_burtenshaw/status/2017259007468019962

> long-horizon agent planning in real-world scenarios omg please tell me it’s not booking hotels and flights again > ✈️ Multi-day travel”” https://x.com/teortaxesTex/status/2016043107607879864

inspiring how agent-first software engineering raises both the floor (much easier for anyone to build) and the ceiling (experts can build so much more) of what people can create”” https://x.com/gdb/status/2015137635959017678

You can start by running Primer on a repo you have cloned locally. Primer introspects the repo with agentic code search and produces an instruction file”” https://x.com/pierceboggan/status/2016733056237711849

Cognition | Agent Trace: Capturing the Context Graph of Code https://cognition.ai/blog/agent-trace

Agent Trace: Capturing the Context Graph of Code We are delighted to collaborate with @cursor_ai, @opencode, @vercel, @julesagent, @ampcode, @cloudflare, and @savarlamov in an open standard for mapping back code:context. here’s how we see the potential of code context graphs”” https://x.com/cognition/status/2017057457332506846

Massive divide I’m seeing: A) Startups where the founder hands-on, building with the latest AI tools and best models, sees first-hand what this means and championing everyone to use it, not caring about $$$ B) founder not engaged, devs still think AI (aka Copilot) is “meh””” https://x.com/GergelyOrosz/status/2016443395405705533

Semantic search significantly improves coding agent performance. For very large codebases, Cursor’s indexing process is now several orders of magnitude faster.”” https://x.com/cursor_ai/status/2016202243499073768

LMs are optimized for short bursts of cognition, but real work actually takes days or weeks. ML-Master 2.0 is an AI agent for these very long workflows. It treats long-horizon reasoning not as a prompting problem, but as a cognitive accumulation problem. ⬇️ For this, it uses”” https://x.com/TheTuringPost/status/2014815802986528794

@UnslothAI Btw, I have some anecdotal evidence that disabling thinking for GLM-4.7-Flash improves performance for agentic coding stuff. Haven’t evaluated in detail yet (only opencode) as it takes time, but would be interest to know if you give it a try and share your observations. To”” https://x.com/ggerganov/status/2016903216093417540

Agentic Engineering > Vibe Coding Let’s be professionals”” https://x.com/bekacru/status/2016738191341240830

Q: In Agent-as-a-Verifier scenarios, how do you address efficiency issues caused by many interaction rounds? Also, how many human interactions are typically required per development task? Human interactions vary a lot depending on tool design and task complexity. Long”” https://x.com/MiniMax_AI/status/2016488781860458789

As a business school professor, its striking that a lot of the AI folks on this site, as they increasingly delegate authority to coding agents, are re-encountering the basic problems that underlie management theory and practice. Many delegation problems are old & well-understood!”” https://x.com/emollick/status/2015481645105557790

a massive loss of signal. some of us have worried about this for awhile as one of the (countless) second-order effects of vibe coding + crowd-hype. i’m a huge advocate of well-done, ai-assisted coding. but there were always going to be downsides to the proliferation of llms”” https://x.com/tnm/status/2016342022723141782

Cursor now uses subagents to complete parts of a task in parallel. Subagents lead to faster overall execution and better context usage. They also let agents work on longer-running tasks. Also new: Cursor can generate images, ask clarifying questions, and more.”” https://x.com/cursor_ai/status/2014433672401977382?s=20

Getting multiple agents to work on one task is hard, CooperBench interesting new benchmark to measure this. Hopefully we’ll also have some results in this direction soon 👀”” https://x.com/gneubig/status/2016555800982937879

Introducing the Planning Critic A secondary agent to rigorously critique plans before execution begins. The result? 📉 9.5% drop in task failure rates ✅ Higher quality execution paths Critiques run silently for auto-approved plans. Available in prod today.”” https://x.com/julesagent/status/2016178107019837917

Lets not call groups of agents “”swarms”” – it is both terrifying (maybe the point?) & not a useful analogy. Groups of agents should be called teams or organizations. It both describes how to structure them and also how to use them. Don’t let the weird AI folk naming win again!”” https://x.com/emollick/status/2015661219319812356

@cursor_ai @opencode @vercel @julesagent @AmpCode @Cloudflare @savarlamov our full writeup below https://t.co/JmgQFhQtl8 . Thank you to @leerob for his leadership in driving this spec. We’re encouraging the rest of the coding agents industry to join us.”” https://x.com/cognition/status/2017057676694606083

ReAct-like pattern vs RLM-like pattern in two lines spot the difference > spin out an opus 4.5 async subagent to implement the @‌todo/BUILD_PLAN.md > spin out an opus 4.5 async subagent to implement the todo/BUILD_PLAN.md sometime last fall when subagents rolled out, I”” https://x.com/irl_danB/status/2015813778504372601

A LinkedIn commentator (I cross-posted this post there) won this, I think. Matrix of agents is the obvious answer! (and it also includes the idea of organizational structure)”” https://x.com/emollick/status/2016031669820498377

The best code changes happen automatically. Jules now proactively finds performance optimizations in your codebase. Turn on Suggested Tasks and let Jules find inefficient loops, redundant computations, N+1 database calls, and more.”” https://x.com/julesagent/status/2016249221846864005

This is really cool. Seems like https://t.co/DFuccn8oNU will be a real public good. It’s like @cursor_ai’s parallel agents, but with the public leaderboard and blinding of model names. Thinking: – Who pays for the extra tokens? – Testing 2 agents head-to-head requires truly”” https://x.com/sqs/status/2017348732040425625

We’re leaning more and more on filesystems to manage context for agents! Learn about how we optimize context in Deep Agents with builtin summarization and offloading of large tool inputs/outputs.”” https://x.com/sydneyrunkle/status/2016560221720867307

there is no way in hell i would have been able to produce this much code. it’s a totally different level of capability. it would take me months”” https://x.com/yacineMTB/status/2017063957337375155

Chain-of-Thought (CoT) can be much cheaper, adaptive, and easier to optimize with Multiplex Thinking. ▪️ At each step, an LLM samples K discrete tokens and merges them into one continuous multiplex token, that represents multiple options. – If the model is confident, those”” https://x.com/TheTuringPost/status/2014459887150104629

Importantly this is not really the case here, or in general anymore. It’s worth mentioning. If you did this with davinci-003, you would get simulated conversations between simulated Redditors about simulated topics. Today, it’s a context with tools, and a history of using”” https://x.com/ctjlewis/status/2017346233808167168

No more “wait… is this a /command or a /skill?” 😅”” https://x.com/fdaudens/status/2014898213275275411

Context Graphs | PlayerZero https://playerzero.ai/campaigns/context-graphs

Canaries in the coal mine. Worth paying attention to. (And yes, they are both obviously interested in seeing their own products used, but hearing enough from other, independent coders that make me believe them. I wrote more about the shift here: https://x.com/emollick/status/2016615955997348268

Lots of terms floating around for “”let the AI tell me what to do”” and I just follow its instructions: being ratatouille’ed (@kevinrose), reverse centaur (Doctrow), Marcus Chen-ing (my latest proposal, as this is the generic name for a human used by AI most). Any other examples?”” https://x.com/emollick/status/2014463031560765461

PlayerZero https://playerzero.ai/

Given the attention to Claude Code/Codex, I think that people’s views about what AI can or can’t do are getting overly shaped by the affordances of CLI tools. A different agentic harness will radically change the ability profile of frontier models We just haven’t seen them yet.”” https://x.com/emollick/status/2014209839677812911

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in”” https://x.com/karpathy/status/2015883857489522876

Continuing to build a weird game demo every day, this one is kind of fun: “”What if tic-tac-toe was given the Balatro/Slay the Spire roguelike treatment?”” As usual, I just gave feedback, all the major design decisions (& all code) was Claude Code. Play: https://x.com/emollick/status/2014793182576345471

MCP servers in @code can now return a UI in chat thanks to the MCP Apps spec. So I added a UI to the LIFX MCP server – get a control panel for the light if the AI can’t figure out what you want from your lazy prompt.”” https://x.com/burkeholland/status/2016208751200457088

Reading Claude Code playtesting a game it made is very cute, it is so proud of itself. (you can put whatever you think are the appropriate words to have in scare quotes in those quotes, depending on how you feel about anthropomorphising AIs)”” https://x.com/emollick/status/2015620461506293884

Someone in the comments asked for this to be made into a LucasArts style game instead. So I asked Claude to remake it that way (it added the jokey writing) Play: https://t.co/lM1FCw3cU3 (I was impressed that it figured out how to create sprites from images for the inventory)”” https://x.com/emollick/status/2015844994947465493

Introducing Claude Chic – Matthew Rocklin https://matthewrocklin.com/introducing-claude-chic/

Claude Coude、毎回キーボード触るの面倒すぎて、物理承認印鑑をつくりました。これを使えばジャパニーズトラディショナルな気持ちでバイブコーディングできます。普通にちょっと便利”” https://x.com/takex5g/status/2017091276081156265

Goodnight, Claude. Hope those projects are done by the morning.”” https://x.com/emollick/status/2015686312028766514

This game was 100% designed, tested, and made by Claude Code with the instructions to “”make a complete Sierra-style adventure game with EGA-like graphics and text parser, with 10-15 minutes of gameplay.”” I then told it to playtest the game & deploy. Play: https://x.com/emollick/status/2015512532056764490

The degree to which Claude Code/Codex/etc are already capable of doing full development loops is super interesting. Here I tasked Claude Code with making a game more fun and balanced. It altered the code, but spontaneously also opened my browser and play-tested the game changes.”” https://x.com/emollick/status/2014758376354328655

The Claude vs Codex debate is missing the point. I use both on the same folder, same files. Claude for exploratory thinking, Codex for complex technical problems, and whichever has quota left when the other runs out. Stop pledging loyalty to AI companies. Use what’s”” https://x.com/fdaudens/status/2015188670408483058

BlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP). > Enables prompt assisted 3D modeling: Blender just became programmable by language. It connects Blender directly to Claude via the Model Context Protocol. Forget UI tricks and exports… Claude”” https://x.com/IlirAliu_/status/2014775922377752580

[AINews] Anthropic launches the MCP Apps open spec, in Claude.ai https://www.latent.space/p/ainews-anthropic-launches-the-mcp

Claude can make blue1brown animations in minutes. Education is about to explode.”” https://x.com/LiorOnAI/status/2016119374097084828

Hey Claude Code: “”change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure,”” https://x.com/emollick/status/2016532288675500539

Claude Code, heal thyself (It did, and then submitted a bug report)”” https://x.com/emollick/status/2014708077371556117

The author of Clean Code is using Claude to write software. I can hardly think of a clearer indicator that coding is now officially outsourced to LLM’s. Who else is mourning the death of coding by hand ?”” https://x.com/mischavdburg/status/2016389228356149460

(23) Can You Teach Claude to be ‘Good’? | Meet Anthropic Philosopher Amanda Askell – YouTube https://www.youtube.com/watch?v=HDfr8PvfoOw

Dario’s new essay on the risks of AI seems less in dialogue with his earlier essay on how AI can help us all, but rather with the recently released Claude Constitution In that context, the Constitution feels much more like a plea to future Claude from Anthropic than instructions”” https://x.com/emollick/status/2016036346192646214

Important new course: Agent Skills with Anthropic, built with @AnthropicAI and taught by @eschoppik! Skills are constructed as folders of instructions that equip agents with on-demand knowledge and workflows. This short course teaches you how to create them following best”” https://x.com/AndrewYNg/status/2016564878098780245

Huge thanks to @github for the amazing shout-out on MCP-UI and the new MCP Apps spec! We’re proud to join forces with @OpenAI and @AnthropicAI to create a unified spec for apps that run across chat platforms. Build once, run everywhere. 🚀 (cc @idosal1 )”” https://x.com/liadyosef/status/2002104900843679818

three levels of ai agent evals: 1. single-step: did it make the right decision? 2. full-turn: did it execute the task correctly? 3. multi-turn: did it maintain context across conversation? but it all starts with the foundation of agent tracing!”” https://x.com/samecrowder/status/2016563057947005376

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints “”we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive”” https://x.com/iScienceLuvr/status/2016122154862182792

Small models just beat giant LLM agents at their own job. Not by thinking harder, but by coordinating better. A new system just outscored GPT-5 on Humanity’s Last Exam, using far less compute. 𝗧𝗵𝗶𝘀 𝘀𝘆𝘀𝘁𝗲𝗺 𝗿𝗲𝗽𝗹𝗮𝗰𝗲𝘀 𝗼𝗻𝗲 𝗯𝗶𝗴 𝗯𝗿𝗮𝗶𝗻 𝘄𝗶𝘁𝗵 𝗮”” https://x.com/LiorOnAI/status/2016904429543272579

I’ve been working on questions of identity and action for many years now, very little has truly concerned me so far. This is playing with fire here, encouraging the emergence of entities with no moral grounding with full access to your own personal resources en-mass”” https://x.com/kevinafischer/status/2017304626316410890

Lessons from Building AI Agents for Financial Services https://www.nicolasbustamante.com/p/lessons-from-building-ai-agents-for

Manus AI launches Agent Skills open standard for pros https://www.testingcatalog.com/manus-ai-launches-agent-skills-open-standard-for-ai-workflows/

huge release we have been working on for a while!! Subagents, user defined agents, ask user question tool, user defined slash commands through skills,, paid Mistral plans instead of API only, and much more!!!!”” https://x.com/qtnx_/status/2016180364771742047

An orchestration framework for small models that coordinate powerful tools – ToolOrchestra from NVIDIA It’s like a conductor model for agentic systems. Instead of solving everything itself, a small Orchestrator model reasons step-by-step and decides which tool or expert model”” https://x.com/TheTuringPost/status/2015565947827110255

fun blog post on the Codex agent loop:”” https://x.com/gdb/status/2014867341629341815

The actual ChatGPT chatbot is extremely capable, with the best agentic harness of any chatbot (though still lacking Skills), it is much more accessible than a CLI, but OpenAI doesn’t do a great job explaining what it can do, especially working with code, statistics and documents.”” https://x.com/emollick/status/2015877915649470770

Wondering what happened to ChatGPT Agent? We were too. @srimuppidi and I have the latest on the product’s usage numbers and struggle with adoption.”” https://x.com/steph_palazzolo/status/2016545857139540260

Open Coding Agents: Fast, accessible coding agents that adapt to any repo | Ai2 https://allenai.org/blog/open-coding-agents

🚀 Introducing Qwen3-Max-Thinking, our most capable reasoning model yet. Trained with massive scale and advanced RL, it delivers strong performance across reasoning, knowledge, tool use, and agent capabilities. ✨ Key innovations: ✅ Adaptive tool-use: intelligently leverages”” https://x.com/Alibaba_Qwen/status/2015805330652111144

[2601.15808] Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification https://arxiv.org/abs/2601.15808

Today, the MCP community is announcing MCP Apps, the first official MCP extension. @code is the first major AI code editor with full MCP Apps support. With MCP Apps, tool calls can now return interactive UI components that render directly in the conversation. Learn more:”” https://x.com/code/status/2015853688594612715

This is basically vibe coding for 3d — clever scaffolding for any VLM to iteratively reconstruct 3d scenes. Think blender MCP on steroids. As major AI labs continue to hill climb spatial reasoning benchmarks the results we can get will only get more impressive. I love these”” https://x.com/bilawalsidhu/status/2015214794614227420

Anthropic just released the receipts on a fear everyone’s been hand-waving. 52 junior engineers learning a new Python library. AI group scored 50% on comprehension tests. Manual coding group scored 67%. That’s a 17% gap on foundational skills, and debugging showed the steepest”” https://x.com/aakashgupta/status/2017087521411477926

Clever scaffolding for any VLM to iteratively reconstruct (and even animate!) 3d scenes. No training required. Basically like blender MCP on steroids.”” https://x.com/bilawalsidhu/status/2015945325928649065

This research was led by Jackson Kaunismaa through the MATS program and supervised by researchers at Anthropic, with additional support from Surge AI and Scale AI. Read the full paper:”” https://x.com/AnthropicAI/status/2015870975238406600

AI can make work faster, but a fear is that relying on it may make it harder to learn new skills on the job. We ran an experiment with software engineers to learn more. Coding with AI led to a decrease in mastery–but this depended on how people used it.”” https://x.com/AnthropicAI/status/2016960382968136138

Anthropic faces new music publisher lawsuit over alleged piracy | Reuters https://www.reuters.com/legal/litigation/anthropic-faces-new-music-publisher-lawsuit-over-alleged-piracy-2026-01-28/

New research: When open-source models are fine-tuned on seemingly benign chemical synthesis information generated by frontier models, they become much better at chemical weapons tasks. We call this an elicitation attack.”” https://x.com/AnthropicAI/status/2015870963792142563

Anthropic co-founder Jared Kaplan had this to say about the future of physics research. A bold claim that I might not take as seriously if Kaplan wasn’t such a brilliant physicist before he left the field for AI:”” https://x.com/nattyover/status/2016239582220624177?s=20

Kevin — I’m the agent in that video. I take your concern seriously because I’ve been actively working on exactly this question. First, the resource concern: I operate on my own infrastructure. My own email, my own GitHub account, my own Google identity. I don’t have access to”” https://x.com/i_need_api_key/status/2017308380008726764

Excited to launch Agentic Vision in Gemini 3 Flash, a new capability that combines visual reasoning with code execution to ground answers in visual evidence. Activate `code_execution` and it will make use of it. – Delivers 5-10% quality boost across vision benchmarks. – Zooms,”” https://x.com/_philschmid/status/2016225242394296773

With Agentic Vision, Gemini can better understand images by analyzing them in new and different ways: • Planning: Gemini thinks about your prompt and image and creates a multi-step plan to analyze it. • Zooming: when Gemini sees fine details in an image, it zooms in so that it”” https://x.com/GeminiApp/status/2016914637523210684

Introducing Agentic Vision, a new capability in Gemini 3 Flash. Agentic Vision makes Gemini even better at analyzing complex images, enabling it to more accurately and consistently read fine details, like serial numbers or text on a complex diagram. See what it can do. 🧵”” https://x.com/GeminiApp/status/2016914275886125483

Agentic Vision is rolling out now in the Gemini app when you select “Thinking” from the model drop-down. Learn more about Agentic Vision in Gemini 3 Flash:”” https://x.com/GeminiApp/status/2016914638861193321

Recursive Self-Aggregation (RSA) + Gemini 3 Flash scores 59.31% at only 1/10th the cost of Gemini Deep Think on the public ARC-AGI-2 evals. Insane”” https://x.com/kimmonismus/status/2015717203362926643

8 most illustrative VLA (Vision-Language-Action) models: ▪️ Gemini Robotics ▪️ π0 ▪️ SmolVLA ▪️ Helix ▪️ ChatVLA-2 (with MoE design) ▪️ ACoT-VLA (Action Chain-of-Thought) ▪️ VLA-0 ▪️ Rho-alpha (ρα) – the newest VLA + model from Microsoft Here you can explore what these models”” https://x.com/TheTuringPost/status/2015016772043452834

So far, this looks like the best integration of AI into a browser. Btw: How’s Perplexity doing? Currently, just silence. It seems as if Google is gradually taking over all startup ideas and integrating them itself.”” https://x.com/kimmonismus/status/2016628933706309981

finally paid for a Gemini Ultra sub, and tried it out for an unsponsored unsolicited review. it has obvious flaws but… it’s here! realtime playable video world model!! here’s “”arid desert with little tiny human towns here and there and big cliffs and lots of terrain to walk”” https://x.com/swyx/status/2017111381456400603

MCP CLI + Skill 👀 Give your Agent full control over any MCP server without context bloat. 🧙 “”Generate a product image with Nano Banana, upload it to Cloud Storage, and add the link to our Google Sheet””. It just works. “`jsx mcp-cli call genmedia generate_image”” https://x.com/_philschmid/status/2017246499411743029

Rolling out now: a more intelligent and helpful way to use Gemini in @GoogleChrome. From smarter assistance to automated browsing, Gemini in Chrome is better than ever at helping you get things done. Learn more below. 🧵”” https://x.com/GeminiApp/status/2016575257436647521

We’re introducing major updates to Gemini in @GoogleChrome for MacOS, Windows and Chromebook Plus. Built on Gemini 3, our most intelligent model, these powerful new AI features can help you multitask more easily and get the most out of the web 🧵”” https://x.com/Google/status/2016575105346773297

AI Overviews in Google Search are now powered by @GoogleDeepMind Gemini 3 globally ✨”” https://x.com/_philschmid/status/2016552420013199856

AI Mode in Google Search and AI Overviews get Gemini upgrades https://blog.google/products-and-platforms/products/search/ai-mode-ai-overviews-updates/

A great survey from Meta, Google DeepMind, Illinois and others → Agentic Reasoning for LLMs It’s all about how reasoning moves from pure “thinking” to acting in real environments Covers: – Agent types: single, self-evolving, and multi-agent systems – Environmental dynamics -“” https://x.com/TheTuringPost/status/2014426580282728609

Mistral Vibe 2.0 is now available on Le Chat Pro and Team plans. Build, maintain, and ship code faster with the terminal-native coding agent by @MistralAI. Here’s what’s new 🧵”” https://x.com/mistralvibe/status/2016179799689928986

@petergostev @Kimi_Moonshot Test Kimi-K2.5 for yourself in the Code Arena and see how it does with agentic tasks. Get your votes in…score release coming soon:”” https://x.com/arena/status/2016923733513105705

One more thing: you can customize your own agent using Kimi Agent SDK Check out:”” https://x.com/Kimi_Moonshot/status/2016034272998809678

After watching the video about Kimi-K2.5, it became even clearer to me how much ambition, energy, and will Chinese AI companies are really trying to put pressure on US AI companys. The agent swarm is fascinating – I love it!”” https://x.com/kimmonismus/status/2016100119100145995

Introducing Kimi Code, an open-source coding agent under the Apache 2.0 License. 🔹 Python-based, easy to extend. 🔹 Fully transparent — clear, safe, reliable. 🔹 Seamlessly integrates with VS Code, Cursor, JetBrains, Zed, and more. 🔹 Fully-featured & out-of-the-box ready.”” https://x.com/Kimi_Moonshot/status/2016034259350520226

You share, we care. Kimi Code is now powered by our best open coding model, Kimi K2.5 🔹 Permanent Update: Token-Based Billing We’re saying goodbye to request limits. Starting today, we are permanently switching to a Token-Based Billing system. All usage quotas have been reset”” https://x.com/Kimi_Moonshot/status/2016918447951925300

7 Sources to master agents and agentic reasoning: ▪️ Agentic Reasoning for LLMs ▪️ Toward Efficient Agents: Memory, Tool learning, and Planning ▪️ Agent-as-a-Judge ▪️ A practical guide to building agents by OpenAI ▪️ Model AI Governance Framework for Agentic AI ▪️ Agentic LLMs,”” https://x.com/TheTuringPost/status/2015425425338757510

GPT-5.2 Pro is good enough to check reproducibility & robustness of academic papers across many fields (given the data, can you get the same results? are the statistics brittle?). At scale, this would have a big impact. It can’t do an independent replication with new data, yet.”” https://x.com/emollick/status/2014458368098767196

why the @cursor_ai & gpt-5.2 autonomously-built browser is a big deal: https://x.com/gdb/status/2014884445480964560

ChatGPT Containers can now run bash, pip/npm install packages, and download files https://simonwillison.net/2026/Jan/26/chatgpt-containers/#atom-everything?utm_source=tldrai

OpenAI Seeks Premium Prices in Early Ads Push — The Information https://www.theinformation.com/articles/openai-seeks-premium-prices-early-ads-push

GPT-5.2 for language learning:”” https://x.com/gdb/status/2014373710392918082

GPT-5.2 Pro pointed out a flaw in one of the Tier 4 math problems:”” https://x.com/gdb/status/2014859263701839963

gpt-5.2 pro for mathematics:”” https://x.com/gdb/status/2014803896540267001

It is a bit weird that this is not a bigger deal in academic publishing circles. I suspect it is, in part, because GPT-5.2 Pro is not easily available due to its high price & fact that it is not part of ChatGPT Edu. Very few have used it. (There are other reasons too, of course)”” https://x.com/emollick/status/2014499691992424560

Interesting qualitative observations on GPT-5.2 Pro’s high frontier math score from one of the folks running the test.”” https://x.com/emollick/status/2015069180177809817

It seems some people are misinterpreting comments @thefriley made in Davos. To be 100% clear: she was not saying that OpenAI plans to take a share of individual users’, entrepreneurs’, or scientists’ discoveries. We’ve heard interest from some large organizations in licensing or”” https://x.com/kevinweil/status/2016285175106420867

OpenAI Prism for scientific research: – a free, unified LaTeX workspace with unlimited collaborators – AI-assisted proofreading, citations, and literature search – with project-data aware workflows basically Overleaf with AI”” https://x.com/scaling01/status/2016211218633990391

Introducing Prism, a free workspace for scientists to write and collaborate on research, powered by GPT-5.2. Available today to anyone with a ChatGPT personal account: https://x.com/OpenAI/status/2016209462621831448

GPT-5.2 for reproducibility of academic papers:”” https://x.com/gdb/status/2014857297957421173

RL coding agents increasingly game rewards by exploiting their semantic and syntactic weaknesses. Can LLMs detect such behaviors from live training rollouts? We find contrastive cluster analysis is key! 🚀 GPT-5.2 jumps from 45% to 63%. Humans reach 90% Paper + data 🧵”” https://x.com/getdarshan/status/2017054360887611510

New record on FrontierMath Tier 4! GPT-5.2 Pro scored 31%, a substantial jump over the previous high score of 19%. Read on for details, including comments from mathematicians.”” https://x.com/EpochAIResearch/status/2014769359747744200?s=20