Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Top-down isometric pixel art scene of a city intersection with chunky sprite delivery robots, drones, and autonomous vehicles moving in coordinated patterns, PS1 era graphics with saturated neon trail lines showing movement paths, small glowing UI elements above each agent sprite, dithered asphalt texture with bright yellow crosswalks, high contrast lighting with hot pink and electric blue highlights, CRT screen glow aesthetic, emergent traffic patterns visible from bird’s eye view.
Chrome’s Gemini is getting “Skills” as it moves toward becoming a full AI agent https://chromeunboxed.com/gemini-in-chrome-is-getting-skills-as-it-moves-toward-becoming-a-full-ai-agent/
Anthropic keeps inventing new approaches to how agents work that then get adopted universally. First MCPs now Skills (which are an excellent tool). Also good to see fast adoption of good standards.”” https://x.com/emollick/status/2011890972121293235
AI agents have gotten good enough at long horizon tasks that it is an inflection point in the impact of AI at work. Agreement on this from METR, GDPval & now Anthropic. If you have a tool that saves 8 hours 65% of the time, that changes work, even counting potential error rates.”” https://x.com/emollick/status/2012237630411292859
Organizations have started catching up to the prior best practices for employee AI use (access to frontier models, training, champion programs, prompt libraries)… …except almost all of this now has to be re-invented for a world of agents driven by skills that do long tasks.”” https://x.com/emollick/status/2013726596901769715
Erdos problems are a definite example of models breaching a threshold. The idea that an AI could solve one, let alone many, on its own would have been insane a year ago (o1 was brand new), and now we have multiple Erdos problems solved by GPT-5.2 Pro in the last couple weeks.”” https://x.com/emollick/status/2012729680667750812
I’ve solved a second Erdos problem (#281) using only GPT 5.2 Pro – no prior solutions found. Terence Tao calls it “”perhaps the most unambiguous instance”” of AI solving an open problem:”” https://x.com/neelsomani/status/2012695714187325745
3 approaches to building worlds for AI agents 1. Traditional web apps: They store state in databases and operate under fixed rules. They are stable and controllable, but limited to what developers specify in advance. 2. Fully generative worlds: They place AI models at the”” https://x.com/TheTuringPost/status/2011941946886082837
A must-read→ A Survey on Agent-as-a-Judge Covers: – Why LLM-as-a-Judge breaks on complex tasks – Limitations like bias, single-pass reasoning, lack of real-world verification – LLM-as-a-judge vs. agent-as-a-judge – How agentic judges add planning, tools, and memory -“” https://x.com/TheTuringPost/status/2011962160910336330
As AI and cloud infra scale, managing privileged access with long-lived credentials and shared secrets becomes harder to maintain and understand. @goteleport implements Zero Trust PAM using cryptographic identity instead of stored secrets. Each human, machine, or AI agent”” https://x.com/TheTuringPost/status/2012625525152653533
As someone who made one of those cute gamey agent interfaces that went viral, I do think it is likely that they are a dead end. Even if tasks are not fully automated, as roon suggests, we already have tools for delegating long-running tasks. It will look like project management.”” https://x.com/emollick/status/2012750371458834728
Building a better Bugbot · Cursor https://cursor.com/blog/building-bugbot
Building agentic AI https://www.algolia.com/resources/asset/building-agentic-ai
Cognition | Devin Review: AI to Stop Slop https://cognition.ai/blog/devin-review
Continuing to have AI build a weird game demo a day. Here is: “”Make a game where you have to prevent the apocalypse, but the interface is just Jira tickets”” Pretty fun/funny branching storyline, all text is AI created with minor feedback from me. Play: https://x.com/emollick/status/2012606086889558437
For most of the past two years, the story of agentic AI has been told as a story about reasoning. Better chain-of-thought. Better planning. More tools. More tokens. My context is longer than yours! etc. That framing is starting to change. The limiting factor is no longer how”” https://x.com/TheTuringPost/status/2014078927875309612
Framer: The fastest way to launch your startup site https://www.framer.com/startups/?dub_id=rhXRk3fo9VucvDvf%2F2%2F0100019beb3afab0-f7a5e9c5-ea9e-4bc2-9b08-116713479e81-000000%2Ftbkm8ER1TWhzpdNZVGZR8NyE-Vsr0AxjlKr-IMgVtMg%3D441
How to build a Frontend for LangChain Deep Agents with CopilotKit! | Blog | CopilotKit https://www.copilotkit.ai/blog/how-to-build-a-frontend-for-langchain-deep-agents-with-copilotkit
Import AI 441: My agents are working. Are yours? https://importai.substack.com/p/import-ai-441-my-agents-are-working
Introducing FastMCP 3.0 🚀 https://www.jlowin.dev/blog/fastmcp-3
Introducing VibeCon – the world’s largest vibe coding conference. Register today to lock in early bird pricing: http://127.0.0.1:8080/register”” https://x.com/bilawalsidhu/status/2013327740963791238
MCP is Not the Problem, It’s your Server: Best Practices for Building MCP Servers https://www.philschmid.de/mcp-best-practices
Notion working on custom MCPs, Workers, and Computer Use https://www.testingcatalog.com/notion-testing-custom-mcps-workers-and-computer-use-agent/
The big change in casual AI coding is that you can ask for stuff & you usually get it with no big errors. When you find anything wrong it is often corrected one-shot You can build a lot of stuff by asking: here is a game where you control only the org chart of a fantasy kingdom”” https://x.com/emollick/status/2012333820951867880
The Code-Only Agent • Rijnard van Tonder https://rijnard.com/blog/the-code-only-agent
The Tragedy of the Agentic Commons https://www.strangeloopcanon.com/p/the-tragedy-of-the-agentic-commons
The work systems that were built to manage human teams using well-understood processes and procedures are going to fall apart in all sorts of ways when AI agents are brought into the loops without changes.”” https://x.com/emollick/status/2011884610150355229
This has been said a thousand times before, but allow me to add my own voice: the era of humans writing code is over. Disturbing for those of us who identify as SWEs, but no less true. That’s not to say SWEs don’t have work to do, but writing syntax directly is not it.”” https://x.com/rough__sea/status/2013280952370573666
This turned out to be true. Many software projects should have waited until agentic coding tools got to today’s level Given trajectory in coding, agents, video, etc. there are lots of other projects that you should be lazy about today since it is faster to wait for AI to improve”” https://x.com/emollick/status/2013349245563077036
Triple inverted pendulum in transition control A classic control problem, done in real time. This setup moves smoothly between all eight equilibrium points of a triple inverted pendulum. The system reacts every 1 millisecond, which shows how fast modern control loops can be.”” https://x.com/IlirAliu_/status/2013326370592403499
Very fast Codex coming!”” https://x.com/sama/status/2012243893744443706
Which open model is the best? This question no longer makes sense. Models’ capabilities are now distributed across jobs, modalities, and deployment environments. We’re now in the specialization era. Here is what you should know about the main trends: • Agentic models ≠”” https://x.com/TheTuringPost/status/2013388699241763321
–dangerously-skip-agency”” https://x.com/emollick/status/2011909846581346759
Still commanding AI to make one weird game demo a day. This one: “”A game where you need to stop a spaceship from crashing into the sun and your only tool is future Claude Code.”” The entire design & all puzzles 100% AI, I gave minor UX feedback. Play: https://x.com/emollick/status/2013386588831535338
Not to repeat this, but the fact the Gemini chatbot can’t seem to deliver files (or even consistently run code) is a huge gap compared to ChatGPT or Claude. It makes a very smart model (Gemini 3) much less useful, especially for people trying to get AI to do real tasks and work.”” https://x.com/emollick/status/2013657294366478514
NitroGen: A Foundation Model for Generalist Gaming Agents”” TL;DR: vision-to-action model trained on 40k+ hrs of gameplay across 1,000+ games, mapping raw pixels to gamepad actions for generalist agents.”” https://x.com/Almorgand/status/2011847937899589672
Benchmarking AI Agent Memory: Is a Filesystem All You Need? | Letta https://www.letta.com/blog/benchmarking-ai-agent-memory
Salesforce ships higher-quality code across 20,000 developers with Cursor · Cursor https://cursor.com/blog/salesforce
Agent Skills are now available in Google Antigravity! Skills are an open standard to extend what your agent can do. Whether it’s project-specific workflows or global utilities, you can now package knowledge into reusable skills.”” https://x.com/antigravity/status/2011248170299498637
How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning | NVIDIA Technical Blog https://developer.nvidia.com/blog/how-to-train-an-ai-agent-for-command-line-tasks-with-synthetic-data-and-reinforcement-learning/
Supply-chain risk of agentic AI – infecting infrastructures via skill worms https://blog.lukaszolejnik.com/supply-chain-risk-of-agentic-ai-infecting-infrastructures-via-skill-worms/
Does anyone know of a “”intro to filesystems for smart non-computer people who never had to use a terminal or even really folders”” that I could give to students who grew up largely not having to know this stuff? Everything 101 I can find is too easy & doesn’t explain concepts.”” https://x.com/emollick/status/2013094876389187949
Anthropic works on Knowledge Bases for Claude Cowork https://www.testingcatalog.com/anthropic-works-on-knowledge-bases-for-claude-cowork/
Claude Code, make me a version of the 1984 abandonware Apple IIe game Rescue Raiders, look it up”” (A few pieces of feedback later) https://x.com/emollick/status/2011989590295421221
Opus 4.5: “”you need to build a game that is coherent, fun and story driven. There are precisely two controls. One is slider, in which one side is labelled Maximum Potato and the other is labelled Formalware, there are four positions for the slider. The other is a dial that goes”” https://x.com/emollick/status/2013466698121113981
When my students were creating initial demos with Claude Code & Antigravity, the AI would often spontaneously decide to do Wizard of Oz demos. The AI would build an interface, but not underlying logic. Code would (live!) run the interface behind the scenes to make it look working”” https://x.com/emollick/status/2012212148915654753
Anthropic’s Claude Code Has the AI World Buzzing: ‘It’s Amazing and Also Scary’ – WSJ https://www.wsj.com/tech/ai/anthropic-claude-code-ai-7a46460e
Apple plans to make Siri an AI chatbot, report says | TechCrunch https://techcrunch.com/2026/01/21/apple-plans-to-make-siri-an-ai-chatbot-report-says/
I started the “”vibefounding”” MBA class by giving students my Voight-Kamppff Quiz: Which work do you have deep experience in? Which skills are you considered world class in? What do you do outside work that you LOVE & have knowledge about? Their AI startups had to build on these.”” https://x.com/emollick/status/2012174434996519176
Google brings Personal Intelligence to AI Mode in Search https://blog.google/products-and-platforms/products/search/personal-intelligence-ai-mode-search/
Google’s Personal Intelligence, when connected to your phone, is … holy fk!”” https://x.com/TheTuringPost/status/2013971432523333988
Continuing to build a game a day by just asking the AI. This is SocFight: a fighting game where historical sociologists beat each other up. Will Max Weber’s Iron Cage attack defeat Durkheim’s dangerous Anomie? (Asked Claude Code to use OpenAI’s image generator when needed).”” https://x.com/emollick/status/2012955926542577921
There have been a few break points in AI development where sudden jumps in capability mean that prior research and wisdom about what models can do suddenly lags actual ability by a large margin: GPT-4, o1/o3, and now the long-task-horizon agents (we need a better name for them).”” https://x.com/emollick/status/2013979113183133779
Well… there is this paper on GPT-4 that found it was good at setting prices on behalf of the user (as opposed to against them, as in this joke post) In fact, if it is possible to establish an oligarchy, LLM agents secretly collude on your behalf to the detriment of customers!”” https://x.com/emollick/status/2013080195255676965





Leave a Reply