Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Top-down isometric pixel art scene of a city intersection with chunky sprite delivery robots, drones, and autonomous vehicles moving in coordinated patterns, PS1 era graphics with saturated neon trail lines showing movement paths, small glowing UI elements above each agent sprite, dithered asphalt texture with bright yellow crosswalks, high contrast lighting with hot pink and electric blue highlights, CRT screen glow aesthetic, emergent traffic patterns visible from bird’s eye view.

Chrome’s Gemini is getting “Skills” as it moves toward becoming a full AI agent https://chromeunboxed.com/gemini-in-chrome-is-getting-skills-as-it-moves-toward-becoming-a-full-ai-agent/

Anthropic keeps inventing new approaches to how agents work that then get adopted universally. First MCPs now Skills (which are an excellent tool). Also good to see fast adoption of good standards.”” https://x.com/emollick/status/2011890972121293235

AI agents have gotten good enough at long horizon tasks that it is an inflection point in the impact of AI at work. Agreement on this from METR, GDPval & now Anthropic. If you have a tool that saves 8 hours 65% of the time, that changes work, even counting potential error rates.”” https://x.com/emollick/status/2012237630411292859

Organizations have started catching up to the prior best practices for employee AI use (access to frontier models, training, champion programs, prompt libraries)… …except almost all of this now has to be re-invented for a world of agents driven by skills that do long tasks.”” https://x.com/emollick/status/2013726596901769715

Erdos problems are a definite example of models breaching a threshold. The idea that an AI could solve one, let alone many, on its own would have been insane a year ago (o1 was brand new), and now we have multiple Erdos problems solved by GPT-5.2 Pro in the last couple weeks.”” https://x.com/emollick/status/2012729680667750812

I’ve solved a second Erdos problem (#281) using only GPT 5.2 Pro – no prior solutions found. Terence Tao calls it “”perhaps the most unambiguous instance”” of AI solving an open problem:”” https://x.com/neelsomani/status/2012695714187325745

3 approaches to building worlds for AI agents 1. Traditional web apps: They store state in databases and operate under fixed rules. They are stable and controllable, but limited to what developers specify in advance. 2. Fully generative worlds: They place AI models at the”” https://x.com/TheTuringPost/status/2011941946886082837

A must-read→ A Survey on Agent-as-a-Judge Covers: – Why LLM-as-a-Judge breaks on complex tasks – Limitations like bias, single-pass reasoning, lack of real-world verification – LLM-as-a-judge vs. agent-as-a-judge – How agentic judges add planning, tools, and memory -“” https://x.com/TheTuringPost/status/2011962160910336330

As AI and cloud infra scale, managing privileged access with long-lived credentials and shared secrets becomes harder to maintain and understand. @goteleport implements Zero Trust PAM using cryptographic identity instead of stored secrets. Each human, machine, or AI agent”” https://x.com/TheTuringPost/status/2012625525152653533

As someone who made one of those cute gamey agent interfaces that went viral, I do think it is likely that they are a dead end. Even if tasks are not fully automated, as roon suggests, we already have tools for delegating long-running tasks. It will look like project management.”” https://x.com/emollick/status/2012750371458834728

Building a better Bugbot · Cursor https://cursor.com/blog/building-bugbot

Building agentic AI https://www.algolia.com/resources/asset/building-agentic-ai

Cognition | Devin Review: AI to Stop Slop https://cognition.ai/blog/devin-review

Continuing to have AI build a weird game demo a day. Here is: “”Make a game where you have to prevent the apocalypse, but the interface is just Jira tickets”” Pretty fun/funny branching storyline, all text is AI created with minor feedback from me. Play: https://x.com/emollick/status/2012606086889558437

For most of the past two years, the story of agentic AI has been told as a story about reasoning. Better chain-of-thought. Better planning. More tools. More tokens. My context is longer than yours! etc. That framing is starting to change. The limiting factor is no longer how”” https://x.com/TheTuringPost/status/2014078927875309612

Framer: The fastest way to launch your startup site https://www.framer.com/startups/?dub_id=rhXRk3fo9VucvDvf%2F2%2F0100019beb3afab0-f7a5e9c5-ea9e-4bc2-9b08-116713479e81-000000%2Ftbkm8ER1TWhzpdNZVGZR8NyE-Vsr0AxjlKr-IMgVtMg%3D441

How to build a Frontend for LangChain Deep Agents with CopilotKit! | Blog | CopilotKit https://www.copilotkit.ai/blog/how-to-build-a-frontend-for-langchain-deep-agents-with-copilotkit

Import AI 441: My agents are working. Are yours? https://importai.substack.com/p/import-ai-441-my-agents-are-working

Introducing FastMCP 3.0 🚀 https://www.jlowin.dev/blog/fastmcp-3

Introducing VibeCon – the world’s largest vibe coding conference. Register today to lock in early bird pricing: http://127.0.0.1:8080/register”” https://x.com/bilawalsidhu/status/2013327740963791238

MCP is Not the Problem, It’s your Server: Best Practices for Building MCP Servers https://www.philschmid.de/mcp-best-practices

Notion working on custom MCPs, Workers, and Computer Use https://www.testingcatalog.com/notion-testing-custom-mcps-workers-and-computer-use-agent/

The big change in casual AI coding is that you can ask for stuff & you usually get it with no big errors. When you find anything wrong it is often corrected one-shot You can build a lot of stuff by asking: here is a game where you control only the org chart of a fantasy kingdom”” https://x.com/emollick/status/2012333820951867880

The Code-Only Agent • Rijnard van Tonder https://rijnard.com/blog/the-code-only-agent

The Tragedy of the Agentic Commons https://www.strangeloopcanon.com/p/the-tragedy-of-the-agentic-commons

The work systems that were built to manage human teams using well-understood processes and procedures are going to fall apart in all sorts of ways when AI agents are brought into the loops without changes.”” https://x.com/emollick/status/2011884610150355229

This has been said a thousand times before, but allow me to add my own voice: the era of humans writing code is over. Disturbing for those of us who identify as SWEs, but no less true. That’s not to say SWEs don’t have work to do, but writing syntax directly is not it.”” https://x.com/rough__sea/status/2013280952370573666

This turned out to be true. Many software projects should have waited until agentic coding tools got to today’s level Given trajectory in coding, agents, video, etc. there are lots of other projects that you should be lazy about today since it is faster to wait for AI to improve”” https://x.com/emollick/status/2013349245563077036

Triple inverted pendulum in transition control A classic control problem, done in real time. This setup moves smoothly between all eight equilibrium points of a triple inverted pendulum. The system reacts every 1 millisecond, which shows how fast modern control loops can be.”” https://x.com/IlirAliu_/status/2013326370592403499

Very fast Codex coming!”” https://x.com/sama/status/2012243893744443706

Which open model is the best? This question no longer makes sense. Models’ capabilities are now distributed across jobs, modalities, and deployment environments. We’re now in the specialization era. Here is what you should know about the main trends: • Agentic models ≠”” https://x.com/TheTuringPost/status/2013388699241763321

–dangerously-skip-agency”” https://x.com/emollick/status/2011909846581346759

Still commanding AI to make one weird game demo a day. This one: “”A game where you need to stop a spaceship from crashing into the sun and your only tool is future Claude Code.”” The entire design & all puzzles 100% AI, I gave minor UX feedback. Play: https://x.com/emollick/status/2013386588831535338

Not to repeat this, but the fact the Gemini chatbot can’t seem to deliver files (or even consistently run code) is a huge gap compared to ChatGPT or Claude. It makes a very smart model (Gemini 3) much less useful, especially for people trying to get AI to do real tasks and work.”” https://x.com/emollick/status/2013657294366478514

NitroGen: A Foundation Model for Generalist Gaming Agents”” TL;DR: vision-to-action model trained on 40k+ hrs of gameplay across 1,000+ games, mapping raw pixels to gamepad actions for generalist agents.”” https://x.com/Almorgand/status/2011847937899589672

Benchmarking AI Agent Memory: Is a Filesystem All You Need? | Letta https://www.letta.com/blog/benchmarking-ai-agent-memory

Salesforce ships higher-quality code across 20,000 developers with Cursor · Cursor https://cursor.com/blog/salesforce

Agent Skills are now available in Google Antigravity! Skills are an open standard to extend what your agent can do. Whether it’s project-specific workflows or global utilities, you can now package knowledge into reusable skills.”” https://x.com/antigravity/status/2011248170299498637

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning | NVIDIA Technical Blog https://developer.nvidia.com/blog/how-to-train-an-ai-agent-for-command-line-tasks-with-synthetic-data-and-reinforcement-learning/

Supply-chain risk of agentic AI – infecting infrastructures via skill worms https://blog.lukaszolejnik.com/supply-chain-risk-of-agentic-ai-infecting-infrastructures-via-skill-worms/

Does anyone know of a “”intro to filesystems for smart non-computer people who never had to use a terminal or even really folders”” that I could give to students who grew up largely not having to know this stuff? Everything 101 I can find is too easy & doesn’t explain concepts.”” https://x.com/emollick/status/2013094876389187949

Anthropic works on Knowledge Bases for Claude Cowork https://www.testingcatalog.com/anthropic-works-on-knowledge-bases-for-claude-cowork/

Claude Code, make me a version of the 1984 abandonware Apple IIe game Rescue Raiders, look it up”” (A few pieces of feedback later) https://x.com/emollick/status/2011989590295421221

Opus 4.5: “”you need to build a game that is coherent, fun and story driven. There are precisely two controls. One is slider, in which one side is labelled Maximum Potato and the other is labelled Formalware, there are four positions for the slider. The other is a dial that goes”” https://x.com/emollick/status/2013466698121113981

When my students were creating initial demos with Claude Code & Antigravity, the AI would often spontaneously decide to do Wizard of Oz demos. The AI would build an interface, but not underlying logic. Code would (live!) run the interface behind the scenes to make it look working”” https://x.com/emollick/status/2012212148915654753

Anthropic’s Claude Code Has the AI World Buzzing: ‘It’s Amazing and Also Scary’ – WSJ https://www.wsj.com/tech/ai/anthropic-claude-code-ai-7a46460e

Apple plans to make Siri an AI chatbot, report says | TechCrunch https://techcrunch.com/2026/01/21/apple-plans-to-make-siri-an-ai-chatbot-report-says/

I started the “”vibefounding”” MBA class by giving students my Voight-Kamppff Quiz: Which work do you have deep experience in? Which skills are you considered world class in? What do you do outside work that you LOVE & have knowledge about? Their AI startups had to build on these.”” https://x.com/emollick/status/2012174434996519176

Google brings Personal Intelligence to AI Mode in Search https://blog.google/products-and-platforms/products/search/personal-intelligence-ai-mode-search/

Google’s Personal Intelligence, when connected to your phone, is … holy fk!”” https://x.com/TheTuringPost/status/2013971432523333988

Continuing to build a game a day by just asking the AI. This is SocFight: a fighting game where historical sociologists beat each other up. Will Max Weber’s Iron Cage attack defeat Durkheim’s dangerous Anomie? (Asked Claude Code to use OpenAI’s image generator when needed).”” https://x.com/emollick/status/2012955926542577921

There have been a few break points in AI development where sudden jumps in capability mean that prior research and wisdom about what models can do suddenly lags actual ability by a large margin: GPT-4, o1/o3, and now the long-task-horizon agents (we need a better name for them).”” https://x.com/emollick/status/2013979113183133779

Well… there is this paper on GPT-4 that found it was good at setting prices on behalf of the user (as opposed to against them, as in this joke post) In fact, if it is possible to establish an oligarchy, LLM agents secretly collude on your behalf to the detriment of customers!”” https://x.com/emollick/status/2013080195255676965

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading