“I got to play with the new Claude model that controls a mouse and keyboard last week. Full post shortly, but I had it play Paperclip Clicker (of course) and it did well over a hundred moves executing a coherent strategy without any intervention. Agents start to come into view.
“Computer use API We’ve built an API that allows Claude to perceive and interact with computer interfaces. You feed in a screenshot to Claude, and Claude returns the next action to take on the computer (e.g. move mouse, click, type text, etc).
Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine | Hacker News
“The most significant gains are in coding. The new 3.5 Sonnet sets a new state-of-the-art on SWE-bench Verified with a score of 49% (using no complex scaffolding)—besting all models including reasoning models like OpenAI o1-preview and specialized models for agentic coding.
“I can’t tell you the last time I was so excited to see a new AI capability in action. We plugged in Claude computer use in @Replit Agent as a human feedback replacement. And… it just works! I feel it won’t take long until our agent will become fully autonomous.
“Claude 3.5 Haiku 3.5 Haiku replaces 3.0 Haiku as our fastest and least expensive model. It outperforms many state-of-the-art models on coding tasks—including the original Claude 3.5 Sonnet and GPT-4o. 3.5 Haiku will be made available in the coming weeks.
“The new Claude 3.5 Sonnet has *insane* capabilities when used as a Minecraft agent. It’s powered by a project called Mindcraft. Running this code allows you to spawn AI bots that will follow your instructions, build, and play the game. Here’s how to set it up in <15min.
“🚨Anthropic just released the most amazing AI technology I’ve ever used I’m not kidding AI agents are here and you can now build your own personal army of AI’s that will do work for you Here is your demo and complete beginner’s guide: (trust me, you want to bookmark this)
“New @AnthropicAI Computer Use feels surreal. But don’t take my word for it. We made a template on Replit for you to try. Watch me fork the template, ask the agent to go to YouTube, find a video, and even skip the ads — all in a few minutes.
Anthropic announces AI agents for complex tasks, racing OpenAI
“Claude 3.5 Sonnet’s current ability to use computers is imperfect. Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges. So we encourage exploration with low-risk tasks. We expect this to rapidly improve in the coming months.” / X
“Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
“Playing with Claude Computer Use is very worthwhile. It’s obvious that its something that’ll be used in the future, much like when you first try ChatGPT or amazing tech like AirPods. BUT, it’s clear its integration will take some serious time. Here’s an example web task,
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku \ Anthropic
Claude | Computer use for coding – YouTube
Claude | Computer use for automating operations – YouTube
“Claude’s “computer use” beta is wild because you don’t need to make custom tools for LLMs to use — automation is about to look a lot more like screen recording a task/workflow involving any desktop apps, and asking Claude to take control and do it for you.
“Anthropic’s computer use can operate mobile devices including iOS, Android, and mobile browsers 📱 Here it is ordering me an Uber and posting for me on X.
“Just got our first AI-ordered pizza with Lindy + Claude computer use 🙂
Computer use (beta) – Anthropic
“We’re trying something fundamentally new. Instead of making specific tools to help Claude complete individual tasks, we’re teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people.
Introducing the analysis tool in Claude.ai \ Anthropic
anthropic-quickstarts/computer-use-demo at main · anthropics/anthropic-quickstarts · GitHub
“One of those “AI feels like a superpower” moments. I went to an old tweet about a diagram of city streets by entropy, and pasted the scientific paper and image into Claude and asked it to create code to replicate it. It built the code in one shot, even replicated color scheme
Claude | Computer use for orchestrating tasks – YouTube
“The ability of multimodal AI to “understand” images is underrated. I just took these. Given the first photo Claude guesses where I am. Given the second it identifies the type of plane. These aren’t obvious.
“Anthropic computer use API + iPhone mirroring to a Mac = AI controlled phone. Watch Claude control my phone and successfully look up stats in my Sports app. I even got it to play a game in the Chess app against another AI – pretty crazy. And this is the worst it’ll ever be.
“The new Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. While groundbreaking, computer use is still experimental—at times error-prone. We’re releasing it early for feedback from developers.
“We’ve built an API that allows Claude to perceive and interact with computer interfaces. This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research.
Initial explorations of Anthropic’s new Computer Use capability
“Last week in AI was 🔥 Does @nvidia’s Llama 3.1 fine-tune outperform @OpenAI GPT-4o and @AnthropicAI Claude 3.5? New Zyphra’s Zamba2 challenges the Transformer architecture? > NVIDIA Llama 3.1 Nemotron 70B topped Arena Hard (85.0) & AlpacaEval 2 LC (57.6) > Zamba2 7B matched
Sabotage evaluations for frontier models \ Anthropic
Claude 3.5 Opus is no longer mentioned at all on https://docs.anthropic.com/en/d… | Hacker News
Evaluating feature steering: A case study in mitigating social biases \ Anthropic
Claude 3 Model Card October Addendum.pdf
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://assets.anthropic.com/m/1cd9d098ac3e6467/original/Claude-3-Model-Card-October-Addendum.pdf
“The new Sonnet also sets SOTA on aider’s more demanding refactoring benchmark with a score of 92.1%! 92% Sonnet 10/22 75% o1-preview 72% Opus 64% Sonnet 06/20 49% GPT-4o 08/06 45% o1-mini




Leave a Reply