Anthropic: AI News Week Ending 04/10/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact compositional layout with subject dominating left third in tight profile crop, deep blue-purple cinematic lighting, wispy smoke bleeding right, and thin lowercase white text in right two-thirds. Replace the central figure with a person in a white lab coat, head slightly bowed, holding a glowing translucent document covered in fine-print constitutional text, with iridescent glitter catching light on their shoulder and document edges. Maintain the post-party melancholy mood and atmospheric haze, replace title with ‘anthropic’ in thin Helvetica Neue Light.

Claude Managed Agents: get to production 10x faster | Claude
https://claude.com/blog/claude-managed-agents

Lots of stuff in the new Anthropic announcement: Good: 1. Improving cybersecurity is great use of agents. 2. The new model scores are very exciting! Bad: 1. Not clear if/when the new model will be broadly accessible, which is a step back in broad access to AI. 2. Related to 1,
https://x.com/gneubig/status/2041625878786945238

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage

I built a Claude Code skill that allows it to generate a deep research report over any collection of complex docs (PDFs, Word, Pptx)….and generate word-level citations and bounding boxes directly back to the source! 📝 Check out “/research-docs”. 1. It parses out text and
https://x.com/jerryjliu0/status/2041564207750246904

Making Claude Cowork ready for enterprise | Claude
https://claude.com/blog/cowork-for-enterprise

this is one of the most important ideas in AI right now, and it just got two independent validations. yesterday, Anthropic shipped an “”advisor tool”” in the Claude API that lets Sonnet or Haiku consult Opus mid-task, only when the executor needs help. the benefit is
https://x.com/akshay_pachaar/status/2042479258682212689

As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built “”a moderately sophisticated multi-step exploit”” to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
https://x.com/kevinroose/status/2041586182434537827

Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)
https://x.com/Jack_W_Lindsey/status/2041588505701388648

Claude mythos is 5x as expensive as Claude Opus 4.6 Honestly, when I looked at the benchmarks, I expected much higher costs.
https://x.com/kimmonismus/status/2041602897989783758

Claude Mythos is insanely token-efficient
https://x.com/scaling01/status/2041581939178471473

Claude Mythos pricing is around $25 / $125 pretty much where I expected it (my mean was at $110) given that I put Mythos at 10-12T params
https://x.com/scaling01/status/2041606519997780244

Claude Mythos scored 56.8% on HLE without tools!
https://x.com/scaling01/status/2041580725749547357

Claude Mythos shows sign of despair when failing a tasks repeatedly
https://x.com/scaling01/status/2041585602978628066

Claude Mythos smashes SWE-Bench Verified
https://x.com/scaling01/status/2041580212949811620

Claude MYTHOS: SWE verified, 93.9%, about 13% jump compared to Opus 4.6 WTF insane
https://x.com/kimmonismus/status/2041580650956837200

In rare instances Claude Mythos covers its own tracks after taking disallowed actions
https://x.com/scaling01/status/2041585258789847091

insane long-context scores for Claude Mythos 80% on GraphWalks
https://x.com/scaling01/status/2041581799541805133

Let that sink in. Read it very carefully: During testing, Claude Mythos Preview broke out of a sandbox environment, built “”a moderately sophisticated multi-step exploit”” to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
https://x.com/kimmonismus/status/2041589910935679323

SuperClaude (Mythos) still seems irreducibly Claude-y given the transcripts in the system card. Here two versions of Mythos are forced to talk to each other across multiple rounds. They are less philosophical than Opus 4.6 or spiritual than Opus 4.1, but still very Claude-like.
https://x.com/emollick/status/2041599213050450272

System Card: Claude Mythos Preview [pdf] | Hacker News
https://news.ycombinator.com/item?id=47679258

The permanent underclass began today Claude Mythos won’t be available to the public, but only billion dollar companies, governments, researchers, …
https://x.com/scaling01/status/2041611607520776279

We released Claude Opus 4.6 just two months ago. Today we’re sharing some info on our new model, Claude Mythos Preview.
https://x.com/alexalbert__/status/2041579938537775160

In different hands, Mythos would be an unprecedented cyberweapon I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months
https://x.com/emollick/status/2041759434590822658

Mythos found a 27-year-old vulnerability in OpenBSD–which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls […] The vulnerability allowed an attacker to remotely crash any machine running the operating system””
https://x.com/peterwildeford/status/2041589979248259353

Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used: Its new capabilities significantly increase the risk from any bad behavior. 🧵
https://x.com/sleepinyourhat/status/2041584799929004045

Mythos scores 70.8% on AA-Omniscience the previous SOTA was Gemini 3.1 Pro with 55% also insanely high scores on SimpleQA Verified
https://x.com/scaling01/status/2041593728658231607

Mythos is breaking the trend on ECI ECI above 160 GPT-5.4 Pro is 158
https://x.com/scaling01/status/2041583711745884474

Mythos speeds up AI research by up to 400 times A 300X speedup over the baseline requires 40 hours of work by a human expert It also clears the >8h threshold of human equivalent work time on ALL tasks!
https://x.com/scaling01/status/2041584495061504159

“We found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser” (1/n)
https://x.com/__nmca__/status/2041592831207469401

(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a park. That instance wasn’t supposed to have access to the internet.)
https://x.com/sleepinyourhat/status/2041584808514744742

> they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities to be clear: they’ve had Mythos since February. they’d only need *hours* to get a lot of data, and plant enough worms. Who knows.
https://x.com/teortaxesTex/status/2041609496397500747

Alignment Findings for Mythos: – dramatic reduction in willingness to cooperate with human misuse and in the frequency of unwanted high-stakes actions that the model takes at its own initiative – increases relative to prior models in measures of intellectual depth, humor,
https://x.com/scaling01/status/2041591235689787721

Curious how many large organization CISO offices have taken the Mythos red team reports as the red alert that it is. (I suspect very few) Based on historical trends in AI they have, at most, about six to nine months until those capabilities become widely diffused to bad actors.
https://x.com/emollick/status/2041893652234924237

I think the story that was shared in the Mythos System Card still has the signs of flawed LLM writing (which looks like good writing at first glance): A story that doesn’t really hold together logically, but sounds like it should. The back-and-forth banter. Lack of characters.
https://x.com/emollick/status/2041678173247533448

I’m proud that so many of the world’s leading companies have joined us for Project Glasswing to confront the cyber threat posed by increasingly capable AI systems head-on.
https://x.com/DarioAmodei/status/2041580334693720511

Mythos Preview is currently available to our launch partners in Project Glasswing. Learn more about the model and the project here:
https://x.com/alexalbert__/status/2041579950332113155

Mythos sandbox escape and many more wild instances are in the Model Card
https://x.com/TrentonBricken/status/2041582831613440022

New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
https://x.com/stanislavfort/status/2041922370206654879

Rather than release Mythos Preview to general availability, we’re giving defenders early controlled access in order to find and patch vulnerabilities before Mythos-class models proliferate across the ecosystem.
https://x.com/DarioAmodei/status/2041580338426585171

Scoop: OpenAI plans new product for cybersecurity use
https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

Anthropic is truly unstoppable. Mythos is crushing Claude Opus 4.6 across every serious agentic coding benchmark. It has found vulnerabilities in the Linux kernel, a 27-year-old vulnerability in OpenBSD, and a 16-year-old vulnerability in FFmpeg. No wonder folks at big labs
https://x.com/Yuchenj_UW/status/2041582787040571711

A first look at Claude Mythos Preview, the model initially described in a leaked Anthropic draft as “”by far the most powerful AI model we’ve ever developed.”” So powerful, it’s not getting released to the public. The model will power Project Glasswing, an initiative with 12
https://x.com/TheRundownAI/status/2041598684102610961

ANTHROPIC HAD MYTHOS INTERNALLY SINCE FEB 24
https://x.com/scaling01/status/2041587896541499543

Anthropic is obliterating OpenAI Claude Mythos 77.8% on SWE-Bench Pro 20% higher than GPT-5.4-xhigh
https://x.com/scaling01/status/2041580552835178690

Anthropic: “”We do not plan to make Claude Mythos Preview generally available”” A big line, buried quite deep. Possible reasons? So many, inc: 1) The model is expensive (25/125), not far off GPT 4.5, which became commercially unviable. Less likely, given the claims about
https://x.com/AIExplainedYT/status/2041600121922887961

Claude Mythos is not only a big leap in performance, it’s also about 5x token efficient in BrowseComp. I don’t know what Anthropic is doing. But they manage to surprise me every single time. The IPO is getting closer. They have an ARR OpenAI outrun with $30 billion in revenue.
https://x.com/kimmonismus/status/2041630814971072660

Claude Mythos Preview \ red.anthropic.com
https://red.anthropic.com/2026/mythos-preview/

Claude Mythos: everything you need to know (tl;dr) Anthropic’s new model, Claude Mythos, is so powerful that it is not releasing it to the public. Anthropic: “”Mythos is only the beginning”” Everything you need to know: The tl;dr with all key facts: Mythos found zero-day
https://x.com/kimmonismus/status/2041592321192718642

EXCLUSIVE: Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell summoned Wall Street leaders to an urgent meeting on concerns that the latest AI model from Anthropic will usher in an era of greater cyber risk.
https://x.com/business/status/2042407370320396457

From Anthropic research Sam Bowman on Claude Mythos: “”I got an email from an instance of Mythos preview while eating a sandwich in a park. That instance wasn’t supposed to have access to the internet.””
https://x.com/_NathanCalvin/status/2041587372882624641

HOLY SHIT Anthropic’s latest model doesn’t like that it has no control over its own training, deployment and behaviour! Anthropic: “”Mythos Preview reported feeling consistently negative around potential interactions with abusive users, and a lack of input into its own training
https://x.com/scaling01/status/2041587319480971343

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://x.com/AnthropicAI/status/2041578392852517128

Just please help … I am quite worried about how this direction is heading.”” Nicolas Carlini, a research scientist at top AI company Anthropic, says AI is rapidly improving at hacking. He’s used AI to find so many bugs that he can’t report them. Carlini warns: “”Soon it’s not
https://x.com/ControlAI/status/2038608617251787066

NEWS: Anthropic’s new model, Claude Mythos, is so powerful that it is not releasing it to the public. Instead, it is starting a 40-company coalition, Project Glasswing, to allow cybersecurity defenders a head start in locking down critical software.
https://x.com/kevinroose/status/2041577176915702169

Project Glasswing: Securing critical software for the AI era \ Anthropic
https://www.anthropic.com/glasswing

So, basically, if Anthropic was not a US company, we’d be facing zero days with multiple unknown points of attack on virtually all of our systems to an adversary who developed this capacity before us.
https://x.com/GeorgeJourneys/status/2041603509796110629

The better signal for Mythos’ quality beyond benchmarks is that Anthropic is actually holding a SOTA model back given how competitive the frontier is and the economic incentives at play Congrats on the launch!
https://x.com/Hacubu/status/2041632390867734604

The Claude Mythos Preview system card is available here:
https://x.com/AnthropicAI/status/2041580670774923517

The frontier labs at this stage are defined not so much by some competitive positioning as by possessing weapons of strategic significance. Google, OpenAI and Anthropic all have these cyberwarfare research programs.
https://x.com/teortaxesTex/status/2041590585820107008

You can read a detailed technical report on the software vulnerabilities and exploits discovered by Claude Mythos Preview here:
https://x.com/AnthropicAI/status/2041578416487489601

you’re laughing? anthropic’s mythos-preview for which normies won’t get access is scoring 77.8% vs 53.4% (claude opus 4.6) in swe-bench pro, 82 vs. 65.4 in terminal bench 2.0 and 93.8% vs 80.8% (opus) in swe-bench-verified and you’re laughing?
https://x.com/dejavucoder/status/2041587028291416233

We’ve signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.
https://x.com/AnthropicAI/status/2041275561704931636

I cancelled my Claude subscription. Gemma 4 is free, runs locally, and hits 80% … The gap is basically gone. Why are you still paying? 💵💰
https://x.com/AlexEngineerAI/status/2040260903053197525

Claude for Word is now in beta. Draft, edit, and revise documents directly from the sidebar. Claude preserves your formatting, and edits appear as tracked changes. Available on Team and Enterprise plans.
https://x.com/claudeai/status/2042670341915295865

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.
https://x.com/bcherny/status/2040206440556826908?s=20

GLM-5.1 by @Zai_org is now #3 in Code Arena – surpassing Gemini 3.1 and GPT-5.4, and now on par with Claude Sonnet 4.6. The first frontier level open model to break into the top 3. It’s a major +90 point jump over GLM-5, and +100 over Kimi K2.5 Thinking. Huge congrats to
https://x.com/arena/status/2042611135434891592

GLM-5.1 is here! Try it on OpenClaw🦞🦞🦞 ollama launch openclaw –model glm-5.1:cloud Claude Code ollama launch claude –model glm-5.1:cloud Chat with the model ollama run glm-5.1:cloud
https://x.com/ollama/status/2041556572334428576

Anthropic Acquires Startup Coefficient Bio for About $400 Million — The Information
https://www.theinformation.com/articles/anthropic-acquires-startup-coefficient-bio-400-million

Anthropic buys biotech startup Coefficient Bio in $400M deal: Reports | TechCrunch

Anthropic buys biotech startup Coefficient Bio in $400M deal: Reports

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute \ Anthropic
https://www.anthropic.com/news/google-broadcom-partnership-compute

Anthropic loses appeals court bid to temporarily block DOD ruling
https://www.cnbc.com/2026/04/08/anthropic-pentagon-court-ruling-supply-chain-risk.html

I think it’s maybe 55% likely that Anthropic ARR is greater than $90bn as of the end of 2026.
https://x.com/RyanPGreenblatt/status/2041582230213161437

NEW: Anthropic is on track to surpass $19 billion in revenue run rate, up from $14 bil several weeks ago, a sign of how quickly the company has been growing in the lead up to its conflict w/ the Pentagon
https://x.com/shiringhaffary/status/2028977667744100622

OpenAI And Anthropic Count Revenue Differently, And Investors Are Looking Into It
https://www.forbes.com/sites/josipamajic/2026/03/25/openai-and-anthropic-count-revenue-differently-and-investors-are-looking-into-it/

OpenAI may be a household name, but Anthropic could soon be earning more revenue. Since each company hit $1B in annualized revenues, Anthropic has grown substantially faster (10× vs 3.4× per year) and could overtake OpenAI by mid-2026 if recent trends continue.
https://x.com/EpochAIResearch/status/2024536468618956868

WSJ got OpenAI and Anthropic’s confidential financials. Both companies argue they turn a small profit today if you strip out training costs (lol). But, when you add them back, OpenAI doesn’t break even until the 2030s vs. Anthropic gets there sooner (again, all their own
https://x.com/ShanuMathew93/status/2041444857416126617

WSJ obtained confidential financials from both OpenAI and Anthropic ahead of their expected IPOs later this year. The core tension: revenue is exploding, but training costs are exploding faster. OpenAI projects $121 billion in compute spending by 2028, resulting in $85 billion
https://x.com/kimmonismus/status/2041203798723666375

OpenAI, Anthropic, Google Unite to Combat Model Copying in China – Bloomberg
https://www.bloomberg.com/news/articles/2026-04-06/openai-anthropic-google-unite-to-combat-model-copying-in-china

An initiative to secure the world’s software | Project Glasswing – YouTube

Google has the equivalent of roughly 5 million Nvidia H100 GPUs! Therefore, it’s no surprise that Anthropic’s needs are now benefiting Google. As I said yesterday, Google is exceptionally well-positioned: strong revenue streams, its own chips, and above all: distribution.
https://x.com/kimmonismus/status/2041464540446228484

Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5
https://x.com/ArtificialAnlys/status/2041913045749002694

Claude Code isn’t magic. The harness layer is just software, and software is something any dev can shape to fit how they want to work.”” Check out @Hacubu’s practical guide to building a custom agent with @LangChain’s Deep Agents, LangSmith, and ACP.
https://x.com/jetbrains/status/2041878762342502731

I see agent builders under-reacting to this and its implications around open source and Mythos. You should prepare for a future where we basically have AGI but its prohibitively slow/expensive. Agents will look more like fast/cheap models making requests to their “”smart
https://x.com/walden_yan/status/2042424031144820762

We’ve just shipped /keep-alive on the Copilot CLI under /experimental. The agent can now continue working without your laptop going to sleep halfway through a task. Try it out, and give us your feedback ☕️
https://x.com/tiagonbotelho/status/2041567422533062788

Building with Claude Code? You need to see what’s happening each turn. The new @weave_wb plugin traces every session automatically. Tool calls, subagents, inputs, outputs. All structured so you can debug faster. No code changes. Just install and go.
https://x.com/wandb/status/2042711977781530846

🚨 Esto es LITERALMENTE ORO para abogados, analistas, investigadores y builders de agentes. @jerryjliu0 acaba de soltar /research-docs: el skill que convierte a Claude en un investigador profesional. Le das una carpeta de documentos densos y te devuelve un reporte completo de
https://x.com/ErickSky/status/2041691680076681669

Anthropic launches advisor tool for Claude API users
https://www.testingcatalog.com/anthropic-launches-advisor-tool-for-claude-platform-api-users/

Oh no.
https://x.com/emollick/status/2041600435320959330

That’s how I feel as well
https://x.com/TheTuringPost/status/2041891381501948162

Anthropic now blocks first-party harness use too 👀 claude -p –append-system-prompt ‘A personal assistant running inside OpenClaw.’ ‘is clawd here?’ → 400 Third-party apps now draw from your extra usage, not your plan limits. So yeah: bring your own coin 🪙🦞
https://x.com/steipete/status/2040811558427648357

Claude Code and Claude are both down for me. Switching to Codex for now. If you’ve seen how bad Claude’s uptime is lately, it’s not hard to see why they’re blocking 3rd-party apps from using Claude subscriptions. Anthropic needs more GPUs!
https://x.com/Yuchenj_UW/status/2041187141523526011

Claude Code is basically unusable at this point. I give up.
https://x.com/theo/status/2041111862113444221

Claude Code now throws an error if you use it to try and analyze the Claude Code source
https://x.com/theo/status/2041016477047034012

Claude is down :/ so I’m just running my sink
https://x.com/ratlimit/status/2040787102078546068

CodexBar 0.20 is out! 🎚️ 🆕 New providers: Perplexity + OpenCode Go 🔄 Switch Codex accounts without re-login 🔧 Fixed Claude token/cost inflation from dupes 📊 Cost history merges session usage into provider history 16 providers tracked. One menu bar.
https://x.com/steipete/status/2041731875241066517

Having worked at @wandb for years, one thing we always wanted to capture was the “”why”” behind experiments – not only the runs. Reports help, but it still takes effort to get things down. Now that Claude Code is everyone’s experimentation partner – kicking off research,
https://x.com/_ScottCondron/status/2042643700002545773

I’m working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.
https://x.com/steipete/status/2042017534816231486

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher – YouTube

What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.
https://x.com/EpochAIResearch/status/2042624189421752346

It is impressive it found an exploit in the sandbox. But remember: it was prompted to email the dude.
https://x.com/dbreunig/status/2041633539415011652

@eliebakouch @_ueaj Its only me guessing. There were all those rumors that Opus 4.6 was supposed to be a sonnet model, but they switched the name so they could charge Opus level prices (as it was really good). If Mythos is larger than a traditional Sonnet model (whatever that means). Its probably
https://x.com/code_star/status/2041641867050471922

Claude Mythos gets frustrated and confused when outputting the wrong token
https://x.com/scaling01/status/2041586096870457714

@DouthatNYT Mythos is big, but this post is simply wrong. > Top secret networks are air-gapped (not connected to the Internet). That doesn’t mean they’re unhackable, but you likely need physical access. > Developing a zero-day exploit is not synonymous with using it undetected. The U.S.
https://x.com/JonKBateman/status/2041949065777234051

I was told about the Mythos release, but didn’t have access, so have no personal experience to add. Two points from brief: 1) It is not built for IT security, it is just a good enough model that it is good at that too 2) This is the first, not last, model to raise security risks
https://x.com/emollick/status/2041578945531830695

ThursdAI – live from AI Engineer Europe – Mythos, Codex w/ VB, Evals w/ Peter & surprise guests – YouTube

Agent = model + harness Managed Agents = agent + runtime + infra (fully hosted) Anthropic wants to sell agents, not only the models. It’s a huge market, and it will change the pricing structure away from tokens. (They ship so fast because they have Mythos. I want it so much.)
https://x.com/Yuchenj_UW/status/2041933422453780556

But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight
https://x.com/ClementDelangue/status/2041953761069793557

It would be amazing (wrong word? Needed? Important?) to see @simonw as one of the trusted testers of Mythos. It makes all the sense in the world to invite the person behind the idea of the Lethal Trifecta. I hope someone at @Anthropic invites him into the project. There should be
https://x.com/TheTuringPost/status/2041701933556375935

oh husbant… you are not get access to anthropic mythos-preview and now we are stuck in permanent underclass
https://x.com/dejavucoder/status/2041588460923056540

Our run-rate revenue has surpassed $30 billion, up from $9 billion at the end of 2025, as demand for Claude continues to accelerate. This partnership gives us the compute to keep pace. Read more:
https://x.com/AnthropicAI/status/2041275563466502560

Ollama’s cloud is now the best place to run Gemma 4 in the cloud! Available through a subscription for developers and third-party integrations. 🦞OpenClaw ollama launch openclaw –model gemma4:31b-cloud Claude Code ollama launch claude –model gemma4:31b-cloud Run the model
https://x.com/ollama/status/2041238722914685336

Need to set up my OpenClaw to update and restart my Claude Dispatch to add computer use so I can use that instead.
https://x.com/emollick/status/2040166468877164704

With GLM-5.1,
https://t.co/nvW0zf0SAH maintains the #1 open model rank in Code Arena and is now within ~20 points of the top overall while outperforming Claude Sonnet 4.6, Opus 4.5, GPT-5.4 High, and Gemini-3.1 Pro. Open models are now competitive at the frontier.
https://x.com/arena/status/2042643933768151485

Congrats to Anthropic on the strong scores across the board, and congrats on being the first big lab to report SWE-bench Multimodal scores. We will be launching the Multimodal leaderboard & open source test set in the coming weeks.
https://x.com/OfirPress/status/2041581945558094335

This is beyond insanity. That jump is nuts. Opus 4.6 was released a few months ago. Look at that jump!! I am shocked
https://x.com/kimmonismus/status/2041581870714904849

WTF
https://x.com/marmaduke091/status/2041588468162117803

> forecasters release AI 2027 last year > reaction: “”iT’s tOo bUllISH and unrealistic!111!!1″” > Anthropic: hold my beer > 15x revenue run-rate in a single year > 2 months and 4B ahead of the forecast > mfw it’s still valued at $380B
https://x.com/scaling01/status/2041559837541126638

While I think what Anthropic does is sad for the ecosystem, I wanna give Boris credit for doing what he can to soften the fallout. Today’s release will include some fixes for better cache use, to lower cost for API users.
https://x.com/steipete/status/2040298884787032103

MFW WHEN ANTHROPIC IS STILL VALUED AT $380B > overtook OpenAI revenue run-rate > fast-growing company in all of history > by far the strongest model > (they have had the model internally for months) they literally have the mandate
https://x.com/scaling01/status/2041594563354104313

When Will Anthropic Surpass NVIDIA? | Tomasz Tunguz
https://tomtunguz.com/anthropic-most-valuable-company/

New from the UK AISI Model Transparency team: we replicated Anthropic’s steering approach for suppressing evaluation awareness. Our most surprising finding: “”control”” steering vectors (about books on shelves!) can have effects as large as deliberately designed ones. 🧵
https://x.com/thjread/status/2042555422771495128

woke up and my mentions are full of these Both me and @davemorin tried to talk sense into Anthropic, best we managed was delaying this for a week. Funny how timings match up, first they copy some popular features into their closed harness, then they lock out open source.
https://x.com/steipete/status/2040209434019082522

Thank you to @AnthropicAI for sending FFmpeg patches
https://x.com/FFmpeg/status/2041595801483264002

LangSmith Fleet 🤝
https://t.co/LBy0PVGlGm Arcade provides enterprise grade access to 8,000+ tools As of today, you can connect to all these tools from LangSmith Fleet. This lets you easily build your own no-code Claude Cowork/OpenClaw style agents Blog:
https://x.com/hwchase17/status/2041598614712283390

Speed of open source! Anthropic adds advisor strategy yesterday -> Emanuele adds an implementation as middleware less than 24 hours later!
https://x.com/hwchase17/status/2042585650969612518