Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide-angle observational shot of elderly Chinese man at folding table reviewing handwritten documents in concrete courtyard of half-demolished hutong, chestnut fire horse tethered calmly in background, overcast natural light, desaturated muted tones, weathered concrete and brick, patient documentary composition, white Chinese cinema poster title text reading ANTHROPIC overlaid at top, Jia Zhangke long-take aesthetic, decelerated realism, transitional urban China atmosphere
Introducing Sonnet 4.6 \ Anthropic https://www.anthropic.com/news/claude-sonnet-4-6
NEW: Anthropic releases Claude Sonnet 4.6 Nears Opus-level performance across coding and reasoning at Sonnet pricing ($3/$15 per mil tokens). Computer use scores have gone from single digits last year to 72.5% now 📈 + a 1M token context window”” https://x.com/TheRundownAI/status/2023821446380978238
Sonnet 4.6 the best model on GDPval”” https://x.com/scaling01/status/2023819793212813604
Users preferred Sonnet 4.6 over Opus 4.5 59% of the time”” https://x.com/scaling01/status/2023819403230671232
Pentagon threatens to cut off Anthropic in AI safeguards dispute https://www.axios.com/2026/02/15/claude-pentagon-anthropic-contract-maduro
141 days for Sonnet to go from 13.6% to 60.4% on ARC-AGI-2″” https://x.com/scaling01/status/2023850250662969587
Sonnet 4.6 Benchmarks 79.6% SWE-Bench Verified 58.3% ARC-AGI-2″” https://x.com/scaling01/status/2023818940112327101
We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.”” https://x.com/METR_Evals/status/2024923422867030027
Claude Sonnet 4.6 is the new leader in GDPval-AA, slightly ahead of Anthropic’s Opus 4.6 on agentic performance of real-world knowledge work tasks less than two weeks after its launch In our pre-release testing with @AnthropicAI, Sonnet 4.6 reached an ELO of 1633 using the”” https://x.com/ArtificialAnlys/status/2023821893846135212
To get an idea of the near-term future of work with AI, take a look at the official Claude Cowork plugins, which give the AI specialized knowledge for various hard tasks A natural successor for GPTs, but built for agents (& therefore much more scalable & customizable for firms)”” https://x.com/emollick/status/2023113346162336137
How AI assistance impacts the formation of coding skills \ Anthropic https://www.anthropic.com/research/AI-assistance-coding-skills
Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid – WSJ https://www.wsj.com/politics/national-security/pentagon-used-anthropics-claude-in-maduro-venezuela-raid-583aff17
Anthropic is prepared to loosen its current terms of use, but wants to ensure its tools aren’t used to spy on Americans en masse, or to develop weapons that fire with no human involvement. The Pentagon has aid, that Anthropic will “”pay a price”” for that behavior. Within this”” https://x.com/kimmonismus/status/2023419652378955809
For Claude in Excel users, our add-in now supports MCP connectors, letting Claude work with tools like S&P Global, LSEG, Daloopa, PitchBook, Moody’s and FactSet. Pull in context from outside your spreadsheet without ever leaving Excel.”” https://x.com/claudeai/status/2023817143096406246
The browser agent in Comet now runs on Claude Sonnet 4.6 for all Perplexity Pro users. Max users can choose between Sonnet 4.6 and Opus 4.6.”” https://x.com/comet/status/2023889197556441464
From Claude Code to Figma: Turning Production Code into Editable Figma Designs | Figma Blog https://www.figma.com/blog/introducing-claude-code-to-figma/
Improved Web Search with Dynamic Filtering | Claude https://claude.com/blog/improved-web-search-with-dynamic-filtering
You can now run Qwen3.5 locally! 💜 Qwen3.5-397B-A17B is an open MoE vision reasoning LLM for agentic coding & chat. It performs on par with Gemini 3 Pro, Claude Opus 4.5 & GPT-5.2. Run 4-bit on 256GB Mac / RAM. Guide: https://t.co/wjS1lMnbNp GGUF: https://x.com/UnslothAI/status/2023338222601064463
Measuring AI agent autonomy in practice \ Anthropic https://www.anthropic.com/research/measuring-agent-autonomy
Most agent actions on our API are low risk. 73% of tool calls appear to have a human in the loop, and only 0.8% are irreversible. But at the frontier, we see agents acting on security systems, financial transactions, and production deployments (though some may be evals).”” https://x.com/AnthropicAI/status/2024210050718585017
New Anthropic research: Measuring AI agent autonomy in practice. We analyzed millions of interactions across Claude Code and our API to understand how much autonomy people grant to agents, where they’re deployed, and what risks they may pose. Read more:”” https://x.com/AnthropicAI/status/2024210035480678724
NEW: Pentagon is so furious with Anthropic for insisting on limiting use of AI for domestic surveillance + autonomous weapons they’re threatening to label the company a “supply chain risk,” forcing vendors to cut ties. With @m_ccuri and @mikeallen”” https://x.com/DavidLawler10/status/2023425130148626767
Pentagon threatens to cut off Anthropic in AI safeguards dispute https://www.axios.com/2026/02/15/claude-pentagon-anthropic-contract-maduro?amp%3Butm_medium=newsletter&%3Butm_campaign=ai-s-new-physics-discovery&%3B_bhlid=147fc2fb115d35bbc6b2211e9bcebfff031af136
Software engineering makes up ~50% of agentic tool calls on our API, but we see emerging use in other industries. As the frontier of risk and autonomy expands, post-deployment monitoring becomes essential. We encourage other model developers to extend this research.”” https://x.com/AnthropicAI/status/2024210053369385192
Something strange is happening with AI agents that this new Anthropic research quietly surfaces. The agents are asking us for help more than we’re stepping in to correct *them*. Anthropic analyzed data from Claude Code and their public API to measure how autonomous AI agents”” https://x.com/omarsar0/status/2024864635120451588
People should read the Claude Constitution. It does a pretty good job of laying out what Anthropic presumably really believes (and it is part of training). I’d think that a clear debate over things that are good or bad or missing there would be helpful.”” https://x.com/emollick/status/2023612474474303530
OpenAI may be a household name, but Anthropic could soon be earning more revenue. Since each company hit $1B in annualized revenues, Anthropic has grown substantially faster (10× vs 3.4× per year) and could overtake OpenAI by mid-2026 if recent trends continue.”” https://x.com/EpochAIResearch/status/2024536468618956868
The decision to forbid running this on 3rd party open source code is… interesting”” https://x.com/moyix/status/2024920042887082336
Opus4.6 found 500+ vulnerabilities in open-source code and we’ve begun reporting them and contributing patches quick excerpts from some of them 🧵”” https://x.com/trq212/status/2024937919937741290
Gemini 3.1 Pro is here! It’s top 3 across Text and Vision Arena, and #6 in Code Arena, tied closely with Claude Opus 4.5. Highlights: ▪️Tied #1 in Text (scoring 1500), 4 pts from Opus 4.6 ▪️Top 3 in Arena Expert Leaderboard (scoring 1538), just behind Opus 4.6 ▪️#6 in Code”” https://x.com/arena/status/2024519891295089063
On Dwarkesh Patel’s 2026 Podcast With Dario Amodei | Don’t Worry About the Vase https://thezvi.wordpress.com/2026/02/16/on-dwarkesh-patels-2026-podcast-with-dario-amodei/
Less than a year and a half ago computer use was barely even a thing and now we’re near human-level capability. Another reminder that things are improving very fast.”” https://x.com/alexalbert__/status/2023820589983801796
Huge quality of life upgrade for devs: We’ve added automatic prompt caching to the API which means you no longer have to set cache points in your requests!”” https://x.com/alexalbert__/status/2024586006633271386
Seems like Anthropic lawyers sent some more love letters to OpenCode 🙃”” https://x.com/theo/status/2024648305863774281
Whatever this was seems fixed!”” https://x.com/rishdotblog/status/2023854279766003784
– hallucinating function names that don’t existing during agentic workflows – hallucinating incorrect structures when asked to generate structured outputs sonnet 4.5 still works great, but 4.6 is completely crapping the bed on the same tasks”” https://x.com/rishdotblog/status/2023848930430304648
@OpenHandsDev OpenHands leads to such a sharp improvement on Codex 5.1 (33.3% ➡️ 43.2%) and Claude Sonnet 4.5 (34.1% ➡️ 45.5%) that it makes me wonder if some of the other leaderboards would see improvement from trying out different frameworks as well.”” https://x.com/iamwaynechi/status/2022448462105842023
> Someone reverse-engineered how Claude Code’s Agent Teams communicate. > > No WebSocket. No gRPC. No message queue. > > They read and write JSON files on disk. > > Each agent gets an inbox at ~/.claude/teams/inboxes/{agent}.json. Messages append to a JSON array. Protocol”” https://x.com/peter6759/status/2022156692985983266
👌 Tracing in LangSmith is as easy as copy/paste 📊 Get started in seconds with Claude Agent SDK, OpenAI, LangChain, Vercel AI SDK, and 20+ other frameworks. Pick your stack, copy the code, start debugging. Docs: https://t.co/DAQcQxkVsp Sign up for LangSmith:”” https://x.com/LangChain/status/2023532973086159283
🚨 Breaking: Claude OAuth officially not allowed in OpenClaw This would be a GREAT time for @sama to step in and let us use @OpenAI subscriptions with @openclaw.”” https://x.com/AndrewWarner/status/2024168538508775674
Another thing I noticed writing my latest AI guide was how Anthropic seems to be alone in knowledge work apps. Not just Cowork, but Claude for PowerPoint & Excel, as well as job-sppecific skills, plugins & finance/healthcare data integrations Surprised at the lack of challengers”” https://x.com/emollick/status/2023968612881412457
Anthropic blocked his fren from using the claude sub in openclaw, switched to minimax – big boost for open models thanks anthropic”” https://x.com/Teknium/status/2023251135201738794
Anthropic might have already started slowing down. Since July 2025, Anthropic has grown its revenue at a rate of 7×/year rather than 10×.”” https://x.com/EpochAIResearch/status/2024536493721866668
Bruh It’s not just behind, it’s 50% more expensive than xhigh and 228% over 5.2 codex. That said, a vast improvement over Sonnet 4.5.”” https://x.com/teortaxesTex/status/2023890938125488289
Claude Code has regressed an absurd amount in the last few days. Timestamps no longer update unless you un-focus/re-focus the tab. “”thinking”” doesn’t show at all. I had a query run for 6 minutes with 0 output. This is genuinely unpleasant to use.”” https://x.com/theo/status/2024718133676867608
Claude Sonnet 4.6 same pricing as Sonnet 4.5!”” https://x.com/kimmonismus/status/2023820443359002922
Claude Sonnet 4.6 substantially improves on the aesthetic capabilities of Sonnet 4.5 for tasks like presentation and document generation in GDPval-AA. While we see effective analysis, and in some cases content similarities, between the two versions, the visual elements are”” https://x.com/ArtificialAnlys/status/2023821899139293652
Computer use is the standout. For coding, it’s less prone to overengineering than Opus 4.5 and more consistent over long sessions. And 1M context window in beta on the API. We can’t wait to see what you build!”” https://x.com/mikeyk/status/2023853207731200176
Did anthropic break something after releasing sonnet 4.6? seeing a ton of hallucinations eveywhere for both Opus 4.6 and Sonnet 4.6 cc: @trq212, @alexalbert__”” https://x.com/rishdotblog/status/2023848487285387693
Fun nugget from Sonnet 4.6: With a 1M context window, the model is better at long-horizon planning. In the Vending-Bench Arena, models compete to run a simulated business. Sonnet 4.6 developed a new strategy: invest heavily in capacity for the first 10 months, then pivot hard”” https://x.com/felixrieseberg/status/2023823186484404443
GEPA for skills is here! Introducing gskill, an automated pipeline to learn agent skills with @gepa_ai. With learned skills, we boost Claude Code’s repository task resolution rate to near-perfect levels, while making it 47% faster. Here’s how we did it:”” https://x.com/ShangyinT/status/2024651061995458722
idk what you are all smoking clawdbot is just a passing hype that can be vibe-coded in a week anthropic lost absolutely nothing here except some aura points in the open source community but it’s not like they liked anthropic anyway”” https://x.com/scaling01/status/2023217588319277471
IMO it’s pretty clear that Claude Code needs to be rewritten from scratch at this point”” https://x.com/theo/status/2024726444283449781
Kind of crazy to read how much prompt caching influences the performance of Claude Code. It almost feels like, without it, we wouldn’t be anywhere near the experience we have CC today. Super important read, especially as we enter this new era of agent harnesses. This backend”” https://x.com/omarsar0/status/2024620142240333979
Kind of crazy watching Anthropic’s good will crumble in real time”” https://x.com/theo/status/2024225756981973214
Latest from @AnthropicAI: Claude Opus & Sonnet 4.6 are now in the Search Arena. 🌐 Check them out in Search Arena to see how well they can search, cite and output real-time, verifiable information online.”” https://x.com/arena/status/2024144830209966142
LCM extends on Recursive Language Models and outperforms Claude Code on long-context tasks. Pay close attention. So much innovation is happening in agent memory.”” https://x.com/omarsar0/status/2023765757117763820
Modular: The Claude C Compiler: What It Reveals About the Future of Software https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software
On Anthropic’s Consumer Marketing | rohan ganapavarapu https://rohan.ga/blog/anthro_consumer/
Opus 4.6 is ludicrously better than any model I’ve ever tried at doing architecture and experimental critique. Most noticeably, it will start down a path, notice some deviation it hadn’t expected…and actually stop and reconsider. Hats off to Anthropic.”” https://x.com/eshear/status/2024148657797308747
Opus 4.6 keeps blowing through its *entire* token budget & eventually responding completely empty when I ask for max reasoning. Finish reason: “”length”” These are short prompts – 160 token input. Thinks for 20 mins then blows up & charges me money for the privilege”” https://x.com/paul_cal/status/2024817020529766764
Piotr discovered something worrying: if you give an LLM a list of tools it’s allowed to call, it might decide to also call a tool you didn’t provide! Impacts all major US providers except @OpenAI. Be sure to check LLM tool call requests! (Lisette/Claudette check automatically)”” https://x.com/jeremyphoward/status/2024599416901103705
Side by side example Same model (claude-opus-4-6). Same task. Two different agent harnesses @LangChain Deep Agents CLI: 9s Claude Code: 16s The harness IS the performance. 1.7× difference, zero model changes”” https://x.com/GitMaxd/status/2024137171217871106
Somehow I didn’t fully appreciate how strongly Claude Code’s prompt has to fight against the weights to make parallel tool calls. https://x.com/dbreunig/status/2024247669359788050
Sonnet 4.6 incoming! Lets go!”” https://x.com/kimmonismus/status/2023814107846398015
Sonnet 4.6 is a beast for real-world work, agentic tasks, especially computer usage”” https://x.com/kimmonismus/status/2023844025011499052
Sonnet 4.6 is here. It’s our most capable Sonnet model by far, approaching Opus-class capabilities in many areas. Very excited for folks to try this one out. The performance jump over Sonnet 4.5 (which was released just over four months ago) is quite insane.”” https://x.com/alexalbert__/status/2023817479580221795
sonnet 4.6 is here. no sonnet 5 lol”” https://x.com/dejavucoder/status/2023817232732848501
Sonnet 4.6 is now available in Cursor. Our benchmarks show it as a notable improvement over Sonnet 4.5 on longer tasks, but below Opus 4.6 for intelligence.”” https://x.com/cursor_ai/status/2023841746577485894
Sonnet 4.6 used 74M output tokens to run the Artificial Analysis Intelligence Index, ~3x Sonnet 4.5 (Reasoning, 25M) and more than Opus 4.6 (Adaptive Reasoning, 58M)”” https://x.com/ArtificialAnlys/status/2024259815930012105
Sonnet and Slopus 4.6 are munching through my credits I miss Sonnet 3.5 just one-shotting everything”” https://x.com/scaling01/status/2023835207355560223
Sonnets progress from 4.5 to 4.6 is fucking insane, it’s just much better at everything taste is off the charts The NYC skyline is the most ridiculous part. While other models just write SVG that look like some skyscraper, like a tall box with a few windows, Sonnet 4.6 is”” https://x.com/scaling01/status/2023840565641556439
The clawdbot –> open claw rename foreshadowed it all. Zuck must not be too happy. And interesting that Anthropic didn’t even make a play. So what does this mean? I suspect new functionality keeps coming to open claw first – and the best stuff graduates to chatgpt proper. A”” https://x.com/bilawalsidhu/status/2023187986901344548
This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.”” https://x.com/claudeai/status/2023817132581208353
This is definitely something to be aware of both for benchmark builders and users IMO. For longer-running, more difficult tasks, the differences between which agent you use can be big, like a 10% gain in success rate when going from Claude Code to OpenHands.”” https://x.com/gneubig/status/2022451119310655909
Underrated dev upgrade from today’s launch: Claude’s web search and fetch tools now write and execute code to filter results before they reach the context window. When enabled, Sonnet 4.6 saw 13% higher accuracy on BrowseComp while using 32% fewer input tokens.”” https://x.com/alexalbert__/status/2023834863858769975
Warmer and kinder than Sonnet 4.5, but also smarter and more overcaffeinated than Sonnet 4.5.”” https://x.com/sleepinyourhat/status/2023821754859503650
When Anthropic CEO @DarioAmodei sat down with @dwarkesh_sp, the AI world saw a rare sight: a frontier lab leader was pressed not on what models can do, but why they haven’t transformed the economy yet. Is the bottleneck the technology, or is it us? I look at 3 core pressure”” https://x.com/TheTuringPost/status/2024247179305451634
Worth noting Claude Cowork is quite different from Claude Code (and even more so from agents like OpenClaw) from a security perspective. It runs in a VM with default-deny networking & hard isolation baked in A sign of a path forward for agents that will not terrify corporate IT.”” https://x.com/emollick/status/2023260943942135850
Anthropic has entrusted Amanda Askell to endow its AI chatbot, Claude, with a sense of right and wrong”” https://x.com/WSJ/status/2022629696261808173
Anthropic’s Philosopher Amanda Askell Is Teaching Claude AI to Have Morals – WSJ https://www.wsj.com/tech/ai/anthropic-amanda-askell-philosopher-ai-3c031883?mod=e2tw
📊Let’s dive deeper into @AnthropicAI’s Sonnet 4.6 vs 4.5. Overall: Sonnet 4.6 ranks 3 places higher (#13 vs #16) Where Sonnet 4.6 gains: Code: ▪️WebDev (+19 for Sonnet 4.6: #3 vs #22) Text: ▪️Instruction Following (+6, #5 vs #11) ▪️English (+5, #9 vs #14) ▪️Hard Prompts (+5,”” https://x.com/arena/status/2024892330743124246
Claude Sonnet 4.6 (medium) scores 66.1% on WeirdML, matching Opus 4.6 (no thinking) and a big advance from Sonnet 4.5 at 47.7%. I had to run it on medium reasoning level because the default (high) constantly hit the 64k max tokens limit. Even at medium it uses as many output”” https://x.com/htihle/status/2024764946051907659
Claude Sonnet 4.6 takes second place in the Artificial Analysis Intelligence Index (behind Opus 4.6), but used ~3x more output tokens than Claude Sonnet 4.5 in its max effort mode. Sonnet 4.6 leads all models in GDPval-AA and TerminalBench, including a slight lead over Opus 4.6″” https://x.com/ArtificialAnlys/status/2024259812176121952
When I joined METR I was really skeptical that we were evaling models using simple OS scaffolds rather than Claude Code / Codex / etc. I really appreciate Nikola looking into this and I’m surprised it still doesn’t seem to make much difference for CC on Opus 4.5″” https://x.com/ajeya_cotra/status/2022419978495127828
GLM-5 scores 48.2% on WeirdML, beating Claude Sonnet 4.5 and tying gpt-oss-120b (high) for the best open model. This is a clear advance but still far from Opus-4.6 at 78% and gpt-5.2 at 72%.”” https://x.com/htihle/status/2023734346943775179
Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation \ Anthropic https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation
It’s extremely unreasonable to say a company is a “”supply chain risk”” because it wants terms that prevent using the AI for mass domestic surveillance and lethal autonomous weapons. (Insofar as this is the situation.) 1/”” https://x.com/RyanPGreenblatt/status/2023524096592802207
OpenAI and Anthropic are much further ahead than what benchmarks show. While you are token constrained they are blasting millions of tokens at 4x the API speed without batting an eye and they scaffold like they are trying to build a skyscraper.”” https://x.com/scaling01/status/2023837889478758495
I unashamedly love Windows. Always had. Anthropic folks – apparently, not so much :-), Claude Code is super-buggy on Windows. If you want to avoid spending a lot of time fixing NTFS issues, add this to https://t.co/QG7xYArzFH ## Windows Shell Safety The Bash tool runs under Git”” https://x.com/MParakhin/status/2024172856029171877
Wow, Codex is some sort of a miracle… (yes, I’ve tried Claude Code before that)”” https://x.com/TheTuringPost/status/2022079178703847607
I looked into how Claude Code and Codex compare to the default scaffolds METR uses for time horizon measurements. It looks like they don’t significantly outperform our default scaffolds on any models we’ve tried them on so far.”” https://x.com/nikolaj2030/status/2022398669337825737
Dario acknowledges the multi-trillion dollar robotics opportunity, yet Anthropic is not hiring robotics talent; even as OpenAI and Google DeepMind aggressively build out their own robotics teams.”” https://x.com/TheHumanoidHub/status/2022416551270662427
Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: https://x.com/claudeai/status/2024907535145468326
A paper worth paying close attention to. It presents Lossless Context Management (LCM), which reframes how agents handle long contexts. It outperforms Claude Code on long-context tasks. Recursive Language Models give the model full autonomy to write its own memory scripts. LCM”” https://x.com/dair_ai/status/2023765147970662761
WSJ did a profile of me. A lot of the response has been people trying to infer my personal political views. For what it’s worth, I try to treat my personal political views as a potential source of bias and not as something it would be appropriate to try to train models to adopt.”” https://x.com/AmandaAskell/status/2022778351744581779
We’re officially opening our Bengaluru office–our new home base in India, and Anthropic’s second office in Asia-Pacific. India is our second-largest market for https://t.co/RxKnLNNcNR. We’re launching new partnerships to deepen our long-term commitment:”” https://x.com/AnthropicAI/status/2023322514206957688
WeirdML Time Horizons! Inspired by @METR_Evals I found time-horizons for the WeirdML tasks, using LLM-estimated human completion times. We find horizons of ~24 min (GPT-4) to ~38 hours (Opus 4.6), doubling time ~5 months. Links to blog post, git-repo + nice figures in thread.”” https://x.com/htihle/status/2023349189271572975
TLDR: Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro and solves more levels. I’m now much more confident that current and future models will be able to solve ARC-AGI-3, given that they have access to harness with simple memory. My speculative take”” https://x.com/scaling01/status/2024642420177096769
The @DarioAmodei interview. 0:00:00 – What exactly are we scaling? 0:12:36 – Is diffusion cope? 0:29:42 – Is continual learning necessary? 0:46:20 – If AGI is imminent, why not buy more compute? 0:58:49 – How will AI labs actually make profit? 1:31:19 – Will regulations destroy”” https://x.com/dwarkesh_sp/status/2022357801276690455
EVMbench measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. Opus 4.6 getting mogged by GPT-5.2 and GPT-5.3. Although its detection accuracy is technically higher, it’s precision is much lower. (Opus is going shizo)”” https://x.com/scaling01/status/2024212205944643718
OpenAI’s Sam Altman and Anthropic’s Dario Amodei refuse to hold hands weeks after Super Bowl ad war | Fortune https://fortune.com/2026/02/19/openai-anthropic-sam-altman-dario-amodei-refused-to-hold-hands-ai-super-bowl-ad-war-ceos-big-tech-conflict/
OpenClaw creator on Opus vs Codex: “Opus is like the coworker that is a little silly sometimes, but it’s really funny and you keep him around. Codex is like the weirdo in the corner that you don’t want to talk to, but he’s reliable and gets shit done.” LMAO. Accurate.”” https://x.com/bilawalsidhu/status/2022571001490325791
Anthropics hate for open source is so weird”” https://x.com/ThePrimeagen/status/2023194211445834132
By the way, the recent Gemini 3.1 Pro is also a really good model for RLMs. Claude Opus 4.6 is the worst of the ones I tested. Probably not optimized for the type of decomposition that RLMs need. I am just impressed by GPT-5.2-Codex. The strategies it uses are brilliant.”” https://x.com/omarsar0/status/2024973182436831629
Claude Sonnet 5: The “Fennec” Leaks – Fennec Codename: Leaked internal codename for Claude Sonnet 5, reportedly one full generation ahead of Gemini’s “Snow Bunny.” – Imminent Release: A Vertex AI error log lists claude-sonnet-5@20260203, pointing to a February 3, 2026 release”” https://x.com/pankajkumar_dev/status/2018187650927349976?s=46
Gemini 3.1 Pro will be a massive step-up! There’s a decent chance it’s on par with Opus 4.6 and GPT-5.3. The main reason for that: similarly to Claude 4.6 and GPT-5.2/5.3 it thinks much longer than Gemini 3 Pro The same request on aistudio, tested multiple times, had 6″” https://x.com/scaling01/status/2024251668771066362
Google is once again the leader in AI: Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index, 4 points ahead of Claude Opus 4.6 while costing less than half as much to run @GoogleDeepMind gave us pre-release access to Gemini 3.1 Pro Preview. It leads 6 of the”” https://x.com/ArtificialAnlys/status/2024518545510662602
In Arena Expert, with expert level prompts, Gemini 3.1 Pro Preview lands in the top 3 (scoring 1538), just behind Claude Opus 4.6″” https://x.com/arena/status/2024519895623598423
Sonnet 4.6 crushes Gemini 3 and GPT-5.2 on Vending-Bench 2″” https://x.com/scaling01/status/2023833660546499053
Claude Sonnet 4.6 has landed #3 in Code and #13 in Text Arena! Highlights: ▪️+130 pts jump in Code Arena (#22 -> #3) compared to Sonnet 4.5, surpassing top-tier thinking models like Gemini-3.1 and GPT-5.2 ▪️Strong gains in Text categories: Math (#4) and Instruction Following”” https://x.com/arena/status/2024883614249615394
Dario Amodei — “”We are near the end of the exponential”” https://www.dwarkesh.com/p/dario-amodei-2
OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei visibly declined to hold hands during a group photo at the India AI Impact Summit, even as other leaders on stage linked arms for the ceremonial shot”” https://x.com/Reuters/status/2024401067228684396?s=20
Will the robotics industry will be generating trillions of dollars of revenue?””, “”YES.”” Dario Amodei says breakthroughs in robotics could emerge in several ways, such as through continual learning or generalization. Once achieved, these models will revolutionize both robot”” https://x.com/TheHumanoidHub/status/2022409229223780533





Leave a Reply