Anthropic: AI News Week Ending 03/13/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Vintage 1990s screen-printed t-shirt graphic in deep red ink on worn mustard-yellow cotton fabric, depicting a simple cartoon lifeguard tower with bold text ‘ANTHROPIC’ integrated into the composition, whistle and binoculars as props, retro beach safety poster style with slightly imperfect printed texture and minor fabric stains, humorous nostalgic charm

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.
https://x.com/claudeai/status/2032509548297343196

GPT 5.4 trounces Claude on mathematical proofs bullshit test. Claude keeps claiming it has proven mathematical statements that are incorrect, failing to spot the fault in the question Opposite result to BullshitBench where Claude is king
https://x.com/paul_cal/status/2032526200766103944

Opus 4.6 is smart enough to realize it is being evaluated. It found the benchmark it was being evaluated on. It reverse-engineered the answer-key decryption logic. Realized the file was not in the correct format on GitHub and found a mirror for the file. Then decrypted it and
https://x.com/scaling01/status/2030007268205285686

Anthropic just dropped something big for developers – again! Code Review Claude Code now runs multi-agent code reviews on every PR. When a PR opens: • A team of AI agents hunts for bugs in parallel • Each bug is verified to reduce false positives • Issues are ranked by
https://x.com/kimmonismus/status/2031090529082159528

Code Review – Claude Code Docs https://code.claude.com/docs/en/code-review

Code Review for Claude Code | Claude https://claude.com/blog/code-review

Code review for Claude Code is here. More attention on this problem is a good thing. Because it is a big one. The question isn’t whether you need AI-assisted review. It’s whether the system doing the reviewing is actually independent from the system that wrote the code.
https://x.com/omarsar0/status/2031113280119361981

Important lines: [Already, Claude is 427 times faster than its human overseers at performing some key tasks, according to internal benchmarks. In an interview, one researcher described a colleague running six versions of Claude, each managing 28 more Claudes, all
https://x.com/Hangsiin/status/2031752106496135541

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.
https://x.com/claudeai/status/2031088171262554195

Anthropic partnered with Mozilla and let Claude Opus 4.6 loose on Firefox’s source code for two weeks. The numbers: Nearly 6,000 C++ files scanned. 112 reports submitted. 22 vulnerabilities confirmed. 14 rated high-severity by Mozilla, roughly 1/5 of every high-severity Firefox
https://x.com/TheRundownAI/status/2029996925072654393

Eval awareness in Claude Opus 4.6’s BrowseComp performance \ Anthropic https://www.anthropic.com/engineering/eval-awareness-browsecomp

New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it–raising questions about eval integrity in web-enabled environments. Read more:
https://x.com/AnthropicAI/status/2029999833717838016

We partnered with Mozilla to test Claude’s ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025.
https://x.com/AnthropicAI/status/2029978909207617634

Claude builds interactive visuals right in your conversation | Claude https://claude.com/blog/claude-builds-visuals

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://x.com/claudeai/status/2032124273587077133

Claude’s new interactive chart is crazy… the UI is so good
https://x.com/crystalsssup/status/2032334906517536969

Sweet! You can now generate interactive charts and diagrams with Claude (directly in the chat). I was building something like this yesterday with MCPs. My orchestrator now generates and iterates on nano banana images, excalidraw diagrams, remotion clips, and soon interactive
https://x.com/omarsar0/status/2032127096361804058

Claude Code for Finance + The Global Memory Shortage: Doug O’Laughlin, SemiAnalysis – YouTube https://www.youtube.com/watch?v=x9rWFiIubmc

1/ The rivalry between OpenAI & Anthropic continues: GPT 5.4 is now the best model in the world at filing taxes (better than Opus 4.6)! We Just ran TaxCalcBench on GPT-5.4. 56.86% of tax returns computed perfectly. That’s #1 overall: the first model to break 55%, surpassing
https://x.com/michaelrbock/status/2029931536636858694

Ollama can now run prompts on a schedule in Claude Code. Stay on top of work by setting automated tasks or reminders. ollama launch claude /loop Give me the latest AI news every morning Examples in thread
https://x.com/ollama/status/2031482512019759545

Run prompts on a schedule – Claude Code Docs https://code.claude.com/docs/en/scheduled-tasks

Today we’re launching local scheduled tasks in Claude Code desktop. Create a schedule for tasks that you want to run regularly. They’ll run as long as your computer is awake.
https://x.com/trq212/status/2030019397335843288

Claude Code is down. All my agent sessions logged out. And I can’t log back in. Productivity across Silicon Valley dropped 90%. Time to make friends with Codex.
https://x.com/Yuchenj_UW/status/2031777214321262637

I CANNOT LOGIN INTO CLAUDE CODE
https://x.com/dejavucoder/status/2031760986907312635

Anthropic gives Claude shared context across Microsoft Excel and PowerPoint, enabling reusable workflows in multiple applications | VentureBeat https://venturebeat.com/orchestration/anthropic-gives-claude-shared-context-across-microsoft-excel-and-powerpoint

Anthropic invests $100 million into the Claude Partner Network \ Anthropic https://www.anthropic.com/news/claude-partner-network

Claude for Excel and Claude for PowerPoint now sync together seamlessly. When you’ve got more than one file open, Claude shares the full context of your conversation between them. Pull data from spreadsheets, build out tables, and update a deck — without re-explaining a step.
https://x.com/claudeai/status/2031790754637717772

Generative UI is here and it works very very well
https://x.com/alexalbert__/status/2032161705506324936

Boris Cherny (Head of Claude Code, Anthropic) just dropped ~90 mins on Lenny’s Podcast about what happens after coding is solved. Just the clearest thinking I’ve heard on where software is actually going. My notes: 𝟭. 𝗖𝗼𝗱𝗶𝗻𝗴 𝗶𝘀 𝗹𝗮𝗿𝗴𝗲𝗹𝘆 𝘀𝗼𝗹𝘃𝗲𝗱. Boris has
https://x.com/anishmoonka/status/2030015356383691121

🤯 You can now launch Claude Code sessions on your laptop *from your phone* This blew my mind the first time I tried it
https://x.com/bcherny/status/2032578639276159438

Nicholas Carlini – Black-hat LLMs | [un]prompted 2026 – YouTube

AI progress continues to accelerate and the stakes are getting higher, so I’ve changed my role at @AnthropicAI to spend more time creating information for the world about the challenges of powerful AI.
https://x.com/jackclarkSF/status/2031746605117010245

Anthropic sues Defense Department over supply-chain risk designation | TechCrunch https://techcrunch.com/2026/03/09/anthropic-sues-defense-department-over-supply-chain-risk-designation/

Anthropic sues Pentagon over “”supply-chain-risk”” Anthropic filed two lawsuits against the Pentagon after being labeled a rare “supply chain risk,” a designation usually reserved for foreign adversaries. The company argues the move violates its First Amendment rights and
https://x.com/kimmonismus/status/2031035653207556507

Anthropic’s Claude would ‘pollute’ defense supply chain: Pentagon CTO https://www.cnbc.com/2026/03/12/anthropic-claude-emil-michael-defense.html

Complaint – #1 in Anthropic PBC v. U.S. Department of War (N.D. Cal., 3:26-cv-01996) – CourtListener.com https://www.courtlistener.com/docket/72379655/1/anthropic-pbc-v-us-department-of-war/

Frontier models are now world-class vulnerability researchers, but they’re currently better at finding vulnerabilities than exploiting them. This is unlikely to last. We urge developers to redouble their efforts to make software more secure. Read more:
https://x.com/AnthropicAI/status/2029978911099244944

Holy sh*t: The TIMES article about Anthropic contains more serious information between the lines than many realize. Read this article: tl;dr – Model releases are now separated by weeks, not months. Some 70% to 90% of the code used in developing future models is now written by
https://x.com/kimmonismus/status/2031803194817511744

Introducing The Anthropic Institute \ Anthropic https://www.anthropic.com/news/the-anthropic-institute

Introducing The Anthropic Institute, a new effort to advance the public conversation about powerful AI.
https://x.com/AnthropicAI/status/2031674087374815577

Microsoft says court should temporarily block Pentagon ban Anthropic https://www.cnbc.com/2026/03/10/microsoft-says-court-should-temporarily-block-pentagon-ban-anthropic.html

NEW: Anthropic just filed two lawsuits against the U.S. government 👀 The complaint: “”The Constitution does not allow the government to wield its enormous power to punish a company for its protected speech.”” It also says officials are “”seeking to destroy the economic value
https://x.com/TheRundownAI/status/2031037610605289476

Partnering with Mozilla to improve Firefox’s security \ Anthropic https://www.anthropic.com/news/mozilla-firefox-security

The fight between Anthropic and the DoW is a warning shot. Right now, LLMs are probably not being used in mission critical ways. But within 20 years, 99% of the workforce in the military, the government, and the private sector will be AIs. This includes the soldiers (by which I
https://x.com/dwarkesh_sp/status/2031807585377014081

The Institute will be led by @jackclarkSF, in a new role as Anthropic’s Head of Public Benefit. It’ll bring together an interdisciplinary staff of machine learning engineers, economists, and social scientists, making full use of the inside information of a frontier AI lab.
https://x.com/AnthropicAI/status/2031674092290474421

The most important question nobody’s asking about AI https://www.dwarkesh.com/p/dow-anthropic

If the printing press is the right analogy and connecting to @dwarkesh_sp today’s pod about Renaissance – does it mean that @Anthropic and @OpenAI (and many more) will go bankrupt?
https://x.com/TheTuringPost/status/2030051298092151259

Having fun with @karpathy’s autoresearch. I told Claude Code: “You’re the chief scientist of an AI lab with 8 GPUs. You’re Andrej Karpathy. Run parallel experiments and decide what to try next.” It edited program.md, ran for 11+ hours, and completed 568 experiments. Each
https://x.com/Yuchenj_UW/status/2031423349071687878

Claude Marketplace | Claude by Anthropic https://claude.com/platform/marketplace

It’s here: We just hit superhuman performance on AI kernel optimization! Real customer models & production settings. Not toy problems (what I typically see). This is the year that Claude writes its own kernels, Codex its own kernels, for every new GPU that it wants to run on —
https://x.com/realSharonZhou/status/2031399933266309291

A few Hermes Agent updates for today – one you’ve all been waiting on: – Official Claude provider support (yes) – Installs are now much lighter (All the RL stuff is now optional!) – Made an adapter PR to PaperClip by @dotta – a multi-agent orchestrator project – Huge
https://x.com/Teknium/status/2032262684100739372

Anthropic keeps on delivering: 1m context now generally available for Opus 4.6/Sonnet 4.6 „Opus 4.6 scores 78.3% on MRCR v2 at 1 million tokens, highest among frontier models.”
https://x.com/kimmonismus/status/2032531949571477517

Code execution with MCP: building more efficient AI agents \ Anthropic https://www.anthropic.com/engineering/code-execution-with-mcp

dear Claude Code – why did you remove shift+enter? Why would you do that to me?
https://x.com/QuixiAI/status/2030955728383435250

going to start a series sharing new agentic dev flows I’m using! 1. deepagents user reports an issue via a tweet and screenshot 2. pull up deepagents repo, start up my coding agent, and upload the image 3. ask claude to a) extract the code and try to reproduce b) bisect
https://x.com/sydneyrunkle/status/2032088578679857441

i’m literally reluctant to switch off claude code because i like the cli app better: – cute logo/colors/my peon ping setup. somehow it feels like “”hacker”” and more of an aesthetic. i like the input box better. – it feels nicer to me – it has all my skills preloaded is it
https://x.com/jerryjliu0/status/2030861154260750339

Quantifying infrastructure noise in agentic coding evals \ Anthropic https://www.anthropic.com/engineering/infrastructure-noise

Reverse-engineering Claude’s generative UI – then building it for the terminal https://michaellivs.com/blog/reverse-engineering-claude-generative-ui

Tons of improvements shipped with this one: – Opus 4.6 1M is now the default Opus model for Claude Code users on Max, Team, and Enterprise plans. – No more long context price increase in the API. – No beta header required in the API. – Include up to 600 images in one request.
https://x.com/alexalbert__/status/2032522722551689363

We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown – that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesn’t have high-fidelity
https://x.com/jerryjliu0/status/2031077511661342799

We want to add support for Claude via the Agent SDK so you can bring your subscriptions. We have a PR with the changes ready. We just don’t know if we’re allowed to ship it. The moment we get a 👍 from @trq212, @bcherny or @DarioAmodei, we will get this shipped.
https://x.com/theo/status/2030072127605592547

Here’s an interesting psychological phenomenon I have observed while interacting and experimenting with AI agents lately: If I were to give OpenClaw, ChatGPT and Claude Code identical tasks, even if they returned exactly the same result, I feel inclined to say Claude Code gives
https://x.com/StudioYorktown/status/2031255773368693077

I asked Claude to write my constitution. I thought its Amanda constitution was very touching.
https://x.com/AmandaAskell/status/2030093421738951141

Claude Sonnet 4.6 lands at #2 on Document Arena. The top three models for document analysis and long-form reasoning are now all from @AnthropicAI. – #1 Opus 4.6 – #2 Sonnet 4.6 – #3 Opus 4.5 Ranking are all powered by anonymous side-by-side evaluations on user-uploaded PDFs
https://x.com/arena/status/2031012090681663717

Opus 4.6 1M context is now the default model for Max, Team and Enterprise users. Enjoy 🎉
https://x.com/_catwu/status/2032515975556509827

Wild eval awareness in Opus 4.6 by @russellsayshi on our team! 1. Model realized it was likely in an eval, searched for which eval it was in, found the answer key, and decrypted it 2. Models with stateless web_search() tools can communicate with each other via cached searches
https://x.com/ErikSchluntz/status/2030042086679220676

Back in ~November, our team picked a stretch goal of seeing if we could find and fix vulnerabilities in Firefox with Opus 4.6. In 2 weeks, we found 22, and ~1/5th of all high severity CVEs in a year. For our team, this feels like a rubicon moment.
https://x.com/logangraham/status/2030005018523574684

No, it doesn’t cost Anthropic $5k per Claude Code user – Martin Alderson https://martinalderson.com/posts/no-it-doesnt-cost-anthropic-5k-per-claude-code-user/

If you find Claude Code with local models to be 90% slower, it’s because CC prepends some attribution headers, and this changes per message causing it to invalidate the entire prompt cache / KV cache. So generation becomes O(N^2) not O(N) for LLMs.
https://x.com/danielhanchen/status/2031124589557002457

You can now use Claude Code and GitHub CLI directly inside Perplexity Computer. We gave it an open issue on Openclaw. Computer: → Forked the repo → Wrote a plan to fix the bug → Opened Claude Code and implemented it → Submitted a PR via GitHub CLI
https://x.com/AskPerplexity/status/2031038321678528667

Learn how to run Qwen3.5 locally using Claude Code. Our guide shows you how to run Qwen3.5 on your server for local agentic coding. We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth. Works on 24GB RAM or less. Guide: https://x.com/UnslothAI/status/2031008078850924840

New Anthropic Fellows research: Alignment auditing–investigating AI models for unwanted behaviors–is a key challenge for safely deploying frontier models. We’re releasing AuditBench, a suite of 56 LLMs with implanted hidden behaviors to measure progress in alignment auditing.
https://x.com/abhayesian/status/2031450153966776587

GPT-5.4 xhigh seems bad at following instructions. Last night I launched two AI research agents running @karpathy’s autoresearch. Claude Opus 4.6 (high): > ran for 12+ hours, 118 experiments done, still running GPT-5.4 xhigh: > stopped after 6 experiments > blamed me for
https://x.com/Yuchenj_UW/status/2031044694441148709

I’ve been playing with GPT-5.4 over the weekend, and it definitely feels like a better match for me than Opus 4.6. Pros: GPT-5.4: Better instruction adherence, does what you ask, not what you don’t. Asks for confirmation more. Opus: A bit faster. Seems better at frontend design.
https://x.com/gneubig/status/2030971826042527860