Alignment: AI News Week Ending 02/20/2026

Alignment: AI News Week Ending 02/20/2026

February 20, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide observational shot of a Chinese worker in faded jacket standing beside a chestnut horse at a concrete industrial canal edge, both facing the same direction, half-demolished buildings and overcast sky in background, muted desaturated colors, documentary realism, flat natural light, large red text overlay reading ALIGNMENT positioned like a Chinese cinema poster title, Jia Zhangke aesthetic, patient composition, human-scale intimacy, weathered surfaces.

HumanLM https://humanlm.stanford.edu/

Pentagon threatens to cut off Anthropic in AI safeguards dispute https://www.axios.com/2026/02/15/claude-pentagon-anthropic-contract-maduro

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid – WSJ https://www.wsj.com/politics/national-security/pentagon-used-anthropics-claude-in-maduro-venezuela-raid-583aff17

Anthropic is prepared to loosen its current terms of use, but wants to ensure its tools aren’t used to spy on Americans en masse, or to develop weapons that fire with no human involvement. The Pentagon has aid, that Anthropic will “”pay a price”” for that behavior. Within this”” https://x.com/kimmonismus/status/2023419652378955809

Measuring AI agent autonomy in practice \ Anthropic https://www.anthropic.com/research/measuring-agent-autonomy

Most agent actions on our API are low risk. 73% of tool calls appear to have a human in the loop, and only 0.8% are irreversible. But at the frontier, we see agents acting on security systems, financial transactions, and production deployments (though some may be evals).”” https://x.com/AnthropicAI/status/2024210050718585017

New Anthropic research: Measuring AI agent autonomy in practice. We analyzed millions of interactions across Claude Code and our API to understand how much autonomy people grant to agents, where they’re deployed, and what risks they may pose. Read more:”” https://x.com/AnthropicAI/status/2024210035480678724

NEW: Pentagon is so furious with Anthropic for insisting on limiting use of AI for domestic surveillance + autonomous weapons they’re threatening to label the company a “supply chain risk,” forcing vendors to cut ties. With @m_ccuri and @mikeallen”” https://x.com/DavidLawler10/status/2023425130148626767

Pentagon threatens to cut off Anthropic in AI safeguards dispute https://www.axios.com/2026/02/15/claude-pentagon-anthropic-contract-maduro?amp%3Butm_medium=newsletter&amp%3Butm_campaign=ai-s-new-physics-discovery&amp%3B_bhlid=147fc2fb115d35bbc6b2211e9bcebfff031af136

Software engineering makes up ~50% of agentic tool calls on our API, but we see emerging use in other industries. As the frontier of risk and autonomy expands, post-deployment monitoring becomes essential. We encourage other model developers to extend this research.”” https://x.com/AnthropicAI/status/2024210053369385192

Something strange is happening with AI agents that this new Anthropic research quietly surfaces. The agents are asking us for help more than we’re stepping in to correct *them*. Anthropic analyzed data from Claude Code and their public API to measure how autonomous AI agents”” https://x.com/omarsar0/status/2024864635120451588

People should read the Claude Constitution. It does a pretty good job of laying out what Anthropic presumably really believes (and it is part of training). I’d think that a clear debate over things that are good or bad or missing there would be helpful.”” https://x.com/emollick/status/2023612474474303530

We’re committing $7.5M to @AISecurityInst’s Alignment Project to fund independent research on mitigations for safety and security risks from misaligned AI.”” https://x.com/OpenAINewsroom/status/2024546609485533442

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT | OpenAI https://openai.com/index/introducing-lockdown-mode-and-elevated-risk-labels-in-chatgpt/

Every few months, I write an updated, idiosyncratic guide on which AIs to use right now. My new version has the most changes ever, since AI is no longer just about chatbots. To use AI you need to understand how to think about models, apps, and harnesses.”” https://x.com/emollick/status/2023937967044046949

Have to respect the long game of whoever incorrectly listed the first 500 primes on their “”primefan”” website 20+ years ago and somehow kept the site live long enough to pollute generative AI models in 2026″” https://x.com/skominers/status/2024078964667396342

Will reward-seekers respond to distant incentives? — AI Alignment Forum https://www.alignmentforum.org/posts/8cyjgrTSxGNdghesE/will-reward-seekers-respond-to-distant-incentives

The crazy part is that the AI Labs have generally been right. Like, the stuff they hyped in 2023 turned out to be real and working today. That doesn’t mean that the stuff they are predicting for 2028 will also be real, but it is probably worth noting those predictions & watching.”” https://x.com/emollick/status/2023257496069046563

Dean W. Ball on X: “I continue to think the notion of mass unemployment from AI is overrated. There may be shocks in some fields–big ones perhaps!–but anyone who thinks AI means the imminent demise of knowledge work has just not done enough concrete thinking about the mechanics of knowledge work.” / X
https://x.com/deanwball/status/2023204167146222059

The transition from “AI can’t do novel science” to “of course AI does novel science” will be like every other similar AI transition. First the over-enthusiastic claims, then smart people use AI to help them, then AI starts to do more of the work, then minor discoveries, & then…”” https://x.com/emollick/status/2022676591596515728

Don’t read the replies”” has taken on an entirely different context as the replies are now from AIs who write meaning-shaped comments that you actually have to spend a split second thinking about. Gets around the defenses of those of us used to filtering bad & stupid comments”” https://x.com/emollick/status/2022472041514041525

Anthropic has entrusted Amanda Askell to endow its AI chatbot, Claude, with a sense of right and wrong”” https://x.com/WSJ/status/2022629696261808173

Anthropic’s Philosopher Amanda Askell Is Teaching Claude AI to Have Morals – WSJ https://www.wsj.com/tech/ai/anthropic-amanda-askell-philosopher-ai-3c031883?mod=e2tw

WSJ did a profile of me. A lot of the response has been people trying to infer my personal political views. For what it’s worth, I try to treat my personal political views as a potential source of bias and not as something it would be appropriate to try to train models to adopt.”” https://x.com/AmandaAskell/status/2022778351744581779

TLDR: Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro and solves more levels. I’m now much more confident that current and future models will be able to solve ARC-AGI-3, given that they have access to harness with simple memory. My speculative take”” https://x.com/scaling01/status/2024642420177096769

If you haven’t hit the little plus button on your favorite chatbot recently, it is a complete hodgepodge mess that nobody not on X would understand: Canvas and Web search and Learning and confusing icons. I am not even calling out any of the Big Three directly, its all of them.”” https://x.com/emollick/status/2023478306960814573

The dark side of reinforcement learning @olive_jy_song, senior researcher at @MiniMax_AI, about RL models that try to hack rewards and why alignment fails in practice This conversation is an inside look at how Chinese AI labs move fast – testing new models overnight, debugging”” https://x.com/TheTuringPost/status/2022961676799398337

lol what: Researchers found that repeating the exact same prompt twice dramatically improves LLM performance (one model improved from 21% to 97% accuracy on a name-search task) without longer outputs, slower responses, fine-tuning, or fancy prompt engineering. Because models”” https://x.com/kimmonismus/status/2024069380162936992

Looking Inside: a Maliciousness Classifier Based on the LLM’s Internals https://labs.zenity.io/p/looking-inside-a-maliciousness-classifier-based-on-the-llm-s-internals

Repeating Prompts https://daoudclarke.net/2026/02/19/repeating-prompt

Grok 4.20 is BASED. The only AI that doesn’t equivocate when asked if America is on stolen land. The others are weak sauce.”” https://x.com/elonmusk/status/2023880206721970544