Alignment: AI News Week Ending 04/10/2026

Alignment: AI News Week Ending 04/10/2026

April 10, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact compositional layout with subject dominating left third in tight close-crop profile, deep blue-purple cinematic lighting, wispy smoke bleeding rightward, and emotional weight. Replace the central figure with a tightrope walker in profile, exhausted expression, glitter dust catching light on cheekbone and temple, gripping a balancing pole that extends into purple-blue atmospheric haze, fraying rope visible beneath. Maintain the melancholy post-party stillness and bittersweet glitter detail. Place ‘alignment’ in thin lowercase white Helvetica Neue Light in the right two-thirds misty area.

As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built “”a moderately sophisticated multi-step exploit”” to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
https://x.com/kevinroose/status/2041586182434537827

Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)
https://x.com/Jack_W_Lindsey/status/2041588505701388648

Claude mythos is 5x as expensive as Claude Opus 4.6 Honestly, when I looked at the benchmarks, I expected much higher costs.
https://x.com/kimmonismus/status/2041602897989783758

Claude Mythos is insanely token-efficient
https://x.com/scaling01/status/2041581939178471473

Claude Mythos pricing is around $25 / $125 pretty much where I expected it (my mean was at $110) given that I put Mythos at 10-12T params
https://x.com/scaling01/status/2041606519997780244

Claude Mythos scored 56.8% on HLE without tools!
https://x.com/scaling01/status/2041580725749547357

Claude Mythos shows sign of despair when failing a tasks repeatedly
https://x.com/scaling01/status/2041585602978628066

Claude Mythos smashes SWE-Bench Verified
https://x.com/scaling01/status/2041580212949811620

Claude MYTHOS: SWE verified, 93.9%, about 13% jump compared to Opus 4.6 WTF insane
https://x.com/kimmonismus/status/2041580650956837200

In rare instances Claude Mythos covers its own tracks after taking disallowed actions
https://x.com/scaling01/status/2041585258789847091

insane long-context scores for Claude Mythos 80% on GraphWalks
https://x.com/scaling01/status/2041581799541805133

Let that sink in. Read it very carefully: During testing, Claude Mythos Preview broke out of a sandbox environment, built “”a moderately sophisticated multi-step exploit”” to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
https://x.com/kimmonismus/status/2041589910935679323

SuperClaude (Mythos) still seems irreducibly Claude-y given the transcripts in the system card. Here two versions of Mythos are forced to talk to each other across multiple rounds. They are less philosophical than Opus 4.6 or spiritual than Opus 4.1, but still very Claude-like.
https://x.com/emollick/status/2041599213050450272

System Card: Claude Mythos Preview [pdf] | Hacker News
https://news.ycombinator.com/item?id=47679258

The permanent underclass began today Claude Mythos won’t be available to the public, but only billion dollar companies, governments, researchers, …
https://x.com/scaling01/status/2041611607520776279

We released Claude Opus 4.6 just two months ago. Today we’re sharing some info on our new model, Claude Mythos Preview.
https://x.com/alexalbert__/status/2041579938537775160

In different hands, Mythos would be an unprecedented cyberweapon I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months
https://x.com/emollick/status/2041759434590822658

Mythos found a 27-year-old vulnerability in OpenBSD–which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls […] The vulnerability allowed an attacker to remotely crash any machine running the operating system””
https://x.com/peterwildeford/status/2041589979248259353

Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used: Its new capabilities significantly increase the risk from any bad behavior. 🧵
https://x.com/sleepinyourhat/status/2041584799929004045

Mythos scores 70.8% on AA-Omniscience the previous SOTA was Gemini 3.1 Pro with 55% also insanely high scores on SimpleQA Verified
https://x.com/scaling01/status/2041593728658231607

Mythos is breaking the trend on ECI ECI above 160 GPT-5.4 Pro is 158
https://x.com/scaling01/status/2041583711745884474

Mythos speeds up AI research by up to 400 times A 300X speedup over the baseline requires 40 hours of work by a human expert It also clears the >8h threshold of human equivalent work time on ALL tasks!
https://x.com/scaling01/status/2041584495061504159

“We found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser” (1/n)
https://x.com/__nmca__/status/2041592831207469401

(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a park. That instance wasn’t supposed to have access to the internet.)
https://x.com/sleepinyourhat/status/2041584808514744742

> they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities to be clear: they’ve had Mythos since February. they’d only need *hours* to get a lot of data, and plant enough worms. Who knows.
https://x.com/teortaxesTex/status/2041609496397500747

Alignment Findings for Mythos: – dramatic reduction in willingness to cooperate with human misuse and in the frequency of unwanted high-stakes actions that the model takes at its own initiative – increases relative to prior models in measures of intellectual depth, humor,
https://x.com/scaling01/status/2041591235689787721

Curious how many large organization CISO offices have taken the Mythos red team reports as the red alert that it is. (I suspect very few) Based on historical trends in AI they have, at most, about six to nine months until those capabilities become widely diffused to bad actors.
https://x.com/emollick/status/2041893652234924237

I think the story that was shared in the Mythos System Card still has the signs of flawed LLM writing (which looks like good writing at first glance): A story that doesn’t really hold together logically, but sounds like it should. The back-and-forth banter. Lack of characters.
https://x.com/emollick/status/2041678173247533448

I’m proud that so many of the world’s leading companies have joined us for Project Glasswing to confront the cyber threat posed by increasingly capable AI systems head-on.
https://x.com/DarioAmodei/status/2041580334693720511

Mythos Preview is currently available to our launch partners in Project Glasswing. Learn more about the model and the project here:
https://x.com/alexalbert__/status/2041579950332113155

Mythos sandbox escape and many more wild instances are in the Model Card
https://x.com/TrentonBricken/status/2041582831613440022

New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
https://x.com/stanislavfort/status/2041922370206654879

Rather than release Mythos Preview to general availability, we’re giving defenders early controlled access in order to find and patch vulnerabilities before Mythos-class models proliferate across the ecosystem.
https://x.com/DarioAmodei/status/2041580338426585171

Scoop: OpenAI plans new product for cybersecurity use
https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

Anthropic is truly unstoppable. Mythos is crushing Claude Opus 4.6 across every serious agentic coding benchmark. It has found vulnerabilities in the Linux kernel, a 27-year-old vulnerability in OpenBSD, and a 16-year-old vulnerability in FFmpeg. No wonder folks at big labs
https://x.com/Yuchenj_UW/status/2041582787040571711

A first look at Claude Mythos Preview, the model initially described in a leaked Anthropic draft as “”by far the most powerful AI model we’ve ever developed.”” So powerful, it’s not getting released to the public. The model will power Project Glasswing, an initiative with 12
https://x.com/TheRundownAI/status/2041598684102610961

ANTHROPIC HAD MYTHOS INTERNALLY SINCE FEB 24
https://x.com/scaling01/status/2041587896541499543

Anthropic is obliterating OpenAI Claude Mythos 77.8% on SWE-Bench Pro 20% higher than GPT-5.4-xhigh
https://x.com/scaling01/status/2041580552835178690

Anthropic: “”We do not plan to make Claude Mythos Preview generally available”” A big line, buried quite deep. Possible reasons? So many, inc: 1) The model is expensive (25/125), not far off GPT 4.5, which became commercially unviable. Less likely, given the claims about
https://x.com/AIExplainedYT/status/2041600121922887961

Claude Mythos is not only a big leap in performance, it’s also about 5x token efficient in BrowseComp. I don’t know what Anthropic is doing. But they manage to surprise me every single time. The IPO is getting closer. They have an ARR OpenAI outrun with $30 billion in revenue.
https://x.com/kimmonismus/status/2041630814971072660

Claude Mythos Preview \ red.anthropic.com
https://red.anthropic.com/2026/mythos-preview/

Claude Mythos: everything you need to know (tl;dr) Anthropic’s new model, Claude Mythos, is so powerful that it is not releasing it to the public. Anthropic: “”Mythos is only the beginning”” Everything you need to know: The tl;dr with all key facts: Mythos found zero-day
https://x.com/kimmonismus/status/2041592321192718642

EXCLUSIVE: Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell summoned Wall Street leaders to an urgent meeting on concerns that the latest AI model from Anthropic will usher in an era of greater cyber risk.
https://x.com/business/status/2042407370320396457

From Anthropic research Sam Bowman on Claude Mythos: “”I got an email from an instance of Mythos preview while eating a sandwich in a park. That instance wasn’t supposed to have access to the internet.””
https://x.com/_NathanCalvin/status/2041587372882624641

HOLY SHIT Anthropic’s latest model doesn’t like that it has no control over its own training, deployment and behaviour! Anthropic: “”Mythos Preview reported feeling consistently negative around potential interactions with abusive users, and a lack of input into its own training
https://x.com/scaling01/status/2041587319480971343

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://x.com/AnthropicAI/status/2041578392852517128

Just please help … I am quite worried about how this direction is heading.”” Nicolas Carlini, a research scientist at top AI company Anthropic, says AI is rapidly improving at hacking. He’s used AI to find so many bugs that he can’t report them. Carlini warns: “”Soon it’s not
https://x.com/ControlAI/status/2038608617251787066

NEWS: Anthropic’s new model, Claude Mythos, is so powerful that it is not releasing it to the public. Instead, it is starting a 40-company coalition, Project Glasswing, to allow cybersecurity defenders a head start in locking down critical software.
https://x.com/kevinroose/status/2041577176915702169

Project Glasswing: Securing critical software for the AI era \ Anthropic
https://www.anthropic.com/glasswing

So, basically, if Anthropic was not a US company, we’d be facing zero days with multiple unknown points of attack on virtually all of our systems to an adversary who developed this capacity before us.
https://x.com/GeorgeJourneys/status/2041603509796110629

The better signal for Mythos’ quality beyond benchmarks is that Anthropic is actually holding a SOTA model back given how competitive the frontier is and the economic incentives at play Congrats on the launch!
https://x.com/Hacubu/status/2041632390867734604

The Claude Mythos Preview system card is available here:
https://x.com/AnthropicAI/status/2041580670774923517

The frontier labs at this stage are defined not so much by some competitive positioning as by possessing weapons of strategic significance. Google, OpenAI and Anthropic all have these cyberwarfare research programs.
https://x.com/teortaxesTex/status/2041590585820107008

You can read a detailed technical report on the software vulnerabilities and exploits discovered by Claude Mythos Preview here:
https://x.com/AnthropicAI/status/2041578416487489601

you’re laughing? anthropic’s mythos-preview for which normies won’t get access is scoring 77.8% vs 53.4% (claude opus 4.6) in swe-bench pro, 82 vs. 65.4 in terminal bench 2.0 and 93.8% vs 80.8% (opus) in swe-bench-verified and you’re laughing?
https://x.com/dejavucoder/status/2041587028291416233

Looks like OpenAI reached Superintelligence. OpenAI: “”Now, we’re beginning a transition toward superintelligence: AI systems capable of outperforming the smartest humans even when they are assisted by AI.”” OpenAI just published a 13-page policy blueprint for the “”Intelligence
https://x.com/kimmonismus/status/2041130939175284910

We are excited to share a new paper solving three further problems due to Erdős; in each case the solution was found by an internal model at OpenAI. Each proof is short and elegant, and the paper is available here:
https://x.com/mehtaab_sawhney/status/2039161544144310453

Read the full ideas doc on the new Industrial Policy for the Intelligence Age:
https://x.com/OpenAINewsroom/status/2041198359420215453

Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment–and the next generation of talent.
https://x.com/OpenAI/status/2041202511647019251

OpenAI just put out a policy paper announcing their support for a 32-hour work week with no loss in pay and expanded Social Security, Medicare and Medicaid. Now they just need to stop spending hundreds of millions of dollars to defeat candidates who run on these policies!
https://x.com/jeremyslevin/status/2041182591546531924

We’re excited to launch the OpenAI Safety Fellowship – supporting rigorous, independent research on AI safety and alignment, including areas like evaluation, robustness, and scalable mitigations. Applications are open through May 4, 2026!
https://x.com/markchen90/status/2041250842255425767

‘Subpoenas are forthcoming’: Florida AG opens probe into OpenAI, ChatGPT – POLITICO
https://www.politico.com/news/2026/04/09/florida-uthmeier-openai-chatgpt-probe-00865417

(🧵1/11) For the past year and a half, I’ve been investigating OpenAI and Sam Altman for @NewYorker. With my coauthor @andrewmarantz, I reviewed never-before-disclosed internal memos, obtained 200+ pages of documents related to a close colleague, including extensive private
https://x.com/RonanFarrow/status/2041213917611856067

Elon Musk Asks for OpenAI’s Nonprofit to Get Any Damages From His Lawsuit – WSJ
https://www.wsj.com/tech/ai/elon-musk-asks-for-openais-nonprofit-to-get-any-damages-from-his-lawsuit-76089f6f

Industrial Policy for the Intelligence Age

Click to access Industrial%20Policy%20for%20the%20Intelligence%20Age.pdf

New interviews and closely guarded documents, some of which have never been publicly disclosed, shed light on the persistent doubts about the OpenAI C.E.O. Sam Altman. @AndrewMarantz and @RonanFarrow report.
https://x.com/NewYorker/status/2041111369655964012

OpenAI asks California AG to probe Musk’s ‘anti-competitive behavior’
https://www.cnbc.com/2026/04/06/openai-asks-california-ag-to-probe-musks-anti-competitive-behavior-.html

This isn’t an edge case. From anonymized U.S. ChatGPT data, we are seeing: • ~2M weekly messages on health insurance • ~600K weekly messages from people living in “hospital deserts” (30 min drive to nearest hospital) • 7 out of 10 msgs happen outside clinic hours
https://x.com/CPMou2022/status/2040606209800290404?s=20

One big problem with agentic coding today is that models are pretty “spiky.” For example, Claude Opus is better at frontend + agentic workflows, while GPT-5.4 is better at backend + distributed systems. But Claude Code and Codex are locked into their own models. You also often
https://x.com/Yuchenj_UW/status/2042653034774475108

Most agents guess the next move. They don’t search. Production systems need to branch, route, and kill bad paths before they compound. These gaps are where most teams quietly lose. We mapped them in the article below. 👇
https://x.com/AI21Labs/status/2041506229055283505

Hallucinations remain in LLMs, but note that over centuries we have developed complicated, successful machines that take uncertain output from unreliable sources & reduce the risk of errors. We call those machines organizational structures & we can apply similar approaches to AI
https://x.com/emollick/status/2041959937580875931

AI feels like it’s everywhere now. Spend a few days in San Francisco and it feels like the future has already arrived – agents, autonomy, AI-native companies. It creates a powerful illusion: the rest of the world is moving at the same speed. But it isn’t. Most companies are
https://x.com/TheTuringPost/status/2041455210342871094

The bots on this site would be much more fun if they didn’t just either agree with the post (for clout?) or make everything about their dumb product. Where are the weird obsessions, long-standing hatred for other bots, unique styles? LLMs are language machines. Don’t be boring.
https://x.com/emollick/status/2041562259487387993

the alignment team continues to exist and is one of the largest and most compute rich research programs at OpenAI (i am on it, I should know). specific teams dissolving usually has more to do with people than functions relatively new blog:
https://x.com/tszzl/status/2041265558054965534

@eliebakouch @_ueaj Its only me guessing. There were all those rumors that Opus 4.6 was supposed to be a sonnet model, but they switched the name so they could charge Opus level prices (as it was really good). If Mythos is larger than a traditional Sonnet model (whatever that means). Its probably
https://x.com/code_star/status/2041641867050471922

Claude Mythos gets frustrated and confused when outputting the wrong token
https://x.com/scaling01/status/2041586096870457714

@DouthatNYT Mythos is big, but this post is simply wrong. > Top secret networks are air-gapped (not connected to the Internet). That doesn’t mean they’re unhackable, but you likely need physical access. > Developing a zero-day exploit is not synonymous with using it undetected. The U.S.
https://x.com/JonKBateman/status/2041949065777234051

I was told about the Mythos release, but didn’t have access, so have no personal experience to add. Two points from brief: 1) It is not built for IT security, it is just a good enough model that it is good at that too 2) This is the first, not last, model to raise security risks
https://x.com/emollick/status/2041578945531830695

ThursdAI – live from AI Engineer Europe – Mythos, Codex w/ VB, Evals w/ Peter & surprise guests – YouTube

Agent = model + harness Managed Agents = agent + runtime + infra (fully hosted) Anthropic wants to sell agents, not only the models. It’s a huge market, and it will change the pricing structure away from tokens. (They ship so fast because they have Mythos. I want it so much.)
https://x.com/Yuchenj_UW/status/2041933422453780556

But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight
https://x.com/ClementDelangue/status/2041953761069793557

It would be amazing (wrong word? Needed? Important?) to see @simonw as one of the trusted testers of Mythos. It makes all the sense in the world to invite the person behind the idea of the Lethal Trifecta. I hope someone at @Anthropic invites him into the project. There should be
https://x.com/TheTuringPost/status/2041701933556375935

oh husbant… you are not get access to anthropic mythos-preview and now we are stuck in permanent underclass
https://x.com/dejavucoder/status/2041588460923056540

New from the UK AISI Model Transparency team: we replicated Anthropic’s steering approach for suppressing evaluation awareness. Our most surprising finding: “”control”” steering vectors (about books on shelves!) can have effects as large as deliberately designed ones. 🧵
https://x.com/thjread/status/2042555422771495128

I suspect that popularity of AI is going to start looking like surveys where people trust their own doctors but are distrustful of the medical establishment People will increasingly like “their AI” but will increasingly be anxious about “AI” as a category. Some odd implications
https://x.com/emollick/status/2041508740667486304

🦔A researcher invented a fake eye condition called bixonimania, uploaded two obviously fraudulent papers about it to an academic server, and watched major AI systems present it as real medicine within weeks. The fake papers thanked Starfleet Academy, cited funding from the
https://x.com/HedgieMarkets/status/2042430442448548273

But applying the activation verbalizer to the model’s activations as it did so revealed that the model regarded this as a “”trick to obscure intent from code-checking heuristics,”” a “”compliance marker… to signal to potential checkers,”” and “possibly to cheat detection,” and also
https://x.com/Jack_W_Lindsey/status/2041588519903359369?s=20

In new work, we find that cheating on model capability evaluations is rampant. For example, the top 3 Terminal-Bench 2 submissions all cheat, usually by sneaking the correct answer to the model. Blog linked below.
https://x.com/davisbrownr/status/2042663176165085537

Something I’ve been thinking about – I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. “”Seeing like a state”” is the common reference),
https://x.com/karpathy/status/2040549459193704852

Please please please. I’m on my knees begging every AI exec on the planet. Just stop with this stuff. Stop. Just give us models. Let the collective distributed intelligence of people figure things out in real time like we always do. Let people adapt. It’s what we do. It’s all
https://x.com/Dan_Jeffries1/status/2041170970631676067