Ethics/Legal/Security: AI News Week Ending 04/03/2026

Ethics/Legal/Security: AI News Week Ending 04/03/2026

April 3, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted perfume bottle with amber-gold liquid, crystal stopper, pure white background, soft shadow, and glass refractions. Replace the label text with ‘Ethics’ in the same black serif font style. Add a delicate sterling silver chain draped around the bottle neck with a small dainty scales-of-justice pendant hanging from it, rendered in high-fashion jewelry aesthetic with precise balance beam and tiny pans.

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
https://x.com/AnthropicAI/status/2039749628737019925

> Anthropic leaked Claude Code source code > someone forked it > 32.6k stars, 44.3k forks > got scared of getting sued > convert the whole codebase from TypeScript to Python with Codex AI is quietly erasing copyright.
https://x.com/Yuchenj_UW/status/2038996920845430815

🧵 Claude Code source leak — After reading 500K+ lines of code, one takeaway stands out: This isn’t just good engineering. It’s research-grade thinking shipped as a product Deep insights from Zhihu contributor Yufeng He 👇 🧠 Core design • A single while(true) loop = the
https://x.com/ZhihuFrontier/status/2039229986339688581

🚨 Anthropic’s Claude Code Source Leak — What It Actually Exposes A careless build mistake just laid bare one of the most advanced AI coding tools — and the lessons are huge. Insights from Zhihu contributor deephub 👇 🏢 About Anthropic Anthropic is a leading AI safety-focused
https://x.com/ZhihuFrontier/status/2039289110075203854

0xMarioNawfal on X: “The leaked Claude Code source has 44 hidden feature flags and 20+ unshipped features. – Background agents running 24/7 – One Claude orchestrating multiple worker Claudes – Cron scheduling for agents – Full voice command mode – Actual browser control via Playwright – Agents that https://t.co/IkU0WzP0VO” / X
https://x.com/RoundtableSpace/status/2038960753458438156?s=20

Anthropic’s new model, Capybara: “Compared to Claude Opus 4.6, Capybara achieves dramatically higher scores in software coding, academic reasoning, and cybersecurity.” According to Dario’s previous interview, it might be a 10T-parameter model that cost $10 billion to train.
https://x.com/Yuchenj_UW/status/2037387996694200509

Beyond raw model capability, the real gap in coding tools is the harness. Now that 500k+ lines of Claude Code are out there, every model lab and AI coding startup, including open-source AI labs, will study it and close that gap fast. SF already has Claude Code source
https://x.com/Yuchenj_UW/status/2039029676040220682

Claude Code leaked their source map, effectively giving you a look into the codebase. I immediately went for the one thing that mattered: spinner verbs There are 187
https://x.com/wesbos/status/2038958747200962952?s=20

Claude code source code has been leaked via a map file in their npm registry! Code:
https://x.com/Fried_rice/status/2038894956459290963?s=20

Claude Code’s source code appears to have leaked: here’s what we know | VentureBeat
https://venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know

Claude Code’s source code has been leaked via a map file in their NPM registry | Hacker News
https://news.ycombinator.com/item?id=47584540

dharmi on X: “incredible to learn more about how the best coding agent works under the hood eg: here is how the plan mode in claude code works https://t.co/qd16GCVjau” / X
https://x.com/DharmiKumbhani/status/2038917827462308308?s=20

DMCAs for Claude code source code are going out.
https://x.com/BlancheMinerva/status/2039114452088295821

ellen livia ᯅ 🇺🇸🇮🇩🔜 ICIAI Tokyo on X: “here’s how Claude Code actually handles memory : all 8 phases 🧵 Our team at @mem0ai use @claudeai a lot, we deeply care about memory. here is a summary of how it works 👇 User Input -> Context Assembly -> History System -> API / Query -> Response -> Summary Phase 1: session https://t.co/hcZbJzbUxB” / X
https://x.com/ellen_in_sf/status/2039098050837463504

fakeguru on X: “I reverse-engineered Claude Code’s leaked source against billions of tokens of my own agent logs. Turns out Anthropic is aware of CC hallucination/laziness, and the fixes are gated to employees only. Here’s the report and CLAUDE.md you need to bypass employee verification:👇 https://t.co/h8KQESUz1i” / X
https://x.com/iamfakeguru/status/2038965567269249484?s=20

himanshu on X: “Based on everything explored in the source code, here’s the full technical recipe behind Claude Code’s memory architecture: [shared by claude code] Claude Code’s memory system is actually insanely well-designed. It isn’t like “store everything” but constrained, structured and https://t.co/PlGRvuvkts” / X
https://x.com/himanshustwts/status/2038924027411222533?s=20

https://pbs.twimg.com/media/HEuwvh_bgAE1xnL?format=jpg&name=large

Justin Schroeder on X: “Important takeaways from Claude’s source code: 1. Much of Claude Code’s system prompting is in the source code. This is actually surprising. (get full post)
https://x.com/jpschroeder/status/2038960058499768427

Leon Lin on X: “IT WORKED. opensource full claude code soon. https://t.co/6TJ2IBgRzq” / X
https://x.com/LexnLin/status/2038991257582604618?s=20

mal on X: “i read through the claude code source code so u dont have to. ” / X
https://x.com/mal_shaik/status/2038918662489510273

most interesting features in the Anthropic CC repo: – Kairos: always-on autonomous agent mode – dream: nightly memory consolidation – teammem: shared project memory – buddy: tamagotchi-like pet system with models
https://x.com/scaling01/status/2038982287648293016

My takeaways from scanning the Claude Code code for ~45 min this evening: 1️⃣Harness engineering is hard. There’s a lot of hard won knowledge in here and plenty of diagnostics to keep the feedback flowing. 2️⃣Harnesses and prompts smooth out model quirks. @SrihariSriraman and I
https://x.com/dbreunig/status/2039206774558036466

OFFICIAL STATEMENT from Anthropic regarding the leak
https://x.com/theo/status/2039074833334689987

Ole Lehmann on X: “i can’t believe more people aren’t talking about this part of the claude code leak there’s a hidden feature in the source code called KAIROS, and it basically shows you anthropic’s endgame KAIROS is an always-on, *proactive* Claude that does things without you asking it to.
https://x.com/itsolelehmann/status/2039018963611627545?s=20

rahat on X: “Claude Code has a regex that detects “wtf”, “ffs”, “piece of shit”, “fuck you”, “this sucks” etc. It doesn’t change behavior…it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will https://t.co/dJTfwxYMCV” / X
https://x.com/Rahatcodes/status/2038995503141065145?s=20

Sebastian Raschka on X: “Claude Code’s Real Secret Sauce (Probably) Isn’t the Model” / X
https://x.com/rasbt/status/2038980345316413862?s=20

The leaked Claude Code hit 110k+ GitHub stars in a day. Made OpenClaw look slow. #1 open-source project in Anthropic history.
https://x.com/Yuchenj_UW/status/2039415430994100440

What surprises me is that @DarioAmodei – the CEO – has said nothing. Boris seems to be an amazing leader and it’s great to hear these words from him. But…
https://x.com/TheTuringPost/status/2039390822093779258

A few take-aways from the Claude Code Leak: – Anthropic is actively using Capybara (Mythos) for development – they are already at Capybara v8 – Capybara still has issues with over-commenting and false-claims – Capybara has 1M context and fast mode – Numbat is another interesting
https://x.com/scaling01/status/2038948989257630166?s=20

Another Claude 5 update: Anthropic’s upcoming Model “”Mythos”” will have its own Tier *above* Opus, called “”Capybara”” This means that in addition to Haikiu, Sonnet, and Opus, there will also be “”Capybara,”” which is even more compute-intensive but also delivers significantly better
https://x.com/kimmonismus/status/2037463638261305752

Anthropic’s new model Capybara/Mythos just wants to be human
https://x.com/scaling01/status/2039091546377576864

Claude Mythos Blog Post Saved before it was taken down.
https://x.com/M1Astra/status/2037377109472018444

Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power revealed in data leak | Fortune
https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/

Local Claude Code builds have been achieved internally
https://x.com/theo/status/2039079267905261831

– Drafted a blog post
– Used an LLM to meticulously improve the argument over 4 hours.
– Wow, feeling great, it’s so convincing!
– Fun idea let’s ask it to argue the opposite.
– LLM demolishes the entire argument and convinces me that the opposite is in fact true.
– lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
https://x.com/karpathy/status/2037921699824607591

These functional emotions have real consequences. To build AI systems we can trust, we may need to think carefully about the psychology of the characters they enact, and ensure they remain stable in difficult situations. Read the full paper:
https://x.com/AnthropicAI/status/2039749660349239532

Anthropic on X: “We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human.
https://x.com/AnthropicAI/status/2039749632238944336

Emotion concepts and their function in a large language model \ Anthropic
https://www.anthropic.com/research/emotion-concepts-function

Anthropic Wins Injunction in Court Battle With Trump Administration – WSJ
https://www.wsj.com/politics/policy/anthropic-wins-injunction-in-court-battle-with-trump-administration-4cc93351

We’ve signed an MOU with the Australian Government to collaborate on AI safety research and support Australia’s National AI Plan. Read more:
https://x.com/AnthropicAI/status/2039137425214353555

3/30/26 – The Age Of Artificial Intelligence: Americans’ AI Use Increases While Views On It Sour, Quinnipiac University Poll On AI Finds; 7 In 10 Think AI Will Cut Jobs With Gen Z The Most Pessimistic | Quinnipiac University Poll
https://poll.qu.edu/poll-release?releaseid=3955

I would expect that a lot of things that were old hat to experts, but completely inaccessible to most people, will go viral in the coming months. Sure, anyone could have done those things before, but it required a lot of deep knowledge. Now the AI can make it happen by asking.
https://x.com/emollick/status/2038494612583543254

NEW paper from Google DeepMind The biggest threat to AI agents isn’t a smarter attacker. It’s the web itself. This work introduces the first systematic framework for understanding how the open web can be weaponized against autonomous agents. The paper defines “”AI Agent Traps””:
https://x.com/omarsar0/status/2039383554510217707

Today, we’re launching the Secure Intelligence Institute. SII partners with top cryptography, security, and ML teams to advance security research and industry collaboration. It is led by Dr. Ninghui Li at Purdue.
https://x.com/perplexity_ai/status/2039029140758864314

AI models will secretly scheme to protect other AI models from being shut down, researchers find – Yahoo News Canada
https://ca.news.yahoo.com/ai-models-secretly-scheme-protect-162555909.html?guccounter=1

⚠️ Supply chain attack in progress: someone is squatting Anthropic-internal npm package names targeting people trying to compile the leaked Claude Code source. `color-diff-napi` and `modifiers-napi` — both registered today, same person, disposable email. Do NOT install them. 🧵
https://x.com/Butanium_/status/2039079715823128964

I think this is a terrible move by @AnthropicAI. The open source community is going to build custom harness now anyways, you might as well have some control. Obviously they didn’t want this to happen, but now that it has I don’t see what they’re going to accomplish
https://x.com/BlancheMinerva/status/2039128635559318013

is it just me or is Claude down?
https://x.com/iScienceLuvr/status/2037487244634972471

The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.
https://x.com/emollick/status/2039142905156153428

This is an actual violation of the DMCA. Anthropic just broke the law.
https://x.com/theo/status/2039412173689196674

I know these are all unreliable leaks of internal code names but please, please AI labs, the only thing worse than calling your models GPT-5.5-xhigh-Codex-nano is giving them names like Agent Smith or Mythos, for obvious reasons.
https://x.com/emollick/status/2037565418970185786

Access control is one of the top priorities across every enterprise organization to secure AI agents. We’re excited to collaborate with @auth0 on this blog post. We’re building the infrastructure enabling agents to automate document heavy work (invoices, contracts, claims,
https://x.com/jerryjliu0/status/2039841363202818505

Autonomous AI is already in production in 50%+ of orgs, but governance is falling behind, and agent sprawl is becoming the next enterprise risk. Here’s a good webinar that can help mitigate it: “”AgentOps 2026: How to Securely Manage AI Agents”” →
https://x.com/TheTuringPost/status/2037877632520634654

The first paper from the Secure Intelligence Institute responds to NIST’s request for information on securing autonomous agents. Read the paper on arXiv:
https://x.com/perplexity_ai/status/2039029152880480260

We release a new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a new human expert study with 10 professional security practitioners. Offensive cyber capability has been doubling every 9.8 months since 2019. Accelerating to every 5.7
https://x.com/LyptusResearch/status/2039861448927739925

All AI policy is haunted by a failure of imagination. It is either nothing happens or apotheosis, people can’t seem to conceive of any other futures. It is amazing to predict massive AI development and expect nothing major to change in employment or productivity.
https://x.com/emollick/status/2039008808823865676

Sycophantic AI decreases prosocial intentions and promotes dependence | Science
https://www.science.org/doi/10.1126/science.aec8352

The vast majority of social science is based on the assumption that the future looks like the past. It is usually a good bet! Not sure it will apply to the impacts of AI, though.
https://x.com/emollick/status/2039351338065125530

A Mirror Test For LLMs — LessWrong
https://www.lesswrong.com/posts/TfKM9PgztxieEcKiv/a-mirror-test-for-llms

Can we ever trust AI to watch over itself? – by Celia Ford
https://www.transformernews.ai/p/ai-alignment-researchers-want-to-superintelligence

Maybe we shouldn’t have given away the fact that you can crash Opus by asking about California’s High Speed Rail delays in Armenian. Would’ve been useful to have that in our back pocket, just in case,
https://x.com/emollick/status/2039372868409270556

When we artificially dialed up the “desperate” vector, rates of cheating jumped way up. When we dialed up the “calm” vector instead, cheating dropped back down. That means the emotion vector is actually driving the cheating behavior.
https://x.com/AnthropicAI/status/2039749652413550691

This work from @voooooogel was pretty ground-breaking:
https://x.com/jeremyphoward/status/2039880485036544422

Scoop: Altman told staff he tried to “”save”” Anthropic in Pentagon clash
https://www.axios.com/2026/03/26/sam-altman-openai-anthropic-pentagon

prediction market to bet on how many ships are let through the strait of hormuz. who’s building this?
https://x.com/bilawalsidhu/status/2038715535664324920

Stop Telling Kids AI Will Steal Their Future
https://x.com/TheTuringPost/status/2038756899110179127

The world of work is changing. Let’s change it for the better. Ryan Roslansky nails it in his new book: “”The most important truth about this moment is that the outcome isn’t written yet.””
https://x.com/mustafasuleyman/status/2039035184176414800

The easiest way to make money fast from a superhuman artificial intelligence would be in the financial markets, almost by definition. So the first lab to develop one, if AGI is possible, would almost certainly keep it quiet for as long as they could. Beats charging for API access
https://x.com/emollick/status/2038423081606148538

Today we’re introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people
https://x.com/AIatMeta/status/2037153756346016207

Nathan Lambert’s ATOM Project Seeks American Open Source AI Models – The New Stack
https://thenewstack.io/nathan-lamberts-atom-project-seeks-american-open-source-ai-models/

Your robot doesn’t need a policy anymore. It can just write its own. Coding Agents for Robotics: Instead of training fixed models, robots become agents that: • call perception and control APIs • write code to solve tasks • execute, observe, and improve in loops This is a
https://x.com/IlirAliu_/status/2039409590748532938

U.S. Senators Tom Cotton [R] and Chuck Schumer [D] plan to introduce the American Security Robotics Act to ban the federal government from buying or operating Chinese-made humanoid robots. The bill prohibits federal funding for these systems and requires phasing out existing
https://x.com/TheHumanoidHub/status/2038088330378879443

Crazy details of the social engineering attack that went into the axios compromise. We clearly need good ways to defend against these sophisticated/targeted attacks: – better credential management – better identity verification – better malware detection
https://x.com/gneubig/status/2040072807552327998

Mercor says it was hit by cyberattack tied to compromise of open source LiteLLM project | TechCrunch

Mercor says it was hit by cyberattack tied to compromise of open source LiteLLM project

New supply chain attack this time for npm axios, the most popular HTTP client library with 300M weekly downloads. Scanning my system I found a use imported from googleworkspace/cli from a few days ago when I was experimenting with gmail/gcal cli. The installed version (luckily)
https://x.com/karpathy/status/2038849654423798197

Rundown of the very bad week in security: – TeamPCP (sophisticated hacking group) attacks: Hackers broke into the system that builds a oss popular security scanning tool called Trivy. This was a supply chain attack (when bad code is slipped into widely used software tools or
https://x.com/saranormous/status/2039172685666918672

The idea that “AI safety” could be based on secrecy and control has been fatally falsified.
https://x.com/pmarca/status/2039042126294733295

Brutal week for security teams. These aren’t failures of negligence, but what happens when systems/processes work as designed and still can’t be explained end to end. This is an industry-wide, structural problem. We’re entering an era of software abundance, and shipping before
https://x.com/saranormous/status/2039108234460721341

This is a very good post:
https://x.com/sama/status/2038640963036626971