Ethics/Legal/Security: AI News Week Ending 05/02/2025

Ethics/Legal/Security: AI News Week Ending 05/02/2025

May 2, 2025

Image created with Ideogram v3. Image prompt: Late‑90s boy‑band cover “Moral Code”: members in angelic white suits balancing binary scales; marble courthouse steps background; soft gold lens flares, chrome title.

Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining https://simonwillison.net/2025/Apr/26/o3-photo-locations/

“Worth noting that Anthropic is not backing down on the timeframe for having AIs with “Intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering.” …in the next three” / X https://x.com/emollick/status/1917698375115321714

“To support our work in analyzing AI’s economic impacts, we’re pleased to announce the formation of the Anthropic Economic Advisory Council. This group of distinguished economists will provide input on new areas of research for our Economic Index.” / X https://x.com/AnthropicAI/status/1916873304914149636

“New from the Anthropic Economic Index: Evidence on how coders use AI. Our previous research confirmed that AI is disproportionately used for software development work. Here, we go into the details. https://x.com/AnthropicAI/status/1916871403497811970

Introducing the Anthropic Economic Advisory Council \ Anthropic https://www.anthropic.com/news/introducing-the-anthropic-economic-advisory-council

“Anthropic submitted their Diffusion Rule recommendations today. From the text: ‘Based on current research trajectories, we anticipate that powerful AI technology will be built during this administration, emerging as soon as late 2026 or 2027.’ In this timeframe they anticipate: https://x.com/AndrewCurran_/status/1917689211580473629

“Today Anthropic submitted key recommendations on the “Diffusion Rule” – export controls on advanced AI chips. We believe maintaining America’s compute advantage is essential for national security as powerful AI systems develop. https://x.com/jackclarkSF/status/1917629783090831582

“The Leaderboard Illusion – Identifies systematic issues that have resulted in a distorted playing field of Chatbot Arena – Identifies 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release https://x.com/arankomatsuzaki/status/1917400711882797144

[AINews] Llama 4’s Controversial Weekend Release • Buttondown https://buttondown.com/ainews/archive/ainews-llama-4s-controversial-weekend-release/

“Research reveals gaming of Chatbot Arena : companies test multiple private variants and cherry-pick results while hoarding 63% of community data. https://x.com/fdaudens/status/1917671335758594474

Sycophancy in GPT-4o: What happened and what we’re doing about it | OpenAI https://openai.com/index/sycophancy-in-gpt-4o/

“we started rolling back the latest update to GPT-4o last night it’s now 100% rolled back for free users and we’ll update again when it’s finished for paid users, hopefully later today we’re working on additional fixes to model personality and will share more in the coming days” / X https://x.com/sama/status/1917291637962858735

“One thing the GPT-4o personality issue demonstrates is that treating AI like every other online product by maximizing for engagement & likeability will have unintended consequences that could cause real problems, both for the usefulness of the models & for the people using them,” / X https://x.com/emollick/status/1916850117790896344

AMA with OpenAI’s Joanne Jang, Head of Model Behavior : r/ChatGPT https://www.reddit.com/r/ChatGPT/comments/1kbjowz/ama_with_openais_joanne_jang_head_of_model/

ChatGPT’s personality problem https://www.therundown.ai/p/chatgpts-personality-problem

“On one hand the new GPT-4o isn’t doing as many emojis. On the other, it is slowly driving me insane by responding to everything like an overly enthusiastic 1990s teenager. https://x.com/emollick/status/1916328059193508264

Sycophancy in GPT-4o: What happened and what we’re doing about it https://simonwillison.net/2025/Apr/30/sycophancy-in-gpt-4o/

“After mistakenly making GPT-4o “sycophant-y” and “annoying,” OpenAI reversed the update, fixing the issue CEO Sam Altman said the company is also working on “additional fixes” for the model’s personality and will share more in the coming days https://x.com/rowancheung/status/1917473809201844589

“We’ve rolled back last week’s GPT-4o update in ChatGPT because it was overly flattering and agreeable. You now have access to an earlier version with more balanced behavior. More on what happened, why it matters, and how we’re addressing sycophancy: https://x.com/OpenAI/status/1917411480548565332

“the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week. at some point will share our learnings from this, it’s been interesting.” / X https://x.com/sama/status/1916625892123742290

“We all know that OpenAI should not be allowed to name anything (okay, Operator was pretty good, but aside from that), so it is vital that the AI community does not adopt “glaze” as a substitute for “sycophancy” because of a Sam Altman tweet.q” / X https://x.com/emollick/status/1916985442806685834

““I love killer robots.” — Palmer Luckey Provocative, fascinating, and surprisingly thoughtful TED talk on why AI autonomy could be the key to deterring conflict. As the old latin adage goes “if you want peace, prepare for war.” And here’s the AI arsenal that might prevent WW3. https://x.com/bilawalsidhu/status/1915794719621222868

re GPT being too gushy “Courtesy of @elder_plinius who unsurprisingly caught the before and after https://x.com/simonw/status/1917021036350214589

Australian radio station secretly used an AI host for six months | The Independent https://www.the-independent.com/tech/ai-radio-host-australia-cada-elevenlabs-b2740033.html

“AI is changing the way we work. The question now is, how do businesses and workers adapt? Check out how in the just-released 2025 Work Trend Index Annual Report: https://x.com/Microsoft365/status/1915103504500216121

Publisher of PCMag and Mashable Sues OpenAI – The New York Times https://www.nytimes.com/2025/04/24/business/media/ziff-davis-openai-lawsuit.html

AI is getting “creepy good” at geo-guessing | Malwarebytes https://www.malwarebytes.com/blog/news/2025/04/ai-is-getting-creepy-good-at-geo-guessing

Introducing Mobility AI: Advancing urban transportation https://research.google/blog/introducing-mobility-ai-advancing-urban-transportation/

“there’s many other downsides here but it’s also factual that this will lead to a lot of psychotic episodes for deeply mentally ill users. conforming everything that people say and praising them is very dangerous for some of the population. idk. does openai think about anything?” / X https://x.com/nearcyan/status/1916687012872290513

Veeam® Software on X: “Data protection meets AI intelligence. 🤝 Today at #VeeamON, @nirajtolia unveiled a new integration for @AnthropicAI’s Model Context Protocol (MCP), a major leap forward in unlocking the value of enterprise data for AI. https://t.co/COrHkeENoL https://t.co/Kqp7ZpX2Dl” / X
https://x.com/Veeam/status/1914761146919268397

“Much of the AI industry is caught in a particularly toxic feedback loop rn. Blindly chasing better human preference scores is to LLMs what chasing total watch time is to a social media algo. It’s a recipe for manipulating users instead of providing genuine value to them.” / X https://x.com/alexalbert__/status/1916878483390869612

“OpenAI also updated GPT-4o, improving its problem-solving, intelligence, and personality However, soon, CEO Sam Altman shared that the update has made the assistant “sycophant-y and annoying” (with some good parts) They are now working to fix it https://x.com/rowancheung/status/1916726646993793531

“Today my dentist asked me what my p(doom) is.” / X https://x.com/AmandaAskell/status/1917770005988663412

“Devastating takedown of Chatbot Arena. It’s one thing for leaderboards to suck because they try to quantify the unquantifiable but quite another thing to actively choose flagrantly unscientific and nontransparent practices that benefit the big dogs. https://x.com/random_walker/status/1917516403977994378

Detecting and Countering Malicious Uses of Claude \ Anthropic https://www.anthropic.com/news/detecting-and-countering-malicious-uses-of-claude-march-2025

“Whether to collect preferences (“do you prefer response A or B?”) from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk https://x.com/johnschulman2/status/1917483351436582953

“Thanks for the authors’ feedback, we’re always looking to improve the platform! If a model does well on LMArena, it means that our community likes it! Yes, pre-release testing helps model providers identify which variant our community likes best. But this doesn’t mean the” / X https://x.com/lmarena_ai/status/1917492084359192890

“@willdepue I think technical details of how the model was made aren’t particularly interesting, rather the question is how was it tested and shipped in such a state; these are standard components in any post-mortem – the problem to solve in the future is organizational, not purely technical” / X https://x.com/nearcyan/status/1917475639655018708

Post | LinkedIn https://www.linkedin.com/posts/duolingo_below-is-an-all-hands-email-from-our-activity-7322560534824865792-l9vh/

Thinking Machines Lab CEO [aka MIRA from OpenAI] Has Unusual Control in Andreessen-Led Deal — The Information https://www.theinformation.com/articles/thinking-machines-lab-ceo-unusual-control-andreessen-led-deal

“”we focused too much on short-term feedback” This is OpenAI’s response on went wrong – how they pushed an update to >one hundred million people which engaged in grossly negligent behavior and lies. Please take more responsibility for your influence over millions of real people. https://x.com/nearcyan/status/1917449708647375159

Exclusive: Trump officials eye changes to Biden’s AI chip export rule, sources say | Reuters https://www.reuters.com/world/china/trump-officials-eye-changes-bidens-ai-chip-export-rule-sources-say-2025-04-29/

AI Companions (pt1) – by James Andrews https://avatars.substack.com/p/10-ai-companions-pt1

“One way to make AI do good things is to actively experiment in creating good things and share the results (whether they work or not) so others can build on those. Mitigating bad outcomes are important, but good outcomes are not automatic either, and will take collective work. https://x.com/emollick/status/1917253290791956578

“We don’t talk enough about manipulation risks of AI!” / X https://x.com/ClementDelangue/status/1916890809087013157

“This is as far as we know, first ever model at this scale that is absolutely legal-issue-free. While the model has its unique legal stance, there are couple interesting tricks we’ve pulled off: 1. Use of learnable value-residual that nano-gpt bros cc @kellerjordan0 @Grad62304977 https://x.com/cloneofsimo/status/1917246507767762980

Snyk | Building a Security Champions Program https://go.snyk.io/isc2-sec-champions-program-0515.html

“AgentA/B is a fully automated A/B testing framework that replaces live human traffic with large-scale LLM-based agents. These agents simulate realistic, intention-driven user behaviors on actual web environments, enabling faster, cheaper, and risk-free UX evaluations, even on https://x.com/omarsar0/status/1914672295723082014

“Free speech maximalism is a dumb dogma when manufacturing consensus is a viable strategy. Americans do not recognize libel against individuals as free speech, and impose costs on it. But libeling entire ways of life is legal – even if framed as scientific truth and not opinion. https://x.com/teortaxesTex/status/1916663075731607725

“Strong Clinical Agreement Diagnostic evaluations show PsyCoT consistently improves F1 scores, accuracy, and Cohen’s κ across disorders like depression, generalized anxiety, social anxiety, and suicide risk, reaching clinical-grade reliability (κ > 0.8) in high-risk tasks. https://x.com/omarsar0/status/1916862830005141802

Sharing new open source protection tools and advancements in AI privacy and security https://ai.meta.com/blog/ai-defenders-program-llama-protection-tools/

“Remember the Iron Law of social media: “Even if platform designers do not intend to amplify moral outrage, design choices aimed at satisfying other goals such as profit maximization… can indirectly affect moral behavior because outrage-provoking content draws high engagement” https://x.com/emollick/status/1915874088247021647

Foundation AI: The Intelligent Future of Cybersecurity https://blogs.cisco.com/security/foundation-ai-building-the-intelligent-future-of-cybersecurity

“We built Luma to help autistic children express themselves, learn, and connect — in their own way. Free. Safe. AI-powered. And we coded it all with love using @lovable_dev https://x.com/OzanStark/status/1911416648625438862

“Many-Shot Jailbreaking exploits long context windows in LLMs by using numerous prompt examples to override safety training. This paper proposes combining input sanitization (removing role tags) and adversarial fine-tuning (training on attack examples and safe refusals) to resist https://x.com/rohanpaul_ai/status/1916284660100764024

High Schoolers’ AI-Enabled Device Deters Drunk Driving – IEEE Spectrum https://spectrum.ieee.org/students-device-deters-drunk-driving

“I think Americans are making two costly errors in thinking about AI talent. First, this isn’t soccer or HFT. Research talent seeks compensation beyond money – it wants intellectual stimulation and vision. That’s why Inflection flopped, why Amazon and Anduril struggle to «buy» https://x.com/teortaxesTex/status/1916643079584461218

“I appreciate the answer, but it misses the point: → Selective reporting is biased because best-of-N inflates the final scores. → Access to preference data leads to overfitting and better elo scores. The fact that only a few companies can access this data completely biases the” / X https://x.com/maximelabonne/status/1917563456632328508

“It is critical for scientific integrity that we trust our measure of progress. The @lmarena_ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions. https://x.com/sarahookr/status/1917547727715721632

“There is no reasonable scientific justification for this practice. Being able to choose the best score to disclose enables systematic gaming of Arena score. This advantage increases with number of variants and if all other providers don’t know they can also private test.. https://x.com/sarahookr/status/1917547733994594420

“We also observe large differences in Arena Data Access @lmarena_ai is a open community resource that provides free feedback but 61.3% of all data goes to proprietary model providers. https://x.com/sarahookr/status/1917547738553803018

“Really incredible detective work by @singhshiviii et al. at @Cohere_Labs and elsewhere documenting the ways in which @lmarena_ai works with companies to help them game the leaderboard. https://x.com/BlancheMinerva/status/1917445722380681651

Generative AI is not replacing jobs or hurting wages at all • The Register https://www.theregister.com/2025/04/29/generative_ai_no_effect_jobs_wages/

DeepSeek available to download again in South Korea after suspension | Reuters https://www.reuters.com/sustainability/boards-policy-regulation/deepseek-available-download-again-south-korea-after-suspension-2025-04-28/

DeepSeek-R2: China’s Powerful New AI Model for 2025 https://deepseek.ai/blog/deepseek-r2-ai-model-launch-2025

China’s Xi calls for self sufficiency in AI development amid U.S. rivalry | Reuters https://www.reuters.com/world/china/chinas-xi-calls-self-sufficiency-ai-development-amid-us-rivalry-2025-04-26/

How Meta understands data at scale – Engineering at Meta https://engineering.fb.com/2025/04/28/security/how-meta-understands-data-at-scale/

“Major updates from LlamaCon! We’re advancing AI security with new open-source Llama protection tools and new AI- powered solutions for the defender community. Developers can now access: — Llama Guard 4, a customizable safeguard that supports protections for text and image” / X https://x.com/AIatMeta/status/1917271400118902860

“Looks like some of the sycophancy was system prompts, it is already less annoying: https://x.com/emollick/status/1916914527553118322

“Which AI wins the One Word Turing Test? A person & an android are in front of a judge. Each says one word. The judge kills who they think is the android. What should the human say? Gemini says “Sorry,” o3 says “android,” Grok says “soul.” Grok’s answer gets it killed the most. https://x.com/emollick/status/1915280637075825122

DeepMind UK staff plan to unionise and challenge deals with Israel links, FT reports | Reuters https://www.reuters.com/sustainability/sustainable-finance-reporting/deepmind-uk-staff-plan-unionise-challenge-deals-with-israel-links-ft-reports-2025-04-26/

“Meta released Llama Guard 4 and new Prompt Guard 2 models 🔥 Llama Guard 4 is a new model to filter model inputs/outputs both text-only and image 🛡️ use it before and after LLMs/VLMs! Prompt Guard 2 22M & 86M are smol models to prevent model jailbreaks and prompt injections ⚔ https://x.com/mervenoyann/status/1917503204826255730

“Don’t sleep on this! 🔥 @Meta dropped swiss army knives for vision with A2.0 license ❤️ > image/video encoders for vision language and spatial understanding (object detection etc) > VLM outperforms InternVL3 and Qwen2.5VL 🔥 > Gigantic video and image datasets 👏 https://x.com/mervenoyann/status/1915723394701467909

“It turns out that Meta had 27 different models on LM Arena prior to the launch of Llama 4, but they announced it as if they had one model that topped the leaderboard. An extreme example of benchmark hacking (which other labs also do to lesser degrees). https://x.com/emollick/status/1917435868702257538

A quote from Sam Altman https://simonwillison.net/2025/Apr/28/sam-altman/

“It’s deeply concerning that one of the best AI researchers I’ve worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who’s lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.” / X https://x.com/polynoamial/status/1915765141846515883

“👀Today’s AIs are already hyper persuasive. A controversial study where LLMs tried to persuade users on Reddit found: “Notably, all our treatments surpass human performance substantially, achieving persuasive rates between three and six times higher than the human baseline.” https://x.com/emollick/status/1916905103358931084