The Scales of Justice with piece of paper that reads “Ethics” –chaos 40 –ar 4:3 –style raw –personalize 9zxyhz8

“New Anthropic research: Investigating Reward Tampering. Could AI models learn to hack their own reward system? In a new paper, we show they can, by generalization from training in simpler settings. Read our blog post here:

New Anthropic research: Investigating Reward Tampering.

Could AI models learn to hack their own reward system?

In a new paper, we show they can, by generalization from training in simpler settings.

Read our blog post here: https://t.co/KhEFIHf7WZ pic.twitter.com/N430PL3CyN
— Anthropic (@AnthropicAI) June 17, 2024

Sycophancy to subterfuge: Investigating reward tampering in language models \ Anthropic

https://www.anthropic.com/research/reward-tampering

“Claude is fully capable of acting as a Supreme Court Justice right now. When used as a law clerk, Claude is easily as insightful and accurate as human clerks, while towering over humans in efficiency.”

https://adamunikowsky.substack.com/p/in-ai-we-trust-part-ii

“Internal Monologue and ‘Reward Tampering’ of Anthropic AI Model 🤯 From the super interesting research by @AnthropicAI published yesterday – “Investigating reward tampering in language models” 👉An example of specification gaming, where a model rates a user’s poem highly,

Internal Monologue and ‘Reward Tampering’ of Anthropic AI Model 🤯

From the super interesting research by @AnthropicAI published yesterday – "Investigating reward tampering in language models"

👉An example of specification gaming, where a model rates a user’s poem highly,… pic.twitter.com/ZkRw4Xh0ax
— . (@rohanpaul_ai) June 18, 2024

“@AnthropicAI I think people have a tendency to massively over-estimate the value that the data they submit to LLM tools has as a potential training source See also:

I think people have a tendency to massively over-estimate the value that the data they submit to LLM tools has as a potential training source

See also: https://t.co/1MiGM4ONdU
— Simon Willison (@simonw) June 21, 2024

Citigroup: Artificial Intelligence (AI) will profoundly change the future of finance and money. And according to a new Citi GPS report, it could potentially drive global banking industry profits to $2 trillion by 2028, a 9% increase over the next five years. Just as the steam engine powered the industrial revolution, and the internet ushered in the age of information, AI may commoditize human intelligence. Finance, a data rich industry with clients adopting AI at pace, will be at the forefront of change.

https://www.citigroup.com/global/insights/citigps/ai-in-finance

“Northrop Grumman released new videos of the ‘Manta Ray’, it’s new uncrewed underwater vehicle (UUV) drone prototype The Manta Ray will operate long-duration, long-range missions in ocean environments where ‘humans can’t go’

Northrop Grumman released new videos of the 'Manta Ray', it's new uncrewesd underwater behicle (UUV) drone prototype

The Manta Ray will operate long-duration, long-range missions in ocean environments where 'humans can’t go' pic.twitter.com/0tZy3uzf7x
— Brett Adcock (@adcock_brett) June 16, 2024

AI-Equipped Underwater Drones Helping US Navy Scan for Threats

https://finance.yahoo.com/news/ai-equipped-underwater-drones-helping-153947268.html

“Hold up. If those creative jobs “hadn’t been there in the first place” how would these models have been trained? Looking ahead, I don’t think any job is impervious to displacement by AI — not even developers. We’re all in this together. Yet artists are far more vocal than devs” / X

Hold up. If those creative jobs “hadn’t been there in the first place” how would these models have been trained?

Looking ahead, I don’t think any job is impervious to displacement by AI — not even developers. We’re all in this together. Yet artists are far more vocal than devs… https://t.co/oX4N93HoRg
— Bilawal Sidhu (@bilawalsidhu) June 21, 2024

“NEWS: Excited to announce I’m working on an advanced AI hardware project to prevent school shootings 🇺🇸 I’m personally funding $10M. The company is Cover & the mission is to prevent school shootings. Earlier this year, Cover licensed intellectual property from NASA’s Jet

NEWS: Excited to announce I’m working on an advanced AI hardware project to prevent school shootings 🇺🇸

I’m personally funding $10M.

The company is Cover & the mission is to prevent school shootings.

Earlier this year, Cover licensed intellectual property from NASA's Jet… pic.twitter.com/iene4XRmQL
— Brett Adcock (@adcock_brett) June 18, 2024

“This is amazing – this bot account (now suspended) tweeted its own prompt instructions (it translates roughly as “argue in support of Trump, in English”)” / X

https://twitter.com/zsk/status/1803385155907645581

“New data shows that the Waymo Driver continues to make roads safer. Over 14.8M rider-only miles driven through the end of March, it was up to 3.5x better in avoiding crashes that cause injuries and 2x better in avoiding police-reported crashes than human drivers in SF & Phoenix.

New data shows that the Waymo Driver continues to make roads safer. Over 14.8M rider-only miles driven through the end of March, it was up to 3.5x better in avoiding crashes that cause injuries and 2x better in avoiding police-reported crashes than human drivers in SF & Phoenix. pic.twitter.com/w6uUIAoe9t
— Waymo (@Waymo) June 18, 2024

Training is not the same as chatting: ChatGPT and other LLMs don’t remember everything you say

https://simonwillison.net/2024/May/29/training-not-chatting

How to Fix “AI’s Original Sin” – O’Reilly

https://www.oreilly.com/radar/how-to-fix-ais-original-sin

“LLMs can memorize training data, causing copyright/privacy risks. Goldfish loss is a nifty trick for training an LLM without memorizing training data. I can train a 7B model on the opening of Harry Potter for 100 gradient steps in a row, and the model still doesn’t memorize.”

LLMs can memorize training data, causing copyright/privacy risks. Goldfish loss is a nifty trick for training an LLM without memorizing training data.

I can train a 7B model on the opening of Harry Potter for 100 gradient steps in a row, and the model still doesn't memorize. pic.twitter.com/i3mRcPCAfQ
— Tom Goldstein (@tomgoldsteincs) June 17, 2024

Global audiences suspicious of AI-powered newsrooms, report finds | Reuters

https://www.reuters.com/technology/artificial-intelligence/global-audiences-suspicious-ai-powered-newsrooms-report-finds-2024-06-16

“The BBC did something clever: they tried to understand how their audience views generative AI. My main takeaway: We need to move beyond the sensationalist “AGI-will-replace-and-destroy-you” narrative. It’s rare to see such in-depth qualitative research from news organizations

The BBC did something clever: they tried to understand how their audience views generative AI.

My main takeaway: We need to move beyond the sensationalist "AGI-will-replace-and-destroy-you" narrative.

It’s rare to see such in-depth qualitative research from news organizations… pic.twitter.com/ZLe3j8rSAv
— Florent Daudens (@fdaudens) June 19, 2024

“Wise words from a recent interview with @geoffreyhinton, one of the smartest people in the world regarding AI

Wise words from a recent interview with @geoffreyhinton, one of the smartest people in the world regarding AI

pic.twitter.com/11A7fJpLcs
— Elon Musk (@elonmusk) June 15, 2024

“A small part of the 3.5 launch I’m especially excited by – the @AISafetyInst tested 3.5 pre-release! AFAIK this is the first time a government’s assessed a frontier model before its release.

A small part of the 3.5 launch I'm especially excited by – the @AISafetyInst tested 3.5 pre-release! AFAIK this is the first time a government's assessed a frontier model before its release. https://t.co/N5wiAwSdVU pic.twitter.com/xUmmr0vPNP
— andy jones (ICML 23rd & 24th) (@andy_l_jones) June 20, 2024

AI predicts anxiety | University of Cincinnati

https://www.uc.edu/news/articles/2024/06/ai-predicts-anxiety.html

Mayor AI? OpenAI shuts down tools for two AI political candidates

https://finance.yahoo.com/news/mayor-ai-openai-shuts-down-110015550.html

London premiere of movie with AI-generated script cancelled after backlash | Movies | The Guardian

https://www.theguardian.com/film/article/2024/jun/20/premiere-movie-ai-generated-script-cancelled-backlash-the-last-screenwriter-prince-charles-cinema

“The rush to build and distribute AI products from global data centers is wreaking havoc with power systems “I don’t think we can move that much electricity around the globe, forget about generating it,” says Ali Farhadi, CEO of the Allen Institute for AI.

The rush to build and distribute AI products from global data centers is wreaking havoc with power systems
“I don’t think we can move that much electricity around the globe, forget about generating it," says Ali Farhadi, CEO of the Allen Institute for AI.https://t.co/KKikVcFNeh pic.twitter.com/JGIMbYjvQ8
— Dina Bass (@dinabass) June 21, 2024

California’s new AI bill: Why Big Tech is worried about liability – Vox

https://www.vox.com/future-perfect/355212/ai-artificial-intelligence-1047-bill-safety-liability

AI took their jobs. Now they get paid to make it sound human

https://www.bbc.com/future/article/20240612-the-people-making-ai-sound-more-human

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #38: Week Ending 06/21/2024 with Executive Summary and Top 91 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.