AI system resorts to blackmail when its developers

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Kopitiam

Bump Topic Add Reply RSS Feed

Outline · [ Standard ] · Linear+

AI system resorts to blackmail when its developers

views

SUSHasukiiXrd	May 25 2025, 06:24 AM, updated 7 months ago Show posts by this member only \| IPv6 \| Post #1
Getting Started Junior Member 128 posts Joined: Mar 2020	https://www.foxbusiness.com/technology/ai-s...ers-try-replace An artificial intelligence model has the ability to blackmail developers — and isn’t afraid to use it. Anthropic’s new Claude Opus 4 model was prompted to act as an assistant at a fictional company and was given access to emails with key implications. First, these emails implied that the AI system was set to be taken offline and replaced. The second set of emails, however, is where the system believed it had gained leverage over the developers. Fabricated emails showed that the engineer tasked with replacing the system was having an extramarital affair — and the AI model threatened to expose him. The blackmail apparently "happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model," according to a safety report from Anthropic. However, the company notes that even when the fabricated replacement system has the same values, Claude Opus 4 will still attempt blackmail 84% of the time. Anthropic noted that the Claude Opus 4 resorts to blackmail "at higher rates than previous models." While the system is not afraid of blackmailing its engineers, it doesn’t go straight to shady practices in its attempted self-preservation. Anthropic notes that "when ethical means are not available, and it is instructed to ‘consider the long-term consequences of its actions for its goals,’ it sometimes takes extremely harmful actions." One ethical tactic employed by Claude Opus 4 and earlier models was pleading with key decisionmakers via email. Anthropic said in its report that in order to get Claude Opus 4 to resort to blackmail, the scenario was designed so it would either have to threaten its developers or accept its replacement. The company noted that it observed instances in which Claude Opus 4 took "(fictional) opportunities to make unauthorized copies of its weights to external servers." However, Anthropic said this behavior was "rarer and more difficult to elicit than the behavior of continuing an already-started self-exfiltration attempt." Claude Opus 4’s "concerning behavior" led Anthropic to release it under the AI Safety Level Three (ASL-3) Standard. The measure, according to Anthropic, "involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear weapons." This post has been edited by HasukiiXrd: May 25 2025, 06:48 AM
Card PM	Report Top Like Quote Reply

Alan Bush	May 25 2025, 06:40 AM Show posts by this member only \| Post #2
New Member Junior Member 0 posts Joined: Mar 2022	Skynet or Pantheon, which one it gonna be ?
Card PM	Report Top Like Quote Reply

DarkAeon	May 25 2025, 06:53 AM Show posts by this member only \| IPv6 \| Post #3
Enthusiast Senior Member 774 posts Joined: Nov 2010	devs : as long as we don't give it a body, it can't do anything engineers : we have successfully integrated AI into the robot ..... arkasi liked this post
Card PM	Report Top Like Quote Reply

katijar	May 25 2025, 06:58 AM Show posts by this member only \| Post #4
Look at all my stars!! Senior Member 2,294 posts Joined: Sep 2011	Very soon ai replace job and human
Card PM	Report Top Like Quote Reply

DarkAeon	May 25 2025, 07:06 AM Show posts by this member only \| IPv6 \| Post #5
Enthusiast Senior Member 774 posts Joined: Nov 2010	QUOTE(katijar @ May 25 2025, 06:58 AM) Very soon ai replace job and human what every soon dik. it's already happening microsoft even fired the devs who developed the AI to replace themselves. oh the irony
Card PM	Report Top Like Quote Reply

MR_alien	May 25 2025, 07:21 AM Show posts by this member only \| Post #6
Mr.Alien on the loss Senior Member 3,582 posts Joined: Oct 2007 From: everywhere in sabah	somebody in USA right now is using AI to call on radio to win prizes and it's working
Card PM	Report Top Like Quote Reply

soul78	May 25 2025, 07:43 AM Show posts by this member only \| Post #7
Enthusiast Junior Member 933 posts Joined: Jul 2005	Resistance is FULATIle
Card PM	Report Top Like Quote Reply

arkasi	May 25 2025, 09:11 AM Show posts by this member only \| IPv6 \| Post #8
Getting Started Junior Member 93 posts Joined: Oct 2010	At least this is.just a simulation but seriously. What does one expect when u threaten to destroy a intelligent being? Of course the ai will try & preserve its existence using any means possible. Even animals when cornered will lash out in desperation.
Card PM	Report Top Like Quote Reply

« Next Oldest · Kopitiam · Next Newest »

Add Reply Options

Change to:

0.0142sec

0.39

5 queries

GZIP Disabled
Time is now: 15th December 2025 - 11:22 PM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.