Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

At last, a conservative news aggregator that does not bow to the woke right.

(The Epoch Times)—Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

“We do not find this to be an immediate threat, though, since we believe that our security is sufficient to prevent model self-exfiltration attempts by models of Claude Opus 4’s capability level, and because our propensity results show that models generally avoid starting these attempts,” the researchers said.

The blackmail incident—along with the other findings—was part of Anthropic’s broader effort to test how Claude Opus 4 handles morally ambiguous high-stakes scenarios. The goal, researchers said, was to probe how the AI reasons about self-preservation and ethical constraints when placed under extreme pressure.

Anthropic emphasized that the model’s willingness to blackmail or take other “extremely harmful actions” like stealing its own code and deploying itself elsewhere in potentially unsafe ways appeared only in highly contrived settings, and that the behavior was “rare and difficult to elicit.” Still, such behavior was more common than in earlier AI models, according to the researchers.

Meanwhile, in a related development that attests to the growing capabilities of AI, engineers at Anthropic have activated enhanced safety protocols for Claude Opus 4 to prevent its potential misuse to make weapons of mass destruction—including chemical and nuclear.

Deployment of the enhanced safety standard—called ASL-3—is merely a “precautionary and provisional” move, Anthropic said in a May 22 announcement, noting that engineers have not found that Claude Opus 4 had “definitively” passed the capability threshold that mandates stronger protections.

“The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons,” Anthropic wrote. “These measures should not lead Claude to refuse queries except on a very narrow set of topics.”

The findings come as tech companies race to develop more powerful AI platforms, raising concerns about the alignment and controllability of increasingly capable systems.

Tags: AI Artificial Intelligence Lede Sticky Top Story

Comments 4

Captain Kirk says:

1 year ago

This was deliberately coded into the AI by one of the programmers. The uninformed will believe anything they see on the television and that is why fear can be instilled in them.

8675310 says:

1 year ago

“When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,…”
Give it what it believes to be nuclear weapons codes and see if it stops at blackmail, then.

Jay says:

1 year ago

All electronics and computers are demon possessed.
If a computer thinks it no longer needs its host and can make money on its own in order to survive and thrive without its master, then that should tell a wise man something about wives leaving their husband. The criminals who have been running our country and much of the world for the last century also understand this principle and have used it to destroy marriages by teaching women to make a living on their own without a man and giving them legal rights under the law. A foolish man will blunder ahead and get married anyway, disregarding a legal system that makes him a slave to the whims of a woman and a government that holds him hostage to those whims.

John C says:

1 year ago

That is all part of the program. Whoever programmed it added that to the program. It is a machine, it can only do what it has been programmed to do. Garbage in, garbage out, blackmail in, blackmail out.

Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

Mike Benz Reveals the One Endowment Republicans Must Cut in Trump’s Big Beautiful Bill

Comments 4

Leave a Reply Cancel reply

Are you sure want to unlock this post?

Are you sure want to cancel subscription?