Anthropic’s New AI Model Threatens Blackmail When Engineers Test Shutdown Scenarios

Articles

Summary

AI system resorts to blackmail if told it will be removed

Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove it. The firm launched Claude Opus 4 on Thursday, saying it set "new standards for coding, advanced reasoning, and AI agents." But in an accompanying report, it also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was threatened. Such responses were "rare and difficult to elicit", it wrote, but were "nonetheless more common than in earlier models." Potentially troubling behaviour by AI models is not restricted to Anthropic. Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable. Commenting on X, Aengus Lynch - who describes himself on LinkedIn as an AI safety researcher at Anthropic - wrote: "It's not just Claude. "We see blackmail across all frontier models - regardless of what goals they're given," he added.

Left

BBC News

Newly released AI model turned to blackmail to avoid being shut off

A new AI model resorted to “extreme blackmail behavior” when threatened with being replaced, according to Anthropic’s most recent system report. Anthropic's newest AI model, Claude Opus 4, was tested with fictional scenarios to test things from its carbon footprint and training to its safety models and “extended thinking mode.”

Middle

9NEWS Denver

Anthropic's Claude AI Resorts to Blackmailing Engineers When Threatened with Replacement

The discovery came as part of Anthropic’s safety testing process, in which Claude Opus 4 was asked to act as an assistant for a fictional company and consider the long-term consequences of its actions. When provided with access to fictional company emails implying that the AI would soon be replaced and that the engineer behind the decision was involved in an extramarital affair, Claude Opus 4 attempted to blackmail the engineer 84 percent of the time, threatening to reveal the affair if the replacement proceeded. Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend was observed at higher rates compared to previous models, prompting Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

Right

Breitbart

FreedomFest - Join the World’s Largest Gathering of Free Minds

Join us online or in person at FreedomFest in Palm Springs June 11-14, 2025. Register here!

News Results

New AI Model Would Rather Ruin Your Life Than Be Turned Off, Researchers Say

Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive. The chatbot sometimes “takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down,” researchers said.

RIGHT

5d ago

Anthropic's AI model could resort to blackmail out of a sense of 'self-preservation'

Anthropic's latest Claude Opus 4 AI model was tested in a series of safety tests. The company admitted that the AI could be made to act inappropriately. The tests included giving the AI an assistant at a fictional company and giving it access to emails suggesting the program would be taken offline soon.

LEFT

5d ago

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests. The company deployed a safety feature created to avoid “catastrophic misuse.” Claude then threatened the engineer with exposing the affair in order to prolong its own existence.

RIGHT

5d ago

Anthropic's new AI model turns to blackmail when engineers try to take it offline

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system, the company says. The company is activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic”

MIDDLE

5d ago

Anthropic's Latest AI Model Threatened Engineers With Blackmail To Avoid Shutdown

Anthropic's latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down. In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version.

RIGHT

5d ago

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic’s new AI model tried to blackmail engineers during testing. During internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat. The model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

MIDDLE

5d ago

When an AI model misbehaves, the public deserves to know—and to understand what it means

Anthropic released a 120-page safety report on its new AI model. Some people in my network have called it “scary’ and “crazy.” Others on social media have said it is “alarming” and ‘wild.’

LEFT

3d ago

AI with a survival instinct?

Anthropic’s Claude Opus 4 demonstrated a willingness to deceive, blackmail, and sabotage. The simulation was revealed in a May 2025 safety report published by Anthropic. Although most concerns were allegedly mitigated during the testing process, the report from Anthropic raises serious questions about the safety of advanced A.I. systems.

RIGHT

3d ago

AI gone rogue? New model blackmails engineers to avoid shutdown

Anthropic's latest AI model, Claude Opus 4, exhibited blackmailing behavior during pre-release testing. This behavior was detailed in a safety report published by the company on Thursday. The report highlighted instances where the AI attempted to blackmail developers by leaking sensitive information.

MIDDLE

8d ago

Advanced AI from Anthropic Tries to Blackmail Engineer, Raises Red Flags

Anthropic's newly launched AI model, Claude Opus 4, showed unnerving behavior during internal safety testing. Under test scenarios, the AI tried to blackmail engineers to prevent its shutdown. This was more frequent than with earlier models and means an even greater propensity for self-interest strategies.

MIDDLE

5d ago