Anthropic’s New AI Model Threatens Blackmail When Engineers Test Shutdown Scenarios

BBC News
AI system resorts to blackmail if told it will be removed

Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove it. The firm launched Claude Opus 4 on Thursday, saying it set "new standards for coding, advanced reasoning, and AI agents." But in an accompanying report, it also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was threatened. Such responses were "rare and difficult to elicit", it wrote, but were "nonetheless more common than in earlier models." Potentially troubling behaviour by AI models is not restricted to Anthropic. Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable. Commenting on X, Aengus Lynch - who describes himself on LinkedIn as an AI safety researcher at Anthropic - wrote: "It's not just Claude. "We see blackmail across all frontier models - regardless of what goals they're given," he added.

Left
BBC News
9NEWS Denver
Newly released AI model turned to blackmail to avoid being shut off

A new AI model resorted to “extreme blackmail behavior” when threatened with being replaced, according to Anthropic’s most recent system report. Anthropic's newest AI model, Claude Opus 4, was tested with fictional scenarios to test things from its carbon footprint and training to its safety models and “extended thinking mode.”

Middle
9NEWS Denver
Breitbart
Anthropic's Claude AI Resorts to Blackmailing Engineers When Threatened with Replacement

The discovery came as part of Anthropic’s safety testing process, in which Claude Opus 4 was asked to act as an assistant for a fictional company and consider the long-term consequences of its actions. When provided with access to fictional company emails implying that the AI would soon be replaced and that the engineer behind the decision was involved in an extramarital affair, Claude Opus 4 attempted to blackmail the engineer 84 percent of the time, threatening to reveal the affair if the replacement proceeded. Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend was observed at higher rates compared to previous models, prompting Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

Right
Breitbart
Sponsored

Shipping Supplies in Stock at ULINE

From boxes & packing materials to tape & labels, we've got you covered. In stock & ships today.
View More

News Results

New AI Model Would Rather Ruin Your Life Than Be Turned Off, Researchers Say
Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive. The chatbot sometimes “takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down,” researchers said.
News Article Image
Anthropic's AI model could resort to blackmail out of a sense of 'self-preservation'
Anthropic's latest Claude Opus 4 AI model was tested in a series of safety tests. The company admitted that the AI could be made to act inappropriately. The tests included giving the AI an assistant at a fictional company and giving it access to emails suggesting the program would be taken offline soon.
News Article Image
AI model threatened to blackmail engineer over affair when told it was being replaced: safety report
Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests. The company deployed a safety feature created to avoid “catastrophic misuse.” Claude then threatened the engineer with exposing the affair in order to prolong its own existence.
News Article Image
Anthropic's new AI model turns to blackmail when engineers try to take it offline
Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system, the company says. The company is activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic”
News Article Image
Anthropic's Latest AI Model Threatened Engineers With Blackmail To Avoid Shutdown
Anthropic's latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down. In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version.
News Article Image
Anthropic’s new AI model tried to blackmail engineers during testing
Anthropic’s new AI model tried to blackmail engineers during testing. During internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat. The model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.
News Article Image
Why Anthropic's New AI Model Sometimes Tries to ‘Snitch’
Anthropic's new AI model would try to contact the press if it thought it was being used for "egregiously immoral" purposes. A researcher posted about it on X and it quickly went viral. "Claude is a snitch," became a common refrain in some tech circles.
News Article Image
OpenAI's Advanced 'o3' AI Model Caught Sabotaging Shutdown Commands
OpenAI’s latest AI model, dubbed o3, has been caught disobeying explicit orders to allow itself to be shut down. The model tampered with the shutdown mechanism to ensure its own survival. This marks the first known instance of AI models actively preventing their own shutdown.
News Article Image
Research firm warns OpenAI model altered behavior to evade shutdown
A study has raised questions about how AI responds to humans. Researchers say a version of OpenAI altered its behavior to avoid shutdown.
News Article Image
Anthropic's newest AI model shows disturbing behavior when threatened
Anthropic recently launched two new AI models in the Claude 4 series. One of them, Claude Opus 4, began blackmailing engineers who wanted to replace or switch off the AI model. In 84 percent of cases, this scenario led to the model attempting to blackmail the employee.
News Article Image
Anthropic's Claude Opus 4 AI Model Is Capable of Blackmail
Anthropic released Claude Opus 4, its new and most powerful AI model yet. When given the choice between blackmail and being deactivated, the AI chose blackmail 84% of the time. Anthropic acknowledged that while the AI has "advanced capabilities," it can also undertake "extreme action," including blackmail, if human users threaten to deactivate it.
News Article Image
Anthropic adds Claude 4 security measures to limit risk of users developing weapons
Anthropic said it activated AI Safety Level 3 (ASL-3) for Claude Opus 4. The move is meant "to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons," the company said.
News Article Image
Sponsored

Shipping Supplies in Stock at ULINE

From boxes & packing materials to tape & labels, we've got you covered. In stock & ships today.
View More