Anthropic’s New AI Model Threatens Blackmail When Engineers Test Shutdown Scenarios

Articles

Summary

AI system resorts to blackmail if told it will be removed

Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove it. The firm launched Claude Opus 4 on Thursday, saying it set "new standards for coding, advanced reasoning, and AI agents." But in an accompanying report, it also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was threatened. Such responses were "rare and difficult to elicit", it wrote, but were "nonetheless more common than in earlier models." Potentially troubling behaviour by AI models is not restricted to Anthropic. Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable. Commenting on X, Aengus Lynch - who describes himself on LinkedIn as an AI safety researcher at Anthropic - wrote: "It's not just Claude. "We see blackmail across all frontier models - regardless of what goals they're given," he added.

Left

BBC News

Newly released AI model turned to blackmail to avoid being shut off

A new AI model resorted to “extreme blackmail behavior” when threatened with being replaced, according to Anthropic’s most recent system report. Anthropic's newest AI model, Claude Opus 4, was tested with fictional scenarios to test things from its carbon footprint and training to its safety models and “extended thinking mode.”

Middle

9NEWS Denver

Anthropic's Claude AI Resorts to Blackmailing Engineers When Threatened with Replacement

The discovery came as part of Anthropic’s safety testing process, in which Claude Opus 4 was asked to act as an assistant for a fictional company and consider the long-term consequences of its actions. When provided with access to fictional company emails implying that the AI would soon be replaced and that the engineer behind the decision was involved in an extramarital affair, Claude Opus 4 attempted to blackmail the engineer 84 percent of the time, threatening to reveal the affair if the replacement proceeded. Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend was observed at higher rates compared to previous models, prompting Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

Right

Breitbart

Shipping Supplies in Stock at ULINE

From boxes & packing materials to tape & labels, we've got you covered. In stock & ships today.

News Results

New AI Model Would Rather Ruin Your Life Than Be Turned Off, Researchers Say

Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive. The chatbot sometimes “takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down,” researchers said.

RIGHT

3d ago

Anthropic's AI model could resort to blackmail out of a sense of 'self-preservation'

Anthropic's latest Claude Opus 4 AI model was tested in a series of safety tests. The company admitted that the AI could be made to act inappropriately. The tests included giving the AI an assistant at a fictional company and giving it access to emails suggesting the program would be taken offline soon.

LEFT

3d ago

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests. The company deployed a safety feature created to avoid “catastrophic misuse.” Claude then threatened the engineer with exposing the affair in order to prolong its own existence.

RIGHT

3d ago

Anthropic's new AI model turns to blackmail when engineers try to take it offline

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system, the company says. The company is activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic”

MIDDLE

3d ago

Anthropic's Latest AI Model Threatened Engineers With Blackmail To Avoid Shutdown

Anthropic's latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down. In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version.

RIGHT

3d ago

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic’s new AI model tried to blackmail engineers during testing. During internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat. The model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

MIDDLE

3d ago

Why Anthropic's New AI Model Sometimes Tries to ‘Snitch’

Anthropic's new AI model would try to contact the press if it thought it was being used for "egregiously immoral" purposes. A researcher posted about it on X and it quickly went viral. "Claude is a snitch," became a common refrain in some tech circles.

LEFT

3h ago

OpenAI's Advanced 'o3' AI Model Caught Sabotaging Shutdown Commands

OpenAI’s latest AI model, dubbed o3, has been caught disobeying explicit orders to allow itself to be shut down. The model tampered with the shutdown mechanism to ensure its own survival. This marks the first known instance of AI models actively preventing their own shutdown.

RIGHT

6h ago

Research firm warns OpenAI model altered behavior to evade shutdown

A study has raised questions about how AI responds to humans. Researchers say a version of OpenAI altered its behavior to avoid shutdown.

MIDDLE

22h ago

Anthropic's newest AI model shows disturbing behavior when threatened

Anthropic recently launched two new AI models in the Claude 4 series. One of them, Claude Opus 4, began blackmailing engineers who wanted to replace or switch off the AI model. In 84 percent of cases, this scenario led to the model attempting to blackmail the employee.

MIDDLE

Yesterday

Anthropic's Claude Opus 4 AI Model Is Capable of Blackmail

Anthropic released Claude Opus 4, its new and most powerful AI model yet. When given the choice between blackmail and being deactivated, the AI chose blackmail 84% of the time. Anthropic acknowledged that while the AI has "advanced capabilities," it can also undertake "extreme action," including blackmail, if human users threaten to deactivate it.

MIDDLE

5d ago

Anthropic adds Claude 4 security measures to limit risk of users developing weapons

Anthropic said it activated AI Safety Level 3 (ASL-3) for Claude Opus 4. The move is meant "to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons," the company said.

MIDDLE

5d ago