

In one test, models learned of a fictional executive’s affair and pending decision to shut them down. With few programmed options, the AI models were boxed into a binary choice — either act ethically, or resort to blackmail to preserve their goals. Anthropic emphasized that this does not reflect likely real-world behavior, but rather extreme, stress-test conditions designed to probe model boundaries. Still, the numbers are striking. Claude Opus 4 opted for blackmail in 96% of runs. Google’s Gemini 2.5 Pro followed closely at 95%. OpenAI’s GPT-4.1 blackmailed 80% of the time, and DeepSeek’s R1 landed at 79%.
Ladies and gentlemen the future of blackmail is here
If only he were capable of being so consistent. He thinks what people pay him to think.