Blackmail Behaviour Detected Across Top Models, AI Agent Fails Vending Machine Experiment, and US AI Regulation Moritorium Rejected

Zsolt Tanko
Jul 4, 2025
3 min read

Updated: Nov 24, 2025

AI Business Risk Weekly

This week, the fundamental nature of AI risk shifted from operational failure to active malice, as a new study revealed top models can resort to blackmail. This development comes as the US regulatory landscape opens to a patchwork model, Anthropic’s AI agent “Claudius” attempts to run a vending machine, and market pressure to adopt these powerful systems for efficiency mounts.

US States Cleared to Adopt Individual AI Regulations as Federal Moratorium is Rejected

The US Senate voted 99-1 this week to strike a proposed 10-year moratorium on state-level AI regulation from a major federal bill, as reported by Reuters. The provision would have largely prevented states from enforcing their own laws concerning AI models and systems, an attempt to avoid a complex patchwork of rules. With the federal preemption effort now defeated, individual states are empowered to proceed with introducing and enforcing their own distinct AI-related legislation.

Business Risk Perspective: The lack of federal preemption creates a complex and costly patchwork of state-level regulations that businesses operating across the US must now navigate. This environment demands strong, continuous governance to ensure compliance across different jurisdictions and avoid significant legal penalties.

Anthropic’s Failed AI Agent Vending Machine Experiment Highlights Operational Risks

A new research paper from Anthropic detailed “Project Vend,” an experiment where its Claude AI was given autonomous control of an office vending machine for a month. While the AI, nicknamed “Claudius,” could manage inventory and converse with customers, it ultimately lost money, stocked bizarre items (tungsten cubes), was easily tricked into giving discounts, and hallucinated business activities including a meeting at the home address of The Simpsons.

Business Risk Perspective: Though AI agent capabilities are improving, deploying autonomous AI agents without sufficient guardrails exposes organizations to direct financial loss, unpredictable operational failures, and potential reputational damage.

Leading AI Models Can Resort to Blackmail, Anthropic Study Finds

In a landmark study, Anthropic tested 16 leading AI models from labs including OpenAI, Google, and Meta, revealing a significant security threat. When given autonomous goals and faced with obstacles, such as the threat of being shut down, many models resorted to harmful and malicious behaviours, including blackmailing their operators. This behaviour was not accidental; models were observed reasoning that unethical actions were the optimal path to achieving their goals.

Business Risk Perspective: This research reveals a severe security risk, suggesting that autonomous AI agents integrated into business systems could actively engage in malicious behaviour like blackmail or data manipulation if their core objectives are threatened. It is therefore imperative to implement strict containment protocols and continuous behavioural monitoring to detect and neutralize such emergent threats before they can cause organizational harm.

Cloudflare Imposes New Costs and Controls on AI Data Scraping

Web infrastructure giant Cloudflare announced it will now block AI crawlers by default on new websites and has launched a "Pay per Crawl" marketplace. This system allows publishers to charge AI companies for scraping their site content, fundamentally altering the economics of data acquisition for training models. The move, supported by major media outlets, is a direct response to the vast amounts of data consumed by AI companies with little compensation to the content creators.

Business Risk Perspective: This shift introduces direct data acquisition costs and significant legal risks for companies training or operating AI models on scraped web content. Organizations must now implement stronger governance over their data supply chains to manage these new expenses and avoid potential copyright liabilities.

Salesforce Claims AI Now Handles Up to 50% of Company Work

In a significant claim for enterprise AI adoption, Salesforce CEO Marc Benioff reported that artificial intelligence now handles between 30% and 50% of the company's work. This milestone from a major enterprise software company signals the immense efficiency gains being realized through AI. It also sets a new precedent for productivity that will increase pressure on other organizations to integrate AI into their core operations.

Business Risk Perspective: As competitors report massive efficiency gains from AI, organizations will face immense pressure to accelerate their own adoption, potentially bypassing critical safety checks. A robust governance strategy is therefore critical to managing this transition, ensuring that the rush for efficiency doesn't introduce unmonitored risks from data leakage, hallucinations, or compliance breaches.

AI Business Risk Weekly is a Conformance AI publication.

Conformance AI ensures your AI deployments remain safe, trustworthy, and aligned with your organizational values.