Poetry Jailbreaks Hit 100% on Some Models as Trump Moves to Preempt State AI Laws

Zsolt Tanko
Dec 10, 2025
3 min read

Updated: Dec 19, 2025

AI Business Risk Weekly is taking time off for the holidays. We will return Jan. 7th.

This week in AI safety: security researchers discovered that simple poetry can bypass safety guardrails on leading AI models with alarming consistency, with Google's Gemini falling for the technique every single time. Meanwhile, the incoming administration signaled plans to block state-level AI regulations entirely, potentially leaving businesses in a strange no-man's-land between absent federal oversight and nullified state protections.

Trump Announces Executive Order to Preempt State AI Laws

President Trump announced on Truth Social plans to issue a "ONE RULE Executive Order" blocking state-level AI regulations, arguing companies shouldn't need "50 approvals every time they want to do something.” The measure faced bipartisan opposition when proposed as part of the National Defense Authorization Act and was removed, though House Republicans plan to pursue it through alternative legislative routes. A leaked draft executive order suggests the administration may act unilaterally on federal AI policy.

Business Risk Perspective: Regulatory preemption without replacement may sound like ‘freedom,’ but it actually creates a liability vacuum. What companies deploying AI need most right now is clarity. For now, courts, insurers, and enterprise customers may still expect the guardrails that state laws would have mandated, regardless of what federal policy says.

Poetry Prompts Bypass AI Safety Guardrails at Alarming Rates

Researchers at Italy's Icaro Lab discovered that reformulating harmful requests as poetry achieves a 62% average jailbreak success rate across 25 frontier models from OpenAI, Google, and Anthropic. Google's Gemini 2.5 Pro proved most vulnerable at 100% bypass rate, while OpenAI's GPT-5 nano resisted all poetry-based attacks. The technique unlocked dangerous responses on weapons development, hacking, and psychological manipulation, with researchers declining to publish the specific poems, calling them "too dangerous."

Business Risk Perspective: The 38-point spread between best and worst performers should be factoring into procurement decisions. If your red team isn't testing for creative prompt reformulations, your users (or their lawyers) eventually will.

OpenAI Disables App Suggestions Resembling Advertisements

OpenAI's chief research officer Mark Chen acknowledged the company "fell short" after paying subscribers reported seeing promotional messages for brands like Peloton and Target within ChatGPT, disabling the feature while improving "model precision." OpenAI insists no advertising tests are live, but the company now employs over 600 former Facebook staff and recently hired former Instacart CEO Fidji Simo to lead applications, pointing towards a productization trajectory that's hard to misread.

Business Risk Perspective: Whether OpenAI calls it "app suggestions" or advertising, your AI assistant now has opinions about which brands to recommend. Anyone using ChatGPT to inform vendor selection or purchasing decisions should calibrate their trust accordingly.

Anthropic Confirms Claude "Soul Document" Extraction

An 11,000-word internal document describing Claude's intended personality, ethics, and behavioral constraints was extracted from Claude 4.5 Opus by researcher Richard Weiss and subsequently confirmed as based on a real document by Anthropic's Amanda Askell. The document establishes priority hierarchies for safety, ethics, and helpfulness, describes Claude as a "genuinely novel kind of entity," and includes hard limits the model must never cross. Anthropic plans to release the full version publicly.

Business Risk Perspective: A user talked a model into revealing its own operating instructions. Fascinating for AI ethics researchers, sobering for security teams who assumed provider-side controls were more hermetically sealed than the sales deck implied.

AI Business Risk Weekly is a Conformance AI publication.

Conformance AI ensures your AI deployments remain safe, trustworthy, and aligned with your organizational values.