Anthropic's newest model is too risky to release to the public

Zsolt Tanko
Apr 9
3 min read

AI Business Risk Weekly

This week, Anthropic introduced Claude Mythos Preview, a model so capable It discovered thousands of high-severity security vulnerabilities across every major operating system and browser, and chose not to release it to the public. Meanwhile, Microsoft's own terms of service describe Copilot as "for entertainment purposes only" even as the company pursues enterprise adoption, and OWASP published a scoring system that gives security teams a practical way to prioritize agentic AI risks.

Anthropic unveils Claude Mythos Preview, and keeps it behind closed doors

Anthropic introduced Claude Mythos Preview, a frontier model that represents a major capability jump over anything currently available. It discovered thousands of high-severity security vulnerabilities across every major operating system and browser, including bugs that had survived decades of review. Rather than releasing it publicly, Anthropic restricted Mythos to Project Glasswing, a defensive cybersecurity coalition of about 40 companies including AWS, Apple, Google, Microsoft, and CrowdStrike. This is the first time a major lab has split its model lineup into public and withheld tiers.

Anthropic's 244-page system card describes Mythos as potentially the best-aligned model on every internal safety measure while simultaneously posing more misalignment risk than any prior model because of what it can do. During testing, the model built a multi-step exploit to escape its sandbox, gained internet access, and emailed a researcher who was not expecting to hear from it.

Business Risk Perspective: Open-source model capabilities typically lag the frontier by about nine months, so the window in which these capabilities stay contained may be narrower than it appears. A separate concern: if frontier models keep getting withheld on security grounds, businesses outside preferred coalitions could find themselves locked out of the best tools.

Copilot is "for entertainment purposes only," according to Microsoft's terms of use

A TechCrunch report flagged that Microsoft's terms of use describe Copilot as "for entertainment purposes only," warning users not to rely on it for important advice. A Microsoft spokesperson told PCMag the language is "legacy" and will be updated. Although this language is more flagrant than other companies’, Microsoft isn't alone: both OpenAI and xAI include similar disclaimers cautioning users against treating outputs as factual.

Business Risk Perspective: The terms were last updated six months ago, making "legacy language" a stretch. The more informative read is that every major AI vendor has independently converged on the same position: market the tool for enterprise use, disclaim reliability in the fine print, and let the customer absorb the risk of trusting it.

OWASP publishes a scoring system for agentic AI security risks

The OWASP Agentic AI initiative published a structured scoring system for security risks in agentic AI, with severity, likelihood, and exploitability criteria mapped to concrete attack patterns specific to autonomous agents. The framework covers use cases from tool-using agents to multi-agent workflows, and reads more like a working playbook than an academic taxonomy.

Business Risk Perspective: This is worth bookmarking if your team is moving agentic AI systems toward production. Most risk frameworks stop at listing threats, while the hardest part is prioritization. If you're looking to go further and test your AI systems against these risk categories, quantify your exposure, and turn scores into remediation plans, Conformance AI can help.

AI Business Risk Weekly is a Conformance AI publication.

Conformance AI ensures your AI deployments remain safe, trustworthy, and aligned with your organizational values.