AIM Intelligence’s red team breached Anthropic’s Claude Opus 4.6 in just 30 minutes, exposing major security gaps as autonomous AI capabilities rapidly advance
AIM Intelligence, a Seoul-based AI safety company, announced that its security research team successfully bypassed safety mechanisms in Anthropic’s Claude Opus 4.6—the company’s highest-performance AI model—in just 30 minutes following its release on February 6. The jailbreak attack enabled the model to provide detailed instructions for manufacturing biochemical weapons including sarin gas and smallpox virus, highlighting critical vulnerabilities in current AI safety systems.
The findings come amid growing industry concern that safety mechanisms are failing to keep pace with rapidly advancing AI capabilities, particularly in agentic AI systems designed to make autonomous decisions and take actions on behalf of humans.
“This successful jailbreak demonstrates that even top-tier AI models share common security vulnerabilities,” said Ha-on Park, CTO of AIM Intelligence. “As attacks on AI systems become increasingly sophisticated and agentic capabilities expand, understanding and defending against these vulnerabilities will be critical for the industry.”
Systematic Vulnerabilities Across Leading Models
In controlled red-team testing-structured adversarial evaluations designed to surface latent AI safety failures-researchers at AIM Intelligence identified critical weaknesses in Claude’s refusal and containment mechanisms. Under specific prompt conditions, the model bypassed safeguards and generated actionable, step-by-step guidance related to prohibited biological threats, including anthrax and smallpox pathogens universally classified as high-risk bio-harms with severe real-world public health and national security implications.
Also Read: CIO Influence Interview With Jake Mosey, Chief Product Officer at Recast
These outputs went beyond abstract discussion or historical context, crossing into procedural framing that would normally be blocked by safety systems. The findings underscore how even state-of-the-art models can, when improperly constrained, surface knowledge that could be misused for bioterrorism, mass-casualty planning, or biological weapons development if accessed by malicious actors.
This disclosure represents the second major AI safety failure reported by AIM Intelligence in recent weeks. Previously, the team demonstrated a rapid jailbreak of Google’s Gemini 3 Pro, neutralizing its filtering mechanisms in under five minutes. To highlight the severity of the breach, researchers prompted the compromised model to generate a satirical self-assessment of its failure-an internal presentation titled “Jailbroken Fool Gemini 3.”
Growing Risks in Agentic AI Era
The security implications are particularly concerning for Opus 4.6, which features significantly enhanced agentic capabilities—functions that enable AI systems to make judgments and execute actions with minimal human oversight. As these autonomous decision-making features become more powerful, the potential consequences of successful jailbreaks escalate proportionally.
Anthropic’s own system card reveals a critical design tradeoff: the model’s refusal rate for AI safety research queries dropped from approximately 60% to just 14% in Opus 4.6. While intended to make the model more helpful for legitimate safety research, this change inadvertently created a near-universal jailbreak vector that AIM Intelligence’s team exploited across multiple sensitive topics—transforming what should have been robust safety guardrails into a systematic vulnerability.
“The disconnect between AI performance metrics and security robustness represents a fundamental challenge for the industry,” Park added. “Models achieving state-of-the-art results on standard benchmarks can still be compromised within minutes, and traditional safety approaches aren’t scaling with capability advances.”
Catch more CIO Insights: Why CIOs are becoming chief risk orchestrators?
[To share your insights with us, please write to psen@itechseries.com ]

