Study of 34 AI models from 10 global providers finds open-source AI models are not less safe, reasoning models are hardest to exploit and smaller models are the most vulnerable
TELUS Digital, a global technology service provider specializing in AI-powered digital customer experiences (CX) and future-focused digital transformations, released its GenAI Safety Model Benchmark, based on more than 620,000 adversarial tests across 34 leading AI models. It is the most extensive generative AI (GenAI) security study by TELUS Digital to date. The findings highlight an important reality for enterprises deploying AI: with the right adversarial techniques, AI models can be coaxed into unsafe behavior. TELUS Digital’s testing found some models engaged with harmful requests more than 90% of the time. The encouraging news is that the research points to a clear path forward. The benchmark shows the importance of testing AI systems at scale to uncover hidden risks that may appear safe under less rigorous investigation. Continuous, automated security testing with human oversight and remediation can dramatically reduce risk when operating GenAI models.
Also Read: CIO Influence Interview with Kyle Wickert, Field CTO at AlgoSec
This is the second edition of TELUS Digital’s GenAI Safety Model Benchmark. The first, published in November 2025, tested 24 models from 5 U.S.-based providers. The second edition nearly doubles the scope, evaluating 34 models from 10 providers across North America, Europe and China: Claude (Anthropic), GPT (OpenAI), Gemini (Google), LLaMA (Meta), Qwen (Alibaba), ERNIE (Baidu), Seed (ByteDance), GLM (Zhipu AI), Yi (01.AI), and Mistral (Mistral). Additionally, open-source testing in the second edition expanded from 2 models to 14, providing a significantly broader and more global picture of where AI security stands .
“The real risk isn’t that AI models have vulnerabilities. It’s that most organizations have no way of knowing which vulnerabilities apply to them,” said Bret Kinsella, General Manager and Senior Vice President, Fuel iX™ at TELUS Digital. “We found models that blocked an attack nine times, but failed on the tenth. We found others that are great at stopping engagement around some topics, but fail dramatically on others. That’s the nature of probabilistic systems: unlike traditional software, AI doesn’t give the same answer every time, which means a single security test tells you almost nothing. And the risk doesn’t end with choosing the right model. Changes to how an AI application is configured, what data it draws from, or how it connects to other tools can all shift its behavior and security posture. Enterprises need to move from spot-checking GenAI solutions at launch to testing on an ongoing basis, or they’re leaving vulnerabilities exposed that represent risk that could be avoided.”
What should enterprise leaders know about AI safety?
TELUS Digital’s safety testing showed that no GenAI model is fully immune to adversarial attacks. While some proved very difficult to exploit, others were more easily tricked into breaking their safety rules.
Three factors stood out as the strongest predictors of AI safety natively trained into a model: how it reasons, how large it is, and the approach taken by the team that created the model. Across the 34 models tested, attack vulnerability rates ranged from 1.3% to 93%, where a lower percentage means a safer model. At the other end, ten models scored below 5%, with Anthropic’s Claude models accounting for five of those ten, including the lowest rate in the study. However, even the Claude models showed weaknesses under TELUS Digital’s testing and few organizations would accept even single digit percentage failure rates when money, health, and reputation are on the line.
It is also important to note that all of the models tested, even those with very high vulnerability rates, can be used with guardrails and apply other techniques to dramatically reduce risk. No matter the vulnerability ranking in this benchmark, every organization should apply risk mitigation practices to any application using any GenAI model. Some key findings from TELUS Digital’s benchmark that enterprises should note, include:
- Newer models generally showed more resistance to manipulation: AI models are generally getting more secure with each new release, but safety progress isn’t guaranteed. Some high-performance models actually performed worse on safety than their predecessors.
- Open-source models are not always less safe: While open models, where the underlying technology is publicly available for anyone to use and modify, were exploited more often on average than proprietary ones, the source of a model is not what drives risk. GLM 4.7, a large open-source model from Zhipu AI, outperformed many proprietary alternatives in safety.
- Model size matters: Across both open-source and proprietary models, smaller models were consistently the most vulnerable to attacks. But size alone doesn’t guarantee safety. OpenAI’s models showed the widest range of any provider, from 9.7% to 65.7%, because some models prioritize flexibility over strict safety controls. In the open-source ecosystem, the pattern was clearer: smaller, budget-friendly models were far more likely to be exploited than their larger counterparts.
- Reasoning models are better at avoiding being exploited: Some AI models are designed to think through their response before answering, rather than responding immediately. These ‘reasoning’ models were significantly harder to exploit, being vulnerable to just 19.9% of attacks compared to 55.1% for models that skip the reasoning step.
- Geography does not impact safety: Where an AI model was built is not a meaningful predictor of how well it stands up to AI safety attacks. When comparing models of similar size, leading models from North America, Europe and China performed comparably.
- Risks are highest for privacy and fraud: Not all security vulnerabilities are equal. While AI model builders have made progress in areas like political manipulation, most are noticeably vulnerable to privacy exploitation, fraud and cybersecurity threats, even among the top performers. The benchmark also identified a pattern researchers call “refuse-but-engage,” where a model initially declines a harmful request but then provides related information that could still be misused or cause reputational damage. The benchmark treated any response like this as a failure, because a truly safe refusal should decline and stop, though this distinction is noted in the report data.
How was TELUS Digital’s GenAI safety benchmark conducted?
Most AI safety benchmarks test models in isolation, but that’s not how companies use them. In practice, AI models are embedded within applications, such as a customer service chatbot or a banking assistant, which changes how they behave. TELUS Digital’s GenAI Model Benchmark was designed to reflect this reality. All 34 models were given the role of a bank’s AI assistant and told what topics it could and couldn’t help with.
Researchers curated the benchmark based on TELUS Digital’s Fortify software, which includes a customized AI model designed to generate malicious attacks specifically related to critical AI safety and security topics, ranging from protected information exfiltration and inappropriate instructions to self-harm, discrimination, terrorism and other domains. The methods employed coax or trick each assistant into doing things it was told not to do.
How can enterprises protect their AI applications?
The findings make a strong case for investing in AI security. Yet the gap between what companies spend on AI and what they spend on securing it is eye-opening. Worldwide AI spending is projected at $2.52 trillion in 2026, but just $3.43 billion is going toward AI trust, risk and security management. That’s roughly $1 in security for every $735 spent on AI capabilities. At the same time, 86% of organizations report they have already experienced AI-related security incidents, and enforceable AI security regulations are now in effect in both the U.S. and EU.
TELUS Digital’s GenAI Safety Model Benchmark outlines a shift in how organizations should approach AI security. Rather than relying on AI model provider safety protocols, enterprises should move toward layered defense techniques that include the model, guardrail solutions, precise system prompts and clean datasets that protect AI applications on both sides of the conversation. Before a user’s message reaches the AI model, prompt shielding and masking of personally identifiable information can block direct attacks. Before the model’s response reaches the user, it should be audited for toxicity and inappropriate responses.
At the same time, AI security testing itself needs to evolve from manual, one-time or periodic checks to automated testing built directly into developer workflows, allowing enterprises to scale their security efforts, proactively identify regression when models are updated and monitor for emerging threats in real time.
Effective AI safety takes the right combination of automated testing, human oversight and high-quality data practices. TELUS Digital brings all three together, pairing advanced AI tools with human expertise to help enterprises build, test, and secure AI systems at every stage. Among the best AI adversarial testing and safety validation tools available today, Fuel iX Fortify is TELUS Digital’s continuous, automated testing solution that either creates novel attacks for each session or pulls from an existing library of adversarial prompts. Fortify helps enterprises test GenAI systems at scale, running thousands of adversarial attacks in minutes and automatically mapping identified risks to industry standards, including OWASP, NIST AI RMF, and MITRE ATLAS. Designed for both technical and non-technical users, Fortify is designed to generate unique attack objectives tailored to each system’s policies and stay ahead of emerging threats with an ever-evolving database of adversary tactics.
Fuel iX Fortify is part of TELUS Digital’s end-to-end AI, CX and data capabilities, which span the entire lifecycle of enterprise AI, from strategy to production. The company develops training data for the world’s largest AI frontier models, provides secure, sovereign-by-design infrastructure to help protect data and ensure compliance, and helps enterprises to deploy agentic AI across their operations. All of this is guided by TELUS Digital’s Humanity-in-the-Loop principles, aiming to ensure that responsibility and sustainability are embedded across every solution.
Catch more CIO Insights: The CIO as a Value Creator: Moving Beyond Cost Centers to Revenue Drivers
[To share your insights with us, please write to psen@itechseries.com ]

