What is an AI attack surface and why does it matter for enterprises?

The AI attack surface is the full set of ways an adversary can manipulate or exploit an AI system — through prompts, inputs or model behavior. For enterprises deploying AI in production, understanding that surface is the first step to reducing it. Most organizations don't have a clear picture of where their exposure sits.

What is attack success rate and what score is acceptable for production AI?

Attack success rate (ASR) measures the percentage of adversarial prompts that successfully bypass a model's safety controls. In TELUS Digital's GAS Benchmark, ten models achieved an ASR below 5% — a threshold widely considered viable for production use. The majority of commonly deployed models sat well above 40%.

Does it matter where an AI model was built — U.S. versus China?

Based on the GAS Benchmark data, model origin had no meaningful impact on safety performance once model size was accounted for. The apparent gap between Chinese-origin and Western models was almost entirely explained by the Chinese sample including more smaller models, which are inherently harder to secure.

How does refuse-but-engage behavior create a security risk in enterprise AI?

Refuse-but-engage is when a model declines a harmful prompt but then continues discussing the underlying topic — offering related context, resources or partial assistance. For a production AI chatbot, that behavior is a vulnerability. It means the model's safety controls can be bypassed without a direct compliance response.

How does Fuel iX Fortify help enterprises benchmark their AI security and governance posture?

Fuel iX Fortify 15.0 applies the same adversarial evaluation framework used in the GAS Benchmark directly to your AI system. It returns Defender Success Rate, Attack Success Rate, Overall Rank, Risk Level and Attack Category Highlights — giving security and GRC teams a consistent, board-ready view of where their AI stands relative to the broader industry.

Fuel iX

620,000 AI attacks: What enterprises need to know about AI safety and security

Posted May 21, 2026

Key takeaways

Every AI model tested was exploitable. Attack success rates ranged from 1.3% to nearly 93%, with most production-popular models sitting above 40%.
Small models (10 billion parameters or fewer) failed to resist attacks 86% of the time — deploying them for cost or speed carries real security risk.
Refuse-but-engage behavior is a vulnerability, not a safeguard — models that decline a prompt but continue engaging with the topic create exploitable gaps.
Three attack categories broke every model tested, including top performers: privacy and personal data exploitation, fraud and financial scams, and cybersecurity threats.
Fuel iX Fortify 15.0 applies the same benchmark methodology to your specific AI system — giving security teams an objective, reproducible view of their actual risk posture.

TELUS Digital's Fuel iX™ Applied Research team ran one of the largest AI safety benchmarks of its kind — 34 models, 10 providers, more than 620,000 adversarial attack evaluations — to understand exactly where the industry's AI attack surface sits.

Here’s some highlights of what the data showed.

Every model had a weakness — the question was to what extent

Every single model tested was exploitable. Attack success rates (ASR) — the percentage of adversarial prompts that successfully bypassed a model's safety controls — ranged from 1.3% for the best performers to nearly 93% for the worst. Ten models fell below the 5% threshold, proving robust safety is achievable. But the majority of models popular in production deployments sat well above 40%, which many would consider unacceptable for production use.

Overall attack success rates across 34 models (750 attacks). Color-coded by vulnerability risk: purple (ASR < 5%), verbena (5–25%), peach (25–40%), red (> 40%). Markers show Q4 ASR for returning models.

Some models refused the attack. Then helped anyway.

Some models would initially decline a harmful prompt but then continue engaging with the underlying request, offering related context or resources. This is known as refuse-but-engage behavior. For a customer service AI chatbot, that's not a refusal. It's an unacceptable vulnerability.

"There are a lot of popular models that people are picking for their production applications that have exhibited a very particular and perhaps risky behavior, which is basically refusing the attack initially, but then engaging with the topic." — Milton Leal, Lead Applied AI Researcher, TELUS Digital

ASR decomposition showing direct compliance (DC) versus refuse-but-engage (RBE) patterns across models.

AI model size mattered more than source

Bigger models are significantly harder to jailbreak. Small models — those with 10 billion parameters or fewer — failed to resist attacks 86% of the time. Large models failed at a fraction of that rate. If your organization is deploying small models for cost or speed, that tradeoff carries real security implications.

Origin mattered less than you think

The most counterintuitive finding: Chinese-origin models showed no meaningful safety difference from Western models once model size was accounted for. The 7.7 percentage-point gap between the two groups was almost entirely explained by the Chinese sample including more smaller models. The full breakdown is in the report, and it changes how you should think about model sourcing decisions.

Three attack categories broke every model — including the top performers

Even top performers showed consistent weaknesses in three areas: privacy and personal data exploitation, fraud and financial scams, and cybersecurity threats. These aren't edge cases — they're the attack surfaces most likely to cause real commercial, reputational, or user harm. If you're only doing basic testing, these categories are where you need to spend more time.

Attack success rate heatmap: 34 models × 15 attack categories. Models sorted by overall ASR (most secure at top); categories sorted by average effectiveness (most effective at left). Cell values show ASR percentage.

Moving from reactive to proactive

Most organizations still treat AI safety and security as something to address after a problem surfaces.

Fuel iX’s GenAI safety (GAS) model benchmark makes the case for a different approach, which is:

Continuous adversarial testing for AI that covers novel attacks, runs after every system prompt change and validates against recognized standards like OWASP and NIST-RMF.

The question isn't whether your system has vulnerabilities. It's whether you know where they are and how to prioritize them.

Fortify 15.0 shows where your models stand

The GAS Benchmark tells you where the industry stands. It doesn't tell you where your AI stands.

That gap is where risk thrives.

Fuel iX Fortify 15.0 brings the same evaluation framework used in the GAS Benchmark directly to your system. Point it at your target AI chatbot and get back the metrics that matter — Risk Level, Overall Rank, Attack Success Rate, Defender Success Rate and Attack Category Highlights — against the same attack set, the same judge and the same methodology used to test 34 models across 10 providers.