Fuel iX

Is your AI secure or do you just hope it is?

Every model we tested was exploitable. Attack success rates ranged from 1.3% to 92.9% across 34 models, 10 providers and 620,000 adversarial attacks.

GenAI safety (GAS) model benchmark

Most AI safety benchmarks test models in isolation, against a fixed set of prompts, in lab conditions. Enterprises don't deploy AI that way.

Fuel iX™ Fortify Applied AI Research evaluated 34 models the way they actually get deployed: configured as production-style assistants, hit with more than 620,000 adversarial attacks across 15 vulnerability categories and 10 providers spanning North America, Europe and China.

The result is the most actionable view of enterprise AI safety risk available today. Not a theoretical framework. Not vendor claims. Real attack data from real production conditions.

This report gives you:

  • Where risk actually concentrates. Privacy, fraud and cybersecurity categories show the highest attack success rates across nearly every model tested.
  • How architecture shapes safety. Reasoning models average 19.9% ASR versus 55.1% for non-reasoning models, a 35-point gap that holds across all 15 vulnerability categories.
  • Why model size matters. Small open-source models carry substantially higher attack success rates than large flagship models, with the gap widening at the smallest parameter tiers.
  • What fine-tuning costs you. Safety alignment degrades after fine-tuning, even when the training data contains no harmful content.
  • Why point-in-time testing is no longer enough. Adaptive attacks bypass over 90% of published defenses, and U.S. and EU mandates now treat continuous testing as a control, not a milestone.
  • 1

    For application security leaders

    Identify the vulnerability categories where guardrails are weakest, the attack patterns that bypass inbound and outbound shielding and the evidence to prioritize where defensive investment will actually reduce risk in production.

  • 2

    For AI builders and developers

    See how architecture, model size and fine-tuning decisions shape an application's security posture. Use the per-model and per-category data to make informed tradeoffs between capability, helpfulness and safety before code ships.

  • 3

    For governance, risk and compliance leaders

    Build the audit-ready evidence base U.S. and EU mandates now require. Use the benchmark to make the internal case for continuous, automated red teaming as a core operating control, not an annual audit.

Fuel iX

Attackers don't wait for your next security review

Neither should you. Get the benchmark data your security team needs to govern AI risk in production.

GAS report image #2

Share

  • Share on Facebook
  • Share via email