Data & AI

From validation to collaboration: Rethinking human-in-the-loop processes in trust and safety automation


Kaushik P S

Senior Product Director of AI & Data Solutions

Jessica Sargent

Director of Program Management, Gen AI

Jamie Evans

Program Manager, Gen AI

Key takeaways

  • Trust and safety models can’t rely on automation alone. While large language models (LLMs) flag content at scale, they enforce literal policy text rather than intent, missing context, edge cases and cultural nuance that experienced human reviewers catch instinctively.
  • The human role must shift from reviewer to policy steward. The most valuable human work is testing where AI breaks down, identifying novel harm patterns and closing the gap between policy text and meaning through structured reasoning and expertise.
  • Automation is shifting the human role from generalist moderation to strategic optimization. Demand is rising for specialists who can handle complex global regulations, debug "cross-policy collisions" and anticipate novel risks that static rules cannot catch.

LLMs are faster than ever at flagging content, but they have literal interpretation flaws, enforcing what a policy says, not what it means. In the realm of trust and safety, this literal interpretation is a liability. While automation can address the volume of digital content, it often misses the cultural nuance, adversarial tactics and intent-based context that human intuition catches. To build safer digital communities, we need to move beyond the traditional "human-in-the-loop" model of sequential review. Instead, we need to leverage “human-on-the-loop” collaboration — shifting the human role from a generalist moderator to a strategic policy steward who debugs logic, closes reasoning gaps and navigates the complexities of global regulation.

Discover how human expertise is being recalibrated and repurposed as these systems assume greater responsibility in trust and safety operations.

Why automation falls short: The limits of rule-based LLM interpretation

When a moderation policy is fed directly to an LLM, it executes literal text. It might miss intent and edge cases, overlook cross-policy collisions and ignore the cultural context that experienced human reviewers understand instinctively. We’ve observed this directly in our own work at TELUS Digital. In our adversarial evaluation programs, we expected models to struggle with genuinely hard cases and they did. But they also missed things we assumed were solved:

  • Using algospeak to evade filters by swapping letters for numbers or punctuation (e.g., "h@te" instead of "hate")
  • Placing a grid of lines over prohibited content to obscure it
  • Playing problematic audio at low volume underneath a loud, benign sound

These are basic audio-visual signals that any experienced reviewer would catch instantly.

Models also consistently failed to handle legitimate content used in acceptable contexts, like being educational or documentary in nature, or a narrative that has critique or transformative use that analyzes or condemns harmful material. In summary, they are not yet optimized to understand narrative context.

The solution: Human-on-the-loop framework

While we believe "human-in-the-loop" scaled operations will continue to serve trust and safety teams in the near future, we see an evolution in how human oversight can be optimized. Moving toward a "human-on-the-loop" model, where AI systems operate with greater autonomy while humans maintain supervisory control.

Human-on-the-loop processes are designed for better human-AI collaboration to enable the best outcomes for trust and safety. Human expertise is best deployed in validating and challenging how well the model's logic and reasoning holds up, rather than in reviewing outputs.

How does human-on-the-loop work?

Early research from DeepMind had shown that human-AI collaboration can improve content safety operations through methods like AI pre-filtering of safe content, escalating high-confidence violations, contextual assistance, autonomous reviews and acting as a secondary error detection layer.

In our own experiments to test these methodologies, we ran contested cases through a council of different LLMs simultaneously:

  • When multiple models agreed, the determination was straightforward.
  • When they disagreed, that disagreement itself provided diagnostic value. This signals genuine ambiguity either in policy language or the interpretation of narrative context, making the models disagree on judgments.

We then routed these specific cases to human experts, tagged with information about exactly where and why the models split. The human's job was not to review the content, but to:

  • Probe where the model broke down
  • Identify novel harm patterns no classifier had seen before
  • Close the gap between policy text and policy intent

This is policy stewardship, a fundamentally different job and collaborative interaction with AI that the industry needs to evolve into.

The value of human judgment in AI workflows

With AI being used more for content generation, the shape and scale of harms the models could produce will rapidly increase. This is a clear and present problem that humans are better equipped to detect. When devious users avoid banned words with subtle misspellings, humans can adapt quickly and recognize intent, positioning them to set precedents for future automation.

Human input also brings critical value that AI cannot replicate:

  • Consistency and bias reduction: Diverse human evaluation teams, paired with structured processes and clear guidelines, help identify potential biases in AI outputs.
  • Recalibration: Human decisions adjust AI thresholds and rules, optimizing performance. If AI is over indexed on removing a certain type of post, humans can adjust sensitivity, balancing safety with free expression.
  • Training data: Human decisions become training inputs, improving AI over time.
  • Empathy and urgency: Humans bring irreplaceable judgment on critical issues like child exploitation and imminent threats.

The evolution of trust and safety teams

The net effect of the redesign from human-in-the-loop to human-on-the-loop is raising the bar for trust and safety teams, requiring an increasing number of highly skilled specialists rather than generalist moderators. As user-generated content grows exponentially and AI quickly reviews and flags content, the need for these specialists grows as they:

  • Review the quality of AI decisions
  • Process appeals
  • Remain plugged into the platform
  • Understand community context and communication nuances

At TELUS Digital, we are pioneering small teams of forward-deployed researchers and machine learning experts who do not apply policies at scale, but instead:

  • Analyze patterns and debug the system
  • Analyze cross-policy collisions where guidelines produce contradictory signals
  • Identify edge cases that will cause distribution shifts
  • Catch failure patterns before deployment

Our adversarial evaluation programs have shown us that effective trust and safety programs require strategic human expertise in three areas: global laws and regulations (to catch jurisdictional nuances), behavior management (to anticipate evasion tactics) and AI safety (to validate model reasoning).

The future: Human-AI collaboration, not replacement

The essential characteristics that make humans irreplaceable should be augmented with technologies that make them even more effective, ultimately leading to better AI models and safer communities.

The issue isn’t whether AI will replace human moderators. Rather, it’s that humans with specialized skills must continuously supervise AI to cultivate smarter and safer global communities. This involves evolving, not closing, the feedback loop to guide AI in the right direction. Furthermore, human oversight is crucial for informing strategy, policy and accountability teams about AI's efficacy and challenges.

With our experience in training policy models, TELUS Digital can help you in this transformation to human-on-the-loop. Our global and diverse community of subject matter and safety experts can provide the human intelligence loop necessary to build model reasoning, helping you steer your automations that power platform integrity and content compliance. Connect with our team of experts to learn how we can evolve your training policy models.

Frequently asked questions

Be the first to know

Get curated content delivered right to your inbox. No more searching. No more scrolling.

Subscribe now