What should AI teams prioritize when building trust and safety automations?

Understanding the cost of errors is essential when optimizing automation rates. False positives erode user trust and suppress legitimate speech. False negatives create safety risks and regulatory liability. These costs are not equivalent and systems should be explicitly calibrated for the tradeoffs they can tolerate. The second priority is latency architecture. Prioritize pre-moderating all types of content before publication, so post-moderation can be focused on improving the efficacy of the model. Your choice of timing should be deliberate based on content type and safety requirements.

How is human expertise becoming more critical as AI advances?

As AI is used more for content generation, the shape and scale of potential harms increases. Humans are better equipped to detect novel harm patterns, understand cultural context, recognize adversarial tactics and anticipate risks that static rules cannot catch. Human decisions serve as training inputs that improve AI models over time. Humans also recalibrate AI thresholds, adjust sensitivity levels and provide feedback that optimizes performance while balancing safety with free expression.

How should organizations design human-on-the-loop collaboration to maximize model improvement without creating bottlenecks?

To design bottleneck-free human-on-the-loop collaboration, organizations need to shift from capturing binary labels to capturing reasoning. When humans decide content violates policy, capture not just the label but the reasoning — i.e., which policy was violated, what evidence led to that decision and how similar cases should be handled. This reasoning becomes the training signal that teaches models how to think, not just what to decide. Using disagreement as a diagnostic signal is another key factor. Run contested cases through multiple models simultaneously. When models disagree, that disagreement signals genuinely ambiguous content where policy language is open to interpretation. Route only disagreements to human experts with diagnostic information about where models split. This reduces human review volume by 80-90% while focusing effort on cases that matter. Decompose tasks across specialists: policy experts handle policy alignment, cultural specialists handle context, technical experts handle signal detection. Each operates in their strength and the system combines assessments.

Data & AI

From validation to collaboration: Rethinking human-in-the-loop processes in trust and safety automation

Posted April 7, 2026

Kaushik P S

Senior Product Director of AI & Data Solutions

Jessica Sargent

Director of Program Management, Gen AI

Jamie Evans

Program Manager, Gen AI

Key takeaways

Trust and safety models can’t rely on automation alone. While large language models (LLMs) flag content at scale, they enforce literal policy text rather than intent, missing context, edge cases and cultural nuance that experienced human reviewers catch instinctively.
The human role must shift from reviewer to policy steward. The most valuable human work is testing where AI breaks down, identifying novel harm patterns and closing the gap between policy text and meaning through structured reasoning and expertise.
Automation is shifting the human role from generalist moderation to strategic optimization. Demand is rising for specialists who can handle complex global regulations, debug "cross-policy collisions" and anticipate novel risks that static rules cannot catch.

LLMs are faster than ever at flagging content, but they have literal interpretation flaws, enforcing what a policy says, not what it means. In the realm of trust and safety, this literal interpretation is a liability. While automation can address the volume of digital content, it often misses the cultural nuance, adversarial tactics and intent-based context that human intuition catches. To build safer digital communities, we need to move beyond the traditional "human-in-the-loop" model of sequential review. Instead, we need to leverage “human-on-the-loop” collaboration — shifting the human role from a generalist moderator to a strategic policy steward who debugs logic, closes reasoning gaps and navigates the complexities of global regulation.

Discover how human expertise is being recalibrated and repurposed as these systems assume greater responsibility in trust and safety operations.

Why automation falls short: The limits of rule-based LLM interpretation

When a moderation policy is fed directly to an LLM, it executes literal text. It might miss intent and edge cases, overlook cross-policy collisions and ignore the cultural context that experienced human reviewers understand instinctively. We’ve observed this directly in our own work at TELUS Digital. In our adversarial evaluation programs, we expected models to struggle with genuinely hard cases and they did. But they also missed things we assumed were solved:

Using algospeak to evade filters by swapping letters for numbers or punctuation (e.g., "h@te" instead of "hate")
Placing a grid of lines over prohibited content to obscure it
Playing problematic audio at low volume underneath a loud, benign sound

These are basic audio-visual signals that any experienced reviewer would catch instantly.

Models also consistently failed to handle legitimate content used in acceptable contexts, like being educational or documentary in nature, or a narrative that has critique or transformative use that analyzes or condemns harmful material. In summary, they are not yet optimized to understand narrative context.

The solution: Human-on-the-loop framework

While we believe "human-in-the-loop" scaled operations will continue to serve trust and safety teams in the near future, we see an evolution in how human oversight can be optimized. Moving toward a "human-on-the-loop" model, where AI systems operate with greater autonomy while humans maintain supervisory control.

Human-on-the-loop processes are designed for better human-AI collaboration to enable the best outcomes for trust and safety. Human expertise is best deployed in validating and challenging how well the model's logic and reasoning holds up, rather than in reviewing outputs.

How does human-on-the-loop work?

Early research from DeepMind had shown that human-AI collaboration can improve content safety operations through methods like AI pre-filtering of safe content, escalating high-confidence violations, contextual assistance, autonomous reviews and acting as a secondary error detection layer.

In our own experiments to test these methodologies, we ran contested cases through a council of different LLMs simultaneously:

When multiple models agreed, the determination was straightforward.
When they disagreed, that disagreement itself provided diagnostic value. This signals genuine ambiguity either in policy language or the interpretation of narrative context, making the models disagree on judgments.

We then routed these specific cases to human experts, tagged with information about exactly where and why the models split. The human's job was not to review the content, but to:

Probe where the model broke down
Identify novel harm patterns no classifier had seen before
Close the gap between policy text and policy intent

This is policy stewardship, a fundamentally different job and collaborative interaction with AI that the industry needs to evolve into.

The value of human judgment in AI workflows

With AI being used more for content generation, the shape and scale of harms the models could produce will rapidly increase. This is a clear and present problem that humans are better equipped to detect. When devious users avoid banned words with subtle misspellings, humans can adapt quickly and recognize intent, positioning them to set precedents for future automation.

Human input also brings critical value that AI cannot replicate:

Consistency and bias reduction: Diverse human evaluation teams, paired with structured processes and clear guidelines, help identify potential biases in AI outputs.
Recalibration: Human decisions adjust AI thresholds and rules, optimizing performance. If AI is over indexed on removing a certain type of post, humans can adjust sensitivity, balancing safety with free expression.
Training data: Human decisions become training inputs, improving AI over time.
Empathy and urgency: Humans bring irreplaceable judgment on critical issues like child exploitation and imminent threats.

The evolution of trust and safety teams

The net effect of the redesign from human-in-the-loop to human-on-the-loop is raising the bar for trust and safety teams, requiring an increasing number of highly skilled specialists rather than generalist moderators. As user-generated content grows exponentially and AI quickly reviews and flags content, the need for these specialists grows as they:

Review the quality of AI decisions
Process appeals
Remain plugged into the platform
Understand community context and communication nuances

At TELUS Digital, we are pioneering small teams of forward-deployed researchers and machine learning experts who do not apply policies at scale, but instead:

Analyze patterns and debug the system
Analyze cross-policy collisions where guidelines produce contradictory signals
Identify edge cases that will cause distribution shifts
Catch failure patterns before deployment

Our adversarial evaluation programs have shown us that effective trust and safety programs require strategic human expertise in three areas: global laws and regulations (to catch jurisdictional nuances), behavior management (to anticipate evasion tactics) and AI safety (to validate model reasoning).

The future: Human-AI collaboration, not replacement

The essential characteristics that make humans irreplaceable should be augmented with technologies that make them even more effective, ultimately leading to better AI models and safer communities.

The issue isn’t whether AI will replace human moderators. Rather, it’s that humans with specialized skills must continuously supervise AI to cultivate smarter and safer global communities. This involves evolving, not closing, the feedback loop to guide AI in the right direction. Furthermore, human oversight is crucial for informing strategy, policy and accountability teams about AI's efficacy and challenges.

With our experience in training policy models, TELUS Digital can help you in this transformation to human-on-the-loop. Our global and diverse community of subject matter and safety experts can provide the human intelligence loop necessary to build model reasoning, helping you steer your automations that power platform integrity and content compliance. Connect with our team of experts to learn how we can evolve your training policy models.

Insights Overview

Categories

Industries

Resource Types

Glossary

From validation to collaboration: Rethinking human-in-the-loop processes in trust and safety automation

Key takeaways

Why automation falls short: The limits of rule-based LLM interpretation

The solution: Human-on-the-loop framework

How does human-on-the-loop work?

The value of human judgment in AI workflows

The evolution of trust and safety teams

The future: Human-AI collaboration, not replacement

Frequently asked questions

Be the first to know

Related insights

How to build post-training AI Interfaces at the speed of R&D with Express Interface

How to build post-training AI Interfaces at the speed of R&D with Express Interface

The accelerating GenUI ecosystem: MCP Apps, OpenAI’s Apps SDK and Google A2UI

The accelerating GenUI ecosystem: MCP Apps, OpenAI’s Apps SDK and Google A2UI

Why AI's path to AGI runs through math reasoning

Why AI's path to AGI runs through math reasoning