How to evaluate conversational AI for politeness

As generative AI and voice technology converge, the potential of conversational AI becomes increasingly apparent. Think: banks replacing branches with apps or hospitals replacing data entry with voice capture to power real-time care delivery.

To seize these use cases, businesses need a framework to evaluate conversational AI for politeness and other hard-to-measure dimensions of communication, such as empathy, attentiveness and compassion. Ultimately, these dimensions determine how well conversational AI meets users’ needs.

The more businesses turn to generative AI, the more they need a framework to evaluate responses for different conversational attributes. That way, they can monitor and fine-tune their generative AI systems to have more engaging, satisfying conversations with customers, thereby enhancing the overall customer experience.

Such a framework makes even more sense when we consider how subtle changes in prompts can substantially influence the behavior and response generated by large language models (LLMs). That makes prompt evaluation crucial in evaluating conversational attributes.

In this guide, you’ll learn:

How to build attribute classifiers for evaluating conversational AI
How to build and label datasets for testing these attribute classifiers
How to build a prompts dataset for measuring target attributes on different LLMs

Why we created a guide on evaluating conversational AI for politeness

How your conversational AI talks to users has a direct impact on customer satisfaction, retention and loyalty. Imagine an insurance company’s chatbot responding to customers as “dude" or an interactive educational app for children that speaks to them like Ph.D. students.

In each case, customers bring different expectations of politeness. “Dude” might be 100% appropriate language for a new auto insurance company aimed at young drivers. But for another audience, their trust is lost.

Businesses that understand this are the ones making continuous evaluation of their conversational AI systems a standard practice. This guide presents TELUS Digital's proven method for how to build, test and implement such a framework.

Data & AI

Learn how to measure hard-to-define attributes in conversational AI

Explore how particular attributes of conversation — even inherently nuanced dimensions such as politeness — can be monitored for in generative AI systems.

Insights Overview

Categories

Industries

Resource Types

Glossary

How to evaluate conversational AI for politeness

In this guide, you’ll learn:

Why we created a guide on evaluating conversational AI for politeness

Learn how to measure hard-to-define attributes in conversational AI

Share