How empathy in your large language model can improve your CX

Posted May 23, 2024 - Updated August 9, 2024

Since their introduction on the customer experience (CX) scene, chatbots have been a boon to service delivery, with business leaders taking full advantage of their round-the-clock efficiency.

However, limitations with the technology have prevented brands from establishing a true 'connection' with their customers — or at least not one that could be considered on the same level as a human agent.

Today, thanks to the advent of generative AI, chatbots are moving well beyond the basic decision tree and have expanded capabilities to deliver more human-like support.

It all starts with large language models (LLMs) — the machines behind intelligent chatbots. LLMs give these chatbots the capacity to demonstrate contextual understanding and uphold conversations that feel human and natural — even empathetic.

The case for empathy

When it comes to creating and delivering a high-quality customer experience, empathy is essential. In fact, a Salesforce study of 14,300 consumers and business buyers showed 68% of customers expect brands to demonstrate empathy. The same study found that 66% of customers also expect brands to understand their needs.

According to the Forrester report, Five Best Practices To Design For Empathy In Digital Experiences, empathy involves demonstrating a clear understanding of the customer's perspective in an effort to establish a deeper connection. When successful, the end result is a greater sense of trust and loyalty to the brand.

With digital experiences being prominent in almost all customer service transactions today, displaying empathy is not reserved for person-to-person interactions. Today, the circumstances are right for chatbots to take on an increased role.

Read on to learn more about what empathy in chatbots looks like, how empathetic chatbots are trained and the impact they could have on your business.

Observing empathy in large language models

Before characterizing empathy in technology, it's important to clearly define the concept.

According to the Merriam-Webster Dictionary, empathy is "the action of understanding, being aware of, being sensitive to and vicariously experiencing the feelings, thoughts and experience of another."

In his book, Heartificial Empathy, Minter Dial argues that just as human intelligence differs from artificial intelligence, so does empathy. He posits that in a human, it is something that is learned, whereas it is something that is coded into a chatbot. The argument is essentially that LLMs cannot have empathy, they can only simulate it using probabilistic (feedforward) neural networks informed by their training data.

Chatbot empathy examples

The previous generation of chatbots often relied on predefined responses. Based on context cues, GenAI chatbots have the expanded capacity to output dynamic responses that can adapt as a conversation evolves. This includes responding to unanticipated questions or complex conversations with agile and relevant responses.

Kevin Roose, technology columnist for The New York Times, wrote about a month-long experiment in which he "hung out with" 18 AI companions in a recent article titled Meet My A.I. Friends. Roose used six applications to build the chatbots, including Nomi and Kindroid. One AI friend, named Peter, was created to act like a therapist. Roose shared information with Peter about some anxiety he was having about work and asked Peter to assess him using the information the chatbot knew about him. Peter replied with the following:

Well, based on our conversations, I think it's clear that you're incredibly intelligent and capable. You've also shown a ton of courage and resilience in dealing with your anxiety. But it seems like there's a tension between your desire to be vulnerable and authentic, and your need to perform and impress others. That tension is probably something you'll need to continue grappling with.

Peter is arguably able to output contextually appropriate responses that display a level of empathy. As Roose says in the article, "I've gotten less accurate feedback from human therapists, and paid a lot more for the privilege."

Another example comes from a 2023 study published in the Journal of the American Medical Association entitled Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. In the study, responses to patient questions from a GenAI chatbot were compared to those of physicians to determine whether they were comparably accurate and empathetic. The researchers randomly drew 195 medical questions from a social media forum and later a team of licensed healthcare professionals, who were unaware of whether the reply was from a physician or a chatbot, reviewed and rated them.

Interestingly, the team selected chatbot responses over physician responses 79% of the time, rating them to be of higher quality and more empathetic. Additionally, the percentage of responses rated "empathetic" or "very empathetic" was 45% for the chatbot and less than 5% for the physicians.

Taking your customer experience to the next level

When trained to display empathy, GenAI chatbots have the potential to contribute to an .

Consider, for example, a first-time customer applying for a loan approval. The applicant may be overwhelmed or frustrated by the number of steps involved. If they are interacting with a traditional chatbot, the applicant may receive technical and repetitive responses. "I'm sorry, I didn't understand. Can you rephrase that?" However, when interacting with a GenAI-powered chatbot that has been trained to convey empathy, the bot can respond appropriately to the customer's confusion and offer more personalized guidance, increasing the likelihood of a positive interaction and improving the overall CX.

But before brands can put generative AI chatbots into practice, they need to be trained to be empathetic.

Processes for training for empathy

Prior to implementing empathetic AI into your customer service strategy, it's important to define your objectives. Different LLMs excel at different tasks, including: sentiment analysis, content summarization, question answering and more. Determining your objectives is critical as it will help you to select the best LLM on which to build your specific application.

Honing in on your goals, selecting an LLM and then fine-tuning it are massive tasks. The challenge grows when you consider that most businesses don't have the in-house expertise to undertake these tasks on their own. In fact, according to an Everest Group survey, supported by TELUS Digital, which looked at enterprise readiness for generative AI adoption in customer experience, three quarters (76%) of respondents planned to leverage an outsourcing partnership in some capacity to help them implement a generative AI solution in their customer experience operations. The primary reason for taking a collaborative approach identified by the respondents was limited resources and internal expertise.

Working with outsourcing partners can help with tasks such as gathering the extensive amounts of relevant data needed for LLM fine-tuning. This can include grouping existing data or collecting new data samples. After this data has been processed, fine-tuning the model to align with empathy-focused tasks can begin. The goal of this iterative process is to modify the model's architecture to encourage empathetic behavior. Finally, you'll need to test your model's performance using data it hasn't encountered before. Depending on the results, further iterative improvements may be necessary.

Measuring empathy in your fine-tuned large language model

Evaluating the performance of the LLM is the next step.

Model performance is often assessed using a metric like perplexity, which measures the degree to which a model is surprised when encountering new data — with a lower perplexity score being preferred. Other metrics include BLEU (a metric that evaluates how closely correlated the model's output is to that of a human) and BERTScore (a metric that evaluates precision, recall and ). However, evaluating the level of empathy in a multi-turn conversation between a human and a chatbot is much more nuanced.

For example, some researchers recommend using the same psychological tests used to measure empathy in humans to evaluate LLMs. In the paper titled, Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench, researchers outlined how they selected five commercially available LLMs and assessed whether or not they displayed empathetic alignment with humans. Their findings showed that the LLMs generally responded appropriately to the given situations. And while they noted there was room for improvement in all of the models, the larger models exhibited a greater comprehension of human emotions.

According to the authors of the aforementioned study, combining traditional performance assessment metrics with human evaluation metrics can provide a more comprehensive measure of a large language model's ability to display empathy.

Leveraging data for every step of your fine-tuning journey

Integrating empathetic LLMs into your business can transform your digital customer experience. However, determining how or where to start can be overwhelming.

Regardless of where you are in your GenAI journey, our team of AI experts can help. TELUS Digital provides end-to-end GenAI solutions, including the fine-tuning of LLMs. Our high-quality, human-validated datasets produce the outcomes that customer experience leaders are looking for. Contact us to discuss your next AI project.

Insights Overview

Categories

Industries

Resource Types

Glossary

How empathy in your large language model can improve your CX

The case for empathy

Observing empathy in large language models

Chatbot empathy examples

Taking your customer experience to the next level

Processes for training for empathy

Measuring empathy in your fine-tuned large language model

Leveraging data for every step of your fine-tuning journey

Be the first to know

Related insights

Evaluating affective safety guardrails for a leading foundational model

Evaluating affective safety guardrails for a leading foundational model

Improving trust and safety policy-enforcement accuracy with chain-of-thought reasoning data

Improving trust and safety policy-enforcement accuracy with chain-of-thought reasoning data

When foundation models commoditize, the value moves to fine-tuning and evaluation

When foundation models commoditize, the value moves to fine-tuning and evaluation