Translating Manual Scorecards Into AI-Driven Auto Scorecards

Persons hand and robot hand touching a virtual screen
134

MiaRec explains how to transition from manual scorecards to AI-driven contact centre evaluations.

When evaluating your contact centre agents’ performance, you have to tread a fine line between judging whether or not someone has done something well and remaining objective.

This can sometimes lead to supervisor bias (we tend to less harshly judge the people we like) or skewed objectivity as we become tired after evaluating dozens of calls.

The beauty of AI-driven Quality Management is that it never has a bias, never gets tired, and can score all your calls automatically, making it extremely scalable.

However, it cannot answer subjective and qualitative questions as accurately as a human can. It also has no insight outside of the information it is provided with.

AI-driven scorecards break down actions into measurable steps, making evaluations standardized and unbiased.

To achieve highly accurate results, we need to consider a few things and carefully translate the manual evaluation questionnaire into an AI-based scorecard.

Therefore, transitioning from a manual evaluation questionnaire to an AI-based scorecard involves several strategic adjustments to ensure objectivity, precision, and effective automation.

Here are key pieces of expert advice to guide this process:

1. Quantify and Standardize Evaluation Criteria

Manual scorecards often contain qualitative and subjective assessments. To translate them into an AI-based scorecard, you need to break your evaluation into quantifiable and standardized steps.

Focus on specificity over generality by replacing general questions (e.g., “Did the agent introduce themselves properly?”) with specific, measurable actions (e.g., “Did the agent state their first and last name?” and “Did the agent mention the company’s name?”).

Also, frame questions to yield binary responses wherever possible to eliminate ambiguity. Often, this means turning a How question (e.g., “How effectively did the agent restate the issue to confirm understanding?”) into a question that can be answered with Yes or No (“Did the agent restate the caller’s issue to confirm understanding?”).

2. Rewrite Questions for Clarity and Precision

As anyone who has ever worked with ChatGPT, Gemini, or another Generative AI tool knows, it matters how you formulate your questions and requests.

Any prompt needs to be written clearly and precisely. The same applies here. Ensure your AI scorecard questions are clear and precise by removing any potential for misinterpretation. Let’s have a look at these two questions:

  • “Did the agent confirm the reason for calling?”
  • “Did the agent confirm the reason why the customer was calling?”

The second prompt is better than the first because it is more specific, clearer, and contextually understood.

“Why the customer was calling” focuses on the customer’s intent. It helps the AI to look for specific interactions where the agent directly addresses the customer’s problem.

It also indicates that the agent should confirm the customer’s specific reason for contacting the centre. It reduces ambiguity by explicitly stating that the agent needs to understand the customer’s particular issue or need.

“The reason for calling,” on the other hand, is more general and can be interpreted in various ways, potentially leading to confusion or misinterpretation by the AI.

3. Ask One Question At A Time

Manual evaluation questionnaires often include checklist-type questions, like “Did the agent verify the customer’s date of birth, street address, account number, and reason for calling?”

While the question itself is clear and objective, AI will often provide inconsistent results with questions like these because it needs to verify four different things, and the agent will fail if even one is not completed.

However, in reality, the AI will often respond with something like this: “Yes, the agent verified the customer’s date of birth but did not verify their street address, account number, or reason for calling.”

OR “No, the agent did not verify the customer’s date of birth or street address. However, the agent did verify the customer’s account number and date of birth.” and awarded points, resulting in a false positive.

Questions like these should be broken down into separate questions for AI. For example:

  • Question 1: Did the agent verify the customer’s date of birth?
  • Question 2: Did the agent verify the customer’s street address?
  • Question 3: Did the agent verify the customer’s account number?
  • Question 4: Did the agent verify the reason why the customer was calling in?

This way, you guarantee that the AI checks for every point on your checklist and significantly increases the probability that you get a correct result.

4. Show the AI What “Good” Looks Like

Artificial Intelligence can identify a specific dog breed because it has seen and classified millions of pictures of dogs.

Similarly, AI needs clear indicators to evaluate performance. By adding specific examples of phrases or keywords that align with desired behaviors, you can help the AI make the right decision on whether your agent meets your quality criteria or not.

For example, the question “Did the agent greet the customer in a professional and friendly manner?” is highly subjective.

Provide the AI with phrases associated with specific actions. Here is an example: “Did the agent use phrases indicating politeness or positive engagement, such as ‘please,’ ‘thank you,’ ‘glad to help,’ ‘happy to assist,’ etc.?”

Another example is whether or not the agent expressed empathy. By adding examples such as “I’m sorry to hear that happened to you” or “I can see why this situation is frustrating for you” to your prompt, you provide clear decision guidelines.

5. Incorporate Context and Process Details

Many scorecards include questions like “Did the agent follow the processes provided?” The AI Scorecard can only accurately answer if it has all the information.

In this case, it cannot make that decision based on the call recording transcript alone; it needs to know what the specific processes relevant to the evaluation look like.

If possible, describe the step in an example or description section to give the AI the required context, leading to more accurate and relevant assessments.

6. Allow for Contextual Flexibility

Not all customer conversations follow an exact script. In fact, most don’t. Some are disconnected, interrupted, or based on a misunderstanding.

In any case, to accurately evaluate all relevant customer interactions, we need to account for some questions that are not applicable to avoid false positives or negatives. In other words, we need to include N/A options where necessary.

7. Optimize and Customize Prompts

Great prompts are rarely written in the first iteration. It often needs some optimizing and customizing to get it exactly right.

Here is where a Prompt Designer is incredibly useful. A Prompt Designer allows you to write and test a prompt in your live environment without impacting your reporting or analytics. It is like a sandbox environment to try out the effectiveness of your prompts.

The neat thing is that you can apply all the tips above and run one prompt, tweak it a bit, run it again, and see how the results vary. This will yield much more accurate results tailored to your organization’s context.

8. Regularly Review and Update

Just as your manual scorecards should be reviewed and updated regularly, so should your AI-based ones. Based on feedback and evolving AI capabilities, continuously improve your AI prompts and processes. Our tip: Start with one section, refine it, and measure effectiveness before moving on to the next.

Conclusion

Although translating manual evaluation questionnaires into AI-based scorecards is relatively easy and straightforward, it does require a somewhat systematic approach to ensure accuracy and objectivity.

You can create effective automated scorecards by quantifying actions, defining specific keywords, rewriting questions for clarity, and leveraging AI insights.

At the same time, regular reviews and updates ensure continuous improvement and alignment with evolving AI capabilities.

Stay tuned for one of our upcoming articles covering how to run scorecards conditionally against calls based on specific metadata.

For example, if you just created a sales scorecard, you can run it against ONLY my sales calls longer than two minutes. This way, you have optimized your entire QA process end-to-end.

This blog post has been re-published by kind permission of MiaRec – View the Original Article

For more information about MiaRec - visit the MiaRec Website

About MiaRec

MiaRec MiaRec is a global provider of Conversation Intelligence and Auto QA solutions, helping contact centers save time and cost through AI-based automation and customer-driven business intelligence.

Find out more about MiaRec

Call Centre Helper is not responsible for the content of these guest blog posts. The opinions expressed in this article are those of the author, and do not necessarily reflect those of Call Centre Helper.

Author: MiaRec

Published On: 8th Aug 2024
Read more about - Guest Blogs, , ,

Follow Us on LinkedIn

Recommended Articles

question exclamation
Evaluation Spreadsheets vs. Scorecards. Which One to Choose?
Panel of judges holding signs with highest score - call scoring concept
Call Scoring in the Contact Centre: Manual Vs. Automatic
golf scorecard
How BPOs can use Scorecards for Better Coaching
Double exposure of creative artificial Intelligence symbol with a hand writing in notebook
Understanding AI-Driven Forecasting