Making Sure Your Chatbot Is Working As Intended? Scoring Metrics For Chatbots

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
The following two tabs change content below.
I am an expert data scientist and statistician living and working in London with experience in multiple domains including (but not limited to): deep learning, natural language processing, recommender systems, statistical modelling and research design. I am running my own consultancy, and I can take up work with companies of all sizes. I am also offering education services in the areas of data science, AI, machine learning and blockchain through my company Tesseract Academy (http://tesseract.academy.). The seminal event is a half-day workshop taking place every few months, but we also provide in-house training services. Finally, I am also involved in the blockchain space and I have been an advisor to many ICOs. My main specialties include white paper review and modelling token economies.

CHATBOTS IN THE MODERN ERA

I really like chatbots. Recently I finished a project with Artelligen, where we had to create a chatbot helps with compliance, and can connect to different data sources. However, one of the questions that comes up is, how can you make sure that the chatbot is working as intended? This is an interesting topic, first and foremost because there is not straightforward answer.

In other text-related fields, like information retrieval, performance can be measured through metrics such as precision and recall. However, things are not as easy with chatbots. The reason  is that chatbots can have various uses, so the goal can be different from business to business. Some examples chatbot use cases are:

  1. Replacement for customer support.
  2. Personal assistants.
  3. Intelligent interfaces for more standard functionalities (e.g. learning the news, or the weather).
  4. Replacement for professional services such as doctors, or financial advice.
  5. As tools to improve sales, e.g. by creating a chatbot salesman or a chatbot that makes smart recommendations.

If your goal is to replace a professional (e.g. a doctor), then you should be using metrics that measure how close the chatbot simulates a human’s skills. In the doctor example, accuracy of prediction might be one such metric. Another good metric in this case, would be the length of conversation. The shorter the conversation, the more efficient the chatbot.

USEFUL CHATBOT METRICS

However, in other cases, the length of conversation might not be directly related in a straightforward manner to performance. Let’s say that you built a chatbot that makes recommendations for clothes. A long conversation time might mean that the user is either engaged with the chatbot, and wants to chat, or that the user is confused. Similarly, a short conversation time might mean that the user lost interest, or it might mean that the chatbot is really very efficient in making good recommendations.

Here are some metrics which you might want to consider:

  1. Length of conversation: Do you want a chatbot that engages with user? Then the longer the better. Do you want a chatbot that simply delivers a service? Then the shorter the better.
  2. Confusion triggers: These are expressions the user might say to show that he/she is confused. E.g. saying “I don’t understand.”  or “I want to restart the dialogue”
  3. Sentiment analysis of the user’s dialogue: Angry words can clearly indicate something is going wrong.
  4. Business-related metrics: E.g. if you are using your chatbot in order to make product recommendations to your customers, then the sales that took place because of the chatbot is such as metric.

These were just some of the few metrics which you might want to take into account when using a chatbot. I expect that over time, metrics will get more standardised for the different scenarios. So, one day we might get the equivalent of precision/recall in information retrieval.

Print Friendly, PDF & Email

Comments

comments

Dr. Stylianos Kampakis

I am an expert data scientist and statistician living and working in London with experience in multiple domains including (but not limited to): deep learning, natural language processing, recommender systems, statistical modelling and research design. I am running my own consultancy, and I can take up work with companies of all sizes. I am also offering education services in the areas of data science, AI, machine learning and blockchain through my company Tesseract Academy (http://tesseract.academy.). The seminal event is a half-day workshop taking place every few months, but we also provide in-house training services. Finally, I am also involved in the blockchain space and I have been an advisor to many ICOs. My main specialties include white paper review and modelling token economies.

Making Sure Your Chatbot …

by Dr. Stylianos Kampakis time to read: 2 min
0