Using machine learning to analyze customer support conversations

Conversation topic clustering

The importance of data science lies in its ability to tell a data-driven story, using insights to help inform decision makers, and produce better outcomes.

In this post, you will get a high-level understanding of how a data scientist might solve an open-ended, human language-related problem, extracting important data insights from customer service conversations. You will also get to see real-world results that came from this process using machine learning.

Can computers understand human language?

Imagine for a moment you are a data scientist working for the most promising conversation design startup in the world, and your wonderful Customer Support team has just forwarded you a massive list of support conversations between them and their customers. They hand it over to you and ask, “Can you let us know what the most frequently asked questions were?”

Sounds easy enough!

First, you start by finding the most common sentences and get this list:

After looking at this table, you come to a stunning realization, “My computer does not understand human language!”. It may be able to tell that “Hello, can I have help?” and “May I have some assistance” contain different characters, but it does not recognize that they mean the exact same thing. There are so many conversations that you could not possibly manually find every single sentence with the same meaning and merge their counts, which means what your computer thinks are the most common questions are not necessarily the actual most common questions.

This brings forth a difficult question: “How do I get my computer to understand when two or more phrases have similar meanings?”, of course being the intelligent data science you are, you know the answer to this problem is document embeddings!

What is a document embedding?

You can think of a document embedding as a translation from a document (like a sentence or paragraph or multiple paragraphs) in human language, to a vector of numbers that a computer can process. These numbers are determined by machine learning embedding models, which include BERT and USE. You can think of these models as translators from human language to numbers.

High-level diagram depicting what embedding models do.

The ability to translate documents to vectors is useful because once we have vectors, we can calculate document similarity using metrics like euclidean distance and cosine similarity. This means we have a quantitative, automated way of finding out if multiple sentences have a similar meaning, eureka!

Back to our thought experiment as a data scientist working for an excellent conversation design company, we would take all of our customer support conversations and translate them using a machine learning embedding model into vectors. Now we have a whole lot of vectors, but we need to group the ones with similar meanings together, how should do that?

Semi-supervised learning using BERTopic

Thankfully, being an astute data scientist who is up to date with the latest developments in document clustering, you are well aware of the phenomenal BERTopic open source software package by fellow data scientist Maarten Grootendorst.


This package takes your documents, and quantifies them into vectors using machine learning embedding models, groups similar documents together using unsupervised clustering techniques, and then tells you what words are the most important/indicative of a given cluster.

After running your conversations through BERTopic, you can finally report your results to your Customer Support team, who will use this to make your company's customers even happier.


As you may have suspected, the hypothetical data scientist was actually me working for Voiceflow, and below I will detail my results working with BERTopic, and the insights I was able to extract.

After doing some pre-processing (like removing certain pre-written responses, and prioritizing exchanges started by customers) we get the following results:

Topics Found

A total of 65 clusters were found, the top 12 of which can be seen below as described by their most significant words. The topics are sorted in order of frequency of appearance.

Wow, you can already see some general phrases, topic 0 is when clients report bugs, topic 1 is when clients are asking for a feature, etc!

Most popular topics

Below we can see the degree to which some topics are more popular than others, with reporting a bug having the most occurrences.

NOTE: n= 3719. The most popular topic was unclassified with around 1000 out of the total 3500 convos considered left unclassified.

Representative Documents

Representative documents are conversations that are very representative of a particular topic, these are what the models believe are the prime examples of each topic.

This gives us great insight into how each cluster is created, which can help us better understand them.

Even more important, it provides literal examples of frequently asked questions.

Here are some examples of the representative documents of the top 5 topics:

Topics Over Time

Below we can see how topic frequencies changed over time, and this can give us an idea of how customer demands have been changing.

Top 4 most frequent topics over time.

Increasing Granularity

Some topics are a bit general and hard to extract out specific questions. For this, we can run further clustering on some topics to achieve more specific information. Below is an example for topic 9.


Running further clustering on this, we get two more topics:

Two sub-clusters found in order of frequency.

Representative Documents

It seems the sub clustering resulted in the identification of two pain points, issues with copy and pasting and issues with flows.


Thanks to the amazing power of document embedding and unsupervised clustering, we were able to find numerous frequently asked questions for our product.

Customers can often be found asking for pricing, reporting bugs, or asking about building chatbots. With this knowledge, the customer success team can help create robust documentation to help the customers, as well as work with the engineering and design teams to help improve Voiceflow. Wow, the power of data insights!

Future Work

Believe it or not, even more useful information can be extracted out of this data!

This includes:

  • Increasing granularity on more topics to find more frequently asked questions
  • Running sentiment analysis on the conversations to find in which clusters customers had their best and worst experiences
  • Checking to see which conversation clusters hold more subjective or objective problems (using machine learning) to find out which questions may be more difficult to answer.

How we're building Voiceflow’s machine learning platform from scratch