How to classify customer support tickets using Naive Bayes

Picture by Thirdman, Pexels.

9 minutes to read

In ecommerce, customer service staff are often among the busiest people in the organisation, handling hundreds of tasks every day, often simultaneously. However, CS managers often get so bogged down in handling their team’s hefty workloads that they can spend little time on improving their team’s efficiency.

This means that as the site traffic goes up, the team just gets busier and busier. The service for customers deteriorates, staff leave because of the workload, and the headcount needs to go up to handle the pressure.

Support ticket classification

One common approach to improving customer service efficiency is support ticket classification. By getting the customer service team to classify the tickets, it’s possible to analyse why customers are making contact so problems can be fixed, content clarified, and processes automated to allow customers to self-serve.

In the long-term, this means that CS staff have less work to do, service quality goes up, staff don’t leave as often, and headcount doesn’t need to be increased.

The other really useful thing you can do if support tickets are classified is to allocate them to specialists within the team. For example, technical issues, returns, or courier issues, could all be handled by specific team members to improve efficiency.

Classifying tickets manually is fine, but it’s time-consuming to do properly, so automating the process is better. In this project, I’ll cover the basics of building a model to classify support tickets using Natural Language Processing, and a model called Multinomial Naive Bayes.

Load the packages

Open a Jupyter notebook and import the below packages. If you don’t have pandas, numpy, or sklearn, you can install them by entering pip3 install package-name into your terminal.

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score

Load the data

Next, load up your dataset of customer support tickets. Most customer service platforms allow you to download the data on your tickets, but if you don’t have a dataset you can use this anonymised support ticket dataset from Microsoft.

df = pd.read_csv('all_tickets.csv')

After loading the data, I have filled in the NaN values to allow the model to run and have joined the title and body columns together into a single field and dropped the previous columns.

df = df.fillna(value='x')
df['text'] = df['title'] + ' '+df['body']
df = df.drop(columns=['title','body'])

df.head()

	ticket_type	category	sub_category1	sub_category2	business_service	urgency	impact	text
0	1	4	2	21	71	3	4	x hi since recruiter lead permission approve r...
1	1	6	22	7	26	3	4	connection with icon icon dear please setup ic...
2	1	5	13	7	32	3	4	work experience user work experience user hi w...
3	1	5	13	7	32	3	4	requesting for meeting requesting meeting hi p...
4	1	4	2	76	4	3	4	reset passwords for external accounts re expir...

Select your target variable

There are quite a few columns in this dataset that you could classify with your model. To keep things simple, we’ll select just one of them for now - the ticket_type column. This contains two ticket types: 1 with 34,621 tickets assigned and 0 with 13,928 assigned.

df.ticket_type.value_counts()

1    34621
0    13928
Name: ticket_type, dtype: int64

Process the data

There are several text processing techniques that can be used to convert text into numeric representations that can be interpreted by a model. I’ve used count vectorization. This “bag of words” approach counts the unique words in the text, and assigns them a score based on the number of times they occur within the document.

count_vec = CountVectorizer()
bow = count_vec.fit_transform(df['text'])
bow = np.array(bow.todense())

We then convert this bag of words vector to a dense Numpy array using todense(), so it can be used more efficiently in the model.

Define X and y

Now that we’ve created the bow containing our count vectors, we’ll assign this to X to use as our feature set, and set y to the ticket_type column we want our model to predict. If you want to create models for the other columns, you’ll obviously need to change y accordingly.

X = bow
y = df['ticket_type']

Split the test and training data

With X and y created, we can now use the train_test_split() function to split off our training data from our test data. I’ve assigned 30% of the data to the test dataset and have used the remaining 70% of the data for training purposes.

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.3, 
                                                    stratify=y)

Fit a Multinomial Naive Bayes model

Finally, we can fit our Multinomial Naive Bayes model to the X_train and y_train data, and then we’ll make some predictions on the unseen X_test data and assign them to y_pred. Multinomial Naive Bayes is fast, fairly simple, and works really well on text data, which is why it’s commonly used for spam classification.

model = MultinomialNB().fit(X_train, y_train)
y_pred = model.predict(X_test)

Assess model performance

To assess the performance of the model we’ll use two metrics: accuracy, and the F1 score, along with a classification report. The classification report shows that the model performs very well. For a quick first attempt we get decent results, with 97.9% accuracy, and an F1 score of 97.48. With some further tweaks and tuning it should be possible to make it even better.

print('Accuracy:', accuracy_score(y_test, y_pred))
print('F1 score:', f1_score(y_test, y_pred, average="macro"))

Accuracy: 0.9790593889461037
F1 score: 0.9748273745771491

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.94      0.99      0.96      4178
           1       1.00      0.97      0.99     10387

    accuracy                           0.98     14565
   macro avg       0.97      0.98      0.97     14565
weighted avg       0.98      0.98      0.98     14565

With a bit more work, and the creation of other models to classify the other columns, it would be possible to create a web service which takes the text within a support ticket and then returns the classifications for each value, thereby automating the process of ticket classification and speeding things up for the CS team.

The same sort of technique has also been applied to identify tickets that are likely to require escalation to a manager (Montgomery and Damian, 2017), which is particularly expensive for businesses. While you have access to the data, it’s also worth examining sentiment by ticket type via sentiment analysis (Werner et. al., 2018).