How to create a neural network for sentiment analysis

Learn how to use a recurrent neural network and the Long Short-Term Memory model to analyse sentiment with Keras and TensorFlow in Python.

How to create a neural network for sentiment analysis
12 minutes to read

Sentiment analysis, or opinion mining, is a form of emotion AI and uses natural language processing and computational linguistics to analyse text and infer the sentiment. Sentiment analysis has loads of practical uses.

If you use Grammarly, you may have noticed it predicting the tone of your emails when you reply, which is a great way to ensure the tone of your emails isn’t misinterpreted. I recently used sentiment analysis to examine product and service reviews from our a range of online retailers to identify the service and product issues that matter most to consumers. It’s very powerful.

In the tutorial below, we’ll be training a recurrent neural network model using Keras and TensorFlow. Keras is a Python package for deep learning that can be run on top of various other systems and it makes interfacing with Google’s TensorFlow deep learning platform a little easier. TensorFlow is really doing the hard work here and it’s an immensely powerful system, being widely used in businesses and research, as well as behind the scenes at Google.

Tensorflow can be used to run a range of models, but the one we’ll be using is Long Short-Term Memory or LSTM, which is a type of recurrent neural network or RNN.

Install Keras and TensorFlow and load the packages

Before you start, you’ll probably need to install Keras and TensorFlow, which you can do by entering the commands pip3 install keras and pip3 install tensorflow. This may take a short while as the TensorFlow libraries come in at around 300MB. If you have an Nvidia GPU, I’d recommend installing the tensorflow-gpu package instead of the bog standard TensorFlow as it runs much, much faster.

Once they’re installed, load up the packages we’ll be using. We’ll be using the Long-Short Term Memory recurrent neural network (LSTM RNN) from layers and the Dense and Embedding packages to handle the text analysis, plus the Sequential model and the sequence preprocessing package.

from keras.layers import LSTM, Dense, Embedding
from keras.models import Sequential
from keras.preprocessing import sequence
from keras.datasets import imdb

Load the data

Next, we’ll load up the data. Rather helpfully, Keras comes with some built-in datasets. We’re using the IMDB movie review sentiment classification dataset. This contains 25,000 movie reviews from the IMDB website which have been labeled with their sentiment (positive or negative). The model will examine the text in the training data and learn which characteristics define positive or negative sentiments.

The really handy thing about the IMDB data set provided in Keras is that the data have already been preprocessed. Before you give text to an RNN, you need to preprocess it to turn it into numeric data.

Therefore, rather than containing the actual text of the reviews, the data set contains special vectors that can be used by the neural network. Therefore, rather than the usual Pandas dataframe, the load_data() function here returns a tuple of Numpy arrays. If we set the num_words argument, we’ll limit the number of words examined to save time.

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=20000)

If you print the X_train data, you’ll see that it just contains a load of lists of numbers. Each number represents a word ranked by its frequency, so number 1 is the most common word in the dataset and number 1622 is the 1622nd. Any unknown words are assigned a zero. The labels stored in y_train are just 1s and 0s denoting whether the sentiment of the text was positive or negative.

As recurrent neural networks can take a long time to train, and this dataset is fairly large, we can use the Keras preprocessing sequence package’s pad_sequences() function to modify the data and speed things up. The pad_sequences() function essentially makes all of the sequences the same length by padding zeros at the beginning or end. The maxlen argument is used to truncate any sequences that are over a particular length. We’ll limit our sequences to 100 characters to see if this improves the speed.

X_train = sequence.pad_sequences(X_train, maxlen=100)
X_test = sequence.pad_sequences(X_test, maxlen=100)

Create the neural network

Now we have our data, and it’s rather helpfully been preprocessed, we can move on to the creation of the neural network. The specific neural network we’re using to analyse review sentiment is a recurrent neural network called LSTM or Long Short-Term Memory. This model is really useful and can be used for a variety of things, including time series analysis, anomaly detection and speech and handwriting recognition.

First, we’ll load the Sequential model class and set up something called the embedding layer. This basically converts the Numpy arrays into “dense vectors” of a fixed size using padding, so it’s more convenient for the neural network to handle. The embedding layer has a vocabulary size of 20000 words (because that’s the num_words argument we passed when we loaded up the data), and while the 128 value denotes a 128 unit output dimension.

We then add the LSTM model, set the dropout rates and finally use Dense and the sigmoid function to determine the sentiment as either 1 or 0.

model = Sequential()
model.add(Embedding(20000, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

Compile the model

The next step is to use compile() to determine how we run the model. Just as in scikit-learn, Keras lets you define the settings the model uses. There are three main things to configure with the compile() function: the losses, the metrics and the optimizers.

Losses, or loss functions, tell the model what it should try and reduce during the training process. For regression problems that might be the mean squared error or mean absolute error, while for probabilistic problems, like this one, you’d use something like binary cross entropy, categorical cross entropy or the Poisson function.

We’ll try binary_crossentropy. Kera supports several different optimizers, which are suited to different tasks. I’ve tried nadam here and found it a little better than adam. Finally, as with scikit-learn, you can also define the metric by which you are judging your model’s performance - accuracy is fine for this task.

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Fit the model

With everything now set up and ready to go, we can fit our model to the training data. The batch_size argument tells the model how many samples to “propagate” through the neural network, the epochs argument tells Keras how many how many training batches to run. It’s a bit like a fold in K folds cross validation.

The verbose argument prints out the results as they happen, which is handy, because this is going to take a while to run. On my overclocked 4GHz Ryzen 3700X data science workstation with 64GB of RAM this takes just over 10 minutes to complete.

model.fit(X_train, y_train,
          batch_size=32,
          epochs=10,
          verbose=2,
          validation_data=(X_test, y_test))
    Epoch 1/10
    782/782 - 64s - loss: 0.4305 - accuracy: 0.7977 - val_loss: 0.3509 - val_accuracy: 0.8442
    Epoch 2/10
    782/782 - 63s - loss: 0.2433 - accuracy: 0.9036 - val_loss: 0.4023 - val_accuracy: 0.8279
    Epoch 3/10
    782/782 - 63s - loss: 0.1547 - accuracy: 0.9427 - val_loss: 0.4512 - val_accuracy: 0.8388
    Epoch 4/10
    782/782 - 63s - loss: 0.1077 - accuracy: 0.9623 - val_loss: 0.6039 - val_accuracy: 0.8344
    Epoch 5/10
    782/782 - 63s - loss: 0.0766 - accuracy: 0.9731 - val_loss: 0.6015 - val_accuracy: 0.8319
    Epoch 6/10
    782/782 - 63s - loss: 0.0499 - accuracy: 0.9835 - val_loss: 0.6356 - val_accuracy: 0.8315
    Epoch 7/10
    782/782 - 63s - loss: 0.0445 - accuracy: 0.9844 - val_loss: 0.7427 - val_accuracy: 0.8304
    Epoch 8/10
    782/782 - 63s - loss: 0.0319 - accuracy: 0.9895 - val_loss: 0.7467 - val_accuracy: 0.8119
    Epoch 9/10
    782/782 - 63s - loss: 0.0243 - accuracy: 0.9924 - val_loss: 0.8662 - val_accuracy: 0.8254
    Epoch 10/10
    782/782 - 63s - loss: 0.0176 - accuracy: 0.9943 - val_loss: 0.9792 - val_accuracy: 0.8232
    <tensorflow.python.keras.callbacks.History at 0x7fe551a5f1f0>

Evaluate the results

Assuming your computer didn’t catch fire under the strain, you should now have seen the results of each epoch appear on your screen showing you the loss, the accuracy the val_loss and the val_accuracy for each round.

Just as in K folds cross validation, it’s normal to get slightly different results in each epoch, so to assess the overall performance we can calculate the mean of these scores. Keras includes an evaluate() function which makes this very straightforward. Simply pass it the X_test and y_test data and it will return the results.

We get an overall accuracy of 0.8226 and a loss of 0.8526 which looks pretty good for a first attempt. You can try tweaking the compile() settings to see if you can generate any further improvements.

score, accuracy = model.evaluate(X_test, y_test, batch_size=32, verbose=2)
    782/782 - 7s - loss: 0.9792 - accuracy: 0.8232

To examine the predictions we can use model.predict(). Here, we’ll get predictions for the first five rows in the X_test data. The predictions are returned as probabilities (so this is effectively a bit like predict_proba() in scikit-learn), so anything under 0.5 is negative in sentiment and anything above 0.5 is positive. As this is a preprocessed dataset, unfortunately, we don’t have the original source data to join back to this to examine how good the predictions are.

predictions = model.predict(X_test[:5])
predictions
    array([[0.8680382 ],
           [0.9999167 ],
           [0.6412297 ],
           [0.07330499],
           [0.9999981 ]], dtype=float32)

In the next article, I’ll explain the steps you can follow to preprocess data to get it ready for your machine learning model.

Matt Clarke, Tuesday, March 02, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.