How to use Natural Language Understanding models

Learn how to use Natural Language Understanding models (NLU) via PyTorch and Hugging Face Transformers to understand sentiment, perform NER, and more.

How to use Natural Language Understanding models
9 minutes to read

Hugging Face Transformers are a collection of State-of-the-Art (SOTA) natural language processing models produced by the Hugging Face group. Basically, Hugging Face take the latest models covered in current natural language processing (NLP) research and turns them into working, pre-trained models that can be used with its simple framework. Its aim is to “democratize” the models so they can be used by anyone in their projects.

What are Transformers?

Transformers, formerly known as PyTorch Transformers, provide general purpose models for most of the recent cutting edge models, such as BERT, RoBERTA, DistilBert, and GPT-2. These span Natural Language Understanding or NLU, so they can perform reading comprehension and answer questions, and Natural Language Generation, so they can produce new text, instead of just cutting snippets out of the training data. Over 250 models are available which are pre-trained to perform certain tasks, and can be fine-tuned on specific datasets due to the use of an approach known as “transfer learning”.

What is transfer learning?

Transfer learning is the key reason that most Natural Language Understanding and Natural Language Generation models have improved so much in recent years. In a typical machine learning problem, you’d create a set of training data and then train your model. If the dataset changes, you’d re-train your model from scratch, so it would have to re-learn absolutely everything.

In language processing tasks, some things a model must learn will be the same across each problem or dataset. Sentences typically have a similar structure and certain words follow others - linguistic representations, syntax, semantics, and structure are common across language. Therefore, rather than learning from scratch every time, transfer learning models are first trained on massive datasets on powerful multi-GPU systems to create an initial generalised model, which can then be fine-tuned on a specific task.

This means that a model originally built for one purpose, can easily be adapted for another, while still benefiting from the learnings of its predecessor, without the need to train it from scratch. This makes more sense when you think about it. If you had to learn the alphabet, learn English, and how to read every time you read a book, reading books wouldn’t be very quick or easy. The ability to be pre-trained and then fine-tuned is what gives these models the edge. It would take huge amounts of experience, GPU power, electricity, and time to do this in other ways.

How much data has been used to train these models?

Many of the SOTA NLP models have been trained on truly vast quantities of data, making them incredibly time-consuming and expensive to create. Many models are trained on the Nvidia Tesla V100 GPU compute card, with often huge numbers of them put into use for lengthy periods of time. Nvidia’s latest model employed over a thousand incredibly powerful GPUs.

These models aren’t something you could ever easily create on typical PC hardware. Not least because the datasets used for training are so big. RoBERTA, Facebook’s AI model, used a 160GB dataset. Nvidia’s transformer model is 24 times larger than BERT and five times larger than OpenAI’s GPT-2 model. As the models are so large, one common task for AI developers is to create smaller or “distilled” versions of the models which are easier to put into production.

Using distilled models means they can run on lower-end hardware and don’t need loads of re-training which is costly in terms of energy, hardware, and the environment. Many of the distilled models offer around 80-90% of the performance of the larger parent models, with less of the bulk.

What Python frameworks does Hugging Face support?

Hugging Face supports both TensorFlow and PyTorch. If you look at the stats below each model they offer, it looks like usage of the PyTorch versions seems to massively outweigh the use of TensorFlow. However, they’re both actually pretty easy to set up and use. The current Transformers work with Python 3.6+, PyTorch 1.1.0+, and TensorFlow 2.0+. As you’d expect, they recommend installing them within a Python virtual environment for the best results.

How do I install Hugging Face Transformers?

After you’ve installed either PyTorch 1.1.0+ or TensorFlow 2.0+, you can install the Hugging Face Transformers package via the Python Package Index (PyPi) by entering pip3 install transformers. Each time you load up one of the Hugging Face Transformers models, it will download the pre-trained model to your machine. These are often quite large - ranging in size from a few hundred MB to several GB. However, on the plus side, you don’t have to spend weeks or months training them yourself!

Some simple Hugging Face Transformers examples

To show how quick and easy it is to use the pre-trained models, let’s look at some really simple examples of using Hugging Face Transformers for some Natural Language Processing and Natural Language Understanding tasks. Obviously, you’d rarely use these out of the box, and you’d follow some fine tuning afterwards, but the examples show how easy it is to get decent results from the general pretrained models using an approach known as “zero shot” learning.

1. Sentiment analysis

from transformers import pipeline

nlp = pipeline('sentiment-analysis')

text = 'Your business provides lousy service. You should be ashamed.'

output = nlp(text)
[{'label': 'NEGATIVE', 'score': 0.9959068298339844}]

2. Question answering

from transformers import pipeline

nlp = pipeline('question-answering')

question = "How much cover is provided?"
context = """
Send a parcel to Ireland from only £5.81 exc VAT
Collection and drop-off service available
Get parcel cover worth up to £50 included
Protect your parcel up to £5000 with extra cover
Send up to 70kg on selected services

output = nlp(question=question, context=context)
{'score': 0.21761408342346614, 'start': 115, 'end': 124, 'answer': 'up to £50'}

3. Named entity recognition

from transformers import pipeline

nlp = pipeline('ner', model='xlm-roberta-large-finetuned-conll03-english')

text = """ANNE PRO 2 is a 60% size Bluetooth RGB mechanical keyboard. It’s smart and saves the space of the desk. Although there are only 61 keys, it can be used like a standard 104 -key keyboard by the key combination"""

output = nlp(text)

for item in output:
{'word': '<s>', 'score': 0.9837348461151123, 'entity': 'I-MISC', 'index': 0}
{'word': '▁', 'score': 0.9950863718986511, 'entity': 'I-MISC', 'index': 1}
{'word': 'ANNE', 'score': 0.9994246959686279, 'entity': 'I-MISC', 'index': 2}
{'word': '▁PRO', 'score': 0.9999248385429382, 'entity': 'I-MISC', 'index': 3}
{'word': '▁2', 'score': 0.9999505281448364, 'entity': 'I-MISC', 'index': 4}
{'word': '▁Bluetooth', 'score': 0.9998236298561096, 'entity': 'I-MISC', 'index': 9}
{'word': '▁RGB', 'score': 0.9945111870765686, 'entity': 'I-MISC', 'index': 10}
{'word': '</s>', 'score': 0.993192732334137, 'entity': 'I-MISC', 'index': 52}

4. Text summarization

from transformers import pipeline

summarizer = pipeline("summarization")

description = """
Amazon's smart home security division Ring has unveiled a flying camera that launches if sensors detect a potential home break-in.
It is designed to only activate when residents are out, works inside, and is limited to one floor of a building.
Owners will be given a smartphone alert to let them see the footage.
The company is not calling it a drone, but to all intents and purposes it is. The device is likely to spark fresh privacy concerns about the brand.
"The Always Home Cam is an incredibly ambitious device that will seem like something from a science fiction movie for many consumers," commented Ben Wood from the consultancy CCS Insight.
"I expect it to generate a huge amount of interest from technology enthusiasts who are typically the people who embrace smart home technology first. However, it is also likely to provoke a huge discussion around privacy and the future role of technology in the home."

summary = summarizer(description, max_length=100, min_length=50)
[{'summary_text': ' Always Home Cam is designed to only activate when residents are out, works inside, and is limited to one floor of a building . Owners will be given a smartphone alert to let them see the footage . Device is likely to spark fresh privacy concerns about the brand .'}]

Matt Clarke, Tuesday, March 02, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.