How to auto-generate meta descriptions with EcommerceTools

Learn how to use EcommerceTools to create automated meta descriptions via deep learning and the Bart text summarisation model.

How to auto-generate meta descriptions with EcommerceTools
Picture by Pankaj Patel, Unsplash.
9 minutes to read

Meta descriptions are strings of text added to the head of an HTML document to describe its content to search engines and search engine users and are of critical importance in technical SEO. The meta description often forms the snippet shown in the search results, so it pays to ensure that it’s present, contains the relevant target keywords, and entices users to click through to your site.

The downside of meta descriptions is that some website administrators don’t take the time to carefully craft them, and they’re often even missing on some less well cared for websites. In these cases, putting some effort into creating meta descriptions can greatly increase each page’s chances of ranking, and importantly, increase the likelihood that users will click through when the page appears in search engine results.

Automatically generating meta descriptions

Ordinarily, I’d always recommend getting humans to carefully craft meta descriptions, since writing them well is more than simply summarising the content of the page. However, when faced with a site in which there are hundreds or thousands of them missing, and no access to trained copywriters, it’s also possible to auto-generate meta descriptions using deep learning text summarisation models, such as Bart.

While the resulting meta descriptions may sometimes be missing your target keywords, and may not encourage click throughs as well as a human-written meta description, they’re perfectly usable, especially after some minor edits. If you have thousands to write, the deep learning approach to auto generating them is definitely worthy of consideration. Here’s how to do it.

Load the packages

First, open a Jupyter notebook and import the pandas and ecommercetools packages. You’ll likely have Pandas pre-installed, but you’ll probably need to install my EcommerceTools package, which you can do easily via the Pip package manager built into Python.

EcommerceTools includes a wide range of features for ecommerce, marketing, and SEO, and includes a text summarization model which uses the massive Bart model, which has been pre-trained on an enormous dataset. The model is massive at well over 1.2GB, so the first run will take some time.

!pip3 install --upgrade ecommercetools
import pandas as pd
from ecommercetools import nlp
pd.set_option('max_colwidth', 200)

Load the data

Next, load up your data into a Pandas dataframe. For demonstration purposes I am using a dataset of product descriptions from the GoNutrition website, which were originally written by me when I worked there many years ago. You’ll need a decent amount of text, but the model will only use the first 512 characters when creating the summary.

df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/gonutrition.csv')
df.head()
product_name product_description
0 Whey Protein Isolate 90 What is Whey Protein Isolate? Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. This whey protein isolate powder is 90% protein and ex...
1 Whey Protein 80 What is Whey Protein 80? Whey Protein 80 is an ultra premium quality 80% whey protein powder exclusively from free range, grass fed cows providing an unrivalled combination of taste, value and res...
2 Volt Preworkout™ What is Volt™? Our Volt pre workout formula includes 12 advanced active ingredients that work together to increase energy, mental focus and muscular pump. Volt enables you to achieve the ultimate ...

Auto-generate meta descriptions

To auto-generate our meta descriptions using the text summarisation in EcommerceTools we’ll be using the get_summaries() function from the nlp module. This takes six arguments: the name of the dataframe (i.e. df), the name of the column to summarise (i.e. product_description), the name of the new column to add to hold the summary (i.e. meta_description), the minimum number of words to include, the maximum number of words to include, and a setting defining whether the model should sample the original text or not.

Sampled text summarisation

First, we’ll try the sampled text summarisation approach by passing True to the do_sample argument to the Bart model EcommerceTools is using. This will analyse the original product_description field for each page and create a new summary between 20 and 32 words in length, with the text sampled from the sentences within the page.

Since we need our meta descriptions to be a specific length (ideally at least 50 characters and fewer than 160 characters), we’ll also calculate the length using str.len() so we can tweak the parameters. Longer sentences will be truncated, so we’ll add an ellipsis “…” afterwards to indicate truncation to the user.

df = nlp.get_summaries(df, 
                       'product_description', 
                       'meta_description_sampled', 
                       min_length=20, 
                       max_length=32, 
                       do_sample=True)
df['words'] = df['meta_description_sampled'].str.split().str.len()
df['characters'] = df['meta_description_sampled'].str.len()
df['meta_description_sampled'] = df['meta_description_sampled'] + '...'

This approach works fairly well. We get back some perfectly usable meta descriptions that contain a summary of each product page’s content and, by adjusting the min_length and max_length we get meta descriptions that are within our target character limit. The downside is that they’re all truncated with the ellipsis, so would benefit from some human editing.

df[['meta_description_sampled', 'words', 'characters']].head()
meta_description_sampled words characters
0 Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. It's a pure... 22 123
1 GN Whey Protein 80 is an 80% whey protein powder exclusively from free range, grass fed cows. Contains 20g of premium grade protein... 23 131
2 Volt is ideal for those who want to push themselves further when they're working out. However, it's potent and has a high caffeine content.... 24 139

Unsampled text summarisation

The second approach, and the one I’d recommend, is to use unsampled text summarisation. When the Bart MNLI model runs, if you pass the False boolean argument to do_sample, it will instead create entirely new machine generated text using Natural Language Generation.

You end up with completely unique text that does not already appear within the page, yet still summarises the page content and returns the desired meta description length.

df = nlp.get_summaries(df, 
                       'product_description', 
                       'meta_description_unsampled', 
                       min_length=20, 
                       max_length=30, 
                       do_sample=False)
df['words'] = df['meta_description_unsampled'].str.split().str.len()
df['characters'] = df['meta_description_unsampled'].str.len()
df['meta_description_unsampled'] = df['meta_description_unsampled'] + '...'
df[['meta_description_unsampled', 'words', 'characters']].head()
meta_description_unsampled words characters
0 Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. This whe... 21 120
1 GN Whey Protein 80 is an ultra premium quality 80% whey protein powder. Contains 20g of premium grade protein per 25g... 21 117
2 Volt enables you to achieve the ultimate workout so you can maximise your lean muscle, power and strength gains by training harder. Volt is... 24 139

As with the previous approach, we still do end up with some truncated sentences, but the ellipsis solves this and a human could easily trim or extend them to fit. You could also adjust the code to automatically remove any truncated sentences if you wanted something completely automated.

Matt Clarke, Sunday, May 23, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Introduction to Natural Language Processing in Python

Learn fundamental natural language processing techniques using Python and how to apply them to extract insights from real-world text data.

Start course for FREE

Comments