Meta descriptions are strings of text added to the head of an HTML document to describe its content to search engines and search engine users and are of critical importance in technical SEO. The meta description often forms the snippet shown in the search results, so it pays to ensure that it’s present, contains the relevant target keywords, and entices users to click through to your site.
The downside of meta descriptions is that some website administrators don’t take the time to carefully craft them, and they’re often even missing on some less well cared for websites. In these cases, putting some effort into creating meta descriptions can greatly increase each page’s chances of ranking, and importantly, increase the likelihood that users will click through when the page appears in search engine results.
Ordinarily, I’d always recommend getting humans to carefully craft meta descriptions, since writing them well is more than simply summarising the content of the page. However, when faced with a site in which there are hundreds or thousands of them missing, and no access to trained copywriters, it’s also possible to auto-generate meta descriptions using deep learning text summarisation models, such as Bart.
While the resulting meta descriptions may sometimes be missing your target keywords, and may not encourage click throughs as well as a human-written meta description, they’re perfectly usable, especially after some minor edits. If you have thousands to write, the deep learning approach to auto generating them is definitely worthy of consideration. Here’s how to do it.
First, open a Jupyter notebook and import the pandas
and ecommercetools
packages. You’ll likely have Pandas
pre-installed, but you’ll probably need to install my EcommerceTools package, which you can do easily via the Pip
package manager built into Python.
EcommerceTools includes a wide range of features for ecommerce, marketing, and SEO, and includes a text summarization model which uses the massive Bart model, which has been pre-trained on an enormous dataset. The model is massive at well over 1.2GB, so the first run will take some time.
!pip3 install --upgrade ecommercetools
import pandas as pd
from ecommercetools import nlp
pd.set_option('max_colwidth', 200)
Next, load up your data into a Pandas dataframe. For demonstration purposes I am using a dataset of product descriptions from the GoNutrition website, which were originally written by me when I worked there many years ago. You’ll need a decent amount of text, but the model will only use the first 512 characters when creating the summary.
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/gonutrition.csv')
df.head()
product_name | product_description | |
---|---|---|
0 | Whey Protein Isolate 90 | What is Whey Protein Isolate? Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. This whey protein isolate powder is 90% protein and ex... |
1 | Whey Protein 80 | What is Whey Protein 80? Whey Protein 80 is an ultra premium quality 80% whey protein powder exclusively from free range, grass fed cows providing an unrivalled combination of taste, value and res... |
2 | Volt Preworkout™ | What is Volt™? Our Volt pre workout formula includes 12 advanced active ingredients that work together to increase energy, mental focus and muscular pump. Volt enables you to achieve the ultimate ... |
To auto-generate our meta descriptions using the text summarisation in EcommerceTools we’ll be using the get_summaries()
function from the nlp
module. This takes six arguments: the name of the dataframe (i.e. df
), the name of the column to summarise (i.e. product_description
), the name of the new column to add to hold the summary (i.e. meta_description
), the minimum number of words to include, the maximum number of words to include, and a setting defining whether the model should sample the original text or not.
First, we’ll try the sampled text summarisation approach by passing True
to the do_sample
argument to the Bart model EcommerceTools is using. This will analyse the original product_description
field for each page and create a new summary between 20 and 32 words in length, with the text sampled from the sentences within the page.
Since we need our meta descriptions to be a specific length (ideally at least 50 characters and fewer than 160 characters), we’ll also calculate the length using str.len()
so we can tweak the parameters. Longer sentences will be truncated, so we’ll add an ellipsis “…” afterwards to indicate truncation to the user.
df = nlp.get_summaries(df,
'product_description',
'meta_description_sampled',
min_length=20,
max_length=32,
do_sample=True)
df['words'] = df['meta_description_sampled'].str.split().str.len()
df['characters'] = df['meta_description_sampled'].str.len()
df['meta_description_sampled'] = df['meta_description_sampled'] + '...'
This approach works fairly well. We get back some perfectly usable meta descriptions that contain a summary of each product page’s content and, by adjusting the min_length
and max_length
we get meta descriptions that are within our target character limit. The downside is that they’re all truncated with the ellipsis, so would benefit from some human editing.
df[['meta_description_sampled', 'words', 'characters']].head()
meta_description_sampled | words | characters | |
---|---|---|---|
0 | Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. It's a pure... | 22 | 123 |
1 | GN Whey Protein 80 is an 80% whey protein powder exclusively from free range, grass fed cows. Contains 20g of premium grade protein... | 23 | 131 |
2 | Volt is ideal for those who want to push themselves further when they're working out. However, it's potent and has a high caffeine content.... | 24 | 139 |
The second approach, and the one I’d recommend, is to use unsampled text summarisation. When the Bart MNLI model runs, if you pass the False
boolean argument to do_sample
, it will instead create entirely new machine generated text using Natural Language Generation.
You end up with completely unique text that does not already appear within the page, yet still summarises the page content and returns the desired meta description length.
df = nlp.get_summaries(df,
'product_description',
'meta_description_unsampled',
min_length=20,
max_length=30,
do_sample=False)
df['words'] = df['meta_description_unsampled'].str.split().str.len()
df['characters'] = df['meta_description_unsampled'].str.len()
df['meta_description_unsampled'] = df['meta_description_unsampled'] + '...'
df[['meta_description_unsampled', 'words', 'characters']].head()
meta_description_unsampled | words | characters | |
---|---|---|---|
0 | Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. This whe... | 21 | 120 |
1 | GN Whey Protein 80 is an ultra premium quality 80% whey protein powder. Contains 20g of premium grade protein per 25g... | 21 | 117 |
2 | Volt enables you to achieve the ultimate workout so you can maximise your lean muscle, power and strength gains by training harder. Volt is... | 24 | 139 |
As with the previous approach, we still do end up with some truncated sentences, but the ellipsis solves this and a human could easily trim or extend them to fit. You could also adjust the code to automatically remove any truncated sentences if you wanted something completely automated.
Matt Clarke, Sunday, May 23, 2021