How to assess product copy using EQA models

Picture by Andrea Piacquadio, Pexels.

10 minutes to read

In ecommerce, writing good product copy is both an art and a science. Not only does product copy need to be written in the correct tone and style for your brand and audience, it also needs to clearly convey the product features and benefits to encourage site visitors to purchase.

Many less experienced ecommerce copywriters often focus more on style of their copy, and less on thinking about what customers will want to know before they buy. As a result, they write lovely, readable copy that fails to communicate product features and benefits.

When customers can’t find the answers to their pre-purchase questions, not only is the product conversion rate lower, but customers will also be much more likely to contact your customer service or sales team for the answers, which increases your costs. Or, they just go elsewhere…

The rise of questions and answers in SERPs

These days, one increasingly import element of product copy is not just how well it answers the questions customers have, but also how well search engines can interpret the copy to allow it to answer the questions their users have entered into their search engine.

Search engines have increasingly adopted Extractive Question Answering or EQA to handle this. This deep learning technique uses Natural Language Understanding (NLU) models, such as BERT, to read and understand product content and extract the answers to the users’ questions.

Product copy now needs to include the right keywords to ensure customers can find it via on-site and off-site search, include the answers to any pre-purchase questions, sell the features and benefits, and format the information to allow Extractive Question Answering to do its stuff.

In this project…

In this project, we’re going to test a hunch I have. One of the things I always get my ecommerce copywriters to do is clearly explain: what the product does, how it benefits the customer, and why they should buy that particular product. These are the key things that sell technical products, and reduce the likelihood for customers to abandon their order and call customer services for help.

If we use EQA to find the answers to these common questions, the answers provided should give an indication of how well the copy is written to tackle these key issues. If the EQA model can extract the answers, then I suspect customers will also be able to do the same, and search engines also!

Load the packages

To get started, open up a Jupyter notebook and import pandas and the pipeline module from transformers. You’ll likely want to turn off warnings, as the pipeline currently throws a warning as one of the features used is soon to be deprecated.

import pandas as pd
from transformers import pipeline
import warnings
warnings.simplefilter("ignore")

Load the pipeline

Hugging Face has already created a pre-built model pipeline to handle question answering. This has been trained on the Stanford Question Answering Dataset (SQuAD). It’s a pretty massive model that was trained on an even bigger dataset, and it can handle Extractive Question Answering out-of-the-box. Define the pipeline and then wait for it to download.

nlp = pipeline("question-answering")

Load your dataset

Next, load up your dataset. I’ve used the GoNutrition dataset I created for my post on How to auto-generate product summaries using deep learning. This includes a few product descriptions I wrote when I worked on the GoNutrition launch many years ago.

df = pd.read_csv('gonutrition.csv')
df.head()

	product_name	product_description
0	Whey Protein Isolate 90	What is Whey Protein Isolate? Whey Protein Iso...
1	Whey Protein 80	What is Whey Protein 80? Whey Protein 80 is an...
2	Volt Preworkout™	What is Volt™? Our Volt pre workout formula in...

Define your questions

To see if EQA can find the key points in the product copy I’ve devised a series of questions. If you’re trying this out on your own products, you’ll likely want to make them specific to your product range and perhaps modify them based on the product category you’re examining.

questions = [
    "What is this product for?",
    "Why will it benefit me?",
    "What is it made from?",
    "What is special about this product?"
]

Run the Extractive Question Answering model

Now that’s set up, we can run the EQA model. First, we’ll create a Pandas dataframe in which to store the results. Then we’ll use iterrows() to iterate over each row in our product dataset and truncate the description to the first 512 characters, which is a limitation of this model. Arguably, not a bad thing for us, as it means the copywriters need to ensure they put the key points at the start.

In each iteration we’ll loop over the questions in our list and run them through the transformer model. We’ll then create a dictionary of the results, including the product name, question, answer, and confidence score, and append this to our dataframe of results. It only takes a second per product, so is fairly quick when you run it on a decent GPU.

df_items = pd.DataFrame(columns = ['product_name', 'question', 'answer', 'score'])

for index, row in df.iterrows(): 

    text = row['product_description'][:512]

    for question in questions: 
        result = nlp(question=question, context=text)

        item = {
            'product_name': row['product_name'],
            'question': question,
            'answer': result['answer'],
            'score': round(result['score'], 4)
        }

        df_items = df_items.append(item, ignore_index=True)

df_items.head(10)

Inspect the results

Finally, we can inspect the results of the EQA. I suspect, from a sports nutrition perspective, the answers could probably be better. However, this could imply that the copy could be further improved to generate better scores. My completely untested hypothesis is that if EQA can find the answer easily, then humans and search engines using EQA and Product Knowledge Graphs probably can too.

	product_name	question	answer	score
0	Whey Protein Isolate 90	What is this product for?	those looking to lose fat	0.2035
1	Whey Protein Isolate 90	Why will it benefit me?	those looking to lose fat and develop a more t...	0.1991
2	Whey Protein Isolate 90	What is it made from?	whey protein powder	0.0848
3	Whey Protein Isolate 90	What is special about this product?	90% protein and extremely low in fat and carbo...	0.0430
4	Whey Protein 80	What is this product for?	80% whey protein powder	0.1268
5	Whey Protein 80	Why will it benefit me?	fuel recovery and lean muscle gains.	0.0830
6	Whey Protein 80	What is it made from?	free range,	0.1208
7	Whey Protein 80	What is special about this product?	an ultra premium quality 80% whey protein powder	0.1406
8	Volt Preworkout™	What is this product for?	those who want to push themselves further when...	0.5253
9	Volt Preworkout™	Why will it benefit me?	training harder.	0.0206

The verdict

I’m not sure the approach is perfect, but the prototype seems to work. Aggregating mean scores for the questions asked on each product gives a breakdown of performance, which could be used to prioritise which products have copy most in need of improvement.

When I rewrote the copy to optimise it for EQA, I was able to generate text that covered the key points much more succinctly, which should work for humans too. I’d imagine there may be a correlation between better scores from the EQA and stronger product conversion. If you try it on your own product copy, I’d be really interested to hear if you think it works.

df_items.groupby('product_name').agg(
    avg_score=('score', 'mean')
)

	avg_score
product_name
Volt Preworkout™	0.207225
Whey Protein 80	0.117800
Whey Protein Isolate 90	0.132600

The LightGBM model is a gradient boosting framework that uses tree-based learning algorithms, much like the popular XGBoost model. LightGBM supports both classification and regression tasks, and is known for...

How to create a customer retention model with XGBoost

Although all business know the importance of retaining customers, few companies are actually able to measure customer retention accurately, and fewer still can predict which ones will churn or be...

How to add feature engineering to a scikit-learn pipeline

When building a machine learning model, feature engineering is one of the most important steps. Feature engineering is the process of creating new features from existing data and can often...

How to assess product copy using EQA models

Learn how to use Extractive Question Answering or EQA models to assess the quality of your ecommerce product copy.

The rise of questions and answers in SERPs

In this project…

Load the packages

Load the pipeline

Load your dataset

Define your questions

Run the Extractive Question Answering model

Inspect the results

The verdict

Further reading

How to create an ABC XYZ inventory classification model

How to create a fake review detection model

How to create ecommerce sales forecasts using Prophet

How to analyse Google Analytics demographics and interests with GAPandas

How to calculate CLV using BG/NBD and Gamma-Gamma

How to auto-generate product summaries using deep learning

How to use Pandas from_records() to create a dataframe

How to calculate an exponential moving average in Pandas

How to use the Pandas map() function

How to use Pandas pipe() to create data pipelines

How to use Pandas assign() to create new dataframe columns

How to measure Python code execution times with timeit

How to use Pandas from_records() to create a dataframe

How to calculate an exponential moving average in Pandas

How to use the Pandas map() function

How to use Pandas pipe() to create data pipelines

How to use Pandas assign() to create new dataframe columns

How to measure Python code execution times with timeit

How to assess product copy using EQA models

Learn how to use Extractive Question Answering or EQA models to assess the quality of your ecommerce product copy.

The rise of questions and answers in SERPs

In this project…

Load the packages

Load the pipeline

Load your dataset

Define your questions

Run the Extractive Question Answering model

Inspect the results

The verdict

Further reading

Other posts you might like

The LightGBM model is a gradient boosting framework that uses tree-based learning algorithms, much like the popular XGBoost model. LightGBM supports both classification and regression tasks, and is known for...

Although all business know the importance of retaining customers, few companies are actually able to measure customer retention accurately, and fewer still can predict which ones will churn or be...

When building a machine learning model, feature engineering is one of the most important steps. Feature engineering is the process of creating new features from existing data and can often...

Get the newsletter