Most ecommerce websites use review platforms, such as Feefo, Trustpilot, and Google Reviews, to allow customers to give feedback on their service and the products they sell. The reviews help other customers understand whether a company or product is any good, and crucially, how the business deals with customers who experience an issue when ordering.
While the reviews left can sometimes be painful for ecommerce retailers to read, especially when the reasons for service failure are caused by third parties, such as couriers or other internal departments who just don’t get the impact of their behaviour upon customer retention, they can be extremely useful to analyse.
Ecommerce businesses can learn a great deal about customer satisfaction and the causes of customer churn by examining their own product and service reviews, and those of their rivals.
In this project, I’ll show how you can use some data science skills to query the Feefo API and obtain product and service reviews, analyse them, and even predict what your rivals are selling. To take this to the next step, check out my guide to analysing customer reviews using NLP which is based on Trustpilot data.
To get started, open a Jupyter notebook and import the pandas
, requests
, and json
packages. We’ll be using the Requests package to fetch data from Feefo’s API, the JSON module for parsing the data returned by the API, and the Pandas package to display and analyse the data.
In order to see more of the rows and columns returned, and to avoid data truncation, we can use the Pandas set_option()
function to define the max_rows
to 1000, the max_columns
to 1000, and set the max_colwidth
to 1000. This makes the data much easier to read.
import pandas as pd
import requests
import json
pd.set_option('max_rows', 1000)
pd.set_option('max_columns', 1000)
pd.set_option('max_colwidth', 1000)
Next, we’ll create a function to connect to the Feefo API and fetch the merchant ratings or reviews for each merchant over a given time period. When fetching merchant data from Feefo, there’s no need for you to authenticate, because the data is in the public domain.
You can simply pass the merchant_identifier for your chosen business (i.e. vauxhall
) to the Feefo Reviews API using the requests.get()
function. The response
object returned contains JSON that can be parsed as a Python dictionary, allowing you to extract the data you require and easily import it into Pandas and any other format you wish.
def _get_merchant_ratings(merchant_identifiers, since_period='year'):
"""Return a list of dictionaries containing the product and service ratings for each merchant.
Args:
merchant_identifiers (list): List of Feefo merchant identifiers
since_period (string, optional): Time since review (24_hours, week, month, year, all)
Returns:
Python list containing dictionaries of merchant ratings and addresses.
"""
results = []
endpoint = "https://api.feefo.com/api/10/reviews/summary/all"
for merchant_identifier in merchant_identifiers:
response = requests.get(endpoint + "?merchant_identifier=" + merchant_identifier + "&since_period=" + since_period)
results.append(json.loads(response.text))
return results
If you run the _get_merchant_ratings()
function on a single business, i.e. vauxhall
, you’ll be able to see the contents of the dictionary returned. This contains the name and address of the merchant, the total number of reviews in the meta
count
, as well as the spread of ratings.
In Feefo, merchants can choose whether customers are allowed to review their service or their products. Many ecommerce businesses prefer to control their own product reviews, to avoid vendor lock-in with review platforms, so you often see only service reviews returned.
As well as the rating
, which shows the current rating for the business on Feefo over the period selected, you also get a breakdown of the individual star ratings for 1, 2, 3, 4 and 5 stars for service reviews, and product reviews if they’re enabled by the merchant.
results = _get_merchant_ratings(['vauxhall'])
results
[{'merchant': {'identifier': 'vauxhall',
'name': 'Vauxhall',
'url': 'https://www.vauxhall.co.uk/',
'logo': 'vauxhall-logo.png',
'review_url': 'https://www.feefo.com/en-GB/reviews/vauxhall',
'addressCountry': 'GB',
'addressLocality': 'Luton',
'addressRegion': 'Greater London',
'postalCode': 'LU1 3YT',
'streetAddress': 'Griffin House',
'telephone': '01582426569'},
'meta': {'count': 1479, 'pages': 74, 'page_size': 20},
'rating': {'min': 1.0,
'max': 5.0,
'rating': 4.7,
'service': {'count': 281,
'1_star': 3,
'2_star': 7,
'3_star': 10,
'4_star': 26,
'5_star': 235},
'product': {'count': 1479,
'1_star': 24,
'2_star': 28,
'3_star': 81,
'4_star': 273,
'5_star': 1073}}}]
In order to make the Feefo API review data easier to read and use, our next step will be to convert the response
dictionary to a Pandas dataframe. From there we can export the dataframe to CSV or another format, store it locally use SQLite, or push it into a larger database such as MySQL or Google BigQuery.
We’ll make a function called get_merchant_ratings()
which will take a list of Feefo merchant identifiers, and an optional time period. The function will run the _get_merchant_ratings()
function we created above, create a Pandas dataframe in which to store the review data, and then loop over each result and store the values in a row of the dataframe.
def get_merchant_ratings(merchant_identifiers, since_period='year'):
"""Return a Pandas dataframe containing the product and service ratings for each merchant.
Args:
merchant_identifiers (list): List of Feefo merchant identifiers
since_period (string, optional): Time since review (24_hours, week, month, year, all)
Returns:
Python list containing dictionaries of merchant ratings and addresses.
"""
results = _get_merchant_ratings(merchant_identifiers, since_period)
df = pd.DataFrame(columns=['merchant_identifier','reviews','rating',
'service_count','service_1_star','service_2_star','service_3_star','service_4_star','service_5_star',
'product_count','product_1_star','product_2_star','product_3_star','product_4_star','product_5_star',
])
for result in results:
row = {
'merchant_identifier': result.get('merchant').get('name'),
'reviews': result.get('meta').get('count'),
'rating': result.get('rating').get('rating'),
'service_count': result.get('rating').get('service').get('count'),
'service_1_star': result.get('rating').get('service').get('1_star'),
'service_2_star': result.get('rating').get('service').get('2_star'),
'service_3_star': result.get('rating').get('service').get('3_star'),
'service_4_star': result.get('rating').get('service').get('4_star'),
'service_5_star': result.get('rating').get('service').get('5_star'),
'product_count': result.get('rating').get('product').get('count'),
'product_1_star': result.get('rating').get('product').get('1_star'),
'product_2_star': result.get('rating').get('product').get('2_star'),
'product_3_star': result.get('rating').get('product').get('3_star'),
'product_4_star': result.get('rating').get('product').get('4_star'),
'product_5_star': result.get('rating').get('product').get('5_star'),
}
df = df.append(row, ignore_index=True)
return df
Now, if we want to periodically fetch review ratings for a selection of the competitors in your market, we can simply create a list of merchant identifiers and loop over them to collect and display the information together, then parse the dictionaries to display the contents in a single Pandas dataframe. From discussions with Feefo, they consider 1-2 as negative, 3 as neutral, and 4-5 as positive.
If you want to find all the merchants in your market who are listed on the Feefo reviews platform, the easiest way to find them is to load the Feefo sitemap XML into my EcommerceTools package and analyse it in Pandas. This shows us that Feefo has 5015 merchants listed, but quite a few of them are test accounts or from companies that have since closed.
merchant_identifiers = ['vauxhall', 'expedia-com', 'axa-direct', 'tui-uk']
df = get_merchant_ratings(merchant_identifiers, since_period='year')
df.head(10).sort_values(by='rating', ascending=False)
merchant_identifier | reviews | rating | service_count | service_1_star | service_2_star | service_3_star | service_4_star | service_5_star | product_count | product_1_star | product_2_star | product_3_star | product_4_star | product_5_star | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Vauxhall | 1479 | 4.7 | 281 | 3 | 7 | 10 | 26 | 235 | 1479 | 24 | 28 | 81 | 273 | 1073 |
2 | AXA Insurance | 30360 | 4.6 | 30360 | 962 | 402 | 1136 | 4829 | 23031 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | TUI | 165 | 4.3 | 165 | 9 | 8 | 11 | 32 | 105 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | Expedia.com | 101913 | 4.1 | 101913 | 13849 | 3912 | 5815 | 15692 | 62645 | 0 | 0 | 0 | 0 | 0 | 0 |
Another beneficial analysis to undertake is to examine all the product reviews generated by a merchant, when the merchant has them enabled in Feefo. This can show you what they’re selling, and in what volumes, and what customers like and dislike. For ecommerce businesses where you’re selling products that aren’t your own brand, it can be a good way of identifying what is popular in the market and may be a range gap worth filling in your product catalogue.
The process for obtaining all reviews for a business listed on Feefo is more complex, because we have to deal with paginated sets of results. First, we’ll create a couple of helper functions called _count_results()
which returns the number of results in the paginated result set, and _count_pages()
which returns the number of pages.
def _count_results(results):
"""Return the number of results in a paginated result set."""
if results:
return results.get('summary').get('meta').get('count')
else:
return
def _count_pages(results):
"""Return the number of pages in a paginated result set."""
if results:
return results.get('summary').get('meta').get('pages')
else:
return
Next, we’ll create a function called _get_merchant_product_reviews()
to fetch the product reviews for a merchant over a specific time period. This uses the two helper functions created above to fetch the results, then loop through them and return a big list of dictionaries containing the reviews.
def _get_merchant_product_reviews(merchant_identifier, since_period='year'):
"""Return the product reviews for a given Feefo merchant.
Args:
merchant_identifier (string): Feefo merchant identifier
since_period (string, optional): Time since review (24_hours, week, month, year, all)
Return:
Python list of dictionaries of product reviews
"""
response = requests.get("https://api.feefo.com/api/10/reviews/product?page_size=100&merchant_identifier="+merchant_identifier+ "&since_period=" + since_period)
results = json.loads(response.text)
total_results = _count_results(results)
total_pages = _count_pages(results)
reviews = results['reviews']
i = 2
while(i <= total_pages):
response = requests.get("https://api.feefo.com/api/10/reviews/product?page_size=100&merchant_identifier="+merchant_identifier+ "&since_period=" + since_period)
results = json.loads(response.text)
reviews.extend(results['reviews'])
i += 1
return reviews
I’m not going to print the output of _get_merchant_product_reviews()
here, as the output will be massive. However, if you run the below code in your notebook, you’ll be able to see a huge list of dictionaries containing the useful data we need to parse from the Feefo API results.
reviews = _get_merchant_product_reviews('vauxhall')
Finally, we can wrap up the above functions and convert the massive list of dictionaries into a Pandas dataframe. We’ll use the get()
function to extract the specific values we want to extract from the dictionaries and we’ll put them in a new dictionary called row
which we’ll then append to our newly created Pandas dataframe.
def get_merchant_product_reviews(merchant_identifier, since_period='year'):
"""Return the product reviews for a given Feefo merchant in a Pandas dataframe.
Args:
merchant_identifier (string): Feefo merchant identifier
Return:
Pandas dataframe of product reviews
"""
reviews = _get_merchant_product_reviews(merchant_identifier, since_period)
df = pd.DataFrame(columns=['merchant_identifier', 'product_title', 'sku', 'created_at', 'rating', 'helpful_votes', 'review'])
for result in reviews:
for product in result['products']:
row = {
'merchant_identifier': result.get('merchant').get('identifier'),
'product_title': product.get('product').get('title'),
'sku': product.get('product').get('sku'),
'created_at': product.get('created_at'),
'rating': product.get('rating').get('rating'),
'helpful_votes': product.get('helpful_votes'),
'review': product.get('review'),
}
df = df.append(row, ignore_index=True)
df['rating'] = df['rating'].astype(int)
df['helpful_votes'] = df['helpful_votes'].astype(int)
df['created_at'] = pd.to_datetime(df['created_at'])
return df
Running the get_merchant_product_reviews()
function on a given business listed on Feefo will return a massive dataframe containing all their reviews. We can save the output of this to a CSV using the to_csv()
function. You may wish to use Natural Language Processing to analyse the text itself, as I’ve previously shown for Trustpilot reviews.
df = get_merchant_product_reviews('vauxhall', since_period='all')
df.to_csv('vauxhall-product-reviews.csv')
df.sample(5)
merchant_identifier | product_title | sku | created_at | rating | helpful_votes | review | |
---|---|---|---|---|---|---|---|
934 | vauxhall | Astra | 9B68 | 2021-09-14 14:21:22.713000+00:00 | 5 | 0 | Comfortable drive plenty of space great |
469 | vauxhall | Astra | 9B68 | 2021-09-07 13:21:40.499000+00:00 | 5 | 0 | Love th styling of the black roof and the black mirrors and the colour red is better than the red that’s on other cars I’ve seen |
431 | vauxhall | Grandland X | 92 | 2021-09-14 14:54:15.008000+00:00 | 5 | 0 | Everything 😊 apart from having to use the indicator everytime you change lane |
930 | vauxhall | Grandland X | 92 | 2021-09-14 15:06:55.971000+00:00 | 4 | 0 | There are alot of things, I like. |
1111 | vauxhall | Crossland X | 9G | 2021-09-17 09:26:52.482000+00:00 | 1 | 0 | The fuel consumption the look and the colour nothing left to say |
To see which products are most commonly reviewed by Vauxhall customers, we can groupby()
the product_title
and then use agg()
to calculate some aggregate statistics. This shows us that the Corsa is Vauxhall’s most reviewed car and has a solid rating of 4.54 out of 5. Both the Corsa-e and the Vivaro get perfect 5 star ratings from Feefo customers, but are selling in far lower volumes than the Corsa.
If you applied this approach to an ecommerce retailer, you’d be able to see a list of the products you probably should be selling, and perhaps also a list of poorly rated products you may want to leave out of your product range to avoid upsetting customers, thus helping increase your overall customer satisfaction and your overall Feefo scores.
reviews_by_product = df.groupby('product_title').agg(
reviews=('merchant_identifier', 'count'),
avg_rating=('rating', 'mean'),
).sort_values(by='reviews', ascending=False)
reviews_by_product
reviews | avg_rating | |
---|---|---|
product_title | ||
Corsa | 360 | 4.541667 |
Crossland X | 330 | 4.454545 |
Grandland X | 300 | 4.400000 |
Mokka | 180 | 4.833333 |
Astra | 150 | 4.500000 |
Corsa-e | 105 | 5.000000 |
Mokka-e | 60 | 4.500000 |
Vivaro | 15 | 5.000000 |
You can use exactly the same approach to obtain merchant or company reviews. The _get_merchant_reviews()
is effectively the same as the one we created for products. It runs the Feefo API query using Requests, determines the number of reviews and pages, and paginates through them to return a big list of Python dictionaries containing the review data.
def _get_merchant_reviews(merchant_identifier, since_period='year'):
"""Return the reviews for a given Feefo merchant.
Args:
merchant_identifier (string): Feefo merchant identifier
since_period (string, optional): Time since review (24_hours, week, month, year, all)
Return:
Python list of dictionaries of product reviews
"""
response = requests.get("https://api.feefo.com/api/10/reviews/all?page_size=100&merchant_identifier="+merchant_identifier+"&since_period="+since_period)
results = json.loads(response.text)
total_results = _count_results(results)
total_pages = _count_pages(results)
reviews = results['reviews']
i = 2
while(i <= total_pages):
response = requests.get("https://api.feefo.com/api/10/reviews/all?page_size=100&merchant_identifier="+merchant_identifier+"&since_period="+since_period)
results = json.loads(response.text)
reviews.extend(results['reviews'])
i += 1
return reviews
The get_merchant_reviews()
function runs the above function, but creates a Pandas dataframe containing the merchant reviews parsed out of the dictionaries, rather than the product reviews data. Since the Feefo API returns duplicates, it’s important to drop any that are identical, which we can do by using drop_duplicates()
and setting the subset
to review
and setting keep='last'
.
def get_merchant_reviews(merchant_identifier, since_period='year'):
"""Return the reviews for a given Feefo merchant in a Pandas dataframe.
Args:
merchant_identifier (string): Feefo merchant identifier
since_period (string, optional): Time since review (24_hours, week, month, year, all)
Return:
Pandas dataframe of product reviews
"""
reviews = _get_merchant_reviews(merchant_identifier, since_period)
df = pd.DataFrame(columns=['merchant_identifier', 'customer',
'created_at', 'review', 'service_rating'])
for review in reviews:
row = {
'merchant_identifier': review.get('merchant').get('identifier'),
'customer': review.get('customer', {}).get('display_name', {}),
'created_at': review.get('service', {}).get('created_at', {}),
'review': review.get('service', {}).get('review', {}),
'service_rating': review.get('service', {}).get('rating', {}).get('rating', {}),
}
df['service_rating'] = df['service_rating'].astype(int)
df['created_at'] = pd.to_datetime(df['created_at'], utc=True)
df = df.append(row, ignore_index=True).fillna(0)
df = df.drop_duplicates(subset='review', keep='last')
return df
If you run get_merchant_reviews('vauxhall')
you’ll now generate a Pandas dataframe containing all the unique service reviews Vauxhall customers have left. Since there are dates in here, you can even plot the Vauxhall sales velocity, and even estimate the number of each type of car they sold within given periods.
df = get_merchant_reviews('vauxhall')
df.head()
merchant_identifier | customer | created_at | review | service_rating | |
---|---|---|---|---|---|
1 | vauxhall | {} | 2021-09-17 18:50:06.733000+00:00 | Excellent service and very friendly staff who go out of their way to help and answer questions. | 5 |
2 | vauxhall | Alan Buckingham | 2021-09-17 17:14:41.508000+00:00 | Sold well and helpfull | 4 |
3 | vauxhall | {} | 2021-09-17 13:47:38.204000+00:00 | Nice sales people | 5 |
4 | vauxhall | Mr. Fletcher | 2021-09-17 13:04:42.204000+00:00 | Excellent sales person in the showroom . | 5 |
5 | vauxhall | {} | 2021-09-17 11:55:07.526000+00:00 | Was very professional and polite | 5 |
We can see from using the value_counts()
function that most Vauxhall customers are very happy, but there are a handful of customers who left a lower score. You can examine the reviews from the more disgruntled customers by filtering the dataframe on the service_rating
column.
This is a great way to understand what annoys your customers, or those of your rivals, so you can take steps to try and avoid the issues. You might also want to visualise the text of the reviews using word clouds to help understand what annoys customers most.
df.service_rating.value_counts().to_frame()
service_rating | |
---|---|
5 | 77 |
4 | 11 |
3 | 5 |
1 | 4 |
2 | 2 |
df[df['service_rating'] <=3].head()
merchant_identifier | customer | created_at | review | service_rating | |
---|---|---|---|---|---|
7 | vauxhall | Tony Bingham | 2021-09-17 10:38:22.805000+00:00 | I do not recommend buying a car on line it is the worst experience I have ever had buying a car | 3 |
11 | vauxhall | Michael Brammer | 2021-09-17 09:26:52.517000+00:00 | Not very good didn't get the car I wanted and what they promised me and the car I've got I could have had with a lot more spec for the same payment not happy never offers me a test drive either | 1 |
12 | vauxhall | Paul Mellor | 2021-09-17 08:14:50.094000+00:00 | make sure the car is prepared properly for pick up and put the right personal details down for the finance company finally the arrogance of the staff at the dealership the attitude is dont come back once the car as left site were not bothered | 2 |
27 | vauxhall | Kelvin Read | 2021-09-14 17:27:58.753000+00:00 | The dealership were superb. I had cause to contact Vauxhall Customer Service as the “free” 3 year Nav Plus entitlement as a Grandland X customer (no mention of model requirements, or exceptions) did not get authorised, or activated for use. At the time of writing this review almost a month later it is still not resolved. | 3 |
30 | vauxhall | {} | 2021-09-14 15:06:56.008000+00:00 | They need to listen to what their customers say in surveys, after an experience I've had, and these surveys are never looked at, and let's face it, do Vauxhall really care? I don't think so! | 1 |
Finally, the other neat thing you can do with Feefo data is use the volume of customer reviews to extrapolate the potential order volume each business might be generating. If the business is very similar to your own, you could even extrapolate their potential revenue by including Average Order Value (AOV) in your calculations.
According to my Feefo account manager, Feefo sees a typical response rate from review emails of around 8-9%, depending on how reminder emails are configured. On my sites, this figure has been anywhere from 2-9%. This means that for every 100 orders placed with a business, about 9 of them leave a service review.
We’ll make another function called extrapolate_performance()
to handle this task. This takes the dataframe from get_merchant_ratings()
, the estimated response_rate
you think merchants in the market probably generate, and the aov
or Average Order Value in the market.
def extrapolate_performance(df, response_rate, aov):
"""Given a dataframe of Feefo merchant reviews, an estimated
review email response rate, and an AOV, this function returns
a dataframe containing the estimated order volume and value.
Args:
df (dataframe): Pandas dataframe returned by get_merchant_ratings()
response_rate (float): Feefo review email response rate, i.e. 8
aov (float): Average Order Value for this market.
Returns:
df (dataframe): Pandas dataframe showing results.
"""
df['estimated_orders'] = ((df['reviews'] / response_rate ) * 100).astype(int)
df['estimated_revenue'] = df['estimated_orders'] * aov
df = df[['merchant_identifier', 'service_count', 'reviews', 'estimated_orders', 'estimated_revenue']].sort_values(by='estimated_orders')
return df
To run the function you first need to fetch the merchant ratings using the get_merchant_ratings()
function we created above, and then pass it to extrapolate_performance()
along with your estimated email review response rate (i.e. 7%) and your estimated AOV for the merchants, i.e. £90.
This isn’t going to be accurate, but it should give you a rough steer on their probable position within the market, providing you’re comparing very similar businesses (which I am not in the rough example below).
df = get_merchant_ratings(merchant_identifiers, since_period='year')
df_performance = extrapolate_performance(df, response_rate=7, aov=90)
df_performance
merchant_identifier | service_count | reviews | estimated_orders | estimated_revenue | |
---|---|---|---|---|---|
3 | TUI | 165 | 165 | 2357 | 212130 |
0 | Vauxhall | 281 | 1479 | 21128 | 1901520 |
2 | AXA Insurance | 30360 | 30360 | 433714 | 39034260 |
1 | Expedia.com | 101913 | 101913 | 1455900 | 131031000 |
Matt Clarke, Monday, September 06, 2021