ABC analysis originally came from the field of inventory management, where it’s used by procurement staff to classify inventory items into three categories - A, B, and C - to help them control their inventory and avoid costly stock-outs. However, the ABC analysis technique is extremely powerful in other fields too. I’ve regularly used ABC analysis for customer segmentation, and for the segmentation or classification of site content, and typically add a fourth class - D - to denote items with zero contribution.
In this post, I’ll show you how to use ABC analysis to classify Google Search Console data using my EcommerceTools library. EcommerceTools is a data science toolkit for doing data stuff in Python for ecommerce, marketing, and SEO. It makes it really easy to query the Google Search Console API.
ABC analysis works by calculating a cumulative sum and then assigning the first 80% to Class A, the next 10% to Class B, and the final 10% to Class C. The cumulative sum is calculated by sorting the data in descending order and then adding up the values. We’ll be calculating the cumulative sum of the number of clicks each page generates, so our Class D will contain those pages generating zero clicks.
While this might seem quite trivial stuff, it’s surprisingly useful for helping SEOs and content team understand a site’s traffic and to help prioritise their work. The Class A pages represent those that need to be watched carefully as they carry risk.
The Class D group could include pages that you don’t intend to rank for, or pages that are poorly optimised and need to be fixed. The Class B and Class C groups both include pages that could have room for improvement. Combine the ABCD class data with impressions, and you’ve got a powerful dataset for SEOs to use.
To get started, open a Jupyter notebook and import Pandas and my EcommerceTools package. If you don’t have EcommerceTools installed, you can install it via the PyPi Python package repository by entering the command pip3 install --upgrade ecommercetools
.
!pip3 install --upgrade ecommercetools
import pandas as pd
from ecommercetools import seo
We’re going to use EcommerceTools to fetch data from the Google Search Console API and then classify the data using the EcommerceTools SEO module. In order to connect to Google Search Console you’ll need to create a service account and download the JSON credentials file. You’ll also need the name of your Google Search Console API site URL. This will usually be in the format of https://www.example.com
, but if you have a domain property it will be in the sc-domain:example.com
format instead. You’ll also need to define the start and end date for your analysis.
key = "pds-client-secrets.json"
site_url = "sc-domain:practicaldatascience.co.uk"
start_date = '2022-10-01'
end_date = '2022-10-31'
Next, we’ll run the classify_pages()
function. This will query the Google Search Console API, fetch all the pages within your start and end date range, and then classify them into the four categories of A, B, C, and D. Class A will comprise the pages that generate the first 80% of cumulative clicks, Class B will comprise the next 10%, Class C will comprise the next 10%, and Class D will comprise all those pages generating zeo clicks.
To run the function, you simply need to pass the key
variable containing the path to your JSON client secrets key file, the site_url
variable containing the URL of your website, and the start_date
and end_date
variables containing the start and end date of the period you want to classify, and set the output
variable to summary
.
df_summary = seo.classify_pages(key, site_url, start_date, end_date, output='summary')
Based on my site, I get 63 pages classified as Class A, which generate 80% of my clicks. 46 Class B pages generate the next 10% of clicks, and 190 Class C pages generate the final 10% of clicks. I have 36 pages in Class D that generate no clicks. m
df_summary
class | pages | impressions | clicks | avg_ctr | avg_position | share_of_clicks | share_of_impressions | |
---|---|---|---|---|---|---|---|---|
0 | A | 63 | 747643 | 36980 | 5.126349 | 22.706825 | 79.7 | 43.7 |
1 | B | 46 | 639329 | 4726 | 3.228043 | 31.897826 | 10.2 | 37.4 |
2 | C | 190 | 323385 | 4698 | 2.393632 | 38.259368 | 10.1 | 18.9 |
3 | D | 36 | 1327 | 0 | 0.000000 | 25.804722 | 0.0 | 0.1 |
To view the pages and their ABCD classes we can run the same function but change the output parameter to ‘classes’. We get back a dataframe containing the raw Google Search Console data on each page on the site, plus the ABCD classes and their underlying metrics.
df_classes = seo.classify_pages(key, site_url, start_date, end_date, output='classes')
df_classes.head()
page | clicks | impressions | ctr | position | clicks_cumsum | clicks_running_pc | pc_share | class | class_rank | |
---|---|---|---|---|---|---|---|---|---|---|
0 | https://practicaldatascience.co.uk/machine-lea... | 3890 | 36577 | 10.64 | 12.64 | 3890 | 8.382898 | 8.382898 | A | 1 |
1 | https://practicaldatascience.co.uk/data-scienc... | 2414 | 16618 | 14.53 | 14.30 | 6304 | 13.585036 | 5.202138 | A | 2 |
2 | https://practicaldatascience.co.uk/data-scienc... | 2378 | 71496 | 3.33 | 16.39 | 8682 | 18.709594 | 5.124558 | A | 3 |
3 | https://practicaldatascience.co.uk/data-scienc... | 1942 | 14274 | 13.61 | 15.02 | 10624 | 22.894578 | 4.184984 | A | 4 |
4 | https://practicaldatascience.co.uk/data-scienc... | 1738 | 23979 | 7.25 | 11.80 | 12362 | 26.639945 | 3.745367 | A | 5 |
We can use the to_csv
method to export the classifications to a CSV file and pass it on to our SEO team. They’ll be able to use the page classifications to improve the SEO of the website by focusing on the pages that need the most attention.
df_classes.to_csv('google_search_console_classifications.csv', index=False)
Matt Clarke, Friday, November 18, 2022