How to create a Google rank checker tool using Python

Learn how to create a Google rank checker tool using the Python EcommerceTools package so you can monitor your search rankings across a range of keywords.

How to create a Google rank checker tool using Python
Picture by Craig Dennis, Pexels.
13 minutes to read

Most off-the-shelf SEO tools come with a rank checker that allows you to monitor your position for given phrases in the Google search engine results. If you want to create your rank checker tool, it’s fairly easy to do using some Python web scraping skills and a little of your time. Here’s a really quick and easy way to do it.

Import the packages

First, open a Jupyter notebook and install my EcommerceTools Python package. EcommerceTools includes a range of modules for performing a range of ecommerce and marketing tasks, including various technical SEO tasks, such as web scraping and accessing data from the Google Search Console API. You can install it by entering pip3 install --upgrade ecommercetools in your terminal.

!pip3 install --upgrade ecommercetools

You’ll also need to include the date and time modules, so we can create and manipulate dates in Python, as well as the Pandas library for creating dataframes, and the seo module from EcommerceTools, which includes a tool to scrape Google search engine results in a single line of code.

import time
import datetime
import pandas as pd
from ecommercetools import seo

Scrape the Google search engine results

Rather than building a Google search engine results scraper from scratch, we’ll use the get_serps() function from EcommerceTools. To use this, all you need to do is pass the search term to get_serps() and define the number of pages of search engine results to return. The function will return a Pandas dataframe containing the title, snippet, URL, and position of each page in the SERPs.

df = seo.get_serps("contractual churn model", pages=5)
df.head(10)
position title link text
0 1 How to create a contractual churn model in sci... https://practicaldatascience.co.uk/machine-lea... To help maximise retention, contractual or sub...
1 2 How to create a non-contractual churn model fo... https://practicaldatascience.co.uk/machine-lea... The model most commonly used to tackle the pre...
2 3 How to Define Churn in a Non-Contractual Busin... https://forecast.global/insight/how-to-define-... 25 Feb 2020 — Customer churn is a significant ...
3 4 Non-Contractual Customer Churn - Medium https://medium.com/rond-blog/non-contractual-c... 20 Apr 2021 — In contractual churn analysis we...
4 5 Modelling Customer Churn When Churns Are Not E... https://towardsdatascience.com/modelling-custo... Thus, churn modelling in non-contractual busin...
5 6 Customer Churn Analysis - Towards Data Science https://towardsdatascience.com/customer-churn-... Contractual Churn, which is applicable to busi...
6 7 Modeling Churn and Usage Behavior in Contractu... https://www0.gsb.columbia.edu/mygsb/faculty/re... by E Ascarza · 2009 · Cited by 4 — explored th...
7 8 A Joint Model of Usage and Churn in Contractua... https://www.hbs.edu/faculty/Pages/download.asp... by E Ascarza · 2013 · Cited by 95 — Key words:...
8 9 What is Customer Churn Modeling? Why is it val... https://www.kdnuggets.com/2017/03/datascience-... In its simplest form, churn rate is calculated...
9 10 Non-Contractual Churn Prediction with Limited ... https://kth.diva-portal.org/smash/get/diva2:13... This report will investigate how data analysis...

Get the top ranking page for a domain

Next, we’ll create a function to identify the top ranking page for a given domain from the dataframe returned by get_serps(). This takes the dataframe above, plus a string containing the domain of your site and returns the data.

def get_top_rank(df, domain):
    """Return the rank of the domain's top ranking page for a search term."""
    
    data = df[df['link'].str.startswith(domain)].head(1).to_dict('records')
    
    if len(data) > 0:
        return data[0]
    else:
        return

Create the Google rank checker function

Now we’ll create a function called rank_checker() to bulk check the rankings of a Python list of different keywords you want to monitor. This will first create an empty Pandas dataframe, then loop over each of the keywords in the list, run the keyword through the get_serps() function and return the highest ranking page using get_top_rank().

It outputs the dataframe in a dataframe containing the keyword, date, position or rank, title, link and snippet from the Google SERPs, so you can store it, write it to a database or analyse the data as you wish.

def rank_checker(keywords, domain, sleep=2, verbose=False):
    """Scrapes Google Search Engine Results Pages (SERPs) and returns the highest
    ranking page for each keyword phrase on a specified domain. Note that larger
    volumes of keywords may result in the application being temporarily blocked. 
    
    Args:
        keywords (list): List of keyword phrases for which to check Google rankings.
        domain (string): Fully qualified domain name i.e. https://practicaldatascience.co.uk
        sleep (integer, optional): Optional sleep time to avoid blocking.
    
    Returns:
        df (dataframe): Pandas dataframe containing highest rank for each keyword.
    """
    
    df = pd.DataFrame(columns=['keyword', 'date', 'position', 'title', 'link', 'text'])

    for keyword in keywords:

        if verbose:
            print('Fetching:', keyword)
            
        data = get_top_rank(seo.get_serps(keyword, pages=5), domain)
        time.sleep(2)

        if data:
            data['keyword'] = keyword
        else:
            data = {'keyword': keyword, 'position': '', 'title': '', 'link': '', 'text': ''}

        date = datetime.datetime.now()
        data['date'] = date.strftime('%Y-%m-%d')

        df = df.append(data, ignore_index=True)
        
    return df

Run the rank checker on your target keywords

Finally, we can run the Google rank checker on a Python list containing your target keywords. This will work fine for relatively small volumes of keywords, but you are likely to get temporarily blocked by Google if you want to examine larger volumes.

keywords = [
    'contractual churn model', 
    'non contractual churn model', 
    'product matching model',
    'marketing mix model',
    'purchase intention model',
    'inventory classification model',
    'rfm model',
    'causal impact model',
    'product classification model', 
    'learning to rank model',
    'customer segmentation model',
    'bg/nbd model',
    'abc xyz inventory classification model', 
    'abc inventory classification model',
    'isolation forest model',
    'response model',
    'product attribute extraction model',
    'next product to buy model',
    'xgboost classification model',
    'scikit learn linear regression model'
]

To run the code, we’ll pass in the list above containing the keywords whose rank we wish to check, the domain of the site you want to monitor, a sleep parameter to avoid battering Google with numerous consecutive requests, and a boolean argument to turn on verbose mode so you can watch the SERPs being scraped.

domain = 'https://practicaldatascience.co.uk'
df = rank_checker(keywords, domain, sleep=10, verbose=True)
Fetching: contractual churn model
Fetching: non contractual churn model
Fetching: product matching model
Fetching: marketing mix model
Fetching: purchase intention model
Fetching: inventory classification model
Fetching: rfm model
Fetching: causal impact model
Fetching: product classification model
Fetching: learning to rank model
Fetching: customer segmentation model
Fetching: bg/nbd model
Fetching: abc xyz inventory classification model
Fetching: abc inventory classification model
Fetching: isolation forest model
Fetching: response model
Fetching: product attribute extraction model
Fetching: next product to buy model
Fetching: xgboost classification model
Fetching: scikit learn linear regression model

I’ve been using this to periodically fetch rankings for a bunch of keywords I want to monitor and it works fine. However, overdo the use of it and Google will temporarily block you leaving it out of action for a short period. This is a common problem when scraping the SERPs and one that can be complex to resolve without resorting to proxies or paid scraping APIs.

df
keyword date position title link text
0 contractual churn model 2021-10-04 1 How to create a contractual churn model in sci... https://practicaldatascience.co.uk/machine-lea... To help maximise retention, contractual or sub...
1 non contractual churn model 2021-10-04 1 How to create a non-contractual churn model fo... https://practicaldatascience.co.uk/machine-lea... Unlike contractual churn models, non-contractu...
2 product matching model 2021-10-04 1 How to create a dataset for product matching m... https://practicaldatascience.co.uk/machine-lea... Product matching (or data matching) is a compu...
3 marketing mix model 2021-10-04 20 How to create a basic Marketing Mix Model in s... https://practicaldatascience.co.uk/machine-lea... A Marketing Mix Model (also called a Media Mix...
4 purchase intention model 2021-10-04 36 How to create an ecommerce purchase intention ... https://practicaldatascience.co.uk/machine-lea... 29 May 2021 — Ecommerce purchase intention mod...
5 inventory classification model 2021-10-04 3 How to create an ABC inventory classification ... https://practicaldatascience.co.uk/data-scienc... Learn how to create an ABC inventory classific...
6 rfm model 2021-10-04
7 causal impact model 2021-10-04 8 How to infer the effects of marketing using th... https://practicaldatascience.co.uk/machine-lea... 12 Aug 2021 — The Causal Impact model lets you...
8 product classification model 2021-10-04 2 How to create a Naive Bayes product classifica... https://practicaldatascience.co.uk/machine-lea... 13 Mar 2021 — Learn how to use NLP techniques ...
9 learning to rank model 2021-10-04 4 A quick guide to Learning to Rank models - Pra... https://practicaldatascience.co.uk/machine-lea... Learning to Rank models are designed specifica...
10 customer segmentation model 2021-10-04
11 bg/nbd model 2021-10-04 3 How to calculate CLV using BG/NBD and Gamma-Gamma https://practicaldatascience.co.uk/data-scienc... 14 Mar 2021 — Fit the BG/NBD model ... Lifetim...
12 abc xyz inventory classification model 2021-10-04 3 How to create an ABC XYZ inventory classificat... https://practicaldatascience.co.uk/data-scienc... The ABC XYZ inventory classification model is ...
13 abc inventory classification model 2021-10-04 7 How to create an ABC inventory classification ... https://practicaldatascience.co.uk/data-scienc... ABC inventory classification has been one of t...
14 isolation forest model 2021-10-04 7 How to use the Isolation Forest model for outl... https://practicaldatascience.co.uk/machine-lea... As the name suggests, the Isolation Forest is ...
15 response model 2021-10-04
16 product attribute extraction model 2021-10-04 1 A quick guide to Product Attribute Extraction ... https://practicaldatascience.co.uk/machine-lea... Learn why ecommerce retailers and marketplaces...
17 next product to buy model 2021-10-04 1 A quick guide to Next-Product-To-Buy models - ... https://practicaldatascience.co.uk/machine-lea...
18 xgboost classification model 2021-10-04 3 How to create a classification model using XGB... https://practicaldatascience.co.uk/machine-lea... As we're building a classification model, it's...
19 scikit learn linear regression model 2021-10-04 9 How to create a linear regression model using ... https://practicaldatascience.co.uk/machine-lea... Want to get started with sklearn linear regres...

Matt Clarke, Monday, October 04, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.