How to create monthly Google Search Console API reports with EcommerceTools

Learn how to use EcommerceTools to create monthly Google Search Console API reports that let you see changes in your SEO performance.

How to create monthly Google Search Console API reports with EcommerceTools
Picture by Ciao, Pexels.
14 minutes to read

Google Search Console is a really useful tool for search marketers since it shows what is happening data-wise before organic search visitors reach your website. Google Analytics only shows you what results you’re getting after someone has clicked a search engine result, so you miss out on seeing how many impressions you’re getting, what sort of click through rate (CTR) you’re obtaining, and what your average position in the Search Engine Results Pages (SERPS) is.

By monitoring the number of pages and queries that are generating impressions and clicks, and by optimising your content using search engine optimisation (SEO) techniques, you can increase the number of people who see your pages in the SERPs, increase the number of people who click them, and generate more traffic for your site, which should help increase conversions.

While this is useful, and while the Google Analytics API is much more feature-rich, the Google Search Console API is much more basic, so getting these results out in a format you can use for long-term monitoring and reporting is harder. In this project, I’ll show you how you can use the EcommerceTools Python package to query Google Search Console for these monthly metrics and create a single Pandas dataframe containing the historic performance for each month of a 13-month period.

Import the packages

First, open up a Jupyter notebook and install my EcommerceTools Python package. This can be used to perform a range of ecommerce, marketing, and SEO related tasks, including querying the Google Search Console API.

You can install it using the Pip package manager by entering the command below in a code cell and then executing it. Then, import the Pandas, Numpy, and Datetime packages, including timedelta, as well as the seo module from ecommercetools.

!pip3 install ecommercetools
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
from ecommercetools import seo

Set up EcommerceTools

First, you will need to create a client secrets JSON key using the Google API Console to allow EcommerceTools to authenticate and retrieve your data. To save putting hard-coded values in the rest of the project, we’ll create some variables to store these values.

The key variable holds the path to your JSON keyfile, the site_url contains the URL of the Google Search Console property you want to query, and the start_date and end_date define the time period you wish to examine. Importantly, the end date must refer to the last day of the calendar month. I’ve selected a 13 month period, to allow me to observe the year-on-year change for the site.

key = "client_secrets.json"
site_url = "http://example.com"
start_date = "2020-05-01"
end_date = "2021-06-30" # This must be the last day

Calculate the start date and end date of each month

Since Google Search Console doesn’t provide a way to group data by month, we have to do this manually. The easiest way to do this is to assemble multiple API queries and then join the output. However, to do this, we need to know the start date and end date for each month that fell within the larger time period we defined in the variables above.

Since this is the sort of code snippet I might want to re-use in another project, I’ve wrapped things up in a little function called get_month_start_and_end_dates(). This takes the start_date (i.e. 2020-05-01) and the end_date (i.e. 2021-06-30) and identifies all of the calendar months within, returning a Pandas dataframe containing the month_start and end_date for each month, as well as a period which shows the year-month value (i.e. 2020-05 for May 2020).

Since the Google Search Console API requires dates to be passed in as strings, not datetimes, we need to do a bit of data munging and use the datetime strftime() function to ensure that the final dates we return in the dataframe are strings, which will show as the object data type when you run info() on the dataframe.

def get_month_start_and_end_dates(start_date, end_date):
    """Return a Pandas dataframe containing the month start and end dates between a start and end date"""
    
    start_date_object = dt.datetime.strptime(start_date, '%Y-%m-%d')
    end_date_object = dt.datetime.strptime(end_date, '%Y-%m-%d')

    month_start = pd.date_range(start_date_object, end_date_object, freq='MS')
    month_end = pd.date_range(start_date_object, end_date_object, freq='M')

    df = pd.DataFrame({'period': 'period', 'month_start': month_start, 'month_end': month_end})
    df['period'] = df['month_start'].dt.strftime('%Y-%m')    
    df['month_start'] = df['month_start'].dt.strftime('%Y-%m-%d')
    df['month_end'] = df['month_end'].dt.strftime('%Y-%m-%d')
    return df

Running the get_month_start_and_end_dates() with our start_date and end_date variables returns a dataframe I’ve called df_dates which contains the period, month_start, and month_end for each month detected between the two dates defined earlier. Running df_dates.info() confirms that these are in the object data type, so should work correctly when passed to the Google Search Console API.

df_dates = get_month_start_and_end_dates(start_date, end_date)
df_dates.head()
period month_start month_end
0 2020-05 2020-05-01 2020-05-31
1 2020-06 2020-06-01 2020-06-30
2 2020-07 2020-07-01 2020-07-31
3 2020-08 2020-08-01 2020-08-31
4 2020-09 2020-09-01 2020-09-30
df_dates.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   period       14 non-null     object
 1   month_start  14 non-null     object
 2   month_end    14 non-null     object
dtypes: object(3)
memory usage: 464.0+ bytes

Fetch Google Search Console data for each month

Next, we can create another function to fetch our data. This takes four values: key is a string holding the path to your Google API Console client secrets JSON keyfle, which is used to authenticate; site_url is the URL of the property in GSC you want to query, and start_date and end_date refer to the long period of 13 months we defined at the beginning.

The first step of the function is to run the get_month_start_and_end_dates() function to create a dataframe called df_dates which will be used internally to create each API query. We then create an empty Pandas dataframe called df_metrics, which contains columns for the metrics we’re going to fetch from Search Console.

The next bit is technically somewhat inefficient, so will be slow, but there’s no other way to do this in Search Console. It iterates over each row in df_dates using the iterrows() function and runs a separate GSC query for each calendar month by passing in the month_start and month_end from the given month to a standard payload, and then passing this to the query_google_search_console() function of EcommerceTools.

When the data are returned by query_google_search_console(), we extract the bits we need, calculate the metrics, and add them to a dictionary called row. In the first column we’ll store the year-month in period, then we’ll count the number of unique pages and queries using nunique(), then calculate the total number of impressions and clicks using sum(), and calculate the mean of the average click through rate and position using mean(). (You can, of course, make this more accurate by calculating the CTR manually, instead of averaging an average.)

Finally, we add the row dictionary to the df_metrics dataframe using the append() function with the important ignore_index=True argument, and return the final dataframe when we’re done.

def get_monthly_google_search_console_data(key, site_url, start_date, end_date): 
    """Return a Pandas dataframe containing monthly data from Google Search Console.
    
    Args:
        key (string): Path to client secrets JSON keyfile. 
        site_url (string): URL of site to query in Google Search Console. 
        start_date (string): Date string in YYYY-MM-DD format. 
        end_date (string): Date string in YYYY-MM-DD format. 
    
    Returns: 
        df (dataframe): Pandas dataframe of monthly data. 
    """
    
    df_dates = get_month_start_and_end_dates(start_date, end_date)
    df_metrics = pd.DataFrame(columns=['period', 'pages', 'queries', 'impressions', 'clicks', 'avg_ctr', 'avg_position'])

    for index, row in df_dates.iterrows():
    
        payload = {
            'startDate': row['month_start'], 
            'endDate': row['month_end'],
            'dimensions': ["date", "page", "query"],  
            'rowLimit': 25000,
            'startRow': 0
        }

        df = seo.query_google_search_console(key, site_url, payload)

        row = {
            'period': row['period'],
            'pages': df['page'].nunique(),
            'queries': df['query'].nunique(),
            'impressions': df['impressions'].sum(),
            'clicks': df['clicks'].sum(),
            'avg_ctr': round(df['ctr'].mean(), 1),
            'avg_position': round(df['position'].mean(), 1),
        }

        df_metrics = df_metrics.append(row, ignore_index=True)
    
    return df_metrics

Run the code

Finally, we can run our function. By passing in the key, site_url, and start_date and end_date, we can run a load of separate Google Search Console API queries and join the results together to create a single dataframe which shows the site’s organic search performance over time. Viewing data in this format is handy for reporting, because you can see at a glance what you did the previous month and the previous year.

df = get_monthly_google_search_console_data(key, site_url, start_date, end_date)
df.sort_values(by='period', ascending=False)
period pages queries impressions clicks avg_ctr avg_position
13 2021-06 1379 8211 65677 4869 9.0 23.9
12 2021-05 1396 9259 91022 5899 10.2 23.6
11 2021-04 1320 8283 91010 6707 11.3 21.7
10 2021-03 1333 8678 75536 5949 9.9 26.7
9 2021-02 1304 8949 63273 3970 7.2 30.3
8 2021-01 1231 7939 67214 4345 8.0 28.5
7 2020-12 1252 7936 62778 4163 8.0 27.8
6 2020-11 1278 8435 63144 4814 8.8 28.0
5 2020-10 1334 9017 62904 5587 10.8 25.0
4 2020-09 1309 8923 65400 6710 12.8 21.9
3 2020-08 1285 8739 68198 7938 14.9 18.9
2 2020-07 1252 8484 72982 9166 16.3 18.6
1 2020-06 1200 7709 84767 9368 15.4 17.2
0 2020-05 1122 7277 74350 7449 11.4 21.1

Matt Clarke, Thursday, July 08, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.