Google Search Console is a really useful tool for search marketers since it shows what is happening data-wise before organic search visitors reach your website. Google Analytics only shows you what results you’re getting after someone has clicked a search engine result, so you miss out on seeing how many impressions you’re getting, what sort of click through rate (CTR) you’re obtaining, and what your average position in the Search Engine Results Pages (SERPS) is.
By monitoring the number of pages and queries that are generating impressions and clicks, and by optimising your content using search engine optimisation (SEO) techniques, you can increase the number of people who see your pages in the SERPs, increase the number of people who click them, and generate more traffic for your site, which should help increase conversions.
While this is useful, and while the Google Analytics API is much more feature-rich, the Google Search Console API is much more basic, so getting these results out in a format you can use for long-term monitoring and reporting is harder. In this project, I’ll show you how you can use the EcommerceTools Python package to query Google Search Console for these monthly metrics and create a single Pandas dataframe containing the historic performance for each month of a 13-month period.
First, open up a Jupyter notebook and install my EcommerceTools Python package. This can be used to perform a range of ecommerce, marketing, and SEO related tasks, including querying the Google Search Console API.
You can install it using the Pip package manager by entering the command below in a code cell and then executing it. Then, import the Pandas, Numpy, and Datetime packages, including timedelta
, as well as the seo
module from ecommercetools
.
!pip3 install ecommercetools
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
from ecommercetools import seo
First, you will need to create a client secrets JSON key using the Google API Console to allow EcommerceTools to authenticate and retrieve your data. To save putting hard-coded values in the rest of the project, we’ll create some variables to store these values.
The key
variable holds the path to your JSON keyfile, the site_url
contains the URL of the Google Search Console property you want to query, and the start_date
and end_date
define the time period you wish to examine. Importantly, the end date must refer to the last day of the calendar month. I’ve selected a 13 month period, to allow me to observe the year-on-year change for the site.
key = "client_secrets.json"
site_url = "http://example.com"
start_date = "2020-05-01"
end_date = "2021-06-30" # This must be the last day
Since Google Search Console doesn’t provide a way to group data by month, we have to do this manually. The easiest way to do this is to assemble multiple API queries and then join the output. However, to do this, we need to know the start date and end date for each month that fell within the larger time period we defined in the variables above.
Since this is the sort of code snippet I might want to re-use in another project, I’ve wrapped things up in a little function called get_month_start_and_end_dates()
. This takes the start_date
(i.e. 2020-05-01) and the end_date
(i.e. 2021-06-30) and identifies all of the calendar months within, returning a Pandas dataframe containing the month_start
and end_date
for each month, as well as a period
which shows the year-month value (i.e. 2020-05 for May 2020).
Since the Google Search Console API requires dates to be passed in as strings, not datetimes, we need to do a bit of data munging and use the datetime
strftime()
function to ensure that the final dates we return in the dataframe are strings, which will show as the object
data type when you run info()
on the dataframe.
def get_month_start_and_end_dates(start_date, end_date):
"""Return a Pandas dataframe containing the month start and end dates between a start and end date"""
start_date_object = dt.datetime.strptime(start_date, '%Y-%m-%d')
end_date_object = dt.datetime.strptime(end_date, '%Y-%m-%d')
month_start = pd.date_range(start_date_object, end_date_object, freq='MS')
month_end = pd.date_range(start_date_object, end_date_object, freq='M')
df = pd.DataFrame({'period': 'period', 'month_start': month_start, 'month_end': month_end})
df['period'] = df['month_start'].dt.strftime('%Y-%m')
df['month_start'] = df['month_start'].dt.strftime('%Y-%m-%d')
df['month_end'] = df['month_end'].dt.strftime('%Y-%m-%d')
return df
Running the get_month_start_and_end_dates()
with our start_date
and end_date
variables returns a dataframe I’ve called df_dates
which contains the period
, month_start
, and month_end
for each month detected between the two dates defined earlier. Running df_dates.info()
confirms that these are in the object
data type, so should work correctly when passed to the Google Search Console API.
df_dates = get_month_start_and_end_dates(start_date, end_date)
df_dates.head()
period | month_start | month_end | |
---|---|---|---|
0 | 2020-05 | 2020-05-01 | 2020-05-31 |
1 | 2020-06 | 2020-06-01 | 2020-06-30 |
2 | 2020-07 | 2020-07-01 | 2020-07-31 |
3 | 2020-08 | 2020-08-01 | 2020-08-31 |
4 | 2020-09 | 2020-09-01 | 2020-09-30 |
df_dates.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 period 14 non-null object
1 month_start 14 non-null object
2 month_end 14 non-null object
dtypes: object(3)
memory usage: 464.0+ bytes
Next, we can create another function to fetch our data. This takes four values: key
is a string holding the path to your Google API Console client secrets JSON keyfle, which is used to authenticate; site_url
is the URL of the property in GSC you want to query, and start_date
and end_date
refer to the long period of 13 months we defined at the beginning.
The first step of the function is to run the get_month_start_and_end_dates()
function to create a dataframe called df_dates
which will be used internally to create each API query. We then create an empty Pandas dataframe called df_metrics
, which contains columns for the metrics we’re going to fetch from Search Console.
The next bit is technically somewhat inefficient, so will be slow, but there’s no other way to do this in Search Console. It iterates over each row in df_dates
using the iterrows()
function and runs a separate GSC query for each calendar month by passing in the month_start
and month_end
from the given month to a standard payload
, and then passing this to the query_google_search_console()
function of EcommerceTools.
When the data are returned by query_google_search_console()
, we extract the bits we need, calculate the metrics, and add them to a dictionary called row
. In the first column we’ll store the year-month in period
, then we’ll count the number of unique pages and queries using nunique()
, then calculate the total number of impressions and clicks using sum()
, and calculate the mean of the average click through rate and position using mean()
. (You can, of course, make this more accurate by calculating the CTR manually, instead of averaging an average.)
Finally, we add the row
dictionary to the df_metrics
dataframe using the append()
function with the important ignore_index=True
argument, and return the final dataframe when we’re done.
def get_monthly_google_search_console_data(key, site_url, start_date, end_date):
"""Return a Pandas dataframe containing monthly data from Google Search Console.
Args:
key (string): Path to client secrets JSON keyfile.
site_url (string): URL of site to query in Google Search Console.
start_date (string): Date string in YYYY-MM-DD format.
end_date (string): Date string in YYYY-MM-DD format.
Returns:
df (dataframe): Pandas dataframe of monthly data.
"""
df_dates = get_month_start_and_end_dates(start_date, end_date)
df_metrics = pd.DataFrame(columns=['period', 'pages', 'queries', 'impressions', 'clicks', 'avg_ctr', 'avg_position'])
for index, row in df_dates.iterrows():
payload = {
'startDate': row['month_start'],
'endDate': row['month_end'],
'dimensions': ["date", "page", "query"],
'rowLimit': 25000,
'startRow': 0
}
df = seo.query_google_search_console(key, site_url, payload)
row = {
'period': row['period'],
'pages': df['page'].nunique(),
'queries': df['query'].nunique(),
'impressions': df['impressions'].sum(),
'clicks': df['clicks'].sum(),
'avg_ctr': round(df['ctr'].mean(), 1),
'avg_position': round(df['position'].mean(), 1),
}
df_metrics = df_metrics.append(row, ignore_index=True)
return df_metrics
Finally, we can run our function. By passing in the key
, site_url
, and start_date
and end_date
, we can run a load of separate Google Search Console API queries and join the results together to create a single dataframe which shows the site’s organic search performance over time. Viewing data in this format is handy for reporting, because you can see at a glance what you did the previous month and the previous year.
df = get_monthly_google_search_console_data(key, site_url, start_date, end_date)
df.sort_values(by='period', ascending=False)
period | pages | queries | impressions | clicks | avg_ctr | avg_position | |
---|---|---|---|---|---|---|---|
13 | 2021-06 | 1379 | 8211 | 65677 | 4869 | 9.0 | 23.9 |
12 | 2021-05 | 1396 | 9259 | 91022 | 5899 | 10.2 | 23.6 |
11 | 2021-04 | 1320 | 8283 | 91010 | 6707 | 11.3 | 21.7 |
10 | 2021-03 | 1333 | 8678 | 75536 | 5949 | 9.9 | 26.7 |
9 | 2021-02 | 1304 | 8949 | 63273 | 3970 | 7.2 | 30.3 |
8 | 2021-01 | 1231 | 7939 | 67214 | 4345 | 8.0 | 28.5 |
7 | 2020-12 | 1252 | 7936 | 62778 | 4163 | 8.0 | 27.8 |
6 | 2020-11 | 1278 | 8435 | 63144 | 4814 | 8.8 | 28.0 |
5 | 2020-10 | 1334 | 9017 | 62904 | 5587 | 10.8 | 25.0 |
4 | 2020-09 | 1309 | 8923 | 65400 | 6710 | 12.8 | 21.9 |
3 | 2020-08 | 1285 | 8739 | 68198 | 7938 | 14.9 | 18.9 |
2 | 2020-07 | 1252 | 8484 | 72982 | 9166 | 16.3 | 18.6 |
1 | 2020-06 | 1200 | 7709 | 84767 | 9368 | 15.4 | 17.2 |
0 | 2020-05 | 1122 | 7277 | 74350 | 7449 | 11.4 | 21.1 |
Matt Clarke, Thursday, July 08, 2021