How to query the Google Search Console API with EcommerceTools

EcommerceTools makes it quick and easy to query the Google Search Console API and display the data in Pandas. Here's how to do it.

How to query the Google Search Console API with EcommerceTools
Picture by Pixabay, Pexels.
20 minutes to read

The Google Search Console (GSC) API is a great source of information for those working in SEO, marketing, or ecommerce. It can tell you which of your pages are appearing in Google search results, how many people are clicking through to your site, and which keywords are driving the impressions and clicks.

While you can query the Google Search Console API using Python, the standard approach is a bit long-winded, and requires quite a bit of code to authenticate, fetch, and display the data in a Pandas dataframe.

To make querying GSC data much easier, I added this functionality to my EcommerceTools package. EcommerceTools makes querying GSC very easy, and requires very little code. In this project, I’ll explain how you can use it to run some common GSC queries and display and manipulate the data in Pandas, all inside the comfort of a Jupyter notebook.

Load the packages

First, open a Jupyter notebook and import the pandas package and the seo module from my ecommercetools Python package. If you don’t have EcommerceTools, you can install it by entering !pip3 install ecommercetools into a cell in your notebook and then executing it with shift and enter. As some columns are wider, you can use the Pandas set_option() function to increase the maximum column width.

!pip3 install ecommercetools
import pandas as pd
from ecommercetools import seo
pd.set_option('max_colwidth', 100)

Define some constants

To save typing in lots of repetitive values throughout your Jupyter notebook, you can create a few constants to refer to in the code. You’ll need one called key containing the path to your client secrets JSON keyfile, which you can obtain from the Google API Console. You’ll need another to hold the URL of the site you want to query, and the start and end date for the period you wish to extract via the GSC API.

key = "google-search-console.json"
site_url = "http://flyandlure.org"
start_date = "2019-01-01"
end_date = "2020-12-31"

Fetch clicks, impressions, CTR, and average position by page

One of the most common Google Search Console API queries you’ll want to perform will be one to extract the number of clicks and impressions each page has generated, plus the click-through rate (CTR), and the average position or rank in the Google search results. You don’t need to tell the GSC API that you want it to return page, clicks, impressions, ctr, or position metrics, as they get returned by default with all API queries.

To run the query, we simply create a Pandas dictionary called a payload which contains the API query parameters we want to run. In the example below, we’re requesting the standard metrics for each page, over our chosen period.

payload = {
    'startDate': start_date, 
    'endDate': end_date,
    'dimensions': ["page"],  
    'rowLimit': 10000,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()

If you want to control the number of results returned you can adjust the rowLimit value. Changing the startRow value allows you to paginate your API query, so it can return more than the maximum amount when spread over multiple queries.

page clicks impressions ctr position
0 http://flyandlure.org/articles/fly_fishing/still_water_trout_flies_how_and_when_to_fish_them 4582 66651 6.87 18.01
1 http://flyandlure.org/articles/fly_fishing/how_to_fly_fish_with_a_strike_indicator 3164 71893 4.40 11.00
2 http://flyandlure.org/articles/fly_fishing/how_to_fly_fish_with_droppers 2849 57266 4.98 15.86
3 http://flyandlure.org/articles/fly_tying/7_of_the_best_boobies_youll_ever_see 2816 45005 6.26 9.10
4 http://flyandlure.org/articles/fly_fishing_gear_reviews/loop_cross_sx_fly_rod_review 2702 26830 10.07 8.80

Fetch clicks, impressions, CTR, and average position by page

The other really common report you’ll want to run is one to fetch clicks, impressions, CTR, and position for each query keyword. The code for this is exactly the same as the example above, but we’re simply substituting the page dimension for query. We get back a Pandas dataframe that contains the data for each keyword, irrespective of whether it came from a single page, or several pages.

payload = {
    'startDate': start_date, 
    'endDate': end_date,
    'dimensions': ["query"],
    'rowLimit': 10000,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query clicks impressions ctr position
0 fly fishing near me 1041 9790 10.63 6.80
1 bombarda float 609 4484 13.58 2.14
2 emerger fly patterns 582 5400 10.78 1.28
3 fly and lure 529 2130 24.84 1.06
4 pheasant tail nymph 407 21736 1.87 8.32

Fetch data by query and page

The other thing that you can do is query GSC for multiple dimensions simultaneously. For example, you might want to create a Pandas dataframe that contains the clicks, impressions, CTR, and average ranking position for each keyword and page. To do this, all you need to do is provide both the query and page arguments in the list you pass to dimensions.

payload = {
    'startDate': start_date, 
    'endDate': end_date,
    'dimensions': ["query", "page"],
    'rowLimit': 10000,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query page clicks impressions ctr position
0 bombarda float http://flyandlure.org/articles/lure_fishing/how_to_use_a_bombarda_float 609 4484 13.58 2.14
1 emerger fly patterns http://flyandlure.org/articles/fly_tying/5_emerger_fly_patterns 582 5400 10.78 1.29
2 fly and lure http://flyandlure.org/ 438 2114 20.72 1.04
3 pheasant tail nymph http://flyandlure.org/articles/fly_tying/5_pheasant_tail_nymph_patterns_to_tie 407 21735 1.87 8.33
4 loop cross sx review http://flyandlure.org/articles/fly_fishing_gear_reviews/loop_cross_sx_fly_rod_review 376 1461 25.74 1.81

Fetch keyword data for a specific page

Let’s say you want to find all the keywords for which a given page on your website is ranking. This is also fairly easy to do using the Google Search Console API. Here we’d define the query dimension and then use the more advanced dimensionFilterGroups option. This takes a list of dictionaries, each containing the specific filters you want to apply to the query.

In the one below, I’m filtering by the page dimension and passing in the URL of a specific page on my website. Please note that in this case the “boobies” mentioned in the keyword data are a type of trout fly, and not actually anything more saucy!

payload = {
    'startDate': start_date, 
    'endDate': end_date,
    'dimensions': ["query"],
    'dimensionFilterGroups': [{
        'filters': [{
            'dimension': 'page',
            'expression': 'http://flyandlure.org/articles/fly_tying/7_of_the_best_boobies_youll_ever_see'}]
    }],   
    'rowLimit': 10000,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query clicks impressions ctr position
0 booby fly 298 4310 6.91 1.80
1 best boobies 175 3476 5.03 9.23
2 booby flies 146 1516 9.63 3.96
3 boobie fly 104 1785 5.83 1.63
4 boobie flies 53 478 11.09 2.14

Comparing changes in search ranking over different periods

Next, we’ll fetch the Google Search Console API keyword data for two separate periods and create a dataframe showing how the rankings for each keyword have changed. To do this, we’ll first run a query for April and store it in df_april and then do the same for May and store it in df_may.

payload = {
    'startDate': '2021-04-01', 
    'endDate': '2021-04-30',
    'dimensions': ["query"],
    'rowLimit': 10000,
    'startRow': 0
}

df_april = seo.query_google_search_console(key, site_url, payload)
df_april.sort_values(by='clicks', ascending=False).head()
query clicks impressions ctr position
0 fly fishing near me 104 1001 10.39 9.80
1 emerger fly patterns 84 705 11.91 1.57
2 swinsty reservoir fishing 52 157 33.12 1.01
3 sunray fly line review 48 130 36.92 2.17
4 fewston reservoir fishing 42 126 33.33 1.70
payload = {
    'startDate': '2021-05-01', 
    'endDate': '2021-05-30',
    'dimensions': ["query"],
    'rowLimit': 10000,
    'startRow': 0
}

df_may = seo.query_google_search_console(key, site_url, payload)
df_may.sort_values(by='clicks', ascending=False).head()
query clicks impressions ctr position
0 emerger fly patterns 82 698 11.75 1.52
1 fly fishing near me 78 784 9.95 10.35
2 bombarda float 44 349 12.61 2.57
3 washing line fly fishing 39 118 33.05 2.52
4 best stillwater trout flies uk 38 220 17.27 2.82

We need to merge the df_april and df_may dataframes together with a left join, however, doing this will append some not very useful x and y values to each column. To prevent this, we’ll add a suffix to each column first so that the two dataframes contain unique column names.

df_april.columns = [str(col) + '_april' for col in df_april.columns]
df_may.columns = [str(col) + '_may' for col in df_may.columns]
df_april.columns
Index(['query_april', 'clicks_april', 'impressions_april', 'ctr_april',
       'position_april'],
      dtype='object')
df_may.columns
Index(['query_may', 'clicks_may', 'impressions_may', 'ctr_may',
       'position_may'],
      dtype='object')

Now we can use the Pandas merge() function to left join the two dataframes on the query column in each one. Since these have different suffixes, we need to use the left_on and right_on arguments to define their names.

df_all = df_april.merge(df_may, how='left', left_on='query_april', right_on='query_may')
df_all.head()
query_april clicks_april impressions_april ctr_april position_april query_may clicks_may impressions_may ctr_may position_may
0 fly fishing near me 104 1001 10.39 9.80 fly fishing near me 78.0 784.0 9.95 10.35
1 emerger fly patterns 84 705 11.91 1.57 emerger fly patterns 82.0 698.0 11.75 1.52
2 swinsty reservoir fishing 52 157 33.12 1.01 swinsty reservoir fishing 25.0 116.0 21.55 2.26
3 sunray fly line review 48 130 36.92 2.17 sunray fly line review 38.0 108.0 35.19 3.44
4 fewston reservoir fishing 42 126 33.33 1.70 fewston reservoir fishing 27.0 114.0 23.68 2.86

That works, but it could do with some tidying up, so we’ll next drop one of the redundant query columns, rename it to lose the irrelevant suffix, and reorder the columns so the matching pairs sit alongside each other.

df_all = df_all.rename(columns={'query_april':'query'})
df_all = df_all[['query', 'impressions_april', 'impressions_may', 'clicks_april','clicks_may',
                 'ctr_april', 'ctr_may', 'position_april', 'position_may'
                ]]
df_all.head()
query impressions_april impressions_may clicks_april clicks_may ctr_april ctr_may position_april position_may
0 fly fishing near me 1001 784.0 104 78.0 10.39 9.95 9.80 10.35
1 emerger fly patterns 705 698.0 84 82.0 11.91 11.75 1.57 1.52
2 swinsty reservoir fishing 157 116.0 52 25.0 33.12 21.55 1.01 2.26
3 sunray fly line review 130 108.0 48 38.0 36.92 35.19 2.17 3.44
4 fewston reservoir fishing 126 114.0 42 27.0 33.33 23.68 1.70 2.86

Finally, we can calculate the change in the average search engine position on Google’s search results by subtracting the average rank for April from the average rank in May.

df_all['position_change'] = df_all['position_may'] - df_all['position_april']
df_all.head()
query impressions_april impressions_may clicks_april clicks_may ctr_april ctr_may position_april position_may position_change
0 fly fishing near me 1001 784.0 104 78.0 10.39 9.95 9.80 10.35 0.55
1 emerger fly patterns 705 698.0 84 82.0 11.91 11.75 1.57 1.52 -0.05
2 swinsty reservoir fishing 157 116.0 52 25.0 33.12 21.55 1.01 2.26 1.25
3 sunray fly line review 130 108.0 48 38.0 36.92 35.19 2.17 3.44 1.27
4 fewston reservoir fishing 126 114.0 42 27.0 33.33 23.68 1.70 2.86 1.16

By changing the sort order on the position_change column, it’s then easy to see which keyword queries showed the biggest increase in ranking and the biggest decrease in ranking between the two periods. The latest version of EcommerceTools performs out-of-the-box period comparisons of Google Search Console data and even allows you to run SEO testing via the Causal Impact model.

df_all.sort_values(by='position_change', ascending=True).head()
query impressions_april impressions_may clicks_april clicks_may ctr_april ctr_may position_april position_may position_change
2332 alwen reservoir 3 1.0 0 0.0 0.0 0.0 95.67 2.00 -93.67
3186 best fly line for indicator nymphing 5 1.0 0 0.0 0.0 0.0 94.80 3.00 -91.80
6511 fishing waist waders 1 2.0 0 0.0 0.0 0.0 97.00 6.00 -91.00
6713 fly fish 3 3.0 0 0.0 0.0 0.0 98.33 7.33 -91.00
4543 click pawl fly reel 2 2.0 0 0.0 0.0 0.0 98.00 7.50 -90.50
df_all.sort_values(by='position_change', ascending=False).head()
query impressions_april impressions_may clicks_april clicks_may ctr_april ctr_may position_april position_may position_change
8265 grayling 3 2.0 0 0.0 0.0 0.0 2.67 101.5 98.83
9736 lake flies 9 1.0 0 0.0 0.0 0.0 4.11 99.0 94.89
9387 indicator leader fly fishing 1 1.0 0 0.0 0.0 0.0 2.00 93.0 91.00
9588 kid-friendly fishing lakes near me 1 1.0 0 0.0 0.0 0.0 13.00 102.0 89.00
6701 fly casting technique 1 1.0 0 0.0 0.0 0.0 3.00 91.0 88.00

Matt Clarke, Friday, June 11, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.