The Google Search Console (GSC) API is a great source of information for those working in SEO, marketing, or ecommerce. It can tell you which of your pages are appearing in Google search results, how many people are clicking through to your site, and which keywords are driving the impressions and clicks.
While you can query the Google Search Console API using Python, the standard approach is a bit long-winded, and requires quite a bit of code to authenticate, fetch, and display the data in a Pandas dataframe.
To make querying GSC data much easier, I added this functionality to my EcommerceTools package. EcommerceTools makes querying GSC very easy, and requires very little code. In this project, I’ll explain how you can use it to run some common GSC queries and display and manipulate the data in Pandas, all inside the comfort of a Jupyter notebook.
First, open a Jupyter notebook and import the pandas
package and the seo
module from my ecommercetools
Python package. If you don’t have EcommerceTools, you can install it by entering !pip3 install ecommercetools
into a cell in your notebook and then executing it with shift and enter. As some columns are wider, you can use the Pandas set_option()
function to increase the maximum column width.
!pip3 install ecommercetools
import pandas as pd
from ecommercetools import seo
pd.set_option('max_colwidth', 100)
To save typing in lots of repetitive values throughout your Jupyter notebook, you can create a few constants to
refer to in the code. You’ll need one called key
containing the path to your client secrets JSON keyfile, which
you can obtain from the Google API Console. You’ll need another to hold the URL of the site you want to query, and the start and end date for the period you wish to extract via the GSC API.
key = "google-search-console.json"
site_url = "http://flyandlure.org"
start_date = "2019-01-01"
end_date = "2020-12-31"
One of the most common Google Search Console API queries you’ll want to perform will be one to extract the number of clicks and impressions each page has generated, plus the click-through rate (CTR), and the average position or rank in the Google search results. You don’t need to tell the GSC API that you want it to return page, clicks, impressions, ctr, or position metrics, as they get returned by default with all API queries.
To run the query, we simply create a Pandas dictionary called a payload
which contains the API query parameters we want to run. In the example below, we’re requesting the standard metrics for each page
, over our chosen period.
payload = {
'startDate': start_date,
'endDate': end_date,
'dimensions': ["page"],
'rowLimit': 10000,
'startRow': 0
}
df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
If you want to control the number of results returned you can adjust the rowLimit
value. Changing the startRow
value allows you to paginate your API query, so it can return more than the maximum amount when spread over multiple queries.
page | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | http://flyandlure.org/articles/fly_fishing/still_water_trout_flies_how_and_when_to_fish_them | 4582 | 66651 | 6.87 | 18.01 |
1 | http://flyandlure.org/articles/fly_fishing/how_to_fly_fish_with_a_strike_indicator | 3164 | 71893 | 4.40 | 11.00 |
2 | http://flyandlure.org/articles/fly_fishing/how_to_fly_fish_with_droppers | 2849 | 57266 | 4.98 | 15.86 |
3 | http://flyandlure.org/articles/fly_tying/7_of_the_best_boobies_youll_ever_see | 2816 | 45005 | 6.26 | 9.10 |
4 | http://flyandlure.org/articles/fly_fishing_gear_reviews/loop_cross_sx_fly_rod_review | 2702 | 26830 | 10.07 | 8.80 |
The other really common report you’ll want to run is one to fetch clicks, impressions, CTR, and position for each query keyword. The code for this is exactly the same as the example above, but we’re simply substituting the page
dimension for query
. We get back a Pandas dataframe that contains the data for each keyword, irrespective of whether it came from a single page, or several pages.
payload = {
'startDate': start_date,
'endDate': end_date,
'dimensions': ["query"],
'rowLimit': 10000,
'startRow': 0
}
df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | fly fishing near me | 1041 | 9790 | 10.63 | 6.80 |
1 | bombarda float | 609 | 4484 | 13.58 | 2.14 |
2 | emerger fly patterns | 582 | 5400 | 10.78 | 1.28 |
3 | fly and lure | 529 | 2130 | 24.84 | 1.06 |
4 | pheasant tail nymph | 407 | 21736 | 1.87 | 8.32 |
The other thing that you can do is query GSC for multiple dimensions simultaneously. For example, you might want to create a Pandas dataframe that contains the clicks, impressions, CTR, and average ranking position for each keyword and page. To do this, all you need to do is provide both the query
and page
arguments in the list you pass to dimensions
.
payload = {
'startDate': start_date,
'endDate': end_date,
'dimensions': ["query", "page"],
'rowLimit': 10000,
'startRow': 0
}
df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query | page | clicks | impressions | ctr | position | |
---|---|---|---|---|---|---|
0 | bombarda float | http://flyandlure.org/articles/lure_fishing/how_to_use_a_bombarda_float | 609 | 4484 | 13.58 | 2.14 |
1 | emerger fly patterns | http://flyandlure.org/articles/fly_tying/5_emerger_fly_patterns | 582 | 5400 | 10.78 | 1.29 |
2 | fly and lure | http://flyandlure.org/ | 438 | 2114 | 20.72 | 1.04 |
3 | pheasant tail nymph | http://flyandlure.org/articles/fly_tying/5_pheasant_tail_nymph_patterns_to_tie | 407 | 21735 | 1.87 | 8.33 |
4 | loop cross sx review | http://flyandlure.org/articles/fly_fishing_gear_reviews/loop_cross_sx_fly_rod_review | 376 | 1461 | 25.74 | 1.81 |
Let’s say you want to find all the keywords for which a given page on your website is ranking. This is also fairly easy to do using the Google Search Console API. Here we’d define the query
dimension and then use the more advanced dimensionFilterGroups
option. This takes a list of dictionaries, each containing the specific filters you want to apply to the query.
In the one below, I’m filtering by the page
dimension and passing in the URL of a specific page on my website. Please note that in this case the “boobies” mentioned in the keyword data are a type of trout fly, and not actually anything more saucy!
payload = {
'startDate': start_date,
'endDate': end_date,
'dimensions': ["query"],
'dimensionFilterGroups': [{
'filters': [{
'dimension': 'page',
'expression': 'http://flyandlure.org/articles/fly_tying/7_of_the_best_boobies_youll_ever_see'}]
}],
'rowLimit': 10000,
'startRow': 0
}
df = seo.query_google_search_console(key, site_url, payload)
df.sort_values(by='clicks', ascending=False).head()
query | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | booby fly | 298 | 4310 | 6.91 | 1.80 |
1 | best boobies | 175 | 3476 | 5.03 | 9.23 |
2 | booby flies | 146 | 1516 | 9.63 | 3.96 |
3 | boobie fly | 104 | 1785 | 5.83 | 1.63 |
4 | boobie flies | 53 | 478 | 11.09 | 2.14 |
Next, we’ll fetch the Google Search Console API keyword data for two separate periods and create a dataframe showing how the rankings for each keyword have changed. To do this, we’ll first run a query for April and store it in df_april
and then do the same for May and store it in df_may
.
payload = {
'startDate': '2021-04-01',
'endDate': '2021-04-30',
'dimensions': ["query"],
'rowLimit': 10000,
'startRow': 0
}
df_april = seo.query_google_search_console(key, site_url, payload)
df_april.sort_values(by='clicks', ascending=False).head()
query | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | fly fishing near me | 104 | 1001 | 10.39 | 9.80 |
1 | emerger fly patterns | 84 | 705 | 11.91 | 1.57 |
2 | swinsty reservoir fishing | 52 | 157 | 33.12 | 1.01 |
3 | sunray fly line review | 48 | 130 | 36.92 | 2.17 |
4 | fewston reservoir fishing | 42 | 126 | 33.33 | 1.70 |
payload = {
'startDate': '2021-05-01',
'endDate': '2021-05-30',
'dimensions': ["query"],
'rowLimit': 10000,
'startRow': 0
}
df_may = seo.query_google_search_console(key, site_url, payload)
df_may.sort_values(by='clicks', ascending=False).head()
query | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | emerger fly patterns | 82 | 698 | 11.75 | 1.52 |
1 | fly fishing near me | 78 | 784 | 9.95 | 10.35 |
2 | bombarda float | 44 | 349 | 12.61 | 2.57 |
3 | washing line fly fishing | 39 | 118 | 33.05 | 2.52 |
4 | best stillwater trout flies uk | 38 | 220 | 17.27 | 2.82 |
We need to merge the df_april
and df_may
dataframes together with a left join, however, doing this will append some not very useful x
and y
values to each column. To prevent this, we’ll add a suffix to each column first so that the two dataframes contain unique column names.
df_april.columns = [str(col) + '_april' for col in df_april.columns]
df_may.columns = [str(col) + '_may' for col in df_may.columns]
df_april.columns
Index(['query_april', 'clicks_april', 'impressions_april', 'ctr_april',
'position_april'],
dtype='object')
df_may.columns
Index(['query_may', 'clicks_may', 'impressions_may', 'ctr_may',
'position_may'],
dtype='object')
Now we can use the Pandas merge()
function to left join the two dataframes on the query
column in each one. Since these have different suffixes, we need to use the left_on
and right_on
arguments to define their names.
df_all = df_april.merge(df_may, how='left', left_on='query_april', right_on='query_may')
df_all.head()
query_april | clicks_april | impressions_april | ctr_april | position_april | query_may | clicks_may | impressions_may | ctr_may | position_may | |
---|---|---|---|---|---|---|---|---|---|---|
0 | fly fishing near me | 104 | 1001 | 10.39 | 9.80 | fly fishing near me | 78.0 | 784.0 | 9.95 | 10.35 |
1 | emerger fly patterns | 84 | 705 | 11.91 | 1.57 | emerger fly patterns | 82.0 | 698.0 | 11.75 | 1.52 |
2 | swinsty reservoir fishing | 52 | 157 | 33.12 | 1.01 | swinsty reservoir fishing | 25.0 | 116.0 | 21.55 | 2.26 |
3 | sunray fly line review | 48 | 130 | 36.92 | 2.17 | sunray fly line review | 38.0 | 108.0 | 35.19 | 3.44 |
4 | fewston reservoir fishing | 42 | 126 | 33.33 | 1.70 | fewston reservoir fishing | 27.0 | 114.0 | 23.68 | 2.86 |
That works, but it could do with some tidying up, so we’ll next drop one of the redundant query columns, rename it to lose the irrelevant suffix, and reorder the columns so the matching pairs sit alongside each other.
df_all = df_all.rename(columns={'query_april':'query'})
df_all = df_all[['query', 'impressions_april', 'impressions_may', 'clicks_april','clicks_may',
'ctr_april', 'ctr_may', 'position_april', 'position_may'
]]
df_all.head()
query | impressions_april | impressions_may | clicks_april | clicks_may | ctr_april | ctr_may | position_april | position_may | |
---|---|---|---|---|---|---|---|---|---|
0 | fly fishing near me | 1001 | 784.0 | 104 | 78.0 | 10.39 | 9.95 | 9.80 | 10.35 |
1 | emerger fly patterns | 705 | 698.0 | 84 | 82.0 | 11.91 | 11.75 | 1.57 | 1.52 |
2 | swinsty reservoir fishing | 157 | 116.0 | 52 | 25.0 | 33.12 | 21.55 | 1.01 | 2.26 |
3 | sunray fly line review | 130 | 108.0 | 48 | 38.0 | 36.92 | 35.19 | 2.17 | 3.44 |
4 | fewston reservoir fishing | 126 | 114.0 | 42 | 27.0 | 33.33 | 23.68 | 1.70 | 2.86 |
Finally, we can calculate the change in the average search engine position on Google’s search results by subtracting the average rank for April from the average rank in May.
df_all['position_change'] = df_all['position_may'] - df_all['position_april']
df_all.head()
query | impressions_april | impressions_may | clicks_april | clicks_may | ctr_april | ctr_may | position_april | position_may | position_change | |
---|---|---|---|---|---|---|---|---|---|---|
0 | fly fishing near me | 1001 | 784.0 | 104 | 78.0 | 10.39 | 9.95 | 9.80 | 10.35 | 0.55 |
1 | emerger fly patterns | 705 | 698.0 | 84 | 82.0 | 11.91 | 11.75 | 1.57 | 1.52 | -0.05 |
2 | swinsty reservoir fishing | 157 | 116.0 | 52 | 25.0 | 33.12 | 21.55 | 1.01 | 2.26 | 1.25 |
3 | sunray fly line review | 130 | 108.0 | 48 | 38.0 | 36.92 | 35.19 | 2.17 | 3.44 | 1.27 |
4 | fewston reservoir fishing | 126 | 114.0 | 42 | 27.0 | 33.33 | 23.68 | 1.70 | 2.86 | 1.16 |
By changing the sort order on the position_change
column, it’s then easy to see which keyword queries showed the
biggest increase in ranking and the biggest decrease in ranking between the two periods. The latest version of
EcommerceTools performs out-of-the-box period comparisons of Google Search Console data
and even allows you to run SEO testing via the Causal Impact model.
df_all.sort_values(by='position_change', ascending=True).head()
query | impressions_april | impressions_may | clicks_april | clicks_may | ctr_april | ctr_may | position_april | position_may | position_change | |
---|---|---|---|---|---|---|---|---|---|---|
2332 | alwen reservoir | 3 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 95.67 | 2.00 | -93.67 |
3186 | best fly line for indicator nymphing | 5 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 94.80 | 3.00 | -91.80 |
6511 | fishing waist waders | 1 | 2.0 | 0 | 0.0 | 0.0 | 0.0 | 97.00 | 6.00 | -91.00 |
6713 | fly fish | 3 | 3.0 | 0 | 0.0 | 0.0 | 0.0 | 98.33 | 7.33 | -91.00 |
4543 | click pawl fly reel | 2 | 2.0 | 0 | 0.0 | 0.0 | 0.0 | 98.00 | 7.50 | -90.50 |
df_all.sort_values(by='position_change', ascending=False).head()
query | impressions_april | impressions_may | clicks_april | clicks_may | ctr_april | ctr_may | position_april | position_may | position_change | |
---|---|---|---|---|---|---|---|---|---|---|
8265 | grayling | 3 | 2.0 | 0 | 0.0 | 0.0 | 0.0 | 2.67 | 101.5 | 98.83 |
9736 | lake flies | 9 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 4.11 | 99.0 | 94.89 |
9387 | indicator leader fly fishing | 1 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 2.00 | 93.0 | 91.00 |
9588 | kid-friendly fishing lakes near me | 1 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 13.00 | 102.0 | 89.00 |
6701 | fly casting technique | 1 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 3.00 | 91.0 | 88.00 |
Matt Clarke, Friday, June 11, 2021