Google Search Console contains loads of really useful information for technical SEO. However, there are limits to what you can do using the front-end interface, and it takes time to extract the data you need for your analyses. However, you can make life much easier if you access what you need via the Google Search Console API instead.
The API gives you direct access to all the data you need directly from within Python, so you can extract what you need, build automated scripts, create data pipelines to manipulate the data, connect it to other sources, and move the data into other systems. Here’s how it’s done.
For a quicker, easier way to access Google Search Console data, please check out my EcommerceTools Python package that contains a set of tools for extracting data from Google Search Console.
To access data from the Google Search Console API you’ll need to create a Service Account using the Google APIs Console and download the JSON client secrets key file to your machine. The process for doing this is somewhat convoluted:
Open a Jupyter notebook or Python script and import google.oauth2
, googleapiclient.discovery
, requests
, json
, and pandas
. Any packages you don’t have can be installed by entering pip3 install package-name
in your terminal.
from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
To make the Pandas dataframes a bit easier to read, you may want to change the maximum column width from the default value to a higher number using pd.set_option('max_colwidth', 150)
.
pd.set_option('max_colwidth', 150)
Next, create a variable called key and add the path to your client secrets JSON keyfile used to authenticate you on the Service Account used for the Google Search Console API.
key = 'google-search-console.json'
To handle the connection, we’ll create a basic function that passes the key
and scope
to the API and returns a service object that we can use to run queries with. This is fine for Jupyter notebook use, but you’ll want to include some error handling functionality if you want to use it in production.
def connect(key):
"""Create a connection to the Google Search Console API and return service object.
Args:
key (string): Google Search Console JSON client secrets path.
Returns:
service (object): Google Search Console service object.
"""
scope = ['https://www.googleapis.com/auth/webmasters']
credentials = service_account.Credentials.from_service_account_file(key,
scopes=scope)
service = build(
'webmasters',
'v3',
credentials=credentials
)
return service
Next, we’ll create a function called query()
that takes our authenticated service
object, the site_url
identifying the Search Console property we want to query, and a dictionary called payload
which contains our API query.
We’ll use execute()
to run this query on the API, and will then extract the rows
, reformat the data so it is neat and tidy, and add the output to a Pandas dataframe using from_dict()
.
def query(service, site_url, payload):
"""Run a query on the Google Search Console API and return a dataframe of results.
Args:
service (object): Service object from connect()
site_url (string): URL of Google Search Console property
payload (dict): API query payload dictionary
Return:
df (dataframe): Pandas dataframe containing requested data.
"""
response = service.searchanalytics().query(siteUrl=site_url, body=payload).execute()
results = []
for row in response['rows']:
data = {}
for i in range(len(payload['dimensions'])):
data[payload['dimensions'][i]] = row['keys'][i]
data['clicks'] = row['clicks']
data['impressions'] = row['impressions']
data['ctr'] = round(row['ctr'] * 100, 2)
data['position'] = round(row['position'], 2)
results.append(data)
return pd.DataFrame.from_dict(results)
Finally, we can put these steps together. We’ll pass our key
to connect()
to get a service
object. We’ll create a simple API query payload
dictionary, define the site_url
of the property we want to query, and then fetch the data using query()
. This returns a Pandas dataframe that we can manipulate as we wish.
service = connect(key)
payload = {
'startDate': "2019-01-01",
'endDate': "2019-12-31",
'dimensions': ["page","device","query"],
'rowLimit': 100,
'startRow': 0
}
site_url = "http://flyandlure.org"
df = query(service, site_url, payload)
df.head()
page | query | clicks | impressions | ctr | position | |
---|---|---|---|---|---|---|
0 | http://flyandlure.org/ | fly and lure | 144 | 467 | 30.84 | 2.73 |
1 | http://flyandlure.org/articles/fly_fishing_gea... | simms freestone waders review | 134 | 698 | 19.20 | 4.01 |
2 | http://flyandlure.org/articles/fly_fishing_gea... | orvis encounter waders review | 87 | 449 | 19.38 | 4.57 |
3 | http://flyandlure.org/articles/fly_tying/9_zon... | zonker fly | 80 | 1542 | 5.19 | 2.76 |
4 | http://flyandlure.org/articles/fly_fishing/how... | cats whisker fly | 72 | 654 | 11.01 | 1.40 |
Matt Clarke, Friday, March 12, 2021