How to join Google Analytics and Google Search Console data

Learn how to use Python to connect your Google Search Console API data to your Google Analytics Reporting API data using Pandas and GAPandas.

How to join Google Analytics and Google Search Console data
Picture by Lukas, Pexels.
11 minutes to read

Neither Google Search Console nor Google Analytics gives you access to the data found in both systems in one place. However, with a bit of ingenuity and some relatively simple Python code, you can combine or blend data from both sources in a single report. Here’s how it’s done.

Load the packages

Open a Jupyter notebook and import the google.oauth2 and googleapiclient packages, so we can connect to the Google Search Console API, the requests and json packages, so can send our query, and the Pandas and GAPandas packages so we can query the Google Analytics API.

from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
import gapandas as gp

Configure some set-up variables

Next, create some set-up variables containing the name of your Google Search Console client secrets JSON key file, and your Google Analytics client secrets JSON key file, plus the ID of your Google Analytics view, the URL of your site, and the period you want to examine.

key_gsc = "client-secrets-gsc.json"
key_ga = "client-secrets-ga.json"
view_ga = "123456789"
start_date = "2020-01-01"
end_date = "2020-01-31"
site_url = "http://flyandlure.org"
filename_gsc_pages = "gsc_pages.csv"
filename_ga_pages = "gsc_pages.csv"

Connect to Google Search Console

First, we’ll create a little function to create a connection to the Search Console API and return an authenticated service object. If this is the first time you’ve used the Search Console API, I’d recommend checking out my beginner’s guide to using the Search Console API which explains how to get things set up.

def connect(key):
    """Create a connection to the Google Search Console API and return service object.

    Args:
        key (string): Google Search Console JSON client secrets path.

    Returns:
        service (object): Google Search Console service object.
    """

    scope = ['https://www.googleapis.com/auth/webmasters']
    credentials = service_account.Credentials.from_service_account_file(key, 
                                                                        scopes=scope)
    service = build(
        'webmasters',
        'v3',
        credentials=credentials
    )

    return service
service = connect(key_gsc)

Convert Google Search Console data to a dataframe

Next, we’ll create a function to send our query payload to the API and return the results in a Pandas dataframe. This takes the service object from the function above, the site_url containing the name of the property, and a payload dictionary containing your API query.

def query(service, site_url, payload):
    """Run a query on the Google Search Console API and return a dataframe of results.

    Args:
        service (object): Service object from connect()
        site_url (string): URL of Google Search Console property
        payload (dict): API query payload dictionary

    Return:
        df (dataframe): Pandas dataframe containing requested data. 

    """

    response = service.searchanalytics().query(siteUrl=site_url, body=payload).execute()

    results = []

    for row in response['rows']:    
        data = {}

        for i in range(len(payload['dimensions'])):
            data[payload['dimensions'][i]] = row['keys'][i]

        data['clicks'] = row['clicks']
        data['impressions'] = row['impressions']
        data['ctr'] = round(row['ctr'] * 100, 2)
        data['position'] = round(row['position'], 2)        
        results.append(data)

    return pd.DataFrame.from_dict(results)

Fetch Google Search Console data by page

Now that’s in place, we’ll run a Search Console query to return all the pages viewed within the date range selected and write them into a Pandas dataframe called df_gsc. We’ll also back this up to a CSV file, so we can use it elsewhere if we need to.

payload = {
    'startDate': start_date,
    'endDate': end_date,
    'dimensions': ["page"],  
    'rowLimit': 25000,
    'startRow': 0
}

df_gsc = query(service, site_url, payload)
df_gsc.to_csv(filename_gsc_pages, index=False)
df_gsc.head()
page clicks impressions ctr position
0 http://flyandlure.org/articles/fly_fishing/win... 265 3456 7.67 13.02
1 http://flyandlure.org/articles/fly_tying/9_zon... 212 3980 5.33 12.00
2 http://flyandlure.org/articles/fly_fishing/how... 206 4430 4.65 9.63
3 http://flyandlure.org/articles/fly_tying/7_of_... 198 2883 6.87 7.27
4 http://flyandlure.org/articles/fly_fishing/how... 170 2049 8.30 6.08

Connect to the Google Analytics API

Next we’ll use GAPandas to create a connection to the Google Analytics API. By passing the key_ga filepath for our client secrets JSON keyfile, we get back a service object just like the one above, that we can then use to run an API query. If it’s your first time using this package, please check out my beginner’s guide to using GAPandas.

service = gp.get_service(key_ga)

Fetch Google Analytics data on landing pages

Next, we’ll create a payload dictionary containing a Google Analytics Reporting API query. This fetches data from the same period as our GSC query, and includes the ga:landingPagePath dimension and is set to only include the traffic from the ga:medium==organic;ga:source==google so it should match (approximately…) the data we got back from GSC. In reality, it never matches perfectly, but it is pretty close.

payload = {
    'start_date': start_date,
    'end_date': end_date,
    'metrics': 'ga:entrances, ga:users, ga:pageviews, ga:bounces, ga:transactions, ga:transactionRevenue, ga:transactionsPerSession',
    'dimensions': 'ga:landingPagePath',
    'filters': 'ga:medium==organic;ga:source==google',
    'sort': '-ga:entrances'
}
df_ga = gp.run_query(service, view_ga, payload)
df_ga.to_csv(filename_ga_pages, index=False)
df_ga.head()
landingPagePath entrances users pageviews bounces transactions transactionRevenue transactionsPerSession
0 /articles/fly_fishing/winter_fly_fishing_tips_... 287 246 462 226 0 0.0 0.0
1 /articles/fly_tying/9_zonker_fly_patterns_to_tie 274 212 350 241 0 0.0 0.0
2 /articles/fly_tying/7_of_the_best_boobies_youl... 237 202 370 191 0 0.0 0.0
3 /articles/fly_fishing/how_to_fly_fish_with_a_s... 209 191 278 181 0 0.0 0.0
4 /articles/fly_fishing/how_to_fish_the_blob_fly 202 180 368 148 0 0.0 0.0

Add the missing URL to the landing page path

Since the data in Google Search Console uses a dimension called page which contains the absolute URL, and Google Analytics uses a relative URL (minus the site name and protocol), we need to prepend the start part of the URL to the landingPagePath. This is dead easy in Pandas.

df_ga['page'] = site_url+df_ga['landingPagePath']
df_ga.head(10)
landingPagePath entrances users pageviews bounces transactions transactionRevenue transactionsPerSession page
0 /articles/fly_fishing/winter_fly_fishing_tips_... 287 246 462 226 0 0.0 0.0 http://flyandlure.org/articles/fly_fishing/win...
1 /articles/fly_tying/9_zonker_fly_patterns_to_tie 274 212 350 241 0 0.0 0.0 http://flyandlure.org/articles/fly_tying/9_zon...
2 /articles/fly_tying/7_of_the_best_boobies_youl... 237 202 370 191 0 0.0 0.0 http://flyandlure.org/articles/fly_tying/7_of_...
3 /articles/fly_fishing/how_to_fly_fish_with_a_s... 209 191 278 181 0 0.0 0.0 http://flyandlure.org/articles/fly_fishing/how...
4 /articles/fly_fishing/how_to_fish_the_blob_fly 202 180 368 148 0 0.0 0.0 http://flyandlure.org/articles/fly_fishing/how...
5 /articles/fly_tying/7_of_the_best_fly_patterns... 194 124 300 143 0 0.0 0.0 http://flyandlure.org/articles/fly_tying/7_of_...
6 / 185 65 826 67 0 0.0 0.0 http://flyandlure.org/
7 /articles/fly_tying/6_stunning_freshwater_shri... 175 144 215 153 0 0.0 0.0 http://flyandlure.org/articles/fly_tying/6_stu...
8 /articles/fly_tying/5_pheasant_tail_nymph_patt... 159 137 213 130 0 0.0 0.0 http://flyandlure.org/articles/fly_tying/5_phe...
9 /articles/fly_fishing/how_to_fish_snake_flies_... 149 126 192 128 0 0.0 0.0 http://flyandlure.org/articles/fly_fishing/how...

Connect the GA data to the GSC data

Finally, we can now use the Pandas merge() function to join the df_gsc dataframe to the df_ga dataframe by using the page column as our key. This should give us back a dataframe containing the data from both sources, neatly connected to show data from Google Search Console that we can’t easily obtain in GA. As I mentioned above, it’s never perfect, but it should be fairly close!

df_all = pd.merge(df_gsc, df_ga, how='right', on=['page'])
df_all.head()
page clicks impressions ctr position landingPagePath entrances users pageviews bounces transactions transactionRevenue transactionsPerSession
0 http://flyandlure.org/articles/fly_fishing/win... 265.0 3456.0 7.67 13.02 /articles/fly_fishing/winter_fly_fishing_tips_... 287 246 462 226 0 0.0 0.0
1 http://flyandlure.org/articles/fly_tying/9_zon... 212.0 3980.0 5.33 12.00 /articles/fly_tying/9_zonker_fly_patterns_to_tie 274 212 350 241 0 0.0 0.0
2 http://flyandlure.org/articles/fly_tying/7_of_... 198.0 2883.0 6.87 7.27 /articles/fly_tying/7_of_the_best_boobies_youl... 237 202 370 191 0 0.0 0.0
3 http://flyandlure.org/articles/fly_fishing/how... 206.0 4430.0 4.65 9.63 /articles/fly_fishing/how_to_fly_fish_with_a_s... 209 191 278 181 0 0.0 0.0
4 http://flyandlure.org/articles/fly_fishing/how... 170.0 2049.0 8.30 6.08 /articles/fly_fishing/how_to_fish_the_blob_fly 202 180 368 148 0 0.0 0.0

Matt Clarke, Saturday, March 13, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments