How to use GAPandas to view your Google Analytics data

Learn how to use GAPandas to query the Google Analytics API and view, analyse, and visualise your web analytics data from within Pandas dataframes.

How to use GAPandas to view your Google Analytics data
5 minutes to read

Over the past decade I’ve written more Google Analytics API queries than I can remember. Initially, I favoured PHP for these (and still do for permanent web-based applications utilising GA data), but as Google Sheets became more sophisticated, I eventually switched to this for faster ad hoc analysis, following initial prototyping undertaken using the Query Explorer.

However, more recently, I’ve almost entirely switched over to Pandas and Jupyter notebooks for both ad hoc analyses and quick prototype models. The fact that you can combine your Python code with tables, charts, comments and models is just brilliant. If you regularly work with Google Analytics data, it’s something I’d definitely recommend.

Accessing Google Analytics data in Python

As usual with Google’s APIs, there is already a client API written in Python that does the heavy lifting involved in authenticating and querying the server. However, although it works fine in larger applications, it’s not conveniently packaged for use in Jupyter and it takes a little work to get it working with Pandas.

To save me valuable time at work, I spent the weekend creating a Python package to make it quicker and easier to analyse GA data in Pandas and Jupyter notebooks. While it’s certainly not complete, and could use some further work on forthcoming weekends, it’s been a big time saver at work and has helped me find out some interesting things about my personal website too.

Here’s how you can make use of it.

1. Install GAPandas

GAPandas can be found on the Python Package Index (PyPi) so you can install it using a simple pip (or pip3) command.

pip3 install gapandas

2. Create a client_secrets.json keyfile

To authenticate with the Google Analytics API you will need to obtain a client_secrets.json keyfile for a Service Account. Save your keyfile to a logical place in your directory tree so you can access it easily from all of your Jupyter notebooks.

3. Obtain your view ID

The next thing you’ll need is the view ID for the Google Analytics property you want to access. To find your view ID, you can do the following:

  1. Go to analytics.google.com
  2. Click the Admin button on the bottom left hand corner of the sidebar
  3. Select the Google Analytics Property you want to access using the dropdown menu at the top of the middle column
  4. Click the View Settings button at the top of the right column
  5. Copy the view ID (i.e. 12345678)

4. Create your first GAPandas script

Open up a Jupyter notebook by typing jupyter notebook in your terminal. You might want to create a virtual environment before you do this.

Create your connection

Load gapandas and import the connect and query modules. Obtain a service object by providing the get_service() function with the path to your client_secrets.json keyfile.

from gapandas import connect, query
service = connect.get_service('path/to/client_secrets.json')
Write your query

Now you have a connection, all you need to do is create a “payload” of API query parameters to send to the Google Analytics API. These are simple Python dictionaries, so should be intuitive to create, and they support all of the features listed in the Google Analytics API documentation. Here, we’ll fetch sessions, pageviews and bounces by date for the past 30 days since today.

payload = {
    'start_date': '30daysAgo',
    'end_date': 'today',
    'metrics': 'ga:sessions, ga:pageviews, ga:bounces',
    'dimensions': 'ga:date'
}
Run your query

To run your query and print a Pandas DataFrame containing your results, you simply pass your payload dictionary to the run_query() function along with your service object and your Google Analytics view ID. (You might want to store that in a variable to save repetition.)

results = query.run_query(service, '123456789', payload)
print(results)

Running multiple queries

As GAPandas returns a Pandas DataFrame (or the raw data if you provide the 'raw' flag on run_query()) you can manipulate the data using the normal Pandas techniques.

That means you can create and run multiple API queries and merge the data together in ways that would be much more fiddly in Google Sheets or PHP. Check it out by installing it with pip3 install gapandas.

Matt Clarke, Monday, March 01, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.