Over the past decade I’ve written more Google Analytics API queries than I can remember. Initially, I favoured PHP for these (and still do for permanent web-based applications utilising GA data), but as Google Sheets became more sophisticated, I eventually switched to this for faster ad hoc analysis, following initial prototyping undertaken using the Query Explorer.
However, more recently, I’ve almost entirely switched over to Pandas and Jupyter notebooks for both ad hoc analyses and quick prototype models. The fact that you can combine your Python code with tables, charts, comments and models is just brilliant. If you regularly work with Google Analytics data, it’s something I’d definitely recommend.
As usual with Google’s APIs, there is already a client API written in Python that does the heavy lifting involved in authenticating and querying the server. However, although it works fine in larger applications, it’s not conveniently packaged for use in Jupyter and it takes a little work to get it working with Pandas.
To save me valuable time at work, I spent the weekend creating a Python package to make it quicker and easier to analyse GA data in Pandas and Jupyter notebooks. While it’s certainly not complete, and could use some further work on forthcoming weekends, it’s been a big time saver at work and has helped me find out some interesting things about my personal website too.
Here’s how you can make use of it.
GAPandas can be found on the Python Package Index (PyPi) so you can install it using a simple pip
(or pip3
) command.
pip3 install gapandas
To authenticate with the Google Analytics API you will need to obtain a client_secrets.json
keyfile for a Service
Account. Save your keyfile to a logical place in your directory tree so you can access it easily from all of your Jupyter notebooks.
The next thing you’ll need is the view ID for the Google Analytics property you want to access. To find your view ID, you can do the following:
Open up a Jupyter notebook by typing jupyter notebook
in your terminal. You might want to create a virtual environment before you do this.
Load gapandas
and import the connect
and query
modules. Obtain a service object by providing the get_service()
function with the path to your client_secrets.json
keyfile.
from gapandas import connect, query
service = connect.get_service('path/to/client_secrets.json')
Now you have a connection, all you need to do is create a “payload” of API query parameters to send to the Google Analytics API. These are simple Python dictionaries, so should be intuitive to create, and they support all of the features listed in the Google Analytics API documentation. Here, we’ll fetch sessions, pageviews and bounces by date for the past 30 days since today.
payload = {
'start_date': '30daysAgo',
'end_date': 'today',
'metrics': 'ga:sessions, ga:pageviews, ga:bounces',
'dimensions': 'ga:date'
}
To run your query and print a Pandas DataFrame containing your results, you simply pass your payload dictionary to the run_query()
function along with your service object and your Google Analytics view ID. (You might want to store that in a variable to save repetition.)
results = query.run_query(service, '123456789', payload)
print(results)
As GAPandas returns a Pandas DataFrame (or the raw data if you provide the 'raw'
flag on run_query()
) you can manipulate the data using the normal Pandas techniques.
That means you can create and run multiple API queries and merge the data together in ways that would be much more fiddly in Google Sheets or PHP. Check it out by installing it with pip3 install gapandas
.
Matt Clarke, Monday, March 01, 2021